{"id":23784,"date":"2023-09-01T05:00:00","date_gmt":"2023-09-01T03:00:00","guid":{"rendered":"https:\/\/sii.pl\/blog\/?p=23784"},"modified":"2023-08-27T07:09:58","modified_gmt":"2023-08-27T05:09:58","slug":"microsoft-purview-introduction-and-review-of-a-data-governance-solution","status":"publish","type":"post","link":"https:\/\/sii.pl\/blog\/en\/microsoft-purview-introduction-and-review-of-a-data-governance-solution\/","title":{"rendered":"Microsoft Purview \u2013 introduction and review of a data governance solution"},"content":{"rendered":"\n<p>In September 2021, Microsoft launched a beta version of a cloud-native <strong>data governance<\/strong> service called <strong>Microsoft Purview<\/strong>. The tool promised users the ability to govern their on-premises, multicloud and SaaS data. It\u2019s been 2 years since that date, and I have been wondering if this \u201cnew\u201d and promising solution, although now aged, can still be called promising and what the service provides.<\/p>\n\n\n\n<p>Together with some friends from Sii, we conducted a PoC for our client. I want to share some of our findings with you.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Getting started<\/strong><\/h2>\n\n\n\n<p>You can create a Microsoft Purview account in the Azure Portal site by simply creating a new resource and searching for Purview in the Marketplace. During the creation process, you will be asked to specify the <strong>Resource Group<\/strong> and <strong>Purview Account<\/strong> name. Additionally, you can configure the network connection and the platform size of your account.<\/p>\n\n\n\n<p>For testing purposes, \u201c4 capacity units\u201d is more than sufficient in terms of platform size. On the network tab, it would be good to provide a <strong>\u201cPrivate endpoint\u201d<\/strong> for security reasons, but <strong>\u201cAll networks\u201d<\/strong> set up in a non-production environment is fine.<\/p>\n\n\n\n<p>To visualize the data in the organization, we have to create a few <strong>collections<\/strong> and <strong>subcollections<\/strong> in a tree structure to depict the layout of the architecture holding our data. Below is an example of what we came up with for the PoC.<\/p>\n\n\n\n<p>As you can see, at the top we have the name of the account we created, then a collection called \u201cSII\u201d, and under that we have several subcollections showing various subsidiaries of the company. Each subsidiary has an application running and a department that uses it. In our example \u201cDzia\u0142_1\u201d in \u201cJednostka_1\u201d is based on Power BI reports, so we created a <strong>data source<\/strong> that will <strong>scan<\/strong> all <strong>Power Bi<\/strong> reports on that <strong>tenant<\/strong> to depict what reports hold what data. For a full list of data sources that can be created directly from the purview and scan, <a href=\"https:\/\/learn.microsoft.com\/en-us\/purview\/microsoft-purview-connector-overview\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\" rel=\"nofollow\" >follow the link.<\/a><\/p>\n\n\n\n<p>That same department in \u201dJednostka_2\u201d gets its data from a <strong>SQL database,<\/strong> so this is the data source we created and scanned to find out what <strong>assets<\/strong>(data) are stored there.<\/p>\n\n\n\n<p>For the last subsidiary, we <strong>import<\/strong> <strong>assets <\/strong>directly from the <strong>Excel template<\/strong> using a custom <strong>application<\/strong> in <strong>Python<\/strong> that we are building. This is a solution for importing information on custom flat file reports and outputs that are not covered by the scans of data sources. We will discuss both <strong>scanning and template<\/strong> import topics in more detail later.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><a href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/1.jpg\"><img decoding=\"async\" width=\"1024\" height=\"703\" src=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/1-1024x703.jpg\" alt=\"Collections and subcollections in Microsoft Pureview\" class=\"wp-image-23799\" srcset=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/1-1024x703.jpg 1024w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/1-300x206.jpg 300w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/1-768x528.jpg 768w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/1.jpg 1099w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption class=\"wp-element-caption\">Fig. 1 Collections and subcollections in Microsoft Pureview<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Data scans<\/strong><\/h2>\n\n\n\n<p>One way to automatically obtain data information from the environment is to add a data source to a particular collection and run a scan on it. In our example, we have created a <strong>Power BI<\/strong> data source that can be used to scan the same tenant where we have our purview account on for resources. The result of the scan in this case will show not only Power Bi <strong>reports,<\/strong> but also <strong>workspaces<\/strong>, <strong>datasets<\/strong> and <strong>dashboards<\/strong>. <\/p>\n\n\n\n<p>In the details of a given asset, you will be able to see under which <strong>collection path<\/strong> and in which <strong>hierarchy<\/strong> in the Power Bi data source it is located. It will also be possible to check the <strong>lineage<\/strong> of the data and even the <strong>schema,<\/strong> if the asset can be displayed as a table.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/2-1.jpg\"><img decoding=\"async\" width=\"1008\" height=\"720\" src=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/2-1.jpg\" alt=\"Lineage of the data\" class=\"wp-image-23801\" srcset=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/2-1.jpg 1008w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/2-1-300x214.jpg 300w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/2-1-768x549.jpg 768w\" sizes=\"(max-width: 1008px) 100vw, 1008px\" \/><\/a><figcaption class=\"wp-element-caption\">Fig. 2 Lineage of the data<\/figcaption><\/figure>\n\n\n\n<p>Detailed instructions for creating this configuration <a href=\"https:\/\/learn.microsoft.com\/en-us\/purview\/register-scan-power-bi-tenant?tabs=Scenario1\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\" rel=\"nofollow\" >you can find at this link<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Bulk import of assets<\/strong><\/h2>\n\n\n\n<p>Since one of the most common things, you can find in any IT environment is an Excel-based report or a flat file of data that someone uses in their daily work, there must be a way to introduce this information and data structure into the purview. To do this, some coding in Python is required.<\/p>\n\n\n\n<p>While doing this, you will find that underneath purview actually uses <strong>apacheatlas <\/strong>to import assets. The coding is fairly simple, and a tutorial on how to do this with examples <a href=\"https:\/\/www.youtube.com\/watch?v=27jRUydL6qE&amp;t=555s\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\" rel=\"nofollow\" >can be found here.<\/a><\/p>\n\n\n\n<p>For our PoC, we built a small application for loading Excel templates using<strong> apacheatlas<\/strong> and a graphical interface using <strong>customtkinter<\/strong>. The application shows what environment you are connected to, allows you to browse your PC for files\/templates to upload and perform the upload itself.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img decoding=\"async\" width=\"611\" height=\"514\" src=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/3-1.jpg\" alt=\"Purview \u2013 Bulk Loader\" class=\"wp-image-23804\" srcset=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/3-1.jpg 611w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/3-1-300x252.jpg 300w\" sizes=\"(max-width: 611px) 100vw, 611px\" \/><figcaption class=\"wp-element-caption\">Fig. 3 Purview \u2013 Bulk Loader<\/figcaption><\/figure>\n\n\n\n<p>What is really worth noting at this point is that the Excel template that must be used in this process requires a significant amount of work from the uploader. We need to take the report apart and know all the details about it to fill in the template, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>source type,<\/li>\n\n\n\n<li>column names,<\/li>\n\n\n\n<li>column types,<\/li>\n\n\n\n<li>qualified names,<\/li>\n\n\n\n<li>relationships,<\/li>\n\n\n\n<li>column mappings etc.<\/li>\n<\/ul>\n\n\n\n<p>To build the lineage, we either need to already have all the components (<strong>source<\/strong> <strong>system<\/strong>, <strong>target system<\/strong>, <strong>loading<\/strong> <strong>process<\/strong>) in Purview, or create them using the template.<\/p>\n\n\n\n<p>The end result is decent, as we get a nice visualization, as in the example below. We can trace our &#8220;PESEL&#8221; field, which we have in the <strong>archive system<\/strong> on the right, to the <strong>SAP Hana View<\/strong>, which gets that value from the <strong>SAP table <\/strong>called &#8220;SII_02.&#8221; Performance, unlike the load of even a fairly complex structure of 14 assets, takes only a few seconds which is an adventage.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><a href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/4-1.jpg\"><img decoding=\"async\" width=\"1024\" height=\"298\" src=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/4-1-1024x298.jpg\" alt=\"Building the lineage \" class=\"wp-image-23806\" srcset=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/4-1-1024x298.jpg 1024w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/4-1-300x87.jpg 300w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/4-1-768x223.jpg 768w, https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/4-1.jpg 1101w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption class=\"wp-element-caption\">Fig. 4 Building the lineage<\/figcaption><\/figure>\n\n\n\n<p>If you are interested in running some tests on your own, I\u2019ve attached a simple code sample with two import files that can be used for bulk import into Microsoft Purview. The example does not include a graphical interface.<\/p>\n\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-dd5269c7-70a5-423d-bd0c-073cbd4eca2e\" href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/SAP_SII_01_import2.xlsx\" target=\"_blank\" rel=\"noreferrer noopener\">SAP_SII_01_import2<\/a><a href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/SAP_SII_01_import2.xlsx\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-dd5269c7-70a5-423d-bd0c-073cbd4eca2e\">Pobierz<\/a><\/div>\n\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-aef444f3-7239-4451-8535-9965ab66fa8e\" href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/SAP_SII_02_import1.xlsx\" target=\"_blank\" rel=\"noreferrer noopener\">SAP_SII_02_import1<\/a><a href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/SAP_SII_02_import1.xlsx\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-aef444f3-7239-4451-8535-9965ab66fa8e\">Pobierz<\/a><\/div>\n\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-de844d12-e0ee-4adf-91ca-6b17f784b486\" href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/Purview_Example.py_.txt\">Purview_Example.py_<\/a><a href=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/Purview_Example.py_.txt\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-de844d12-e0ee-4adf-91ca-6b17f784b486\">Pobierz<\/a><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Summary<\/strong><\/h2>\n\n\n\n<p>As you can see, we were able to use Microsoft Purview, build out the collections, create automated scans of data sources, import assets from flat files and trace data lineage in the tool. Now we can try to answer some important questions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>What we can do with purview? Where we can use it for?<\/strong><\/h3>\n\n\n\n<p>Creating a master data catalogue, documenting what data we use in a report or showing via a scan to a controller that we indeed do not have personal or confidential data exposed in our system.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Is this tool easy to use for someone who wants to start creating a data catalog?<\/strong><\/h3>\n\n\n\n<p>Unfortunately, the entry threshold is high for new Purview users. Setting up the scan requires a very good knowledge of Azure Cloud on the administrative side, and even the bulk import requires a basic knowledge of Python. Once all that is done, the tool is simple and easy, but creating a data catalog with this this solution is a very technical task.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Is it a production-ready product?<\/strong><\/h3>\n\n\n\n<p>This service has been in beta for 2 years for many reasons.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Automated scanning of data sources requires the participation of a technical person to set up even on the same tenant. Between different tenants with additional network security would require a whole team.<\/li>\n\n\n\n<li>Manual entry requires coding, so you need a developer or engineer to do this. Even at that, not all functions exist to easily work with collections and asset placement.<\/li>\n\n\n\n<li>An intelligent asset import (or import wizard) for unsupported data sources, such as an Excel report, does not exist.<\/li>\n\n\n\n<li>Data lineage even between scanned sources must be adjusted manually.<\/li>\n\n\n\n<li>Almost all major data sources are available in Purview, but they are not complete. For example: you can have tables from all SAP types, but the views and columns that you need to create for them are not available.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Would you recommend using Purview?<\/strong><\/h3>\n\n\n\n<p>I think it is worth knowing about this tool and checking what updates are available for it. It is a good solution for managing data in the cloud, especially in large organizations, and its value will increase. <\/p>\n\n\n\n<p>Nevertheless, I would not recommend it at this stage, as the amount of work required to get it up and running outweighs the benefits it brings.<\/p>\n\n\n<div class=\"kk-star-ratings kksr-auto kksr-align-left kksr-valign-bottom\"\n    data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;23784&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;bottom&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;8&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;4.7&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;11&quot;,&quot;greet&quot;:&quot;&quot;,&quot;legend&quot;:&quot;4.7\\\/5 ( votes: 8)&quot;,&quot;size&quot;:&quot;18&quot;,&quot;title&quot;:&quot;Microsoft Purview \u2013 introduction and review of a data governance solution&quot;,&quot;width&quot;:&quot;130.8&quot;,&quot;_legend&quot;:&quot;{score}\\\/{best} ( {votes}: {count})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>\n            \n<div class=\"kksr-stars\">\n    \n<div class=\"kksr-stars-inactive\">\n            <div class=\"kksr-star\" data-star=\"1\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"2\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"3\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"4\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"5\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n    <\/div>\n    \n<div class=\"kksr-stars-active\" style=\"width: 130.8px;\">\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n    <\/div>\n<\/div>\n                \n\n<div class=\"kksr-legend\" style=\"font-size: 14.4px;\">\n            4.7\/5 ( votes: 8)    <\/div>\n    <\/div>\n","protected":false},"excerpt":{"rendered":"<p>In September 2021, Microsoft launched a beta version of a cloud-native data governance service called Microsoft Purview. The tool promised &hellip; <a class=\"continued-btn\" href=\"https:\/\/sii.pl\/blog\/en\/microsoft-purview-introduction-and-review-of-a-data-governance-solution\/\">Continued<\/a><\/p>\n","protected":false},"author":556,"featured_media":23795,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","inline_featured_image":false,"footnotes":""},"categories":[1319],"tags":[1783,1782,1590,1398],"class_list":["post-23784","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-soft-development","tag-data-analytics","tag-microsoft-purview","tag-tools","tag-azure-en"],"acf":[],"aioseo_notices":[],"republish_history":[],"featured_media_url":"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2023\/08\/Microsoft-Purview-\u2013-introduction-and-review-of-a-data-governance-solution.jpg","category_names":["Soft development"],"_links":{"self":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/23784"}],"collection":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/users\/556"}],"replies":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/comments?post=23784"}],"version-history":[{"count":2,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/23784\/revisions"}],"predecessor-version":[{"id":23820,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/23784\/revisions\/23820"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/media\/23795"}],"wp:attachment":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/media?parent=23784"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/categories?post=23784"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/tags?post=23784"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}