Microsoft Purview – a data governance solution

Microsoft Purview – introduction and review of a data governance solution

01.09.2023

In September 2021, Microsoft launched a beta version of a cloud-native data governance service called Microsoft Purview. The tool promised users the ability to govern their on-premises, multicloud and SaaS data. It’s been 2 years since that date, and I have been wondering if this “new” and promising solution, although now aged, can still be called promising and what the service provides.

Together with some friends from Sii, we conducted a PoC for our client. I want to share some of our findings with you.

Getting started

You can create a Microsoft Purview account in the Azure Portal site by simply creating a new resource and searching for Purview in the Marketplace. During the creation process, you will be asked to specify the Resource Group and Purview Account name. Additionally, you can configure the network connection and the platform size of your account.

For testing purposes, “4 capacity units” is more than sufficient in terms of platform size. On the network tab, it would be good to provide a “Private endpoint” for security reasons, but “All networks” set up in a non-production environment is fine.

To visualize the data in the organization, we have to create a few collections and subcollections in a tree structure to depict the layout of the architecture holding our data. Below is an example of what we came up with for the PoC.

As you can see, at the top we have the name of the account we created, then a collection called “SII”, and under that we have several subcollections showing various subsidiaries of the company. Each subsidiary has an application running and a department that uses it. In our example “Dział_1” in “Jednostka_1” is based on Power BI reports, so we created a data source that will scan all Power Bi reports on that tenant to depict what reports hold what data. For a full list of data sources that can be created directly from the purview and scan, follow the link.

That same department in ”Jednostka_2” gets its data from a SQL database, so this is the data source we created and scanned to find out what assets(data) are stored there.

For the last subsidiary, we import assets directly from the Excel template using a custom application in Python that we are building. This is a solution for importing information on custom flat file reports and outputs that are not covered by the scans of data sources. We will discuss both scanning and template import topics in more detail later.

Fig. 1 Collections and subcollections in Microsoft Pureview

Data scans

One way to automatically obtain data information from the environment is to add a data source to a particular collection and run a scan on it. In our example, we have created a Power BI data source that can be used to scan the same tenant where we have our purview account on for resources. The result of the scan in this case will show not only Power Bi reports, but also workspaces, datasets and dashboards.

In the details of a given asset, you will be able to see under which collection path and in which hierarchy in the Power Bi data source it is located. It will also be possible to check the lineage of the data and even the schema, if the asset can be displayed as a table.

Detailed instructions for creating this configuration you can find at this link.

Bulk import of assets

Since one of the most common things, you can find in any IT environment is an Excel-based report or a flat file of data that someone uses in their daily work, there must be a way to introduce this information and data structure into the purview. To do this, some coding in Python is required.

While doing this, you will find that underneath purview actually uses apacheatlas to import assets. The coding is fairly simple, and a tutorial on how to do this with examples can be found here.

For our PoC, we built a small application for loading Excel templates using apacheatlas and a graphical interface using customtkinter. The application shows what environment you are connected to, allows you to browse your PC for files/templates to upload and perform the upload itself.

What is really worth noting at this point is that the Excel template that must be used in this process requires a significant amount of work from the uploader. We need to take the report apart and know all the details about it to fill in the template, such as:

source type,
column names,
column types,
qualified names,
relationships,
column mappings etc.

To build the lineage, we either need to already have all the components (source system, target system, loading process) in Purview, or create them using the template.

The end result is decent, as we get a nice visualization, as in the example below. We can trace our “PESEL” field, which we have in the archive system on the right, to the SAP Hana View, which gets that value from the SAP table called “SII_02.” Performance, unlike the load of even a fairly complex structure of 14 assets, takes only a few seconds which is an adventage.

If you are interested in running some tests on your own, I’ve attached a simple code sample with two import files that can be used for bulk import into Microsoft Purview. The example does not include a graphical interface.

SAP_SII_01_import2 Pobierz

SAP_SII_02_import1 Pobierz

Purview_Example.py_Pobierz

Summary

As you can see, we were able to use Microsoft Purview, build out the collections, create automated scans of data sources, import assets from flat files and trace data lineage in the tool. Now we can try to answer some important questions.

What we can do with purview? Where we can use it for?

Creating a master data catalogue, documenting what data we use in a report or showing via a scan to a controller that we indeed do not have personal or confidential data exposed in our system.

Is this tool easy to use for someone who wants to start creating a data catalog?

Unfortunately, the entry threshold is high for new Purview users. Setting up the scan requires a very good knowledge of Azure Cloud on the administrative side, and even the bulk import requires a basic knowledge of Python. Once all that is done, the tool is simple and easy, but creating a data catalog with this this solution is a very technical task.

Is it a production-ready product?

This service has been in beta for 2 years for many reasons.

Automated scanning of data sources requires the participation of a technical person to set up even on the same tenant. Between different tenants with additional network security would require a whole team.
Manual entry requires coding, so you need a developer or engineer to do this. Even at that, not all functions exist to easily work with collections and asset placement.
An intelligent asset import (or import wizard) for unsupported data sources, such as an Excel report, does not exist.
Data lineage even between scanned sources must be adjusted manually.
Almost all major data sources are available in Purview, but they are not complete. For example: you can have tables from all SAP types, but the views and columns that you need to create for them are not available.

Would you recommend using Purview?

I think it is worth knowing about this tool and checking what updates are available for it. It is a good solution for managing data in the cloud, especially in large organizations, and its value will increase.

Nevertheless, I would not recommend it at this stage, as the amount of work required to get it up and running outweighs the benefits it brings.

4.7/5

About the author

Przemysław Maciejczyk

He graduated in informatics from the Faculty of Computer Science and Management at Wrocław University of Technology. He is a senior Business Analyst with over 12 years of experience in IT. He has worked on many data migration and report development projects in the healthcare, banking, and sales industries. His current focus is on Microsoft Azure, ADF, Power BI, ML, AI, and data analytics. After work, he enjoys spending time with his family and renovating his home

All articles written by the author

Offices in Poland

Sii Sweden

Sii Ukraine

Sii India

Automating financial analysis with AI

Berlingske Media enters the digital-first era – platform modernization with Sii Poland

Quality Control Center for ABB – gaining control over IT system quality

E-commerce modernization to support growth and smooth shopping experience

Atlassian Team ’26 takeaways: from tools to AI-human orchestration

New episodes of the Sii Talks podcast: from AI strategy to real implementation challenges in organizations

OVHcloud and Sii Poland to deliver secure and sovereign Cloud Solutions for enterprises

Cyber Resilience Act Demystified: Risks, Regulations, and the DevSecOps Approach

The AI-Powered Ecosystem: Scaling Marketing Analytics

Mobile WMS for Process Manufacturing. Powered by Business Central.

Multi‑Agent AI for Embedded Software Development

Take control over your cloud costs with FinOps

HashiCorp Vault as a central hub for key and certificate rotation

Blazing fast websites and how Adobe Experience Manager can help achieve that

What does the code from Sii Testing Lab look like? Repositories deep-dive

How to set up branch policies in Azure DevOps: Complete guide for PR validation

Microsoft Purview – introduction and review of a data governance solution

Getting started

Data scans

Bulk import of assets

Summary

What we can do with purview? Where we can use it for?

Is this tool easy to use for someone who wants to start creating a data catalog?

Is it a production-ready product?

Would you recommend using Purview?

About the author

Leave a comment

Cancel reply

Join our team

Senior Azure Data & Analytics Architect (f/m/x)

Data Engineer with Microsoft Fabric (f/m/x)

You might also like

Intelligent integration: How AI is revolutionizing IT

How UX and UI can discover new possibilities for the automotive industry and autonomous cars

A new dimension of Data Analytics with SAP Business Data Cloud

All certifications are scams

ServiceNow and SLA

How to build a secure RAG GenAI chatbot on AWS with Amazon Bedrock

Intelligent boundaries: How to use AI without losing control

Selected aspects of the Now Assist solution – why is it worth your attention?

Speak, and AI will do the rest – a new era of human-machine interaction

Azure Virtual Desktop: A game-changer for modern workplaces

Using Salesforce in banking processes

FDA final guidance on Medical Device cybersecurity – what is new in 2025?

SUBSCRIBE AND DON'T FALL BEHIND

Join our team

Senior Azure Data & Analytics Architect (f/m/x)

Data Engineer with Microsoft Fabric (f/m/x)

Processing...

What we do angle-down

Industries angle-down

Who we are angle-down

Career angle-down

Trainings angle-down

News angle-down

Contact angle-down

Ta treść jest dostępna tylko w jednej wersji językowej. Nastąpi przekierowanie do strony głównej.

What we do

Industries

Who we are

Career

Trainings

News

Contact

Ta treść jest dostępna tylko w jednej wersji językowej.
Nastąpi przekierowanie do strony głównej.