Implementation of the Modern Data Warehouse in Azure cloud

Implementation of the Modern Data Warehouse architecture in the Azure cloud

17.10.2022

For a long time, the focus of companies has been on data.

Professional entities don’t want to make decisions based on opinions or hunches, instead, they are guided by facts. It doesn’t matter whether we’re talking about making strategic decisions, whether we want to check sales performance, or maybe confirm the address of one of our contractors – both small and large decisions need data to be able to take them consciously.

Without data, you are just another person with an opinion
W.E.Deming

Why Modern Data Warehouses were created?

In recent years we have seen a significant increase in data volumes, sources, formats and frequency of delivery. Data warehouses have stopped relying solely on structured data sources, but have increasingly begun to store and use semi- and unstructured data, such as logs or bitmaps, for data processing and enrichment and as a source for machine learning models.

As a result, the standard data warehouse model, customized to store and analyze structured data, was no longer sufficient and could not respond to the growing demand for data consumption and storage.

At the same time, the trend in which more services are being moved from on-premises to the cloud was increasingly evident, and in this regard, databases are no exception. The development of the cloud has also brought technologies that separate storage from computing power, which in turn has made it possible to create solutions that are both cost- and performance-efficient.

For the above reasons, an architecture called Modern Data Warehouse (MDW) emerged. It incorporates a traditional data warehouse, provides the ability to work with ML and data science models directly on files from source systems, and allows the creation of advanced reports for business users.

In this post, I will try to introduce the implementation of the Modern Data Warehouse architecture in the Azure cloud and discuss its various components.

The implementation of the Modern Data Warehouse architecture in the Azure cloud

Below is a sample diagram showing the implementation of the architecture with the data flows between components highlighted:

First step – data storage in Azure Data Lakes

The entry point is structured data, i.e. data organized in tabular form that has a specific structure and is consistent in terms of the type of data stored, coming from applications and business systems, and unstructured data, i.e. data that does not have the aforementioned structure, such as logs, text files or bitmaps. They can reside in systems or servers located in the cloud or on-premises.

Using Pipelines functionality from either Azure Synapse Analytics or Azure Data Factory, we can create processes that download files to Azure Data Lake, where all source files will be stored. Both Azure Data Factory and Pipelines in Azure Synapse Analytics have built-in functionality to run specific processes according to a specific schedule or in response to specific events, such as creating a file on another data lake.

Some applications or business systems have their own processes that can export files to a specific location in the cloud. On the one hand, this simplifies the whole process, while on the other hand, the security and access issues that these applications should then have must also be taken into consideration.

Azure Data Lake is an Azure cloud file storage service designed for the scalability and efficiency of big data processing. Data can be stored in any size and format. The introduction of file system features helps speed up file reading and metadata management, as well as managing file access at the individual folder level. Thanks to built-in rules, it is possible to run data processing pipelines (steps 2. and 3.) immediately after the fact of writing a file to a given storage account.

Second step – moving data from the data lake to the staging table

Data that has been stored on Azure Data Lake and is to be loaded into the SQL pool in Azure Synapse Analytics must first be moved into so-called staging tables. Their job is to store the current version of the data model and prepare it before loading it into the target tables.

A common practice is to prepare a staging table in having an identical structure to the target table, and then compare the contents of the staging table with the currently held data in the database. Depending on the scenario, the data can be added as new rows to the target tables, used to update existing rows or used to identify rows to be deleted.

To move data from the data lake to the staging table in the SQL pool in Azure Synapse Analytics in the fastest and most efficient way, it is recommended to use PolyBase technology. This is a technology that allows you to access and query external data, such as that stored in the Azure Data Lake service, using T-SQL. As an alternative to Polybase, data can be copied via properly configured pipelines in Azure Data Factory or Azure Synapse Analytics.

Once the data is stored in the staging table, running the appropriate stored procedures allows the data to be moved to the target tables. In this step, the data can undergo various transformations to match the model definitions in the target tables.

Third step – using data in the train ML models

Not all data needs to be transferred directly to the database, some of it can be used as input for processes that train ML models. These processes will be run through the Notebooks functionality in Azure Synapse Analytics or Azure Databricks. It should be noted that these are mostly complex compute-intensive processes and therefore dedicated clusters running on Apache Spark technology will be used to support them.

The results from the trained models will be saved to staging tables in the SQL pool in Azure Synapse Analytics, from where, after running the appropriate procedures, the data can be transferred directly to the target tables or used for data transformation.

Fourth step – calculating data models

The data collected in the target tables will be the basis for calculating data models defined in Azure Analysis Services (AAS).

AAS is a tool for modeling data by defining relationships between data entities, creating measures, dimensions, hierarchies, and security features within a tabular data model. Since AAS recommends defining data models according to Kimball’s methodology, it is recommended to design the database according to its guidelines, which will allow visualization tools to use the data effectively.

Fifth step – data analysis in Power BI

End users analyze aggregated data in Power BI, using defined dimensions and measures in data models, or create their own using the Data Analysis Expressions (DAX) language. Power BI has a wide range of visualization tools and allows users to connect to a variety of data sources located both in the cloud and on-prem, including text files, relational and non-relational databases or even streaming data sources, among others.

Advantages of the Modern Data Warehouse solution

What are the benefits of implementing MDW? First of all, we get an efficient solution that can deal with the huge amount of data flowing in every day. The architecture used is fully scalable, which makes it possible to adjust the computing power to the current demand.

The solutions used to make it possible to combine data coming from different sources and in different formats, and on their basis create advanced visualizations and reports that allow users to grasp relevant information and draw conclusions from the collected data.

The individual components of the solution can be configured in such a way as to ensure high availability of the whole even in the event of failure of a single component, and built-in disaster recovery mechanisms will help protect against accidental data loss.

Other possibilities

The above scheme is indicative only and may be subject to modification. For smaller databases (smaller than 1 TB), it will make more sense to use, for example, Azure SQL Database or Azure Database for PostgreSQL, as most likely due to the size of the data, we will not be able to take full advantage of all the MPP capabilities offered by the SQL pool in Azure Synapse Analytics.

While Azure Analysis Services is an excellent tool for big data analysis, it is also relatively expensive. For this reason, it is also an element that is often overlooked for implementation and replaced by directly querying databases or creating models directly in Power BI. Power BI itself can also be replaced by another visualization tool, such as Qlik or Tableau.

In this case, connectivity to particular types of data sources or individual user preferences plays a big role.

***

If you are interested in the subject of data warehousing, I recommend another article (PL) by our expert: Architektura Lakehouse, koncepcja Delta Lake w usłudze DataBricks.

5/5

About the author

Adam Czeczot

He is an experienced data professional with extensive experience in the Microsoft technology stack. In his professional life, he is a cloud and information technologies enthusiast trying to combine the best of these two in his day-to-day work. He truly believes that the data shall be at the heart of every business decision and keeping that in mind he tries to develop solutions delivering value for the customers. Privately his hobbies are: football, running, hiking, financial markets and cooking.

All articles written by the author

Offices in Poland

Sii Sweden

Sii Ukraine

Sii India

Automating financial analysis with AI

Berlingske Media enters the digital-first era – platform modernization with Sii Poland

Quality Control Center for ABB – gaining control over IT system quality

E-commerce modernization to support growth and smooth shopping experience

Infoshare 2026 wrap-up: AI has become part of everyday work – and that’s what makes it interesting

When young talents meet the experts – this is what Open Days at Sii looked like

You Build It, You Run It (YBIYRI): The model that hurts – and that's why it works

What is needed for a successfull AI adoption? Lessons from Gartner Data & Analytics Summit 2026 in London

The AI-Powered Ecosystem: Scaling Marketing Analytics

Mobile WMS for Process Manufacturing. Powered by Business Central.

From demo to systems you can trust: how real AI works in business

Multi‑Agent AI for Embedded Software Development

Copilot in Microsoft Intune: How AI is transforming the daily work of endpoint administrators

Why SAP S/4HANA releases still fail – even when everything was "tested"

Building an RTK Rover: Web-controlled precision navigation

HashiCorp Vault as a central hub for key and certificate rotation

Implementation of the Modern Data Warehouse architecture in the Azure cloud

Why Modern Data Warehouses were created?

The implementation of the Modern Data Warehouse architecture in the Azure cloud

First step ­– data storage in Azure Data Lakes

Second step – moving data from the data lake to the staging table

Third step – using data in the train ML models

Fourth step – calculating data models

Fifth step – data analysis in Power BI

Advantages of the Modern Data Warehouse solution

Other possibilities

About the author

Leave a comment

Cancel reply

Join our team

Data Warehouse Analyst – Banking Sector (f/m/x)

Data Engineer with Microsoft Fabric (f/m/x)

You might also like

Intelligent integration: How AI is revolutionizing IT

How UX and UI can discover new possibilities for the automotive industry and autonomous cars

A new dimension of Data Analytics with SAP Business Data Cloud

All certifications are scams

ServiceNow and SLA

How to build a secure RAG GenAI chatbot on AWS with Amazon Bedrock

Intelligent boundaries: How to use AI without losing control

Selected aspects of the Now Assist solution – why is it worth your attention?

Speak, and AI will do the rest – a new era of human-machine interaction

Azure Virtual Desktop: A game-changer for modern workplaces

Using Salesforce in banking processes

FDA final guidance on Medical Device cybersecurity – what is new in 2025?

SUBSCRIBE AND DON'T FALL BEHIND

Join our team

Data Warehouse Analyst – Banking Sector (f/m/x)

Data Engineer with Microsoft Fabric (f/m/x)

Processing...

What we do angle-down

Industries angle-down

Who we are angle-down

Career angle-down

Trainings angle-down

News angle-down

Contact angle-down

Ta treść jest dostępna tylko w jednej wersji językowej. Nastąpi przekierowanie do strony głównej.

First step – data storage in Azure Data Lakes

What we do

Industries

Who we are

Career

Trainings

News

Contact

Ta treść jest dostępna tylko w jednej wersji językowej.
Nastąpi przekierowanie do strony głównej.