Data Warehouse Architecture: The Basic Structure
There are a few data warehouse architectural options you can explore but all contain these four must-have components regardless of your architectural choice:
- The data source layer – where the original information “lives” in the database
- The data staging layer – where information is extracted and transformed so it’s ready for the data warehouse
- The data storage layer – where information is stored
- The data presentation layer – where end-users interact with information
What Are Your Data Warehouse Architecture Options?
You have four basic data warehouse architecture options:
In the top-down approach, the data warehouse is the centre of the analytic environment. From the data warehouse, information is distributed to data marts (a subject-specific repository of information).
The reason it’s called a “top-down” approach is that the data warehouse is the departure point for all information and as a result, information is consistent and standardised. However, because the top-down approach requires a detailed enterprise data model as well as the physical infrastructure for the staging area, data warehouse, and marts, it’s more expensive than other architectural choices.
Data marts are deployed quickly through the bottom-down approach. The idea is that these marts will contain all of the information users will ever want or need. Each data mart builds on the next one, so users can query information across marts for comprehensive results.
The bottom-down approach tends to be less successful in distributed, decentralised organisations because business units or departments are inclined to create independent marts that aren’t integrated.
A hybrid model combines the best of the top-down and bottom-up approaches. It relies upon an ETL to store and manage local models in data marts while synchronising differences between them. After you deploy a few dependent data marts, you’ll flesh out the data warehouse by back-filling it with information.
The benefit of a hybrid approach is that it’s fast, although back-filling a data warehouse is a disruptive process that doesn’t really deliver value. Also, users might get confused about which data mart to query.
The federated approach is all about integration. It involves integrating a variety of heterogeneous architectures into one for greater simplicity. Some call it the “architecture of architectures,” because it involves unifying a bunch of disparate architectures into one.
One of the shortcomings of the federated approach is that it’s not well-documented. Moreover, it encourages fragmenting analytics, so you don’t have a single organisational view.
Data Warehouse Best Practices
When creating the architecture for your data estate, keep in mind the following data warehouse best practices.
For a start, think about your future business needs, rather than focusing on what’s happening right now. Experts say that data warehouses can take several years to deliver ROI, so carefully consider what your requirements will be down the road, and what possible technologies are in the pipeline so you’re not locked into an outdated approach.
Second, data quality matters enormously. If your data isn’t clean and accurate, you’ll have problems later on. Errors tend to creep in at the transactional level. If someone doesn’t enter information correctly before it even goes into the data warehouse, it will have a negative impact on analytics.
Third, you must think of creating a data warehouse as a business-wide initiative, rather than just something the IT department is doing. The point of a data warehouse is to benefit the whole organisation; it enables effective data management, which saves time, effort, and money. Moreover, you’ll get deeper insights into your information, so you can make better business decisions.
Azure SQL Data Warehouse: Putting Data Architecture Best Practices into Play
The Microsoft Azure SQL Data Warehouse helps you put best practices into play at your firm. One of the benefits of using the Azure SQL Data Warehouse is that the partner ecosystem is huge. It spans data preparation providers, ingestion services, and visualisation tools. Further, the ecosystem will just keep growing, meaning that the data warehouse will only become more useful with time.
In May 2018, Microsoft announced that the Azure Databricks Service would be available. This Apache-based analytics service allows you to develop ETL processes, otherwise known as data cleaning and also has built-in support for the Azure SQL Data Warehouse.
A third advantage of using the Azure SQL Data Warehouse is that it allows you to democratise data throughout the enterprise. This means workers across the company have access to information that they didn’t have before, which can lead to better decision making and new opportunities.
Although the Azure SQL Data Warehouse makes information more accessible, it doesn’t sacrifice availability for security. The data warehouse features auditing, threat detection, and data encryption (among other features) so that your information is always safe.
Enlighten: Helping You Implement Data Warehouse Architecture Best Practices
Enlighten is a proud Microsoft partner, winning two awards in 2017, including one for data platforms and analytics. We have over 20 years of experience helping our customers get the most out of their data. To learn more about implementing data warehouse architecture best practices at your company, contact us today