Data warehousing concepts

Long before Hadoop and Big Data made a big splash on the IT stage, the necessity for “Data warehouses” was growing in the business world. There was disparate data on Excel sheets, flat files, data bases and there was no single understanding of this data. We have seen all this and more in our previous blog post.

“Data warehouses” were created and a well designed data warehouse comprised of cleaned data from different sources that enables an organization to make critical business decisions.  We dig deeper into Data warehousing by discussing the Father of Data warehousing followed by the different Data warehousing concepts in this blog post.

 

Father of Data warehousing:

Bill Inmon known as the father of data warehousing preferred the top down approach to designing data warehouses. Ralph Kimball on the other hand preferred a bottom up approach to building a data warehouse. Both approaches have their pros and cons and it is up to the organization to choose the approach that best suits their business needs.

 

Data warehousing concepts:

The data that comes from CRM systems, SCM systems, ERPs and other legacy systems will be duplicated and in different formats. Hence, this data is cleansed, extracted and loaded into the data warehouse. The data warehouse is then used to generate reports on different business goals. This post will cover the concepts related to Kimball’s approach to data warehousing.

Source systems: The data from different systems will be in the form of flat files, legacy systems and these are the different source systems.  The data from the different systems are natural “stovepipes” as their dimensions are not conformed.

Data staging area: The data staging area is the area between the source systems and the presentation server. It is a temporary area that holds data that needs to be cleaned, de-duplicated and transformed into data for the data warehouse. It is important to note that the data in the data staging area is not present on a single machine but it is scattered across multiple machines.

Presentation server: The data on the presentation server is ready for querying and preparing reports. The data can be in the form of dimensional model (Kimball model) or as relational model (Inmon model)

Dimensional model: The Kimball approach to data modeling is ‘dimensional modeling’. In dimensional models, data is not represented by ER diagrams but rather by dimension tables and fact tables.

Dimension tables: Dimension tables hold a type of information and ‘Attributes’ are values within a dimension. As an example, ‘customer’ is a type of ‘dimension’ and ‘custid’ is an attribute. Dimension tables are connected to fact tables.

Fact tables: “Fact tables” contain the “measurements of business” (Ralph Kimball). Each fact table is connected to the dimension table but not to each other.

Data mart: An EDW (Enterprise Data warehouse) is the sum of all “data marts”. A “data mart” is a smaller portion of the data warehouse and each data mart can have their dimensional models for different business requirements.

OLAP: OLAP is ‘Online Analytic Processing’ that is primarily used to query the data warehouse and generate reports. OLAP tools “drill down” and “drill up” to query the data that is in multidimensional format.

DataWarehouseConcepy

We have explored the various concepts of Data warehousing in this blog post. Understanding these concepts will enable us to make better business decisions.

About Pavan Gumaste

Pavan Rao is a programmer / Developer by Profession and Cloud Computing Professional by choice with in-depth knowledge in AWS, Azure, Google Cloud Platform. He helps the organisation figure out what to build, ensure successful delivery, and incorporate user learning to improve the strategy and product further.

Leave a Comment

Your email address will not be published. Required fields are marked *


Scroll to Top