Monday, January 27, 2014

Why "Kimball model" is the preferred way to build a Data Warehouse?

Okay! we got to agree with the philosophical thought that this "world is full of contradictions". Else, how would one explain this sudden change in approach to building a database (data warehouse is also a database).

It took us almost a full semester to understand normalized database model taught in the EDM class. Come and go winter break, things have changed. Father of Data warehouse/Business Intelligence, Mr. Kimball, says he has nothing to do with 'normalization'. He wants it all the 'de-normalized' way; strange isn't it!?

So why is Kimball advocating a completely different approach to a database (data warehouse) model design? Other than the obvious reason stated above (world if full of contradictions), what are the main reasons behind Kimball advocating such a polar opposite design approach?

Doing some study of various resources like blogs, whitepapers, wikipedia, text books allowed me to find many factors that support Kimball and his de-normalized, dimensional, star-schema  approach to data warehouse. So to kill all our confusions and dilemmas let us now look into some of these factors that are in favor of "Kimball model" as we will call it here.


Complex ER Model

Simple Kimball (Star-Schema) Model
Ease of use for the business users: The BI and DW solutions are designed to be consumed by business users. Try telling them that, to find count of Employees in a particular department of their organization they have to find their way through 6 normalized tables and have 5 join conditions and a couple of where clause filters. That will drive them crazy and as a BI resource your job will be on the line for doing that! So in such a scenario think of a Kimball model. Isn't it easy for the users when they can find all the information needed about an employee in one dimension table and may be some more measures concerned in a fact table with just 1 join between these 2 tables? That is the first and foremost advantage that Kimball model brings to the table.

Performance: Think of this ER model query and data set; 10 discrete tables, 9 joins, sorting and some more column level calculations on couple of million rows of data. The complexity of this query means that the database software would be really slow in churning out the result set to the user. This is where Kimball model comes up trumps. Dimensional tables surrounding a centralized fact table means that the software is almost sure in predicting its query execution plan. This means that the performance of the database is quicker and therefore the BI report users get answers to their ad-hoc query almost instantly.

Enhancement, maintenance or extensibility: Let us now think from the perspective of a database modeler or designer. The model is developed and being used. Now comes a change request that demands another attribute to be shown in the report. If this was an ER model, then each such modification would have to be considered carefully, because of the complex dependencies between database tables and would have ripple effects. But a Kimball model supports such changes easily. All that the modeler has to do is find the correct dimension and add a column or alter the table without affecting any other tables.

No wonder then with all these great advantages over traditional transaction supporting database model, the Kimball model is being seen as the best model to build a data warehouse. And knowing these advantages makes one understand and accept the contradictions easily and efficiently.

Now that we have accepted to follow Kimball and his model, we will look into more interesting things that this type of database modeling throws at us. But I will reserve that for another day and time. By then I hope, we would have acquitted ourselves well with terminologies like data warehouse, business intelligence, star-schema, fact and dimension tables, types of facts and dimensions

References: 
http://en.wikipedia.org/wiki/Dimensional_modeling
http://www.kimballgroup.com/1997/08/02/a-dimensional-modeling-manifesto/