Okay! we got to agree with the philosophical thought that
this "world is full of contradictions". Else, how would one explain
this sudden change in approach to building a database (data warehouse is also a
database).
It took us almost a full semester to understand normalized
database model taught in the EDM class. Come and go winter break, things have
changed. Father of Data warehouse/Business Intelligence, Mr. Kimball, says he
has nothing to do with 'normalization'. He wants it all the 'de-normalized' way;
strange isn't it!?
So why is Kimball advocating a completely different approach
to a database (data warehouse) model design? Other than the obvious reason
stated above (world if full of contradictions), what are the main reasons behind
Kimball advocating such a polar opposite design approach?
Doing some study of various resources like blogs, whitepapers,
wikipedia, text books allowed me to find many factors that support Kimball and
his de-normalized, dimensional, star-schema approach to data warehouse. So to kill all our
confusions and dilemmas let us now look into some of these factors that are in favor
of "Kimball model" as we will call it here.
Now that we have accepted to follow Kimball and his model,
we will look into more interesting things that this type of database modeling
throws at us. But I will reserve that for another day and time. By then I hope,
we would have acquitted ourselves well with terminologies like data warehouse,
business intelligence, star-schema, fact and dimension tables, types of facts
and dimensions
References:
http://en.wikipedia.org/wiki/Dimensional_modeling
http://www.kimballgroup.com/1997/08/02/a-dimensional-modeling-manifesto/
Complex ER Model |
Simple Kimball (Star-Schema) Model |
Ease of use for
the business users: The BI and DW solutions are designed to be consumed by
business users. Try telling them that, to find count of Employees in a
particular department of their organization they have to find their way through
6 normalized tables and have 5 join conditions and a couple of where clause
filters. That will drive them crazy and as a BI resource your job will be on
the line for doing that! So in such a scenario think of a Kimball model. Isn't
it easy for the users when they can find all the information needed about an employee
in one dimension table and may be some more measures concerned in a fact table
with just 1 join between these 2 tables? That is the first and foremost
advantage that Kimball model brings to the table.
Performance: Think
of this ER model query and data set; 10 discrete tables, 9 joins, sorting and
some more column level calculations on couple of million rows of data. The complexity
of this query means that the database software would be really slow in churning
out the result set to the user. This is where Kimball model comes up trumps.
Dimensional tables surrounding a centralized fact table means that the software
is almost sure in predicting its query execution plan. This means that the
performance of the database is quicker and therefore the BI report users get
answers to their ad-hoc query almost instantly.
Enhancement,
maintenance or extensibility: Let us now think from the perspective of a database
modeler or designer. The model is developed and being used. Now comes a change
request that demands another attribute to be shown in the report. If this was
an ER model, then each such modification would have to be considered carefully,
because of the complex dependencies between database tables and would have
ripple effects. But a Kimball model supports such changes easily. All that the
modeler has to do is find the correct dimension and add a column or alter the
table without affecting any other tables.
No wonder then with all these great advantages over
traditional transaction supporting database model, the Kimball model is being
seen as the best model to build a data warehouse. And knowing these advantages
makes one understand and accept the contradictions easily and efficiently.
References:
http://en.wikipedia.org/wiki/Dimensional_modeling
http://www.kimballgroup.com/1997/08/02/a-dimensional-modeling-manifesto/