Thursday, February 13, 2014

Star and Snowflake schemas


Selecting a particular type of dimensional model approach is a difficult art. Most of us will run into this common dilemma : star schema or snow-flake? Which one to choose, when and why? In this post I will try to decompose the basic difference between star schemas and snow-flake. Then we will also see the scenarios when snowflake schema is preferred over a star schema.

To understand the concept well let me introduce the following OLTP tables and then create both star and snowflake schemas from these OLTP tables.

OLTP tables:

Product (ProductID, ProductName, PricePerUnit_$, CategoryCode, CategoryName, CategoryDesc)

Note that this Product table has redundancy because a Category of products can have multiple Products in it and hence this table is not normalized to 3NF.


Store (StoreID, StoreName, LocationID, LocationName, StateCode, State) 

Again notice that even this table is not normalized to 3NF and has the following functional dependency: State -> Location, Store.
  
Sale (TransactionID, ProductID, StoreID, Quality, Amount, SaleDate)

This table is a transactional table which records the point of sales data that happen in each store.


Star schema model

So if we decide to build a star-schema, then we wouldn't choose to normalize the Product and Store tables and hence end up joining the above tables the way shown below to achieve a dimensional model. For the sake of simplicity no other dimension tables, including Date or Time dimension, have been introduced in this case.

Star schema tables:

In this case the OLTP tables have not changed much in structure at all apart from getting surrogate primary keys, of which we are in full control.

D_Product (ProductID, ProductName, PricePerUnit_$, CategoryCode, CategoryName, CategoryDesc)


D_Store (StoreID, StoreName, LocationID, LocationName, StateCode, State)


F_Transaction (TransactionID, ProductID, StoreID, Quality, Amount, SaleDate)



Snowflake model

To do a snowflake dimensional model, we will have to normalize the Product and Store tables and create multiple 3NF dimensional tables while retaining foreign keys to the separated dimensional tables in the dimensional table having most granular data. For example, foreign key to D_Location remains in the D_Store table since D_Store is at the lowest level of granularity. The snowflake model shown below clearly shows the increase in dimensional tables and relationships in the model.


Snowflake schema tables

D_Product (ProductID, ProductName, PricePerUnit_$, CategoryCode)


D_Category (CategoryCode, CategoryName, CategorDesc) 


 D_Store (StoreID, StoreName, LocationID)


D_Location
(LocationID, LocationName, StateCode) 


D_State
(StateCode, State) 


F_Transaction (TransactionID, ProductID, StoreID, Quality, Amount, SaleDate)
  

When do you use Snowflake Schema Implementation?

Ralph Kimball, the data warehousing guru, proposes three cases where snowflake implementation is not only acceptable but is also the key to a successful design:
               
·         Large customer dimensions where, for example, 80 percent of the fact table measurements involve anonymous visitors about whom you collect little detail, and 20 percent involve reliably registered customers about whom you collect much detailed data by tracking many dimensions

·         Financial product dimensions for banks, brokerage houses, and insurance companies, because each of the individual products has a host of special attributes not shared by other products

·         Multi-enterprise calendar dimensions because each organization has idiosyncratic fiscal periods, seasons, and holidays

Ralph Kimball recommends that in most of the other cases, star schemas are a better solution. Although redundancy is reduced in a normalized snowflake, more joins are required. Kimball usually advises that it is not a good idea to expose end users to a physical snowflake design, because it almost always compromises understandability and performance. So star-schema should be your preferred schema when designing your data warehouse unless your requirements falls into one of the three points mentioned above. Stay posted here, I will come up with more on Business Intelligence reporting next time.

Reference: http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_gs/owb/lesson3/starandsnowflake.htm

No comments:

Post a Comment