Selecting a particular type of dimensional model approach is a
difficult art. Most of us will run into this common dilemma : star schema or
snow-flake? Which one to choose, when and why? In this post I will try to
decompose the basic difference between star schemas and snow-flake. Then we
will also see the scenarios when snowflake schema is preferred over a star
schema.
To understand the concept well let me introduce the following OLTP
tables and then create both star and snowflake schemas from these OLTP tables.
OLTP tables:
Product (ProductID, ProductName,
PricePerUnit_$, CategoryCode, CategoryName, CategoryDesc)
Note that this Product table
has redundancy because a Category of products can have multiple Products in it
and hence this table is not normalized to 3NF.
Again notice that even this table is not normalized to 3NF and has the
following functional dependency: State -> Location, Store.
Sale (TransactionID,
ProductID, StoreID, Quality, Amount, SaleDate)
This table is a transactional table which records the point of sales
data that happen in each store.
Star schema model
So if we decide to build a star-schema, then we wouldn't choose to
normalize the Product and Store tables and hence end up joining the above
tables the way shown below to achieve a dimensional model. For the sake of simplicity
no other dimension tables, including Date or Time dimension, have been
introduced in this case.
Star schema tables:
In this case the OLTP tables have not changed much in structure at all
apart from getting surrogate primary keys, of which we are in full control.
D_Product (ProductID,
ProductName, PricePerUnit_$, CategoryCode, CategoryName, CategoryDesc)
D_Store (StoreID,
StoreName, LocationID, LocationName, StateCode, State)
F_Transaction (TransactionID,
ProductID, StoreID, Quality, Amount, SaleDate)
Snowflake model
To do a snowflake dimensional model, we will have to normalize the Product
and Store tables and create multiple 3NF dimensional tables while retaining
foreign keys to the separated dimensional tables in the dimensional table
having most granular data. For example, foreign key to D_Location remains in
the D_Store table since D_Store is at the lowest level of granularity. The
snowflake model shown below clearly shows the increase in dimensional tables and relationships
in the model.
Snowflake schema tables
D_Product (ProductID,
ProductName, PricePerUnit_$, CategoryCode)
D_Category (CategoryCode,
CategoryName, CategorDesc)
D_Store (StoreID,
StoreName, LocationID)
D_Location (LocationID, LocationName, StateCode)
D_State (StateCode, State)
F_Transaction (TransactionID,
ProductID, StoreID, Quality, Amount, SaleDate)
When do you use Snowflake Schema
Implementation?
Ralph Kimball, the data warehousing guru, proposes three cases where
snowflake implementation is not only acceptable but is also the key to a
successful design:
·
Large customer dimensions where, for example, 80
percent of the fact table measurements involve anonymous visitors about whom
you collect little detail, and 20 percent involve reliably registered customers
about whom you collect much detailed data by tracking many dimensions
·
Financial product dimensions for banks,
brokerage houses, and insurance companies, because each of the individual
products has a host of special attributes not shared by other products
·
Multi-enterprise calendar dimensions because
each organization has idiosyncratic fiscal periods, seasons, and holidays
Ralph Kimball recommends that in most of the other cases, star schemas
are a better solution. Although redundancy is reduced in a normalized
snowflake, more joins are required. Kimball usually advises that it is not a
good idea to expose end users to a physical snowflake design, because it almost
always compromises understandability and performance. So star-schema should be
your preferred schema when designing your data warehouse unless your
requirements falls into one of the three points mentioned above. Stay posted
here, I will come up with more on Business Intelligence reporting next time.
Reference: http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_gs/owb/lesson3/starandsnowflake.htm
No comments:
Post a Comment