Exploring Big Data, Analytics and Business Intelligence: February 2014

Thursday, February 27, 2014

Business Intelligence - tool based metadata modeling

A data warehouse is designed, built and loaded with data. Fast forwarding and looking forward to the next phase of the project, it would most likely be implementing a sophisticated Business Intelligence (BI) reporting, analytics platform. Enterprises - big, medium and small, all like to make use of very sophisticated BI tools/software or services to aid decision making.

In this post, let us explore how a "BI Metadata Model" is framed to enable capabilities for authoring, viewing and modifying BI reports and interactive visualizations - online or offline, in Microsoft Office applications or in-process applications, in the office or on the go. In majority of the organizations, the tool used for BI modeling is most likely to be provided by one amongst the leading BI software/services vendors shown below in the Gartner BI Magic Quadrant image:

Gartner BI Magic Quadrant - 2014

A BI tool supports in sourcing data from heterogeneous data sources (OLAP cubes, relational database, spreadsheet, flat files etc.)and is not restricted to just a data warehouse as its data source. An ideal BI metadata model can be defined in terms of 3 layers: 1) Database layer 2) Business Layer 3) Presentation Layer. All the layers can be equated to a folder containing multiple database objects either in their original form or modified form. The first image shown below is a diagrammatic representation of an ideal BI Metadata Model. The second image below shows an actual real-world Oracle BI metadata model with the above said 3 layers. Let us go through each of the 3 layers to gain basic understanding of the use of building a metadata model with 3 layers.

3 layer BI Metadata Model

Real-world Oracle BI Metadata Model

Physical Data: BI modeler imports all the required data base objects from the source data warehouse. Generally tables, views, materialized views, stored procedures are imported to this layer with very little modifications or changes. In this layer, these objects should be related to each other to form the star-schema model. If there are multiple star-schema, conformed dimension joins of many fact tables are also done so enable drill-through capabilities.

Business Layer: This layer pulls metadata (objects) from physical layer and supports any logical modifications required to handle various business and user requirement. The database objects are all named the way a data modeler understood them and not the way an user can understand. Static filters can also be built inside the tables or views to implement data level security. More such modifications arising from business requirements are all handled here in this layer.

Presentation Layer: From the name of this layer it is evident that the purpose of this layer is to create a very presentable form of metadata model to the business users. Combining objects by user group or by subject areas, adding object level security based on user groups and roles are some of the main things done at this level.

Following this 3 layer approach a BI metadata model derives the following advantages:

Separates the logical behavior of the application from the physical model
Provides the ability to federate multiple physical sources to the same logical object, enabling aggregate navigation and partitioning
Provides dimension conformance and isolation from changes in the physical sources
Enables the creation of portable BI Applications
Hides the complexity of the source data models from the user

There are many other uses and advantages of having each of the 3 layers in a BI metadata model. It would require a blog post on each of the layer separately to discuss those uses and advantages in depth. However, in its core essence the 3 layer model supports easy of usability and simplicity of maintenance.

If your curiosity has now just gone to another level in terms of knowing what next after BI metadata modeling, stay tuned here for my next post on BI report development.

References:

http://www.informationweek.com/big-data/big-data-analytics/gartner-bi-magic-quadrant-winners-and-losers/d/d-id/1114013

http://docs.oracle.com/cd/E17904_01/bi.1111/e10540/intro.htm http://www.rittmanmead.com/2011/12/agile-exadata-obiee-puzzle-pieces/

Thursday, February 13, 2014

Star and Snowflake schemas

Selecting a particular type of dimensional model approach is a difficult art. Most of us will run into this common dilemma : star schema or snow-flake? Which one to choose, when and why? In this post I will try to decompose the basic difference between star schemas and snow-flake. Then we will also see the scenarios when snowflake schema is preferred over a star schema.

To understand the concept well let me introduce the following OLTP tables and then create both star and snowflake schemas from these OLTP tables.

OLTP tables:

Product (ProductID, ProductName, PricePerUnit_$, CategoryCode, CategoryName, CategoryDesc)

Note that this Product table has redundancy because a Category of products can have multiple Products in it and hence this table is not normalized to 3NF.

Store (StoreID, StoreName, LocationID, LocationName, StateCode, State)

Again notice that even this table is not normalized to 3NF and has the following functional dependency: State -> Location, Store.

Sale (TransactionID, ProductID, StoreID, Quality, Amount, SaleDate)

This table is a transactional table which records the point of sales data that happen in each store.

Star schema model

So if we decide to build a star-schema, then we wouldn't choose to normalize the Product and Store tables and hence end up joining the above tables the way shown below to achieve a dimensional model. For the sake of simplicity no other dimension tables, including Date or Time dimension, have been introduced in this case.

Star schema tables:

In this case the OLTP tables have not changed much in structure at all apart from getting surrogate primary keys, of which we are in full control.

D_Product (ProductID, ProductName, PricePerUnit_$, CategoryCode, CategoryName, CategoryDesc)

D_Store (StoreID, StoreName, LocationID, LocationName, StateCode, State)

F_Transaction (TransactionID, ProductID, StoreID, Quality, Amount, SaleDate)

Snowflake model

To do a snowflake dimensional model, we will have to normalize the Product and Store tables and create multiple 3NF dimensional tables while retaining foreign keys to the separated dimensional tables in the dimensional table having most granular data. For example, foreign key to D_Location remains in the D_Store table since D_Store is at the lowest level of granularity. The snowflake model shown below clearly shows the increase in dimensional tables and relationships in the model.

Snowflake schema tables

D_Product (ProductID, ProductName, PricePerUnit_$, CategoryCode)

D_Category (CategoryCode, CategoryName, CategorDesc)

D_Store (StoreID, StoreName, LocationID)

D_Location (LocationID, LocationName, StateCode)

D_State (StateCode, State)

F_Transaction (TransactionID, ProductID, StoreID, Quality, Amount, SaleDate)

When do you use Snowflake Schema Implementation?

Ralph Kimball, the data warehousing guru, proposes three cases where snowflake implementation is not only acceptable but is also the key to a successful design:

· Large customer dimensions where, for example, 80 percent of the fact table measurements involve anonymous visitors about whom you collect little detail, and 20 percent involve reliably registered customers about whom you collect much detailed data by tracking many dimensions

· Financial product dimensions for banks, brokerage houses, and insurance companies, because each of the individual products has a host of special attributes not shared by other products

· Multi-enterprise calendar dimensions because each organization has idiosyncratic fiscal periods, seasons, and holidays

Ralph Kimball recommends that in most of the other cases, star schemas are a better solution. Although redundancy is reduced in a normalized snowflake, more joins are required. Kimball usually advises that it is not a good idea to expose end users to a physical snowflake design, because it almost always compromises understandability and performance. So star-schema should be your preferred schema when designing your data warehouse unless your requirements falls into one of the three points mentioned above. Stay posted here, I will come up with more on Business Intelligence reporting next time.

Reference: http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_gs/owb/lesson3/starandsnowflake.htm

Tuesday, February 4, 2014

Implementation of a DW/BI project - by a green-horn designer!

Things are exciting in the world of DW and BI. Many of us are comfortable with the terminologies and concepts like dimensions, facts, star-schema, snow-flake schema, types and categories of facts and dimensions etc. While still learning DW/BI in much detail, some of us might be still tempted to get real and start designing and modeling a data warehouse.

This kind of curiosity leads us to think about things like DW/BI project phases and implementation that happens in a real world scenario. To understand this at a very high level, we will consider the case of an accounting firm and its need for a business analytics capability. As a DW/BI solution expert our job is to gather requirement, analyze, design and implement a technical solution.

1. Requirement gathering and understanding the data: Business processes and data are centric to our project and solution. Hence first and foremost task is to know the business processes and understand the data. The processes and sources that create data, grain and flow of data are all important things to be understood and documented first. Diagram 1 shown below, represents the current data flow of the accounting company for which this DW/BI solution is being built. There are multiple sources, a path and grain which defines the data. Once this picture is captured the next job is to plan the way of project implementation.

Diagram 1

2. Project plan and phases: After gathering the requirement and having a basic understanding of the data, one is in a good stead to know how the project can be implemented and propose the plan to the client. This accounting company has no database or data warehouse in place. Hence the project will start from understanding data, 'ETL'ing it and warehousing it. So creating ETL logics, staging database and data warehouse are all part of the project. After creating the data warehouse, data can be fed to an OLAP cube and ultimately be used for BI purposes in the form of reports, charts, scorecards, ad-hoc queries etc. Apart from the milestone events shown in Diagram 2, generally more than one sub-events will precede or follow the main events.

Diagram 2

3. The outcome - DW/BI solution: Diagram 3 represents the high level diagram of the DW/BI solution that will provide business analytics capabilities to the accounting firm. It will be now able to fire "what-if" queries to the BI system and obtain answers. Decisions of the company and recommendations to the client will be based on the answers provided by the BI solution to the business users and decision makers of the accounting firm!

Diagram 3

The accounting firm is counting on us while making their decisions! As DW/BI solution experts, it will be our job to provide the correct and most accurate BI answers. To be successful in such a job we need more technical skills and expertise along with domain knowledge. Count on Kimball and Ross to provide us just that!

Inputs:

1) Typical data flow in an accounting firm - Professor Boulger, MIS Dept., Eller College of Management