Just a note on Data before we get started, i loved this quote which i recently got to read –

“Data are not taken for museum purpose; they are taken as a basis for doing something. If nothing is to be done with the data, then there is no use in collecting any. The ultimate purpose of taking data is to provide a basis for action or a recommendation for action.” [an i quote from W. Edwards Deming, a renowned Statistician]

Before we could jump into details of types of Facts, let’s get some clarity on what a fact table is and what way it plays a role in a Warehouse. In the traditional Data warehouse model we have the Facts and dimensions, where the Facts play a major role giving more meaning and information to data and for the end users to get to use it on trend analysis and on many more reporting purposes.

As is noted and said by Kimball – “Fact tables are the foundation of the data warehouse. They contain the fundamental measurements of the enterprise, and they are the ultimate target of most data warehouse queries.”

Additive Fact tables

An additive Fact table is one, which has the transaction data giving the information on different dimensions of each transaction and the measurements (Amount/hours/value) on each. This is a kind of a fact table that most of the Data model would have used, with the measurements giving way for plenty of analysis and information. This is a kind of fact table which can bee summed/aggregated through all the dimension columns to get a grouped by analysis and result giving the summed Measurement/amount with respect to different dimension combinations.

Example is a sales table, where you have dimension like Customer, Date of sales, Sales ID and the measurements like Quantity and Amount/Price. In this case you can Group by a customer or the Day of sale or even by the Sales_ID to get an aggregated Amount and Quantity or both.

Types_of_Facts_additive

 

Semi Additive Facts

A semi additive Facts are the one which comprises of Dimensions and the measurement information, and the facts are summed up with respect to some of the dimensions and can not be summed on the rest.

A good example can be the banking balance tracking table, that gives the debit and credit information and the balance post any transaction of the respective Customer. Here it would not make any sense to sum up the amounts with respect to the date, which would be meaningless measurement. Whereas the same can be helpful to get the track on the balance amount on Customer wise.

Types_of_Facts_semiadditive

Non Additive Facts

A non additive Fact would be with the measurements and dimensions where there would be no summing/aggregation that can give any meaning to the data.

An example can be the table giving information on the Discounts with respect to the Products and on the respective dates. Here there can be no meaning made with the Discount being summed with any of the dimensions to make the data informative.

Types_of_Facts_nonadditive

Factless Facts

The fact tables, completely built on the Dimensions only are called as the Fact-less ones. These facts basically help in getting the combination of different dimensions to make a meaning with the intersection of them.

A very good example can be a table giving information on the Promotional code for each product on a date, here all are dimensions and they would still make sense in the end user’s point to build it as a fact table bringing the Product for a date with the right Promo code assigned to it.

Types_of_Facts_Factless

Hope this was helpful in understanding the basic types of facts that we can build as per the need and requirement on our data model. And the decision to pick the right Fact type has to always be a based on the scenario and the need of the end users (Reporting/Analytic).

Had a chance to discuss and debate on Surrogate keys at work today. SToo many assumptions too many confusions. So its time lets make it clear and put a full stop in place of the question mark.

A surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is nevertheless visible to the user or application. (as said by Hall, Owlett and Codd).

A surrogate represents an object in the database itself. The surrogate is internally generated by the system and is invisible to the user or application. (as said by Wieringa and De Jonge).

OMG this is good enough to carry you off from the real concept. Lets get into some lay men understanding,
The surrogate key was mailnly brought in use to act as a primary key in place of the natural primary key as in the operational database. As in any datawarehouse where there are multiple occurences of the Primary key ID’s used and so to have a unique key to individually refer to each record we make use of the Surrogate Primary key.

As a standard these Surrogate keys are the column of join between the Dimension or Hierarchy tables to the Fact tables, can be visualized as a star schema having one fact table joined to several Dimensions and Hierarchies.

Surrogate_key_blog_banner

Example:
Take an instance where for the employee ‘Emp1’ the Business Unit changes from B1 to B2:
If you used the natural primary key ‘Emp1’ for your employee within your datawarehouse everything would be allocated to Business Unit ‘B2’ even what actualy belongs to ‘B1.’

If you use surrogate keys, you could create on the 17th May a new record for the Employee ‘Emp1’ in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (i.e before 17th May) with the SID of the Employee ‘Emp1’ >> ‘B1.’ All new data (i.e after 17th May) would take the SID of the employee ‘Emp1’ >> ‘B2.’