Sunday, 24 June 2018

Data Warehouse Concepts

Dataware Housing Document

1.The fact table mainly consists of business facts and foreign keys that refer to primary keys in the dimension tables. A dimension table consists mainly of descriptive attributes that are textual fields.

2.A dimension table contains a surrogate key, natural key, and a set of attributes. On the contrary, a fact table contains a foreign key, measurements, and degenerated dimensions.

3.Dimension tables provide descriptive or contextual information for the measurement of a fact table. On the other hand, fact tables provide the measurements of an enterprise.

4.When comparing the size of the two tables, a fact table is bigger than a dimensional table. In a comparison table, more dimensions are presented than the fact tables. In a fact table, less numbers of facts are observed.

5.The dimension table has to be loaded first. While loading the fact tables, one should have to look at the dimension table. This is because the fact table has measures, facts, and foreign keys that are the primary keys in the dimension table.



Read more: Dimension Table and Fact Table | Difference Between http://www.differencebetween.net/technology/hardware-technology/dimension-table-and-fact-table/#ixzz4zVLQgywC



Types of Facts
There are three types of facts:

Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.

Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.

Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

Let us use examples to illustrate each of the three types of facts. The first example assumes that we are a retailer, and we have a fact table with the following columns:


Date
Store
Product
Sales_Amount
The purpose of this table is to record the sales amount for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represents the total sales amount for that week.

Say we are a bank with the following fact table:


Date
Account
Current_Balance
Profit_Margin
The purpose of this table is to record the current balance for each account at the end of each day, as well as the profit margin for each account for each day. Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information). Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level.

Types of Fact Tables
Based on the above classifications, there are two types of fact tables:

Cumulative: This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.
Snapshot: This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.

In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.

In our example, recall we originally have the following table:

Customer Key Name State
1001 Christina Illinois
After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:

Customer Key Name State
1001 Christina California
Advantages:

- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.

Disadvantages:

- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before.

Usage:

About 50% of the time.

When to use Type 1:

Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes

No comments:

Post a Comment