Datastage - Slowly Changing Dimensions - Talentain

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

7/3/2016

DatastageSlowlyChangingDimensions|Talentain

(http://talentain.com/)

Datastage Slowly Changing Dimensions


(http://talentain.com)

Resources (http://talentain.com/resources) Datastage Slowly Changing Dimensions

by Shradha Kelkar, Talentain Technologies

Basics of SCD
Slowly Changing Dimensions (SCDs) are dimensions that have data that
changes slowly,rather than changing on a time-based, regular schedule.
Type 1
The Type 1 methodology overwrites old data with new data, and therefore
does not track historical data at all.
Here is an example of a database table that keeps supplier information:

ShradhaKelkar

Supplier_KeySupplier_CodeSupplier_Name Supplier_State
123

ABC

Acme Supply CoCA

In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the
surrogate key is not necessary, since the table will be unique by the natural key (Supplier_Code).
However, the joins will perform better on an integer than on a character string.
Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply
overwrite this record:
Supplier_KeySupplier_CodeSupplier_Name Supplier_State
123

ABC

Acme Supply CoIL

Type 2
The Type 2 method tracks historical data by creating multiple records for a given natural key in the
dimensional tables with separate surrogate keys and/or dierent version numbers. With Type 2, we have
unlimited history preservation as a new record is inserted each time a change is made.
In the same example, if the supplier moves to Illinois, the table could look like this, with incremented
version numbers to indicate the sequence of changes:
Supplier_KeySupplier_CodeSupplier_Name Supplier_StateVersion
123

ABC

Acme Supply CoCA

http://talentain.com/resources/datastageslowlychangingdimensions

0
1/7

7/3/2016

124

DatastageSlowlyChangingDimensions|Talentain

ABC

Acme Supply CoIL

Another popular method for tuple versioning is to add eective date columns.
Supplier_KeySupplier_CodeSupplier_Name Supplier_StateStart_Date End_Date
123

ABC

Acme Supply CoCA

01-Jan-2000 21-Dec-2004

124

ABC

Acme Supply CoIL

22-Dec-2004

The null End_Date in row two indicates the current tuple version. In some cases, a standardized surrogate
high date (e.g. 9999-12-31) may be used as an end date, so that the eld can be included in an index, and
so that null-value substitution is not required when querying.

How to Implement SCD using DataStage 8.1 SCD


stage?
Step 1:Create a datastage job with the below structure1. Source le that comes from the OLTP sources
2. Old dimesion refernce table link
3. The SCD stage
4. Target Fact Table
5. Dimesion Update/Insert link

(http://i0.wp.com/talentain.com/wordpress/wpcontent/uploads/2011/08/scd-step1.jpg)
Figure 1

Step 2:To set up the SCD properties in the SCD stage ,open the stage and access the Fast Path

http://talentain.com/resources/datastageslowlychangingdimensions

2/7

7/3/2016

DatastageSlowlyChangingDimensions|Talentain

Figure 2

Step 3:The tab 2 of SCD stage is used specify the purpose of each of the pulled keys from the referenced
dimension tables.

Figure 3

Step 4:Tab 3 is used to provide the seqence generator le/table name which is used to generate the new
surrogate keys for the new or latest dimesion records.These are keys which also get passed to the fact
tables for direct load.

http://talentain.com/resources/datastageslowlychangingdimensions

3/7

7/3/2016

DatastageSlowlyChangingDimensions|Talentain

Figure 4

Step 5: The Tab 4 is used to set the properties for conguring the data population logic for the new and
old dimension rows. The type of activies that we can congure as a part of this tab are:
1. Generation the new Surrogate key values to be passed to the dimension and fact table
2. Mapping the source columns with the source column
3. Setting up of the expired values for the old rows
4. Dening the values to mark the current active rows out of multiple type rows

Figure 5
http://talentain.com/resources/datastageslowlychangingdimensions

4/7

7/3/2016

DatastageSlowlyChangingDimensions|Talentain

Step6:Set the derivation logic for the fact as a part of the last tab.

Figure 6

Step 7:Complete the remaining set up, run the job

Figure 7

Shradha Kelkar is an ETL Lead Developer at Talentain and has extensive experience in design, development and
deployment of large scale ETL projects in Banking and Telecom.
http://talentain.com/resources/datastageslowlychangingdimensions

5/7

7/3/2016

DatastageSlowlyChangingDimensions|Talentain

Contact us
Talentain Technologies Pvt. Ltd.
4, Whispering Woods,
2nd Floor, New DP Road,
Baner, Pune 411 045
Phone: +91 20 6520 0233
Connect with us on

(http://www.linkedin.com/company/talentain-technologies)

(https://twitter.com/talentain)

Other Oces (http://talentain.com/contact-us)

Employee Corner

(https://apps.paybooks.in/mylogin.aspx)

Quick Links
Technologies (http://talentain.com/technologies)
Talent Management Process (http://talentain.com/talent-management-services/process)
Business Models (http://talentain.com/talent-management-services/business-models)
Executive Hire (http://talentain.com/talent-management-services/executive-hire)
Talent Development Centre (http://talentain.com/talent-management-services/talent-developmentcentre)
Talent Engagement Program (http://talentain.com/careers-2/talent-engagement-program)
Client Engagement Program (http://talentain.com/clients/client-engagement-program)

Subscribe to Job Posts


Email *
http://talentain.com/resources/datastageslowlychangingdimensions

6/7

7/3/2016

DatastageSlowlyChangingDimensions|Talentain

Subscribe

Copyright 2015. All rights reserved.

http://talentain.com/resources/datastageslowlychangingdimensions

7/7

You might also like