Starting Slow: The General Structure
Starting Slow: The General Structure
Starting Slow: The General Structure
s flavours is the set of predefined ETL mappings, sessions and workflows that come with it. Although there is a good chance that the OLTP data source is highly customised, the online Oracle documentation is full of information that can make the ETL developers life easier. That said, there are some important ETL tasks whose logic isnt very easy to find. They are like black boxes: you customise them a little bit some fields behind the X_CUSTOM placeholder here, a small datatype change in the target table there and they operate their magic. So lets reveal the truth behind the veil of a very important out of the box ETL logic in OBI Apps: the Slowly Changing Dimension management mappings. (NOTE: The rest of this article is based on the assumption that you already know what a Slowly Changing Dimension (SCD) is and all its different types.)
Figure 1: The Navigator. Aha! The Product dimension is a SCD! The structure of an SCD extract, transformation and load is essentially managed by three mappings: SDE_<entity>Dimension SIL_<entity>Dimension SIL_<entity>Dimension_SCDUpdate The first mapping is available in an Adapter folder, related to the data source you are using to feed your precious data warehouse, and looks like Figure 2. It basically joins together the main OLTP table with auxiliary ones to extract data to send to the staging table (which usually take the name of W_<entity>_DS).
Figure 2: A typical SDE mapping. Just dont look into the mapplet The second mapping is in SILOS, and manages the insert and update of the rows coming from the staging tables into the final reporting table (usually called W_<entity>_D). Dealing with surrogate keys and lookups is likely to be more complex than the SDE.
Figure 3: The SIL mapping. See the red circles? Hic sunt leones... The third mapping is in SILOS as well, and its only task is to update the effective dates and a set of column values in the SCD. The effect range only the most recent records or the whole history depends on specific variable settings.
Figure 4: The SCDUpdate mapping. Even without, er, update strategy transformation
The S E i l the E E TIVE TO tes i st i ht f the t source if il le) eeps track of the last updated date on the ain entit , plus all relevant auxiliary tables. This information ill then be used to evaluate if a specific row is eli ible for Type II update or not.
The SI mapping contains the bulk of the logic to update the Type II fields and populate the effective dates.
"
!
#
P ATE decides the surrogate primary key population as well: if it is a row to be inserted, then takes the current value of a sequence generator, else uses the key of the record to update.
4%$ 3
1 2
'
4%$ 3
. . .
Is it a new row? i.e. does the lookup into the target table retrieve anything?) ave any of the Type II fields changed? Are any of the last updated dates on the incoming row greater than the ones on the existing row? field will provide the update logic as per igure 7.
0 ( &
1 2
igure 7: All possible values of the pdate lag. This time, X does OT mark the spot.
Finally, the SIL mapping decides the update strategy of the target reporting table as follows:
lear as mud? In the likely case that the aforementioned pieces of logic are a bit cryptic, my suggestion is to have a thorough look at the variables/fields and write down an example.
The i 8
e:
he
es
The S DUpdate mapping, as stated before, takes care of updating the historical records already present in the target table. The SQL Qualifier transformation contains a SQL override structured as follows:
If the record is a new one UPDATE_FLG either I,B or S) then Insert, else UPDATE_FLG = U or D) Update.
8 EA D A A
6B
If the record is a new row not marked for soft deletion PDATE_FLG =I or S), then returns the I_DATE_VAR either the non-defaulted EFFE TIVE_TO date from the data source, or 0 /0 / 7 ). If the record is a new row marked for soft deletion B), ret urns either the non-defaulted EFFE TIVE_TO date from the data source or the first not null between the I P_DELETED_ON_DT which anyway is in almost all cases NULL) and the session start time. If the record is a simple update, then returns the I_DATE_VAR, unless the rare case of the EFFE TIVE_FROM date in input is greater than the I_DATE_VAR itself, in which case it returns the original EFFE TIVE_TO date from the data source.
67
6B
6B
AA A
6B
If the record is to be inserted PDATE_FLG = I or B), it returns the EFFE TIVE_FROM date from the data source. If it is null, then defaults it to 0 /0 / 899. If the record is a Type II change S), it returns the greatest last updated date among all source tables involved. If it is null, then returns the session start time. If the record is a simple update ), it returns the EFFE TIVE_FROM date of t he target row to be updated. In all remaining cases pdate Flag is null or D), returns LL.
@59
55 B
E TIVE
OM_DT is chosen among the auxiliary fields depending on the value of the
C A
C A
Figure : The S L override in the SCDUpdate mapping. For all those who always dreamt of becoming a DBA. The DELTA_TABLE sub-query does a first cut on the rows to retrieve, selecting only the target records that have been processed in the preceeding load. The outer sub-query, SCD_HISTORY, retrieves only records that have been historically tracked at least once. Finally, the whole query checks the $$UDATE_ALL_HISTORY (sic!)parameter and defines if it has to con sider all historical records or only the most current ones. In other words, if the parameter is set to Y, then each Type 1 change will be propagated across all records related to a specific natural key, otherwise only the records flagged as current. Note how the recordset is ordered by EFFECTIVE_FROM date in decreasing order, which means that the most current records will be first for each natural key.
Figure 9: S DUpdate fields. All this work just for a date! So, thats basically it. omplex? Sure is. Improvable? Possibly. You will notice a few little differences depending on the entity all examples here have been derived from the Product S D Dimension in SILOS - but the general strategy remains the same.
RR