Transformation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 28

What is Transformation?

A transformation is a repository object which reads the data, modifies the data and passes the
data. Transformations in a mapping represent the operations that the integration service performs
on the data.

Transformations can be classified as active or passive, connected or unconnected.

Active Transformations:
A transformation can be called as an active transformation if it performs any of the following
actions.

 Change the number of rows: For example, the filter transformation is active because it
removes the rows that do not meet the filter condition. All multi-group transformations
are active because they might change the number of rows that pass through the
transformation.
 Change the transaction boundary: The transaction control transformation is active
because it defines a commit or roll back transaction.
 Change the row type: Update strategy is active because it flags the rows for insert, delete,
update or reject.

Note: You cannot connect multiple active transformations or an active and passive


transformation to the downstream transformation or transformation same input group. This is
because the integration service may not be able to concatenate the rows generated by active
transformations. This rule is not applicable for sequence generator transformation.

Passive Transformations:

Transformations which does not change the number of rows passed through them, maintains the
transaction boundary and row type are called passive transformation.

Connected Transformations:

Transformations which are connected to the other transformations in the mapping are called
connected transformations.

Unconnected Transformations:
An unconnected transformation is not connected to other transformations in the mapping and is
called within another transformation, and returns a value to that.
Source Qualifier Transformation
The source qualifier transformation is an active, connected transformation used to represent the
rows that the integrations service reads when it runs a session. You need to connect the source
qualifier transformation to the relational or flat file definition in a mapping. The source qualifier
transformation converts the source data types to the Informatica native data types. So, you should
not alter the data types of the ports in the source qualifier transformation.

The source qualifier transformation is used to do the following tasks:

 Joins: You can join two or more tables from the same source database. By default the
sources are joined based on the primary key-foreign key relationships. This can be
changed by explicitly specifying the join condition in the "user-defined join" property.
 Filter rows: You can filter the rows from the source database. The integration service
adds a WHERE clause to the default query.
 Sorting input: You can sort the source data by specifying the number for sorted ports.
The Integration Service adds an ORDER BY clause to the default SQL query
 Distinct rows: You can get distinct rows from the source by choosing the "Select
Distinct" property. The Integration Service adds a SELECT DISTINCT statement to the
default SQL query.
 Custom SQL Query: You can write your own SQL query to do calculations.

Property Description
SQL Query To specify a custom query which replaces the default query.
User-Defined Join Condition used for joining multiple sources.
Specifies the filter condition the Integration Service applies when querying
Source Filter
rows.
Number of Sorted
Used for sorting the source data
Ports
Sets the amount of detail included in the session log when you run a session
Tracing Level
containing this transformation.
Select Distinct To select only unique rows from the source.
Pre-session SQL commands to run against the source database before the
Pre-SQL
Integration Service reads the source.
Post-session SQL commands to run against the source database after the
Post-SQL
Integration Service writes to the target.
Output is
Specify only when the source output does not change between session runs.
Deterministic
Output is Specify only when the order of the source output is same between the
Repeatable session runs.
Note: For flat file source definitions, all the properties except the Tracing level will be disabled.

Viewing the Default Query or Generating the SQL query:

For relational sources, the Integration Service generates a query for each Source Qualifier
transformation when it runs a session. To view the default query generated, just follow the below
steps:

 Go to the Properties tab, select "SQL Query" property. Then open the SQL Editor, select
the "ODBC data source" and enter the username, password.
 Click Generate SQL.
 Click Cancel to exit.

You can write your own SQL query rather than relaying the default query for performing
calculations.

Note: You can generate the SQL query only if the output ports of source qualifier transformation
is connected to any other transformation in the mapping. The SQL query generated contains only
the columns or ports which are connected to the downstream transformations.

Specifying the "Source Filter, Number Of Sorted Ports and Select Distinct" properties:

Follow the below steps for specifying the filter condition, sorting the source data and for
selecting the distinct rows.

 Go to the properties tab.


 Select "Source Filter" property, open the editor and enter the filter condition (Example:
employees.department_id=100) and click OK.
 Go to the "Number Of Sorted Ports" property and enter a value (Example: 2). This value
(2) means to sort the data on the first two ports in the source qualifier transformation.
 Tick the check box for the "Select Distinct" property.

Joins:

The SQL transformation can be used to join sources from the same database. By default it joins
the sources based on the primary-key, foreign-key relationships. To join heterogeneous sources,
use Joiner Transformation.
A foreign-key is created on the department_id column of the employees table, which references
the primary-key column, department_id, of the departments table.
Follow the below steps to see the default join

Create only one source qualifier transformation for both the employees and departments.
Go to the properties tab of the source qualifier transformation, select the "SQL QUERY"
property and generate the SQL query.

There might be case where there won't be any relationship between the sources. In that case, we
need to override the default join. To do this we have to specify the join condition in the "User
Defined Join" Property. Using this property we can specify outer joins also. The join conditions
entered here are database specific.
As an example, if we want to join the employees and departments table on the manager_id
column, then in the "User Defined Join" property specify the join condition as
"departments.manager_id=employees.manager_id". Now generate the SQL and observe the
WHERE clause.

Pre and Post SQL:

You can add the Pre-SQL and Post-SQL commands. The integration service runs the Pre-SQL
and Post-SQL before and after reading the source data respectively.

What is Filter Transformation?

Filter transformation is an active, connected transformation. The filter transformation is used to


filter out rows in a mapping. As the filter transformation is an active transformation, it may
change the number of rows passed through it. You have to specify a filter condition in the filter
transformation. The rows that meet the specified filter condition are passed to other
transformations. The rows that do not meet the filter condition are dropped.

Default condition for Filter Transformation is TRUE.


TRUE can also be written as any non zero value (1,2,9,99,999 … ).
TRUE will process all records and pass through the transformation.

FALSE will drop all the records from Processing and load nothing to target.
FALSE can also be written as ZERO.

Specifying Filter Condition


To configure the filter condition, go to the properties tab and in the filter condition section open
the expression editor. Enter the filter condition you want to apply. Click on validate button to
verify the syntax and then click OK.

Components of Filter Transformation

The filter transformation has the following components.

 Transformation: You can enter the name and description of the transformation.
 Ports: Create new ports and configure them
 Properties: You can specify the filter condition to filter the rows. You can also configure
the tracing levels.
 Metadata Extensions: Specify the metadata details like name, datatype etc.

Configuring Filter Transformation

The following properties needs to be configured on the ports tab in filter transformation

 Port name: Enter the name of the ports created.


 Datatype, precision, and scale: Configure the data type and set the precision and scale
for each port.
 Port type: All the ports in filter transformation are input/output.

Performance Tuning Tips

 Use the filter transformation as close as possible to the sources in the mapping. This will
reduce the number of rows to be processed in the downstream transformations.
 In case of relational sources, if possible use the source qualifier transformation to filter
the rows. This will reduce the number of rows to be read from the source.

Note: The input ports to the filter transformation mush come from a single transformation. You
cannot connect ports from more than one transformation to the filter.

Sorter Transformation
Sorter transformation is an active and connected transformation used to sort the data. The data
can be sorted in ascending or descending order by specifying the sort key. You can specify one
or more ports as a sort key and configure each sort key port to sort in ascending or descending
order. You can also configure the order of the ports in which the integration service applies to
sort the data.

The sorter transformation is used to sort the data from relational or flat file sources. The sorter
transformation can also be used for case-sensitive sorting and can be used to specify whether the
output rows should be distinct or not.

Configuring Sorter Transformation

Configure the below properties of sorter transformation

 Case Sensitive: The integration service considers the string case when sorting the data.
The integration service sorts the uppercase characters higher than the lowercase
characters.
 Work Directory: The integration service creates temporary files in the work directory
when it is sorting the data. After the integration service sorts the data, it deletes the
temporary files.
 Distinct Output Rows: The integration service produces distinct rows in the output when
this option is configured.
 Tracing Level: Configure the amount of data needs to be logged in the session log file.
 Null Treated Low: Enable the property, to treat null values as lower when performing
the sort operation. When disabled, the integration service treats the null values as higher
than any other value.
 Sorter Cache Size: The integration service uses the sorter cache size property to
determine the amount of memory it can allocate to perform sort operation

Performance improvement Tip

Use the sorter transformation before the aggregator and joiner transformation and sort the data
for better performance.

Aggregator Transformation
Aggregator transformation is an active transformation used to perform calculations such as sums,
averages, counts on groups of data. The integration service stores the data group and row data in
aggregate cache. The Aggregator Transformation provides more advantages than the SQL, you
can use conditional clauses to filter rows.

Creating an Aggregator Transformation:


Follow the below steps to create an aggregator transformation

 Go to the Mapping Designer, click on transformation in the toolbar -> create.


 Select the Aggregator transformation, enter the name and click create. Then click Done.
This will create an aggregator transformation without ports.
 To create ports, you can either drag the ports to the aggregator transformation or create in
the ports tab of the aggregator.

 Configuring the aggregator transformation:

You can configure the following components in aggregator transformation

 Aggregate Cache: The integration service stores the group values in the index cache and
row data in the data cache.
 Aggregate Expression: You can enter expressions in the output port or variable port.
 Group by Port: This tells the integration service how to create groups. You can configure
input, input/output or variable ports for the group.
 Sorted Input: This option can be used to improve the session performance. You can use
this option only when the input to the aggregator transformation in sorted on group by
ports.

Properties of Aggregator Transformation:

The below table illustrates the properties of aggregator transformation

Property Description
Cache Directory The Integration Service creates the index and data cache files.
Tracing Level Amount of detail displayed in the session log for this transformation.
Indicates input data is already sorted by groups. Select this option only if
Sorted Input
the input to the Aggregator transformation is sorted.
Aggregator Data
Default cache size is 2,000,000 bytes. Data cache stores row data.
Cache Size
Aggregator Index Default cache size is 1,000,000 bytes. Index cache stores group by ports
Cache Size data
Specifies how the Integration Service applies the transformation logic to
Transformation Scope
incoming data

Group By Ports:

The integration service performs aggregate calculations and produces one row for each group. If
you do not specify any group by ports, the integration service returns one row for all input rows.
By default, the integration service returns the last row received for each group along with the
result of aggregation. By using the FIRST function, you can specify the integration service to
return the first row of the group.

Aggregate Expressions:
This transformation offers even more functionality than SQL’s group by statements since one
can apply conditional logic to groups within the aggregator transformation. Many different
aggregate functions can be applied to individual output ports within the transformation. One is
also able to code nested aggregate functions as well. Below is a list of these aggregate functions:
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE

Configuring the aggregator transformation


You can configure the following components in aggregator transformation in informatica.
Aggregate Cache: The integration service stores the group values in the index cache and row
data in the data cache.
Aggregate Expression: You can enter expressions in the output port or variable port.
Group by Port: This tells the integration service how to create groups. You can configure input,
input/output or variable ports for the group.
Sorted Input: This option can be used to improve the session performance. You can use this
option only when the input to the aggregator transformation in sorted on group by ports.

Informatica Nested Aggregate Functions


You can nest one aggregate function within another aggregate function. You can either use
single-level aggregate functions or multiple nested functions in an aggregate transformation. You
cannot use both single-level and nested aggregate functions in an aggregator transformation
informatica. The Mapping designer marks the mapping as invalid if an aggregator transformation
contains both single-level and nested aggregate functions. If you want to create both single-level
and nested aggregate functions, create separate aggregate transformations.
Examples: MAX(SUM(sales))

Conditional clauses
You can reduce the number of rows processed in the aggregation by specifying a conditional
clause.
Example: SUM(salary, salary >1000)
This will include only the salaries which are greater than 1000 in the SUM calculation.
Non Conditional clauses
You can also use non-aggregate functions in aggregator transformation.
Example: IIF( SUM(sales) <20000, SUM(sales),0)
Note: By default, the Integration Service treats null values as NULL in aggregate functions. You
can change this by configuring the integration service.

Incremental Aggregation in Informatica


After you create a session that includes an Aggregator transformation, you can enable the session
option, Incremental Aggregation. When the Integration Service performs incremental
aggregation, it passes source data through the mapping and uses historical cache data to perform
aggregation calculations incrementally.

Sorted Input:

You can improve the performance of aggregator transformation by specifying the sorted input.
The Integration Service assumes all the data is sorted by group and it performs aggregate
calculations as it reads rows for a group. If you specify the sorted input option without actually
sorting the data, then integration service fails the session.

Router Transformation
Router transformation is an active and connected transformation. It is similar to the filter
transformation used to test a condition and filter the data. In a filter transformation, you can
specify only one condition and drops the rows that do not satisfy the condition. Where as in a
router transformation, you can specify more than one condition and provides the ability for route
the data that meet the test condition. Use router transformation if you need to test the same input
data on multiple conditions.

Creating Router Transformation

Follow the below steps to create a router transformation


1. In the mapping designer, create a new mapping or open an existing mapping
2. Go the toolbar->Click on Transformation->Create
3. Select the Router Transformation, enter the name, click on create and then click
on Done.
4. Select the ports from the upstream transformation and drag them to the router
transformation. You can also create input ports manually on the ports tab.

Configuring Router Transformation

The router transformation has input and output groups. You need to configure these groups.

 Input groups: The designer copies the input ports properties to create a set of output
ports for each output group.
 Output groups: Router transformation has two output groups. They are user-defined
groups and default group.

User-defined groups: Create a user-defined group to test a condition based on the incoming
data. Each user-defined group consists of output ports and a group filter condition. You can
create or modify the user-defined groups on the groups tab. Create one user-defined group for
each condition you want to specify.

Default group: The designer creates only one default group when you create one new user-
defined group. You cannot edit or delete the default group. The default group does not have a
group filter condition. If all the conditions evaluate to FALSE, the integration service passes the
row to the default group.

Specifying Group Filter Condition

Specify the group filter condition on the groups tab using the expression editor. You can enter
any expression that returns a single value. The group filter condition returns TRUE or FALSE
for each row that passes through the transformation.

Advantages of Using Router over Filter Transformation

Use router transformation to test multiple conditions on the same input data. If you use more
than one filter transformation, the integration service needs to process the input for each filter
transformation. In case of router transformation, the integration service processes the input data
only once and thereby improving the performance.

Union Transformation
Union transformation is an active and connected transformation. It is multi input group
transformation used to merge the data from multiple pipelines into a single pipeline. Basically it
merges data from multiples sources just like the UNION ALL set operator in SQL. The union
transformation does not remove any duplicate rows.

Union Transformation Guidelines

The following rules and guidelines should be used when using a union transformation in a
mapping

 Union transformation contains only one output group and can have multiple input groups.
 The input groups and output groups should have matching ports. The datatype, precision
and scale must be same.
 Union transformation does not remove duplicates. To remove the duplicate rows use
sorter transformation with "select distinct" option after the union transformation.
 The union transformation does not generate transactions.
 You cannot connect a sequence generator transformation to the union transformation.
 Union transformation does not generate transactions.

Components of union transformation

Configure the following tabs of union transformation

 Transformation: You can enter name and description of the transformation


 Properties: Specify the amount of tracing level to be tracked in the session log.
 Groups Tab: You can create new input groups or delete existing input groups.
 Group Ports Tab: You can create and delete ports for the input groups.

Note: The ports tab displays the groups and ports you create. You cannot edit the port or group
information in the ports tab. To do changes use the groups tab and group ports tab.

Why union transformation is active

Union is an active transformation because it combines two or more data streams into one.
Though the total number of rows passing into the Union is the same as the total number of rows
passing out of it, and the sequence of rows from any given input stream is preserved in the
output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might
not be row number 1 in the output stream. Union does not even guarantee that the output is
repeatable.

Sequence Generator Transformation


Sequence generator transformation is a passive and connected transformation. The sequence
generator transformation is used for
 Generating unique primary key values.
 Replace missing primary keys
 Generate surrogate keys for dimension tables in SCDs.
 Cycle through a sequential range of numbers.

Configuring Sequence Generator Transformation:

Configure the following properties of sequence generator transformation:

 Start Value: Specify the start value of the generated sequence that you want the
integration service to use the cycle option. If you select cycle, the integration service
cycles back to this value when it reaches the end value.
 Increment By: Difference between two consecutive values from the NEXTVAL port.
Default value is 1. Maximum value you can specify is 2,147,483,647.
 End Value: Maximum sequence value the integration service generates. If the integration
service reaches this value during the session and the sequence is not configured to cycle,
the session fails. Maximum value is 9,223,372,036,854,775,807.
 Current Value: Current Value of the sequence. This value is used as the first value in the
sequence. If cycle option is configured, then this value must be greater than or equal to
start value and less than end value.
 Cycle: The integration service cycles through the sequence range.
 Number of Cached Values: Number of sequential values the integration service caches
at a time. Use this option when multiple sessions use the same reusable generator. Default
value for non-reusable sequence generator is 0 and reusable sequence generator is 1000.
Maximum value is ,223,372,036,854,775,807.
 Reset: The integration service generate values based on the original current value for
each session. Otherwise, the integration service updates the current value to reflect the
last-generated value for the session plus one.
 Tracing level: The level of detail to be logged in the session log file.

Sequence Generator Transformation Ports:

The sequence generator transformation contains only two output ports. They are CURRVAL and
NEXTVAL output ports.

NEXTVAL Port:

You can connect the NEXTVAL port to multiple transformations to generate the unique values
for each row in the transformation. The NEXTVAL port generates the sequence numbers base on
the Current Value and Increment By properties. If the sequence generator is not configure to
Cycle, then the NEXTVAL port generates the sequence numbers up to the configured End
Value.
The sequence generator transformation generates a block of numbers at a time. Once the block of
numbers is used then it generates the next block of sequence numbers. As an example, let say
you connected the nextval port to two targets in a mapping, the integration service generates a
block of numbers (eg:1 to 10) for the first target and then another block of numbers (eg:11 to 20)
for the second target.

If you want the same sequence values to be generated for more than one target, then connect the
sequence generator to an expression transformation and connect the expression transformation
port to the targets. Another option is create sequence generator transformation for each target.

CURRVAL Port:

The CURRVAL is the NEXTVAL plus the Increment By value. You rarely connect the
CURRVAL port to other transformations. When a row enters a transformation connected to the
CURRVAL port, the integration service passes the NEXTVAL value plus the Increment By
value. For example, when you configure the Current Value=1 and Increment By=1, then the
integration service generates the following values for NEXTVAL and CURRVAL ports.

NEXTVAL CURRVAL

---------------

1 2

2 3

3 4

4 5

5 6

If you connect only the CURRVAL port without connecting the NEXTVAL port, then the
integration service passes a constant value for each row.

RANK Transformation
Rank transformation is an active and connected transformation. The rank transformation is used
to select the top or bottom rank of data. The rank transformation is used to select the smallest or
largest numeric/string values. The integration service caches the input data and then performs the
rank calculations.

Configuring the Rank Transformation

Configure the following properties of Rank transformation

 Cache Directory: Directory where the integration service creates the index and data
cache files.
 Top/Bottom: Specify whether you want to select the top or bottom rank of data.
 Number of Ranks: specify the number of rows you want to rank.
 Case-Sensitive String Comparison: Used to sort the strings using case sensitive or not.
 Tracing Level: Amount of logging to be tracked in the session log file.
 Rank Data Cache Size: The data cache size default value is 2,000,000 bytes. You can
set a numeric value, or Auto for the data cache size. In case of Auto, the Integration
Service determines the cache size at runtime.
 Rank Index Cache Size: The index cache size default value is 1,000,000 bytes. You can
set a numeric value, or Auto for the index cache size. In case of Auto, the Integration
Service determines the cache size at runtime.

Transaction Control Transformation in Informatica


Transaction Control is an active and connected transformation. The transaction control
transformation is used to control the commit and rollback of transactions. You can define a
transaction based on varying number of input rows. As an example, you can define a transaction
on a group rows in the employees data using the department Id as a key.

In the informatica power center, you can define the transaction at the following levels:

 Mapping level: Use the transaction control transformation to define the transactions.
 Session level: You can specify the "Commit Type" option in the session properties tab.
The different options of "Commit Type" are Target, Source and User Defined. If you
have used the transaction control transformation in the mapping, then the "Commit Type"
will always be "User Defined"
When you run a session, the integration service evaluates the expression for each row in the
transaction control transformation. When it evaluates the expression as commit, then it commits
all the rows in the transaction to the target(s). When the integration service evaluates the
expression as rollback, then it roll back all the rows in the transaction from the target(s).

When you have flat file as the target, then the integration service creates an output file for each
time it commits the transaction. You can dynamically name the target flat files. Look at the
example for creating flat files dynamically - Dynamic flat file creation.

Configuring Transaction Control Transformation

You can configure the following components in the transaction control transformation:

 Transformation Tab: You can rename the transformation and add a description.
 Ports Tab: You can create input/output ports
 Properties Tab: You can define the transaction control expression and tracing level.
 Metadata Extensions Tab: You can add metadata information.

Transaction Control Expression

You can enter the transaction control expression in the Transaction Control Condition option in
the properties tab. The transaction control expression uses the IIF function to test each row
against the condition. Use the following syntax for the expression

Syntax:

IIF (condition, value1, value2)

Example:

IIF(dept_id=10, TC_COMMIT_BEFORE,TC_ROLLBACK_BEFORE)

Use the following built-in variables in the expression editor of the transaction control
transformation:

 TC_CONTINUE_TRANSACTION: The Integration Service does not perform any


transaction change for this row. This is the default value of the expression.
 TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a
new transaction, and writes the current row to the target. The current row is in the new
transaction.
 TC_COMMIT_AFTER: The Integration Service writes the current row to the target,
commits the transaction, and begins a new transaction. The current row is in the
committed transaction.
 TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction,
begins a new transaction, and writes the current row to the target. The current row is in
the new transaction.
 TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back the transaction, and begins a new transaction. The current row is in the rolled
back transaction.

If the transaction control transformation evaluates to a value other than the commit, rollback or
continue, then the integration service fails the session.

Transaction Control Transformation in Mapping

Transaction control transformation defines or redefines the transaction boundaries in a mapping.


It creates a new transaction boundary or drops any incoming transaction boundary coming from
upstream active source or transaction control transformation.

Transaction control transformation can be effective or ineffective for the downstream


transformations and targets in the mapping. The transaction control transformation can become
ineffective for downstream transformations or targets if you have used transformation that drops
the incoming transaction boundaries after it. The following transformations drop the transaction
boundaries.

 Aggregator transformation with Transformation scope as "All Input".


 Joiner transformation with Transformation scope as "All Input".
 Rank transformation with Transformation scope as "All Input".
 Sorter transformation with Transformation scope as "All Input".
 Custom transformation with Transformation scope as "All Input".
 Custom transformation configured to generate transactions
 Transaction Control transformation
 A multiple input group transformation, such as a Custom transformation, connected to
multiple upstream transaction control points.

Mapping Guidelines and Validation

Use the following rules and guidelines when you create a mapping with a Transaction Control
transformation:
 If the mapping includes an XML target, and you choose to append or create a new
document on commit, the input groups must receive data from the same transaction
control point.
 Transaction Control transformations connected to any target other than relational, XML,
or dynamic MQSeries targets are ineffective for those targets.
 You must connect each target instance to a Transaction Control transformation.
 You can connect multiple targets to a single Transaction Control transformation.
 You can connect only one effective Transaction Control transformation to a target.
 You cannot place a Transaction Control transformation in a pipeline branch that starts
with a Sequence Generator transformation.
 If you use a dynamic Lookup transformation and a Transaction Control transformation in
the same mapping, a rolled-back transaction might result in unsynchronized target data.
 A Transaction Control transformation may be effective for one target and ineffective for
another target. If each target is connected to an effective Transaction Control
transformation, the mapping is valid.
 Either all targets or none of the targets in the mapping should be connected to an effective
Transaction Control transformation.

Update Strategy Transformation in Informatica


Update strategy transformation is an active and connected transformation. Update strategy
transformation is used to insert, update, and delete records in the target table. It can also reject
the records without reaching the target table. When you design a target table, you need to decide
what data should be stored in the target.

When you want to maintain a history or source in the target table, then for every change in the
source record you want to insert a new record in the target table.
When you want an exact copy of source data to be maintained in the target table, then if the
source data changes you have to update the corresponding records in the target.

The design of the target table decides how to handle the changes to existing rows. In the
informatica, you can set the update strategy at two different levels:

 Session Level: Configuring at session level instructs the integration service to either treat
all rows in the same way (Insert or update or delete) or use instructions coded in the
session mapping to flag for different database operations.
 Mapping Level: Use update strategy transformation to flag rows for inert, update, delete
or reject.

Flagging Rows in Mapping with Update Strategy:

You have to flag each row for inserting, updating, deleting or rejecting. The constants and their
numeric equivalents for each database operation are listed below.

 DD_INSERT: Numeric value is 0. Used for flagging the row as Insert.


 DD_UPDATE: Numeric value is 1. Used for flagging the row as Update.
 DD_DELETE: Numeric value is 2. Used for flagging the row as Delete.
 DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.

The integration service treats any other numeric value as an insert.

Update Strategy Expression:

You have to flag rows by assigning the constant numeric values using the update strategy
expression. The update strategy expression property is available in the properties tab of the
update strategy transformation.

Each row is tested against the condition specified in the update strategy expression and a
constant value is assigned to it. A sample expression is show below:

IIF(department_id=10, DD_UPDATE, DD_INSERT)

Mostly IIF and DECODE functions are used to test for a condition in update strategy
transformation.

Update Strategy and Lookup Transformations:

Update strategy transformation is used mostly with lookup transformation. The row from the
source qualifier is compared with row from lookup transformation to determine whether it is
already exists or a new record. Based on this comparison, the row is flagged to insert or update
using the update strategy transformation.

Update Strategy and Aggregator Transformations:

If you place an update strategy before an aggregator transformation, the way the aggregator
transformation performs aggregate calculations depends on the flagging of the row. For example,
if you flag a row for delete and then later use the row to calculate the sum, then the integration
service subtracts the value appearing in this row. If it’s flagged for insert, then the aggregator
adds its value to the sum.

Important Note:

Update strategy works only when we have a primary key on the target table. If there is no
primary key available on the target table, then you have to specify a primary key in the target
definition in the mapping for update strategy transformation to work.
Lookup Transformation in Informatica
Lookup transformation is used to look up data in a flat file, relational table, view or synonym.
Lookup is a passive/active transformation and can be used in both connected/unconnected
modes. From informatica version 9 onwards lookup is an active transformation. The lookup
transformation can return a single row or multiple rows.

You can import the definition of lookup from any flat file or relational database or even from a
source qualifier. The integration service queries the lookup source based on the ports, lookup
condition and returns the result to other transformations or target in the mapping.

The lookup transformation is used to perform the following tasks:

 Get a Related Value: You can get a value from the lookup table based on the source
value. As an example, we can get the related value like city name for the zip code value.
 Get Multiple Values: You can get multiple rows from a lookup table. As an example,
get all the states in a country.
 Perform Calculation. We can use the value from the lookup table and use it in
calculations.
 Update Slowly Changing Dimension tables: Lookup transformation can be used to
determine whether a row exists in the target or not.

You can configure the lookup transformation in the following types of lookup:

 Flat File or Relational lookup: You can perform the lookup on the flat file or relational
database. When you create a lookup using flat file as lookup source, the designer invokes
flat file wizard. If you used relational table as lookup source, then you can connect to the
lookup source using ODBC and import the table definition.
 Pipeline Lookup: You can perform lookup on application sources such as JMS, MSMQ
or SAP. You have to drag the source into the mapping and associate the lookup
transformation with the source qualifier. Improve the performance by configuring
partitions to retrieve source data for the lookup cache.
 Connected or Unconnected lookup: A connected lookup receives source data, performs
a lookup and returns data to the pipeline. An unconnected lookup is not connected to
source or target or any other transformation. A transformation in the pipeline calls the
lookup transformation with the :LKP expression. The unconnected lookup returns one
column to the calling transformation.
 Cached or Uncached Lookup: You can improve the performance of the lookup by
caching the lookup source. If you cache the lookup source, you can use a dynamic or
static cache. By default, the lookup cache is static and the cache does not change during
the session. If you use a dynamic cache, the integratiion service inserts or updates row in
the cache. You can lookup values in the cache to determine if the values exist in the
target, then you can mark the row for insert or update in the target.

Tuning Lookup Transformation


This is a frequently asked question in informatica interview. Follow the below steps to tune a
lookup transformation:

Cache the lookup transformation: This will query the lookup source once and stores the data
in the cache. Whenever a row enters the lookup, the lookup retrieves the data from the lookup
source rather than querying the lookup source again. This will improve the performance of
lookup a lot.

Restrict Order by columns: By default, the integration orders by on all ports in the lookup
transformation. Override this default order by clause to include few ports in the lookup.

Persistent Cache: If your lookup source is not going change at all (example: countries, zip
codes). Use persistent cache in this case.

Prefer Static Cache over Dynamic Cache: If you use dynamic cache, the lookup may update
the cache. Updating the lookup cache is overhead. Avoid dynamic cache.

Restrict Number of lookup ports: Make sure that you include only the required ports in the
lookup transformation. Unnecessary ports in the lookup make the lookup to take time in
querying the lookup source, building the lookup cache.

Sort the flat file lookups: If the lookup source is a flat file, using the sorted input option
improves the performance.

Indexing the columns: If you have used any columns in the where clause, creating any index (in
case of relational lookups) on these columns improves the performance of querying the lookup
source.

Database level tuning: For relational lookups you can improve the performance by doing some
tuning at database level.
Lookup Transformation is Active from Informatica version 9.x
One of the changes that made in informatica version 9 was making the lookup transformation as
active transformation. The lookup transformation can return all the matching rows.

When creating the lookup transformation itself you have to specify whether the lookup
transformation returns multiple rows or not. Once you make the lookup transformation as active
transformation, you cannot change it back to passive transformation. The "Lookup Policy on
Multiple Match" property value will become "Use All Values". This property becomes read-only
and you cannot change this property.

As an example, for each country you can configure the lookup transformation to return all the
states in that country. You can cache the lookup table to improve performance. If you configure
the lookup transformation for caching, the integration service caches all the rows form the
lookup source. The integration service caches all rows for a lookup key by the key index.

Guidelines for Returning Multiple Rows:

Follow the below guidelines when you configure the lookup transformation to return multiple
rows:

 You can cache all the rows from the lookup source for cached lookups.
 You can customize the SQL Override for both cached and uncache lookup that return
multiple rows.
 You cannot use dynamic cache for Lookup transformation that returns multiple rows.
 You cannot return multiple rows from an unconnected Lookup transformation.
 You can configure multiple Lookup transformations to share a named cache if the
Lookup transformations have matching caching lookup on multiple match policies.
 Lookup transformation that returns multiple rows cannot share a cache with a Lookup
transformation that returns one matching row for each input row.

Lookup Transformation Properties


Lookup transformation has so many proprties which you can configure. Depending on the
lookup source (flat file or relational lookup), you can configure the below properties of lookup
transformation:

Lookup
Lookup Property Description
Type

Lookup SQL Override the default sql query generated by the lookup
Relational
Override transformation. Use this option when lookup cache is enabled.
Lookup Table Pipeline You can choose a source, target or source qualifier as the lookup
table name. This is the lookup source which will be used to query
or cache the data.
Name Relational
If you have override the sql query, then you can ignore this option
Lookup Source You can filter looking up in the cache based on the value of data
Relational
Filter in the lookup ports. Works only when lookup cache is enabled.
When lookup cache is enabled, the integration service queries the
lookup source once and caches the entire data. Caching the lookup
Flat File source improves the performance. If the caching is disabled, the
Lookup Caching
Pipeline integration service queries the lookup source for each row.
Enabled
Relational
The integration service always caches the flat file and pipeline
lookups
Which row to return when the lookup transformation finds
multiple rows that match the lookup condition.
Report Error: Reports error and does not return a row.
Flat File
Lookup Policy on Use Last Value: Returns the last row that matches the lookup
Pipeline
Multiple Match condition.
Relational
Use All Values: Returns all matched rows.
Use Any Value: Returns the first value that matches the lookup
condition.
Flat File
You can define the lookup condition in the condition tab. The
Lookup Condition Pipeline
lookup condition is displayed here.
Relational
Connection
Relational specifies the database that contains the lookup table.
Information
Flat File
Indicates the lookup source type: flat file or relational table or
Source Type Pipeline
source qualifier.
Relational
Flat File
Tracing Level Pipeline Set amount of detail to be included in the lookup
Relational
Flat File
Lookup Cache
Pipeline Specifies the directory used to build the lookup cache files
Directory Name
Relational
Use when the lookup source data does not change at all.
Flat File Examples: zipcodes, countries, states etc.
Lookup Cache
Pipeline
Persistent
Relational The lookup caches the data once and it uses the cache even in
multiple session runs.
Lookup Data
Flat File
Cache Size
Pipeline Cache sizes of the lookup data and lookup index
Lookup Index
Relational
Cache Size
Flat File
Dynamic Lookup Indicates to use a dynamic lookup cache. Inserts or updates rows
Pipeline
Cache in the lookup cache as it passes rows to the target table.
Relational
Use with dynamic caching enabled. When you enable this
property, the Integration Service outputs old values out of the
Flat File lookup/output ports. When the Integration Service updates a row
Output Old Value
Pipeline in the cache, it outputs the value that existed in the lookup cache
On Update
Relational before it updated the row based on the input data. When the
Integration Service inserts a row in the cache, it outputs null
values.
An expression that indicates whether to update dynamic cache.
Create an expression using lookup ports or input ports. The
Flat File
Update Dynamic expression can contain input values or values in the lookup cache.
Pipeline
Cache Condition The Integration Service updates the cache when the condition is
Relational
true and the data exists in the cache. Use with dynamic caching
enabled. Default is true.
Flat File
Cache File Name Use with persistent lookup cache. Specifies the file name prefix to
Pipeline
Prefix use with persistent lookup cache files.
Relational
Flat File
Recache From
Pipeline The integration service rebuilds the lookup cache.
Lookup Source
Relational
Flat File
Use with dynamic caching enabled. Applies to rows entering the
Insert Else Update Pipeline
Lookup transformation with the row type of insert.
Relational
Flat File
Use with dynamic caching enabled. Applies to rows entering the
Update Else Insert Pipeline
Lookup transformation with the row type of update.
Relational
Datetime Format Flat File Specify the date format for the date fields in the file.
Thousand
Flat File specify the thousand separator for the port.
Separator
Decimal Separator Flat File Specify the Decimal Separator for the port.
Case-Sensitive The Integration Service uses case sensitive string comparisons
Flat File
String Comparison when performing lookups on string columns.
Flat File
Null Ordering Specifies how to sort null data.
Pipeline
Flat File
Sorted Input Indicates whether the lookup source data is in sorted order or not.
Pipeline
Flat File
Lookup Source is
Pipeline The lookup source does not change in a session.
Static
Relational
Pre-build Lookup Flat File Allows the Integration Service to build the lookup cache before
Cache Pipeline the Lookup transformation receives the data. The Integration
Service can build multiple lookup cache files at the same time to
Relational
improve performance.
Subsecond
Relational Specifies the subsecond precision for datetime ports.
Precision

Connected and Unconnected Lookup Transformation

The lookup transformation can be used in both connected and unconnected mode. The difference
between the connected and unconnected lookup transformations are listed in the below table:

Connected Lookup Transformation Unconnected Lookup Transformation


Receives input values from the :LKP expression
Receives input values directly from the
in another transformation such as expression
upstream transformations in the pipeline.
transformation.
You can use either static or dynamic cache. You can use only static cache.
Lookup cache contains both the lookup Lookup cache contains all lookup/output ports
condition column and lookup source columns in the lookup condition and the lookup/return
that are output ports. port.
Returns multiple columns for the same row or Returns only one column which is designated as
inserts into the dynamic lookup cache. return port for each row.
When there is no match for the lookup
condition, the integration service returns the When there is no match for the lookup
default values for output ports. In case of condition, the integration service returns NULL
dynamic cache, the integration service inserts value for the return port.
the row into the cache.
When there is a match for the lookup condition,
the integration service returns all the output When there is a match for the lookup condition,
ports. In case of dynamic cache, the integration the integration service returns the value from
service either updates the row in the cache or the return port.
leaves the row unchanged.
Passes multiple values to downstream Passes single output value to another
transformations. transformation.
Supports user-defined default values. user defined default values are not supported.

Lookup Transformation Source Types in Informatica


First step when creating a lookup transformation is choosing the lookup source. You can select a
relational table, flat file or a source qualifier as the lookup source.

Relational lookups:
When you want to use a relational table as a lookup source in the lookup transformation, you
have to connect to the lookup source using a ODBC and import the table definition as the
structure for the lookup transformation. You can use the below options for relational lookups:

 You can override the default sql query and write your own customized sql to add a
WHERE clause or query multiple tables.
 You can sort null data based on the database support.
 You can perform case-sensitive comparison based on the database support.

Flat File lookups:

When you want to use a flat file as a lookup source in the lookup transformation, select the flat
file definition in the repository or import the source when you create the transformation. When
you want to import the flat file lookup source, the designer invokes the flat file wizard. You can
use the below options for flat file lookups:

 You can use indirect files as lookup sources by configuring a file list as the lookup file
name.
 You can use sorted input for the lookup.
 You can sort null data high or low.
 You can use case-sensitive string comparison with flat file lookups.

Sorted Input for Flat File Lookups:

For flat file lookup source, you can improve the performance by sorting the flat files on the
columns which are specified in the lookup condition. The condition columns in the lookup
transformation must be treated as a group for sorting the flat file. Sort the flat file on the
condition columns for optimal performance.

Normalizer Transformation
Normalizer transformation type is Active & Connected. The Normalizer transformation is used
in place of Source Qualifier transformations when you wish to read the data from the COBOL
copy book source. Also, a Normalizer transformation is used to convert column-wise data to
row-wise data. This is similar to the transpose feature of MS Excel. You can use this feature if
your source is a COBOL copybook file or relational database table. The Normalizer
transformation converts columns to rows and also generates an index for each converted row.

This is type an active from the transaction which allows you to read the data from cobal
files. Every Cobol source definition default associates with normalizing transformation.
Use the normalize transformation to convert a single input record into multiple output records,
this is known as horizontal port
Normalizer Transformation Uses
Normalizer can be used to deal with

1. multiple-occurring columns and


2. multiple record types created using redefines.

Normalizer Transformation Properties


Active /Passive : Normalizer Transformation is an Active transformation as it creates multiple
rows for each input row.
Connected/Unconnected Transformation : Normalizer Transformation is
a connected Transformation.
Types of ports in Normalizer Transformation
Input:  For reading input Data
Output: For providing output Data
GCID:  The normalizer transformation generates thisID column for each set of multiple
occurring column. It act as the undex for the instance of the multiple-occurring data. For
example, if a column occurs 3 times in a source record, the normalizer returns a value of 1,2 or 3
in the generated column ID. We can use this GCID to give some name to those value. We will
see its more use in our example.
GK_VALUE:  This port increasethe sequence value for each input row. Unlike GCID,(which
was repeating itself for each set of group GK_VALUE keep on increasing with each source row
processed.

Normalizer Transformation Components


When you configure a Normalizer transformation, define the following components:
Transformation tab: You can rename the transformation and add a description.
Properties tab: You can specify the tracing level.
Normalizer tab: Here you define the input/output port along with occurrence. For port which
receive values from multiple column we gives occurs greater than 1 .
Ports tab: Once port under normalizer is created , you will see input and output port for
normalizer. For nomalizer ports with occurs greater than 1 , will see multiple input port but only
one output port. ( see example for more details)
NOTE: The Designer automatically create input /output groups once normalizer port are created
under normalizer port.
Configuring Normalizer Transformation Ports
You can create input ports by copying them from another transformation or by manually creating
them on the Group Ports tab.The Designer creates output ports by copying the following
properties from the input ports.
Port Details

 Port name:  Add the name of the port.


 Datatype, precision, and scale:  Configure the datatype and set the precision and scale
for each port.
 Default Value: In case port data is null , you can define its default value.

Normalizer Transformation Generated Keys


The Normalizer transformation returns at least one generated key column(note this is different
than GCID column) in the output row. The Integration Service increments the generated key
sequence number each time it processes a source row. When you create a Normalizer
transformation, the generated key value is 1 by default. The naming convention for the
Normalizer generated key is GK_< redefined_field_name >.
You can view the current generated key values on the Normalizer transformation Ports tab. At
the end of each session, the Integration Service updates the generated key value in the
Normalizer transformation to the last

value generated for the session plus one. The maximum generated key value is
9,223,372,036,854,775,807.

Changing Normalizer Transformation Generated Keys


Reset the generated key sequence: When you reset the generated key sequence, the Integration
Service resets the generated key start value back to the value it was before the session. Can be set
on the properties tab.

Restart the generated key sequence: When you restart the generated key sequence, the
Integration Service starts the generated key sequence at 1 the next time it runs a session.

When you restart the generated key sequence, the generated key start value does not change in
the Normalizer transformation until you run a session. When you run the session, the Integration
Service overrides the sequence number value on the Ports tab.

When you reset or restart the generated key sequence, the reset or restart affects the generated
key sequence values the next time you run a session. You do not change the current generated
key sequence values in the
Normalizer transformation. When you reset or restart the generated key sequence, the option is
enabled for every session until you disable the option.

You might also like