Informatica Best Practices

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 28

Informatica 

- Best Practices

  - HOW TO: Take repository backup on weekly basis


  - HOW TO: Retain multiple session and workflow log files in the session log folder
  - HOW TO: Renaming Informatica Repository Name
 
HOW TO: Take Informatica repository backup on a daily basis
 
It is always recommended to take full backup of development repository on a daily basis and keep it at
least for a week period, so that in case of accidental deletion of mapping it can be restored from the
backup.
 
Basic batch program that automatically takes repository backup and zip the backup and keep it for
a week duration:-
REM ----------------------------------------------------------
REM --SET VARIABLES
REM ----------------------------------------------------------
set LOGFILENM=<Repository Name>
set REP=<Repository Name>
set USER=Administrator
set PWD=<Admin Password>
set BKPPATH=Drive:\<Location>
Set DAY=%DATE:~0,3%
REM ----------------------------------------------------------
ECHO. |TIME > %LOGFILENAME%.log
ECHO "Backup Started & Connecting to Repository" >> %LOGFILENAME%.log
pmrep connect -r %REP% -n %USER% -x %PWD% -h DEVMAX -o 5001
ECHO "Repository Connected" >> %LOGFILENAME%.log
pmrep Backup -o %BKPPATH%%LOGFILENAME%_%DAY%.rep -f -b -j -q

ECHO. |TIME >> %LOGFILENAME%.log


ECHO "Backup Completed" >> %LOGFILENAME%.log
zip -r -9 -S -m %LOGFILENAME%_%DAY%.rep.zip %LOGFILENAME%_%DAY%.rep
ECHO "Zipping complete!!!" >> %LOGFILENAME%.log 
 
Example If LOGFILENM = DV_REP then Output of the above script will be DV_REP_Mon.zip
 
The above batch program on Informatica server can be scheduled to run in night on a daily basis.
 
 HOW TO: Retain multiple session and workflow log files in the session log folder

You can set sessions to retain log files according to the following options:

1. Session Runs. The Integration Service creates a designated number of session log files. Configure
the number of session logs in the Save Session Log for These Runs option. The Integration
Service does not archive binary logs.
2. Session Time Stamp. The Integration Service creates a log for all sessions, appending a time
stamp to each log. When you save a session log by time stamp, the Integration Service archives
the binary logs and text log files.
To retain multiple session and workflow log files in the session log folder edit the default session
configuration as follows:
1.     Open the Workflow Manager.
2.     Select Tasks > Session Configuration > Edit
3.     Click the Properties tab.
4.     For Save Session log by select Session runs or Session timestamp from the list
5.     Optional. If you selected Session runs in step 4, enter a numeric value for the Save session log
for these runs attribute:
 

Note: These settings will take effect for new sessions you create.
For old existing sessions you will get the Revert option.  If you click on Revert, then it will take your
modified configuration file settings.

 
 This change will be applicable to all the new sessions.
$PMSessionLogCount Integration Service Variable
You can also use the $PMSessionLogCount service variable to create the configured number of session
logs for the Integration Service.
1.     Go to Administration Console.
2.     Go to the Integration Service > Properties > General Properties.
3.     Set the value $PMSessionLogCount.
In this example it is set to 2 which will keep a history of 3 (2+1) log files.
 
 

 
4.     Open the Workflow Manager.
5.     Select Tasks > Session Configuration > Edit.
6.     Select the Properties tab.
7.     Use of variable $PMSessionLogCount set in the Administration Console:
 
 
HOW TO: Renaming Informatica Repository Name

 There may be cases when you would like to rename Informatica Repository, the effective way of doing
as follows -
 Make sure while performing following operation, none of the developers are connected to Informatica
Repository.
Step 1: Using Repository Server Administration - Create repository backup: File
Name “BI_Informatica_Rep” and skip workflow and session logs (this will save time and
space) as shown in figure 1.

 
Figure 1: Repository Backup - Advance Options
 
Step 2: Validate the backup is created on X:\Infa_714\RepositoryServer\bin\Backup
 
Step 3: Create a copy of license file to NewName.lic from OldName.lic, the license file can be
found at X:\Infa_714\RepositoryServer\bin
 
Step 4: Using Repository Server Administration – Drop the existing repository
 
Step 5: Using Repository Server Administration – Create new repository from backup the file
"BI_Informatica_Rep" as shown in Figure 2
 

 
Figure 2: Create New Repository using Restore
Informatica Performance Improvement Tips

We often come across situations where Data Transformation Manager (DTM) takes more time to read
from Source or when writing in to a Target. Following standards/guidelines can improve the overall
performance.

 Use Source Qualifier if the Source tables reside in the same schema
 Make use of Source Qualifer  “Filter” Properties if the Source type is Relational.
 If the subsequent sessions are doing lookup on the same table, use persistent cache in the first
session. Data remains in the Cache and available for the subsequent session for usage.
 Use flags as integer, as the integer comparison is faster than the string comparison.
 Use tables with lesser number of records as master table for joins.
 While reading from Flat files, define the appropriate data type instead of reading as String and
converting.
 Have all Ports that are required connected to Subsequent Transformations else check whether
we can remove these ports
 Suppress ORDER BY using the ‘–‘ at the end of the query in Lookup Transformations
 Minimize the number of Update strategies.
 Group by simple columns in transformations like Aggregate, Source Qualifier
 Use Router transformation in place of multiple Filter transformations.
 Turn off the Verbose Logging while moving the mappings to UAT/Production environment.
 For large volume of data drop index before loading and recreate indexes after load.
 For large of volume of records Use Bulk load Increase the commit interval to a higher value large
volume of data
 Set ‘Commit on Target’ in the sessions

Leveraging Metadata in Informatica Workflow-Session/Analysis

 We can leverage the metadata collected in the Informatica repository for many interesting
analysis, few of the scenarios where I have leveraged the Informatica Metadata are as following.
 This SQL Queries can be executed in Oracle database with no changes and requires little
modification with other databases.
 Failed Sessions
 The following query lists the failed sessions. To make it work for the last ‘n’ days, replace
SYSDATE-1 with SYSDATE – n
 QUERY:
 SELECT SUBJECT_AREA AS FOLDER_NAME,
 SESSION_NAME,
 LAST_ERROR AS ERROR_MESSAGE,
 DECODE (RUN_STATUS_CODE,3,’Failed’,4,’Stopped’,5,’Aborted’) AS STATUS,
 ACTUAL_START AS START_TIME,
 SESSION_TIMESTAMP
 FROM REP_SESS_LOG
 WHERE RUN_STATUS_CODE != 1
 AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE)
 RESULT:
 Long running Sessions
 The following query lists long running sessions. To make it work for the last ‘n’ days, replace
SYSDATE-1 with SYSDATE – n
 QUERY:
 SELECT SUBJECT_AREA AS FOLDER_NAME,
 SESSION_NAME,
 SUCCESSFUL_SOURCE_ROWS AS SOURCE_ROWS,
 SUCCESSFUL_ROWS AS TARGET_ROWS,
 ACTUAL_START AS START_TIME,
 SESSION_TIMESTAMP
 FROM REP_SESS_LOG
 WHERE RUN_STATUS_CODE = 1
 AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE)
 AND (SESSION_TIMESTAMP – ACTUAL_START) > (10/(24*60))
 ORDER BY SESSION_TIMESTAMP
 RESULT:


 Invalid Tasks
 The following query lists folder names and task name, version number, and last saved for all
invalid tasks.
 QUERY:
 SELECT SUBJECT_AREA AS FOLDER_NAME,
 DECODE(IS_REUSABLE,1,’Reusable’,’ ‘) || ‘ ‘ ||TASK_TYPE_NAME AS TASK_TYPE,
 TASK_NAME AS OBJECT_NAME,
 VERSION_NUMBER,
 LAST_SAVED
 FROM REP_ALL_TASKS
 WHERE IS_VALID=0
 AND IS_ENABLED=1
 ORDER BY SUBJECT_AREA,TASK_NAME
 RESULT:

Merge Rows as Columns / Transpose records

The Requirement: Converting rows to columns


Customer Product Cost
Cust1 P1 10
Cust1 P2 20
Cust1 P3 30
Cust2 ABC 10
Cust2 P2 25
Cust2 Def 10
Custom Product Cost Product Cost Product Cost
er 1 1 2 2 3 3
Cust1 P1 10 P2 20 P3 30
Cust2 ABC 10 P2 25 Def 10
The above illustration would help in understanding the requirement. We had to merge multiple records
into one record based on certain criteria. The design had to be reusable since each dimension within the
data mart required this flattening logic.
1. Approach:

The use of aggregator transformation would group the records by a key, but retrieval of the values for a
particular column as individual columns is a challenge, hence designed a component ‘Flattener’ based on
expression transformation.
Flattener is a reusable component, a mapplet that performs the function of flattening records.
Flattener consists of an Expression and a Filter transformation. The expression is used to club each
incoming record based on certain logic. Decision to write the record to target is taken using the Filter
transformation.
2. Design:
The mapplet can receive up to five inputs, of the following data types:
i_Col1 (string),  Customer
i_Col2 (string), Product
i_Col3 (decimal), Cost
i_Col4 (decimal) and
i_Col5 (date/time)
Have kept the names generic trying to accept different data types, so that the mapplet can be used in
any scenario where there is a need for flattening records.
The mapplet gives out 15×5 sets of output, in the following manner:
o_F1_1 (string), Customer
o_F2_1 (string), Product1
o_F3_1 (decimal), Cost1
o_F4_1 (decimal) and
o_F5_1 (date/time)
o_F1_2 (string), Customer
o_F2_2 (string), Product2
o_F3_2 (decimal), Cost2
o_F4_2 (decimal) and
o_F5_2 (date/time)
… … and so on
The output record is going to have repetitive sets of 5 columns each (Each set would refer to one
incoming row). Based on the requirement the number of occurrence of these sets can be increased. The
required fields alone can be used / mapped. For the above example we use just 2 strings and one
decimal for mapping Customer, Product and Cost.
The mapplet receives records from its parent mapping. The Expression would initially save each
incoming value to a variable and compare it with its counterpart that came in earlier and is held in its
cache as long as the condition to flatten is satisfied.

Syntax to store current and previous values:


i_Col2 string i
prv_Col2 string v curr_Col2
curr_Col2 string v i_Col2
The condition/logic to flatten records is parameterized and decided before mapping is called thus
increasing codes’ scalability. The parameterized logic is passed to the Expression transformation via a
Mapplet parameter. The value is used as an expression to perform the evaluation and the result is a flag
value either ‘1’ or ‘2’.

Syntax for port – flag


Flag integer v $$Expr_compare
An example for parameterized expression
$$Expr_compare = iif (curr_Col1 = prv_Col1 AND curr_Col2 !=
prv_Col2, 1, iif (curr_Col1 != prv_Col1,2))
A variable port named “rec_count” is incremented, based on the flag.
Syntax for port – rec_count
rec_count integer v iif (flag=2,0, iif (flag=1,rec_count + 1,rec_count))
The expression transformation now uses the value in ports “flag” and “rec_count” to decide the place
holder for each incoming input value, i.e. the column in target table where this data would move into
ultimately. This process is an iterative one and goes on until the comparison logic ($$Expr_compare)
holds good, i.e. until all records get flattened per the logic. An example of the place holder expression is
shown below:
v_Field1 data type v iif(flag=2 AND rec_count=0,curr_Col1, v_Field1)
Port “write_flag_1” is set to 1 when the comparison logic fails (meaning flattening is complete). Filter
transformation filters out the record once it is completely transposed.
Filter condition:
write_flag_1 integer v iif (flag=2 AND write_flag>1 ,1,0)
3. Outcome:
After developing the code and implementing the same we found it to be a useful utility, so thought of
sharing it and would like to hear suggestions from readers on performing the same functionality in a
different way. Please do share your views.

Informatica and Stored Procedures

A. Described below is a scenario where the requirement is to have a stored procedure that returns a
cursor as a source.By and large PowerCenter does not support a stored procedure that returns a cursor
as a source. The workaround for this is1. The procedure that will load the data to a new table:

CREATE OR REPLACE procedure load (p_initial_date in date, p_final_Date in date) as


str_load varchar2 (500);
str_clean varchar2 (500);
BEGIN
str_clean:= ‘DELETE FROM EMP’;
str_load:= ‘INSERT INTO EMP select * from EMPLOYEE where DOJ between trunc
(p_initial_date) and trunc (p_final_Date) ‘;
execute immediate str_clean;
execute immediate str_load;
EXCEPTION
WHEN OTHERS
THEN
ROLLBACK;
END LOAD;

2. Create the table that will receive the data from the procedure:

SQL> create table EMP as SELECT * from EMPLOYEE where 1 > 2;

3. Add a Store Procedure transformation to the PowerCenter mapping. This transformation will execute
this new procedure called as LOAD on this example.

4. Set the run method to be Source Pre Load, to be executed before read the source table.
5. Import the EMP table as a Source Definition. This table will be populated by the new Store Procedure.

If the original store procedure is used by the customer application and you can’t change the source
code, you can create a new store procedure that call the original one (without inserting into a table),
and execute the insert on the new table executing a loop on the returned cursor.

B. Given below is a situation where you wanted to pass a mapping variable to a stored procedure
transformation (it can either be connected or unconnected).

Connected Stored Procedure

The parameters that are passed to a connected Stored Procedure have to be linked from another
transformation.
Given below are the steps to pass mapping variable to a connected Stored Procedure transformation:

1. Create an Expression transformation.


2. Create an output port in the Expression transformation with the following expression:

$$mapping_variable

This sets the value of this output port to the mapping variable.

1. Link this output port to the Stored Procedure transformation.

Unconnected Stored Procedure

For unconnected Stored Procedure transformations you can use the mapping variable in the expression
calling the stored procedure.
Follow the steps below to pass mapping variable to a unconnected Stored Procedure transformation:

1. Create an Expression transformation.


2. Create an output port in the Expression transformation with the following expression:

: SP.GET_NAME_FROM_ID ($$mapping_variable, PROC_RESULT)

In case if you are attempting to use a mapping variable to store the output value of the stored
procedure, the session will fail with the below error.

“TE_7002 Transformation Parse Fatal Error; transformation stopped: invalid function reference. Failed
to Initialize Server Transformation.”

To resolve the issue replace the mapping variable with the PROC_RESULT system variable.

Example:
Incorrect, using a mapping variable:
:SP.PROCEDURE(FIELD1, $$mapping_variable)
Correct, using the PROC_RESULT system variable:
:SP.PROCEDURE(FIELD1,PROC_RESULT)
Or
:SP.PROCEDURE($$mapping_variable,PROC_RESULT)

The PROC_RESULT system variable assigns the stored procedure output to the port with this expression.

Output Files in Informatica

The Integration Service process generates output files when we run workflows and sessions. By default,
the Integration Service logs status and error messages to log event files.

Log event files are binary files that the Log Manager uses to display log events. When we run each
session, the Integration Service also creates a reject file. Depending on transformation cache settings
and target types, the Integration Service may create additional files as well.
The Integration Service creates the following output files:

Output Files

Session Details/logs:

 When we run a session, the Integration service creates session log file with the load
statistics/table names/Error information/threads created etc based on the tracing level that
have set in the session properties.
 We can monitor session details in the session run properties while session
running/failed/succeeded.

Workflow Log:

 Workflow log is available in Workflow Monitor.


 The Integration Service process creates a workflow log for each workflow it runs.
 It writes information in the workflow log such as
o Initialization of processes,
o Workflow task run information,
o Errors encountered and
o Workflows run summary.
 The Integration Service can also be configured to suppress writing messages to the workflow log
file.
 As with Integration Service logs and session logs, the Integration Service process enters a code
number into the workflow log file message along with message text.

Performance Detail File:

 The Integration Service process generates performance details for session runs.
 Through the performance details file we can determine where session performance can be
improved.
 Performance details provide transformation-by-transformation information on the flow of data
through the session.

Reject Files:

 By default, the Integration Service process creates a reject file for each target in the session. The
reject file contains rows of data that the writer does not write to targets.
 The writer may reject a row in the following circumstances:
o It is flagged for reject by an Update Strategy or Custom transformation.
o It violates a database constraint such as primary key constraint
o A field in the row was truncated or overflowed
o The target database is configured to reject truncated or overflowed data.

Note: By default, the Integration Service process saves the reject file in the directory entered for the
service process variable $PMBadFileDir in the Workflow Manager, and names the reject file
target_table_name.bad. We can view this file name in session level.

 Open Session – Select any of the target View the options


o Reject File directory.
o Reject file name.
 If you enable row error logging, the Integration Service process does not create a reject file.

Row Error Logs:

 When we configure a session, we can choose to log row errors in a central location.
 When a row error occurs, the Integration Service process logs error information that allows to
determine the cause and source of the error.
 The Integration Service process logs information such as source name, row ID, current row data,
transformation, timestamp, error code, error message, repository name, folder name, session
name, and mapping information.
 we enable flat file logging, by default, the Integration Service process saves the file in the
directory entered for the service process variable $PMBadFileDir in the Workflow Manager.

Recovery Tables Files:

 The Integration Service process creates recovery tables on the target database system when it
runs a session enabled for recovery.
 When you run a session in recovery mode, the Integration Service process uses information in
the recovery tables to complete the session.
 When the Integration Service process performs recovery, it restores the state of operations to
recover the workflow from the point of interruption.
 The workflow state of operations includes information such as active service requests,
completed and running status, workflow variable values, running workflows and sessions, and
workflow schedules.

Control File:

 When we run a session that uses an external loader, the Integration Service process creates a
control file and a target flat file.
 The control file contains information about the target flat file such as data format and loading
instructions for the external loader.
 The control file has an extension of .ctl. The Integration Service process creates the control file
and the target flat file in the Integration Service variable directory, $PMTargetFileDir, by default.

Email:

 We can compose and send email messages by creating an Email task in the Workflow Designer
or Task Developer and the Email task can be placed in a workflow, or can be associated it with a
session.
 The Email task allows to automatically communicate information about a workflow or session
run to designated recipients.
 Email tasks in the workflow send email depending on the conditional links connected to the
task. For post-session email, we can create two different messages, one to be sent if the session
completes successfully, the other if the session fails.
 We can also use variables to generate information about the session name, status, and total
rows loaded.

Indicator File:

 If we use a flat file as a target, we can configure the Integration Service to create an indicator file
for target row type information.
 For each target row, the indicator file contains a number to indicate whether the row was
marked for insert, update, delete, or reject.
 The Integration Service process names this file target_name.ind and stores it in the Integration
Service variable directory, $PMTargetFileDir, by default.

Target or Output File:


 If the session writes to a target file, the Integration Service process creates the target file based
on a file target definition.
 By default, the Integration Service process names the target file based on the target definition
name.
 If a mapping contains multiple instances of the same target, the Integration Service process
names the target files based on the target instance name.
 The Integration Service process creates this file in the Integration Service variable directory,
$PMTargetFileDir, by default.

Cache Files:

 When the Integration Service process creates memory cache, it also creates cache files. The
Integration Service process creates cache files for the following mapping objects:
o Aggregator transformation
o Joiner transformation
o Rank transformation
o Lookup transformation
o Sorter transformation
o XML target
 By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and Lookup
transformations and XML targets in the directory configured for the $PMCacheDir service
process variable.

Processing Multiple XML Files through Informatica – 1


 Problem Statement: Data to be processed in Informatica were XML files in nature. The number
of XML files to be processed was dynamic in nature. The need was also to ensure that the XML
file name from which data is being processed is to be captured.
 Resolution:
Option 1 – Using File list as part of Indirect File Sources in session
Option 2 – Using Parameter File and workflow variable
 Implementation Details for option 1: Using File list
 XML file names to be processed were read using batch script and file list was created containing
XML file. This file list name was set under source properties at session level. XML file were read
sequentially and data pertaining to every XML file was processed. Since the number of XML files
to be processed was dynamic the need of the hour was to achieve looping in Informatica.
 Challenge in using File List – Created in a session to run multiple source files for one source
instance in the mapping. When file list is used in a mapping as multiple source files for one
source instance, the properties of all files must match the source definition. File list are
configured in session properties by mentioning the file name of the file list in the Source
Filename field and location of the file list in the Source File Directory field. When the session
starts, the Integration Service reads the file list, then locates and reads the first file source in the
list. After the Integration Service reads the first file, it locates and reads the next file in the list.
The issue using XML file names in file list was further compounded by Informatica grouping
records pertaining to similar XML node together. This lead to difficultly in identifying which
record belonged to which XML file.
 Batch Script – batch scripts controlled over all looping in Informatica by encompassing below
mentioned tasks:
• Reading XML file names from staging location and creating file list containing XML file names.
• Moving XML files from staging location to archive location.
• Verifying whether there are any more XML files to be processed and depending on the
outcome either loop the process by invoking first workflow or end the process
• Using PMCMD commands invoke appropriate workflows.
 Workflow Details –
There were two Informatica workflows designed to achieve looping:
• First workflow –created indirect file to be used as source in session properties and will trigger
second workflow. Details of workflow are:
o Command task will execute a DOS batch script which will create indirect file after reading XML
filenames from a pre-defined location on server.
o Command task which will execute the second workflow to process data within XML files.

• Second workflow will read process XML files and populate staging tables. Details of workflow
are:
o A session will read XML file names using indirect file and load into staging tables.
o A command task will move the XML file just processed in file into an archive folder. Using
batch script
o A command task will execute a batch script which will:
Check whether there are any more XML files to be processed.
If yes then it will trigger the first workflow. This will ensure all XML files are processed and
loaded into staging tables.
If no then process will complete.

Informatica Development Best Practice – Workflow


 Workflow Manager default properties can be modified to improve the overall performance and
few of them are listed below.    This properties can impact the ETL runtime directly and needs to
configured based on :
 i) Source Database
ii)  Target Database
iii) Data Volume

Category Technique
While loading Staging Tables for FULL LOADS,  Truncate target table option should be
checked. Based on the Target database and the primary key defined, Integration Service fires
TRUNCATE or DELETE statement.Database                  Primary Key Defined                   No
Primary KeyDB2                             TRUNCATE                                       TRUNCATE
INFORMIX                 DELETE                                              DELETE
ODBC                         DELETE                                                DELETE
ORACLE                    DELETE UNRECOVERABLE            TRUNCATE
MSSQL                       DELETE                                               TRUNCATE
SYBASE                     TRUNCATE                                        TRUNCATE

Workflow Property “Commit interval” (Default value : 10,000) should be increased for
increased for Volumes more than 1 million records.  Database Rollback Segment size should
 Session
also be updated, while increasing “Commit Interval”.
Properties
 Insert/Update/Delete options should be set as determined by the target population
method. 

Target Option                                   Integration Service


Insert                                                   Uses Target update Option 
                                                              Update as Update
                                                              Update as Insert 
                                                              Update else Insert
Update as update                             Updates all rows as Update
Update as Insert                               Inserts all rows
Update else Insert                            Updates existing rows else Insert
 Partition Maximum number of partitions for a session should be 1.5
times the number of processes in the Informatica server. i.e.
1.5 X 4 Processors = 6 partitions.
Key Value partitions should be used only when an even
Distribution of data can be obtained.  In other cases, Pass
Through partitions should be used.
A Source filter should be added to evenly distribute the data
between Pass through Partitions. Key Value should have ONLY
numeric values. MOD(NVL(<Numeric Key Value>,0),# No of
Partitions defined)  Ex: MOD(NVL(product_sys_no,0),6)
 If a session contains “N” partition, increase the DTM Buffer Size
to at least “N” times the value for the session with One
partition
 If the Source or Target database is of MPP( Massively Parallel Processing ), enable Pushdown
Optimization.  By enabling this, Integration Service will push as much Transformation Logic to
Source database or Target database or FULL ( both ) , based on the settings.  This property
can be ignored for Conventional databases.

Informatica Development Best Practices – Mapping

The following are generally accepted “Best Practices” for Informatica PowerCenter ETL development and
if implemented, can significantly improve the overall performance.

Category Technique   Benefits


       
Source Extracts Loading data from Fixed-width files   Performance
take less time than delimited, since Improvement
delimited files require extra
parsing.  Incase of Fixed width files,
Integration service know the Start
and End position of each columns
upfront and thus reduces the
processing time.
     
Using flat files located on the server   Performance
machine loads faster than a Improvement
database located on the server
machine.
       
Mapping Designer There should be a place holder   Best Practices
transformation (Expression)
immediately after the Source and
one before the target.  Data type
and Data width changes are bound
to happen during development
phase and these place holder
transformations are used to
preserve the port link between
transformations.
     
Connect only the ports that are   Code
required in targets to subsequent Optimization
transformations.  Also, active
transformations that reduce the
number of records should be used
as early in the mapping.
     
If a join must be used in the   Performance
Mapping, select appropriate Improvement
driving/master table while using
joins. The table with the lesser
number of rows should be the
driving/master table.
       
Transformations If there are multiple Lookup   Code
condition, make the condition with Optimization
the “=” sign first in order to
optimize the lookup performance. 
Also, indexes on the database table
should include every column used
in the lookup condition.
     
Persistent caches should be used if  Performance
the lookup data is not expected to Improvement
change often.  This cache files are
saved and can be reused for
subsequent runs, eliminating
querying the database.
     
Integration Service processes  Code
numeric operations faster than Optimization
string operations. For example, if a
lookup is done on a large amount of
data on two columns,
EMPLOYEE_NAME and
EMPLOYEE_ID, configuring the
lookup around EMPLOYEE_ID
improves performance.
     
Replace Complex filter expression   Best Practices
with a flag (Y/N). Complex logic
should be moved to the expression
transformation and the result
should be stored in a port.  Filter
expression should take less time to
evaluate this port rather than
executing the entire logic in Filter
expression.
     
Power Center Server automatically   Performance
makes conversions between Improvement
compatible data types which
slowdown the performance
considerably.  For example, if a
mapping moves data from an
Integer port to a Decimal port, then
back to an Integer port, the
conversion may be unnecessary.
     
Assigning default values to a port;   Performance
Transformation errors written to Improvement
session log will always slow down
the session performance.  Try 
removing default values and
eliminate transformation errors.
     

Worldwide Computer Products News-8 May 2006-Informatica launches PowerCenter 8

How can we use pmcmd command in a workflow or to run a session

pmcmd>startworkflow -f foldername workflowname

Unit testing are of two types

1. Quantitaive testing

2.Qualitative testing

Steps.

1.First validate the mapping

2.Create session on themapping and then run workflow.

Once the session is succeeded the right click on session and go for statistics tab.

There you can see how many number of source rows are applied and how many number of
rows loaded in to targets and how many number of rows rejected.This is called
Quantitative testing.

If once rows are successfully loaded then we will go for qualitative testing.

Steps

1.Take the DATM(DATM means where all business rules are mentioned to the
corresponding source columns) and check whether the data is loaded according to the
DATM in to target table.If any data is not loaded according to the DATM then go and check
in the code and rectify it.

This is called Qualitative testing.

This is what a devloper will do in Unit Testing.

Difference between static and dynamic cache-


Static- Once the data is cached , it will not change. example unconnected lookup uses
static cache.
Dynamic- The cache is updated as to reflect the update in the table( or source) for which it
is reffering to.(ex. connected lookup).

while using a static cache in lookup we can use all operators like =,<,>... while giving
condition in condition tab

but in using dynamic cache we only can use = operator

Static cache: It is readonly cache


Dynamic Cache: It is Read and Write
Static: Informatica returns value when condition is true and if it is false it will return
default value in connected look up and Null value in unconnected look up
Dynamic: It will return only if condition is false

Connected lookup
Unconnected lookup
Receives input values diectly from the pipe line.
Receives input values from the result of a lkp expression in a another transformation.

U can use a dynamic or static cache


U can use a static cache.

Cache includes all lookup columns used in the maping


Cache includes all lookup out put ports in the lookup condition and the lookup/return port.

Support user defined default values


Does not support user defiend default values

Stop Vs Abort

stop: _______If the session u want to stop is a part of batch you must stop the batch,

if the batch is part of nested batch, Stop the outer most bacth

Abort:----

You can issue the abort command , it is similar to stop command except it has 60 second
time out .

If the server cannot finish processing and committing data within 60 sec

Here's the difference:

ABORT is equivalent to:


1. Kill -9 on Unix (NOT kill -7) but YES, Kill -9
2. SIGTERM ABEND (Force ABEND) on Mainframe
3. Windows FORCE QUIT on application.

What does this do?


Each session uses SHARED/LOCKED (semaphores) memory blocks. The ABORT function
kills JUST THE CODE threads, leaving the memory LOCKED and SHARED and allocated.
The good news: It appears as if AIX Operating system cleans up these lost memory
blocks. The bad news? Most other operating systems DO NOT CLEAR THE MEMORY,
leaving the memory "taken" from the system. The only way to clear this memory is to
warm-boot/cold-boot (restart) the informatica SERVER machine, yes, the entire box must
be re-started to get the memory back.

If you find your box running slower and slower over time, or not having enough memory
to allocate new sessions, then I suggest that ABORT not be used.

So then the question is: When I ask for a STOP, it takes forever. How do I get the session
to stop fast?

well, first things first. STOP is a REQUEST to stop. It fires a request (equivalent to a
control-c in SQL*PLUS) to the source database, waits for the source database to clean up.
The bigger the data in the source query, the more time it takes to "roll-back" the source
query, to maintain transaction consistency in the source database. (ie: join of huge tables,
big group by, big order by).

It then cleans up the buffers in memory by releasing the data (without writing to the
target) but it WILL run the data all the way through to the target buffers, never sending it
to the target DB. The bigger the session memory allocations, the longer it takes to clean
up.

Then it fires a request to stop against the target DB, and waits for the target to roll-back.
The higher the commit point, the more data the target DB has to "roll-back".

FINALLY, it shuts the session down.

WHAT IF I NEED THE SESSION STOPPED NOW?


Pick up the phone and call the source system DBA, have them KILL the source query IN
THE DATABASE. This will send an EOF (end of file) downstream to Informatica, and Infa
will take less time to stop the session.

If you use abort, be aware, you are choosing to "LOSE" memory on the server in which
Informatica is running (except AIX).

If you use ABORT and you then re-start the session, chances are, not only have you lost
memory - but now you have TWO competing queries on the source system after the same
data, and you've locked out any hope of performance in the source database. You're
competing for resources with a defunct query that's STILL rolling back.

There are 3 ways to improve session performance for an aggregator transformation :-

A)

1)Size of data cache = Bytes required for variable columns + bytes required for output
columns.

2) Size of index cache = size of ports used in group by clause.

B) If you provide sorted data for group by ports aggregation will be faster, so for ports
which are used in group by of an aggregator sort those ports in a sorter.

C) We can use incremental aggregation if we think that there will be no change in data
which is already aggregated.

When we create a target as flat file and source as oracle.. how can i specify first rows as
column names in flat files...

use a pre sql statement....but this is a hardcoding method...if you change the column
names or put in extra columns in the flat file, you will have to change the insert statement
You can also achive this by changing the setting in the Informatica Repository manager to
display the columns heading. The only disadvantage of this is that it will be applied on all
the files that will be generated by This server

IIF(CUME(1)=1, 'col1,col2,col3,col4'||CHR(10)||to_char(col1 ),to_char(col1))

In Version 8.2 of Informatica , we have in session properties , an option Called " Header
options". Use that to get the field names as the first row in the Target Flat file

When importing a flat file into target designer a flat file import wizard appears. In this
there is an option as 'import field names from first line'. Just check this option so
integration server treats first row values as column names.

Can anyone explain error handling in informatica with examples so that it will be easy to
explain the same in the interview.

There is one file called the bad file which generally has the format as *.bad and it contains
the records rejected by informatica server. There are two parameters one fort the types of
row and other for the types of columns. The row indicators signifies what operation is
going to take place ( i.e. insertion, deletion, updation etc.). The column indicators contain
information regarding why the column has been rejected.( such as violation of not null
constraint, value error, overflow etc.) If one rectifies the error in the data preesent in the
bad file and then reloads the data in the target,then the table will contain only valid data.

What is difference between IIF and DECODE function

You can use nested IIF statements to test multiple conditions. The following example tests
for various conditions and returns 0 if sales is zero or negative:

IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2, IIF( SALES <
200, SALARY3, BONUS))), 0 )

You can use DECODE instead of IIF in many cases. DECODE may improve readability. The
following shows how you can use DECODE instead of IIF :

SALES > 0 and SALES < 50, SALARY1,

SALES > 49 AND SALES < 100, SALARY2,

SALES > 99 AND SALES < 200, SALARY3,

SALES > 199, BONUS)

Decode function can used in sql statement. where as if statment cant use with SQL
statement.

DEODE is similar to switch statement in c language where we test for multiple


conditions.Single IIF can test for only one condition. To test for multiple conditions we
have to use nested IIF. But DECODE is more readable compared to IIF.
 Question: What is Target Update Override? What is the Use ?

Answers
It overrides the default update statemet in the target properties

target update override it is also like souce qualifier override target update override is use ful to
update the target with out using the update strategy transformation.

When we don't have primary keys defined on database level. And still we need update on this target
from Informatica. We neeed to define keys at informatica level and use update override in target
property. This way we can update the table.

The Integration Service updates target tables based on key values. However you can override the
default UPDATE statement for each target in a mapping. You might want to update the target based
on non-key columns.

When the Integration Service executes SQL against a source target or lookup database it searches the
reserved words file stored in the Integration Service installation directory. It encloses matching
reserved words in quotes. If you use target update override you must manually put all reserved words
in quotes.

For a mapping without an Update Strategy transformation or a Custom transformation with the update
strategy property enabled configure the session to mark source rows as update. The Target Update
option only affects source rows marked as update. The Integration Service processes all rows marked
as insert delete or reject normally.

When you configure the session mark source rows as data-driven. The Target Update Override only
affects source rows marked as update by the Update Strategy or Custom transformation.
For example a mapping passes the total sales for each salesperson to the T_SALES table.

The Designer generates the following default UPDATE statement for the target

T_SALES:UPDATE T_SALES SET EMP_NAME :TU.EMP_NAME DATE_SHIPPED :TU.DATE_SHIPPED


TOTAL_SALES :TU.TOTAL_SALES WHERE EMP_ID :TU.EMP_ID

Because the target ports must match the target column names the update statement includes the
keyword:TU to specify the ports in the target transformation.

If you modify the UPDATE portion of the statement be sure to use :TU to specify ports.

You can override the WHERE clause to include non-key columns.

For example you might want to update records for employees named Mike Smith only. To do this you
edit the WHERE clause as follows:

UPDATE T_SALES SET DATE_SHIPPED :TU.DATE_SHIPPED TOTAL_SALES :TU.TOTAL_SALES


WHERE :TU.EMP_NAME EMP_NAME andEMP_NAME 'MIKE SMITH'
Submitted by: miteshcpatel
 
When we don't have primary keys defined on database level. And still we need update on this target
from Informatica. We neeed to define keys at informatica level and use update override in target
property. This way we can update the table.

Target Update (by using Update strategy) will work only when we have the Primary key in Target.
Update override will happen based on this key.If we don't have the Pkey Update override will not work.

To overcome the above issue..If our Target is not having any key column or if we want to update the
Target based on a column other than PKey column then we need to go for Target Update override
option available in Target properties.

How did u implement the Update Strategy multiple

Ans: Target Update Override


By default, the Informatica Server updates targets based on key values. However, you can override the
default UPDATE statement for each target in a mapping. You might want to update the target based on
non-key columns.
For a mapping without an Update Strategy transformation,configure the session to mark source records
as update. If your mapping includes an Update Strategy transformation, the Target Update option only
affects source records marked as update. The Informatica Server processes all records marked as insert,
delete, or reject normally. When you configure the session, mark source records as data-driven. The
Target Update Override only affects source rows marked as update by the Update Strategy
transformation.
Overriding the WHERE Clause You can override the WHERE clause to include non-key columns. For
example, you might want to update records for employees named Mike Smith only. To do this, you edit
the WHERE clause as follows:

UPDATE T_SALES SET DATE_SHIPPED = :TU.DATE_SHIPPED,


TOTAL_SALES = :TU.TOTAL_SALES WHERE :TU.EMP_NAME = EMP_NAME and
EMP_NAME = 'MIKE SMITH'
Entering a Target Update Statement
Follow these instructions to create an update statement.
To enter a target update statement:
1. Double-click the title bar of a target instance.
2. Click Properties.
4. Click the arrow button in the Update Override field.
5. The SQL Editor displays.
5. Select Generate SQL.
The default UPDATE statement appears.
6. Modify the update statement.
You can override the WHERE clause to include non-key
columns.
7. Click OK.

How to list Top 10 salary, without using Rank Transmission

BY USING SORTER TRANSFORMATION USING SORTED PORT AS SAL AND FILTER


TRANSFORMATION TO GET FIRST 10 RECORDS

To configure a Joiner transformation to emulate a Lookup transformation in a PowerCenter mapping do the following:
1.Create a Source Definition based on the database table used for the lookup.
2.Add this Source Definition to the mapping.
3.Connect this Source Definition to the Joiner transformation as the master source
4.Connect the original source table to the Joiner as the detail source.
5.Change the Join Type to Master Outer Join .
6.Configure the join condition.
The join condition must use the same ports as the lookup condition.

Limitations
A Joiner can be used to emulate the lookup, but with certain limitations:

 Only equi- conditions are supported.


 Duplicate matching is not supported.
 Joiner will always cache, Non-Cached is not an option.

The Lookup will always return one value when it can find data and you can choose to get the first value or
last value using the lookup property Lookup Policy on Multiple Match .  However, as the Joiner does not
have that property, you will have to ensure that that all key values only occur once. If this is not ensured
then you will be allowing duplicates of your input data flow into the mapping.
Scenario
Very often you might have to compare incoming data to previous loaded data in the warehouse. Often the
target table is loaded into a lookup, comparisons are made and actions are taken. This is not always the
most efficient way.  Searching for a value in a cache with millions of rows will take longer than searching
for a value in a cache with a few hundred rows which can result in poor performance.  You might want to
keep the caches small, however, a data warehouse only gets bigger and performance goes down. The
input data often has the same size for each delivery as deltas are often created by some smart
mechanism and how would you cache your input data instead of target data and make comparisons that
way? How will you make the cache to be smaller and yet increase the performance?
You can put your input data in a lookup cache and read all rows from the warehouse table as a source,
then make comparisons and act. However, this cannot detect new data because the new data will never
be looked up because it does not exist in the warehouse table.
One way to address this issue is to use the Joiner transformation. The Lookup transformation is a
specialized Joiner transformation and when you use the outer join option the Joiner can operate similar to
a Lookup and if you use the sorted input option a very small cache is created.

Yes, we can resemble the functionality of a lookup transformation using


joiner transformation.

The only difference is when there is multiple match. If there is multiple


match a joiner will return all the matched rows where as a lookup will
return only either the first row or last row or report an error.

As per performance joiner is more costly than lookup.

SCD 123

http://www.dwforum.net/data_warehousing_concepts/scd_12_129.0.html

SCD2 Concept is that if the source record found  match in target,then that will Inserted into the target as a new
record.So in SCD2 there are multiple record will be in target redundantly i,e let say for a single Empno there is more
than one record that's why it maintain history.

 
The points are as below

1)when the matched record is inserted into the Target Dimension table,it remain keep the source key value(Let say
Empno),but the target table primary key is the surogate key which is not created upon the Employee Key.I suppose to
say that it means in target table,the employee no may be redundant but the surogate key will be unique.

2)The Lookup transformation always make condition upon Src.empno=Tgt.empno.Here the situation is when for a
single source record if it match more than one record in target,then you can choose First,Last or any when you use
Static Lookup.So by static lookup it is easier.But in case of Dynamic Lookup if for a single source record found more
than one corrosponding record in target,that will fail.So you make sql override in Dynamic lookup that will extract only
one match record (Last record)from target.then it will work,other wise not.

3)In dynamic Lookup,if the newlookup row is 0-You can drop it.

                                                                    1-Insert it by using the sequence genarator for Surogate key.

                                                                     2-It will be inserted but use a new sequence generator having the start
value somthing more than the           privious sequence generator's last value,then the both are not contradict.

4)Overwride the sql query in dynamic lookup,that's why in every run for each source record if it found match in target
then always return the last matched record.

Then it work fine.But the better solution is Static Lookup for SCD2.

SQL Override

Target update override:(done in designer in target instance and also view in session's target
option)
Update Override it is an option available in TARGET instance
===============================================
Update override with non Primary Key field in the where condition:
Ex: Target table Target1 has A1,B1 as Primary key defined (local to informatica in designer)
UPDATE OVERRIDE SQL ON TARGET TABLE:

UPDATE Target1 SET C=:TU.C,D=:TU.D


WHERE A1=:TU.A1 AND B1=:TU.B1

--// already generated by informatica


================================================================
SQLOverride and Lookup SQL Override
==============================================================
Coming to SQL override it is an option available in Source Qualifier and Lookup SQL Override
available in Lookup transafornmation where u can include joins ,filters,Group by and order
byObviously we have some differences between those 2 Overrides...
1. SQL Override is to limit the number of incoming rows entering the mapping pipelineLookup
SQL Override is to limit the number of lookup rows to avoid the whole table scan by saving the
lookup time & cache it uses...
2. Lookup SQL Override uses "Order By" clause by defaultSQL Override doesn't uses it &
should provide in our query if we require it
3. SQL Override can provide any kind of join by writing the queryLookup SQL Override
provides only Non-Equi joins
4. Lookup SQL Override gives only one record even if it finds multiple records for a single
condition SQL Override doesn't do that...
=========================================================
What is the difference between ODBC , DSN , NATIVE and TCP/IP Connections
===========================================================
ODBC is the database drivers which is used to connect from clinet to the relational database.
DSN is the data service name which is the name of the odbc name give by the customer in order
to access from the client
NATIE Drivers is the drivers which allows the client to acess the source and target after loading
them from database to the workspace
TCP/IP it is the protocal used for connecting the remote computers.here the repository contacts
with informatica repository server through TCP/IP

You might also like