A Guide

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 9

Transformations

Source Qualifier : Passive Aggregator : Active


Connected Connected
♦ Homogenous Sources ♦ Performs calculation on groups unlike an Expression Transformation
♦ SQL Overwrite which performs row by row basis
♦ Properties ♦ Can use conditional clauses to filter records, providing more flexibility
• SQL Query than SQL language
• User defined Joins ♦ Properties
• Source filter • Cache directory
• No. Of Sorted Ports • Tracing level
• Tracing level • Sorted Input
• Select distinct ♦ Components
♦ Optimization • Aggregate Expression
• Use the Source Qualifier to filter. The Source Qualifier limits the • Group by port
row set extracted from the source whereas Filter limits the row set • Sorted I/P option
sent to a target. • Aggregate Cache
♦ Ports  Input, Output, and Value
Filter : Active ♦ Optimization
Connected • Filter before aggregating
• Use Sorted Input to decrease the use of aggregate caches
♦ Limits row set to target ♦ Cannot use Sorted Input if any of the following condition is True
♦ Any Sources • The aggregate expression contains nested aggregate functions.
♦ Properties • The session uses incremental aggregation.
♦ Filter condition drops data that does not match the condition • Source data is data-driven.
♦ Tracing level • The mapping is upgraded from PowerMart 3.5.
♦ Optimization Expression : Passive
• Use the Filter Transformation early in the mapping Connected
• The Filter Condition is Case-sensitive, and queries in some ♦ Row by Row calculation
databases do not take this into ♦ Multiple Expressions can be entered in a single Expression
account Transformation
♦ Properties
• Tracing level
♦ Ports  Input, Output, and Value • Cycle
Router : Active • Number of Cached values
Connected • Reset
• Tracing level
♦ Processing the incoming data only once and passes the output to ♦ Optimization
multiple groups and routes data to the default o/p group that do not • Use Reusable Sequence Generator if the same Sequence
meet the condition generator is to be used in more than one sessions
♦ Properties
• Tracing level Rank : Active
♦ Components Connected
• Input and Output groups ♦ Select only the Top or Bottom Rank of data
• Input and Output ports ♦ Properties
• Group filter conditions • Cache directory
♦ Tips • Top / Bottom
• One Group can be connected to One transformation or target • Number of Ranks
• One Output port in a group can be connected to multiple • Case Sensitive Sting Comparison
transformations or targets • Tracing level
• Multiple Output ports in one group can be connected to ♦ Ports
multiple transformations or targets Input -- minimum of one
• More than one group Cannot be connected to one Output -- minimum of one
transformation or target Variable – stores values or calculations to use in an expression
Rank – Only one
Sequence Generator : Passive
Connected Update Strategy : Active
Connected
♦ Create Unique Primary key values
♦ Replace missing values ♦ Sets at 2 levels
♦ Cycle through a sequential range of numbers Within a Session – treat all records in the same way (for
♦ Properties example,
treat all records as Inserts)
• Start value
Within a Mapping – flag records for Insert, Update, Delete or
• Increment by Reject
• End value ♦ Data driven
• Current value ♦ Properties
• Update Strategy Expression • Both input pipelines originate from the same Joiner
• Forward Rejected Rows transformation
• Tracing level • Either input pipelines contains an Update Strategy
transformation
Joiner : Active • Either input pipelines contains a connected or unconnected
Connected Sequence Generator transformation
Lookup : Passive
♦ Joining data from Heterogeneous Sources Connected / Unconnected
♦ Only two Sources with one joiner ♦ Get a related value
♦ Properties ♦ Perform a Calculation
• Case Sensitive String Comparison ♦ Update slowly changing dimension tables
• Cache directory ♦ Properties
• Join Condition • SQL Override
• Joiner Type (Normal, Master Outer, Detail Outer, Full Outer) • Lookup Table name
• Null ordering in Master • Lookup caching enabled
• Null ordering in Detail • Lookup policy on multiple match
• Tracing level • Lookup condition
♦ Ports • Location information
Input, Output, M (to switch the Master Details relationship for the • Source Type
Sources) • Recache if Stale
♦ The following types of Sources can be used in a Joiner • Tracing level
• Two relational tables existing in separate databases • Lookup cache directory name
• Two flat files in potentially different file systems • Lookup cache initialize
• Two different ODBC sources ♦ Ports
• Two instances of the same XML source Input – minimum of one
• A relational table and a flat file source Output – minimum of one
• A relational table and an XML source Lookup – minimum of one
Return – Only one
♦ A Joiner cannot contain the following types of Sources
♦ Components
• Both pipelines begin with the same original data source
• Lookup table
• Both input pipelines originate from the same Source Qualifier
• Ports
transformation
• Properties
• Both input pipelines originate from the same Normalizer
transformation • Condition
♦ Lookup Caches ♦ Normalizer is designed to handle data read from COBOL sources
• Persistent Cache ♦ Can also be used to denormalize data from any type of Sources
• Recache from Database ♦ Can also be used to handle multiple levels of denormalization in the
• Static Cache Same record
• Dynamic Cache ♦ Properties
• Shared Cache • Reset
• Restart
• Tracing level
Stored Procedure : Passive SCD:
Connected / Unconnected • Type I – No history, overwrite existing dimensions
♦ Stored Procedure Transformation can be used to • Type II – Keeping Versions
• Drop and recreate Indexes • Version No.
• Check the status of the Target database before moving • Flag
Records into it • Date Range
• Determine if enough exists in a database • Type III – Stores current & previous values
• Perform a specialized calculation
♦ Options for running a Stored Procedure Transformation Tracing Levels
• Normal
• Pre-load of the Source • Normal – logs Initialization, errors, stopped rows,
• Post-load of the Source Summarized session results – not allowed
• Pre-load of the Target • Terse – logs Initialization Info, error notification of
• Post-load of the Target Rejected data
• Verbose Init – Normal & addl. Initialization names
♦ Properties
of Index & data files. Detailed transformation
• Stored Procedure Name
statistics
• Connection Information • Verbose Data -- verbose Init + logs each row passing
• Call Text Info mapping & default transformation statistics.
• Stored Procedure Type Truncates string data to column size.
• Execution Order
• Tracing level Directories of Server Variables – Inside root directory
♦ Ports  Input, Output and Return • SessLog
• BadFiles
Normalizer : Active • Cache
Connected • SrcFiles
• TgtFiles • Writer
• Temp • Transformation
• ExtProc

Output Files Following Transformations using Caches


• Server Log
• Session Log • Aggregator
• Session detail • Joiner
• Reject File • Lookup
• Control File • Rank
• Output File
3 Parameters LM allocates to Sessions
• Cache File
• Post Session / Pre Session Email
• Max Sessions
• LM Shared Memory
Load Manager & DTM
• Keep repeating session stored in memory (be default disabled)
LM:
PMCMD
• Locks & reads Sessions, Parameters & variables
Pmcmd Start <Username> <Password> TCP/IP:<hostname:portno>
• Verify permissions and privileges
<Foldername>:<Sessionname>/<batchname>:pf (parameter file):Session
• Creates Session log file flag Wait flag
• Creates DTM Process
• Sends Post-Session email Session flag:
DTM: 1 – Session
• Allocate DTM Process memory 0 – batch
• Initialize Sessions and fetches mapping Wait flag:
• Execute Pre-session commands 1 – Session Complete
• Creates threads 0 – Request Sent
• Writes to repository
Mapping Optimizations
Threads:
• Reduce No. of Transformations
• Master thread
• Reduce No. of expressions
• Mapping thread – 1 thread for each session
• Delete unnecessary links between transformations
• Pre & Post Sessions threads
• Filter before aggregator • Transformations
• For transformations using cache reduce connected ports to • Session level override
reduce data in data cache • Aggregate behavior
• Optimize data type conversion

Session Optimizations

• Run Concurrent batches


• Partition sessions
• Reduce error tracing
• Remove staging areas
• Tune session parameters
Target Database Optimization

• Drop Indexes & Key Constraints


• Drop Checkpoints
• Use bulk loading
• Use external loading
• Turn Off Recovery
• Turn Off db network packet size

Session Properties Tab

• General Tab
• General Options
• Source Options
• Target Options
• Session Commands
• Email
• Performance Options
• Commit interval
• Event bases Scheduling
• Time Tab
• Error handling
Performance Tunings • Optimize Oracle target database
Checking the storage clause, space allocation, and rollback segments
Optimizing the Source Database
Optimizing the Mapping
• Flat file Source
By default, the Informatica Server reads 1024 bytes per line. • Reduce the number of transformations in mappings and delete
The Session Performance can be improved by setting the number of unnecessary links between transformations
bytes the Informatica Server reads per line.
• Limit the connected input/output or output ports for the transformations
• Relational file Source that use data cache (Aggregator, Joiner, Lookup, and Rank). Limiting
Suggestions to improve the Session Performance the number of input/output or output ports reduces the amount of data
• Optimize the query the transformation store in the data cache.
Using Hints, using GROUP BY ORDER BY clauses, using
Parallel queries • Configure single pass reading
• Create tempdb as in-memory database
• Use conditional filters • Optimizing datatype conversions
• Increase database network packet size Eliminate unnecessary datatype conversions from mapping
Allowing larger packets of data to cross the network at one Use Integer values in place of other datatypes when performing
time Comparisons using Lookup and Filter transformations
For Oracle, increase the packet size in listner.ora and
tnsnames.ora • Eliminating Transformation Errors
• Connect to Oracle databases using IPC protocol Occurs when the Informatica Server encounters Conversion error,
Set up Oracle database Connection in listner.ora and Conflicting mapping logic and any condition set up as an error, such as
tnsnames.ora null input

Optimizing the Target Database • Optimizing Lookup Transformations


 Reducing the Number of Cached Rows by using the WHERE
• Drop Indexes and key constraints clause in the Lookup SQL Override Option
• Increase Checkpoint Intervals  Optimizing the Lookup condition by placing the conditions with an
• Use bulk loading equal sign first, if there are more than one filter condition
 indexing the columns in the Lookup ORDER BY and the columns
For Sybase and MS SQL Server
in the condition
• Use external loading
 Using Shared or Persistent Cache
For Teradata, Oracle and Sybase IQ
• Turn Off Recovery
• Optimizing Filter Transformations
• Increase database network packet size
 Use Filter transformation early in the mapping
 Avoid using complex expressions, instead of using simple integer Optimizing the Session
or true/false expressions in the filter condition
• Running concurrent batches
• Optimizing Aggregator Transformations  If there are Independent sessions that use separate sources
 Group by simple Columns like numbers instead of string or date and separate mappings to populate different targets, place
 Use sorted input them in a concurrent batch and run them simultaneously
 Use incremental aggregation  If there is a Complex mapping with multiple sources, separate
the mapping into several simpler mappings with separate
• Optimizing Joiner Transformation sources. Then create a session for each mapping and place the
 Though Joiner transformation needs a data cache to hold the sessions in a concurrent batch to run in parallel
master table rows and the index cache to hold the join columns
from the master table, make sure that the system is having enough • Partitioning Sessions
memory to hold the data and the index cache  When partition a Session, the PowerCenter Server opens
 Normal joins are faster than outer joins and result in fewer tows, multiple connections to source and target databases and
when possible use database joins for homogeneous sources extracts, transforms, and loads data in parallel pipelines

• Optimizing Sequence Generator Transformation • Allocation Buffer memory


 Creating a Reusable Sequence Generator and use it in multiple By default a session has enough buffer blocks for 83 sources and
mappings simultaneously targets, if a session contains more than 83 sources and targets the
 Number of Cached values should be greater than 1000 number of available memory blocks can be increased by adjusting
the following session parameters
• Optimizing Expressions  Increase the DTM buffer pool size, the default setting is
 Factoring out common logic 12,000,000 bytes
 Minimizing aggregator function calls. For e.g. Use SUM(A+B)  Decrease the buffer block size, the default setting is 64,000
instead of using SUM(A)+SUM(B) bytes
 Replacing Common sub expressions with Local variables To configure these settings, first determine the number of memory
 Choosing Numeric Versus String Operations blocks the Informatica Server requires initializing the session, then
 Choosing DECODE function Versus LOOKUP function configuring the DTM buffer pool size and the Buffer block size.
 Using Operators instead of Functions. For e.g. Use
Customers.First_Name ||’ ‘||Customers.Last_Name instead of For e.g. a non-partitioned session using a mapping that contains
CONCAT(CONCAT(Customers.First_Name,’ ‘), 50 sources and 50 targets
Customers.Last_name)  Then the Session requires 200 memory blocks
[(total number of sources+total number of
targets)*2]=Session buffer blocks
100*2=200
 Next, based on default settings determine that DTM
buffer pool size can be changed to 15,000,000 or can Optimizing the System
change the default buffer block size to 54,000
Session buffer blocks=0.9*(DTM buffer pool • Improve Network Speed
size)/(default buffer block size)*(number of partitions)  If flat file is there as a source or a target, move the file onto
200=0.9*14222222/64000*1 the Informatica Server System to improve Performance. When
Or you store flat file on a machine, other than a Informatica
200=0.9*12000000/54000*1 Server, session performance is dependent on the performance
of the network connections. Moving the files onto the
• Increasing the Cache sizes Informatica Server system and adding disk space might
If the allocated data or index cache size is not large enough to improve Performance
store the data, the Informatica Server stores the data in a  In the Case of using relational sources or target databases, try
temporary disk file as it processes the session data. Each time the to minimize the number of network hops between the source
Informatica Server pages to the temporary file, Performance slows and the target databases and the Informatica Server.
 Moving the target database onto the Informatica Server
• Increasing the Commit Interval system might improve the Informatica Server Performance
The smaller the Commit Interval, the more often the Informatica
Server writes to the target database, and slower the overall • Using Multiple PowerCenter Servers
performance Running multiple PowerCenter Servers on separate systems
against the same repository. Distributing the session load to the
The Commit interval does not affect the Performance of Bulk separate PowerCenter Server systems increases performance
load sessions. When the session is in Bulk mode, the target
database Commits as necessary, ignoring Commit commands • Using additional CPUs
sent by the Informatica Server

• Turning Off Session Recovery


The Informatica Server writes recovery information in the
OPB_SRVR_RECOVERY table during each commit. This can
decrease the Performance, hence the session Recovery should be
turned off in the Informatica Server Setup

• Disabling Decimal Arithmetic


• Reducing Error Tracing

You might also like