Datastage Best Practices Job Design Tips

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Datastage best practices - Job design TIPS

Few design tips to build clean and effective jobs in IBM datastage:

- Avoid propagation of unnecessary metadata between the stages. Use Modify stage and
drop the metadata. Modify stage will drop the metadata only when explicitey specified using
DROP clause.
- Do remember that Modify drops the Metadata only when it is explicitly asked to do so
using KEEP/DROP clauses.
- Make use of Modify, Filter, and Aggregation, Col. Generator etc stages instead of
Transformer stage only if the anticipated volumes are high and performance becomes a
problem. Otherwise use Transformer. Its very easy to code a transformer than a modify stage.

- Turn off Runtime Column propagation wherever it’s not required.


- One of the most important mistake that developers often make is not to have a volumetric
analyses done before you decide to use Join or Lookup or Merge stages. Estimate the volumes
and then decide which stage to go for.
- Add reject files wherever you need reprocessing of rejected records or you think
considerable data loss may happen. Try to keep reject file at least at Sequential file stages and
writing to Database stages.
- Make use of Order By clause when a DB stage is being used in join. The intention is to
make use of Database power for sorting instead of datastage reources. Indicate don’t sort
option between DB stage and join stage using sort stage when using order by clause.
- Use Sort stages instead of Remove duplicate stages. Sort stage has got more grouping
options and sort indicator options.
- One of the most frequent mistakes that developers face is lookup failures by not taking
care of String padchar that datastage appends when converting strings of lower precision to
higher precision.Try to decide on the APT_STRING_PADCHAR, APT_CONFIG_FILE
parameters from the beginning. Ideally APT_STRING_PADCHAR should be set to OxOO
(C/C++ end of string) and Configuration file to the maximum number of nodes available.
- Data Partitioning is very important part of Parallel job design. It’s always advisable to have
the data partitioning as ‘Auto’ unless you are comfortable with partitioning, since all DataStage
stages are designed to perform in the required  way with Auto partitioning.
- While doing Outer joins, you can make use of Dummy variables for just Null checking
instead of fetching an explicit column from table.

1/1

You might also like