B. How Can We Run Same Job in 1 Day 2 Times?: 1. What Is Meta Data? Explain? Where It Is Used?

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5


1. What is Meta data? Explain? Where it is used?
A. metadata means columns deff it means like eno,ename,sal,comm like that

B. How can we run same job in 1 day 2 times?

a. yes we can run one job twice a day that is we have a option called in data stage director –

3. What is the difference between DataStage and data stage TX?

a. It is a EAI tool (enterprise data interchange).It is totally differs from DS PX (ETL tool)
Using this we can convert the different format data to a desired format.
It does not deal with Extraction transformation and loading

4. What is pre-load in Hashed file?

A. pre-load means initial load it means first load into database if any modification did that is called
incremental load

5. What is exact difference between Parallel Jobs and server Jobs?

A. main differences are

1) px2 is faster than server jobs

2) platform independent
3) security raising
4) partitioning


6. Which partitioning follows in join, merge and lookup?

A. Round Robin partition

Join Stage follows Modulus partitioning method. Merge follows same partitioning method as well
as Auto partitioning method. Lookup follows Entire partitioning method
7. Which type of partitioning follows the Remove duplicates stage?
Have no idea

Key Partitioning

Hash partition.

Hash by field partition or hash partition is best method for removing duplicates by assigning hash key

8. How to run the job in command prompt in unix?

Answer Using dsjob command,


dsjob -run -jobstatus projectname jobname

9. How do you call procedures in datastage?

Use the Stored Procedure Stage

10. How do you remove duplicates without using remove duplicate stage?

Answer In the target make the column as the key column and run the job.

if the target is a sequential file this ll not work. for the same u have to use aggregator stage for the

Answered By: debasis    Date: 12/8/2007

In parallel edition, by using sort stage, in properties tab select allow duplicates=false.

11. What is environment variables? What is the use of this?

Answer Basically Environment variable is predefined variable those we can use while creating DS
job. We can set either as Project level or Job level. Once we set specific variable that variable will be
available into the project/job.

We can also define new environment variable. For that we can got to DS Admin.

I hope u understand. For further details refer the DS Admin guide.

12. What is the difference b/w macros and prompt?

Answer Macro-A macro is a set of instructions that can run applications. Example : A macro can
open your catalog, select a report(say for instance) convert that to another format and export it to
any specified location, provided the code (Program)is such. Prompt-A prompt specifies the manner in
which data in the reports are to be displayed. A Prompt can be defined at the catalog level either or
during report generation

13. What is Modulus and Splitting in Dynamic Hashed File?

Answer The modulus size can be increased by contacting your UNIX Admin.

14. Functionality of Link Partitioner and Link Collector?

Answer server

jobs mainly execute the jobs in sequential fashion,the ipc stage as well as link partioner and
link collector will simulate the parllel mode of execution over the sever jobs having single
cpu Link Partitioner : It receives data on a single input link and diverts the data to a
maximum no.of 64 output links and the data processed by the same stage having same
meta dataLink Collector : It will collects the data from 64 inputlinks, merges it into a single
data flowand loads to target. these both r active stagesand the design and mode of
execution of serverjobs has to be decidead by the designer

15. What does a Config File in parallel extender consist of?

Answer Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.

16. Types of Parallel Processing?

Answer Parallel Processing is broadly classified into 2 types.
a) SMP - Symmetrical Multi Processing.
b) MPP - Massive Parallel Processing.

In PX, the types of parallel processing are 01. Pipeline Processing 02. Partitioning

17. Containers : Usage and Types?

Answer Container is a collection of stages used for the purpose of Reusability. There are
2 types of Containers.
a) Local Container
: Job Specific
b) Shared Container: Used in any job within a project. ?
There are two types of shared container:?
1.Server shared container. Used in server jobs (can also be used in parallel jobs).?
2.Parallel shared container. Used in parallel jobs. You can also include server shared
containers in parallel jobs as a way of incorporating server job functionality into a parallel
stage (for example, you could use one to make a server plug-in stage available to a parallel

18. Differentiate Database data and Data warehouse data?

Answer By Database, one means OLTP (On Line Transaction Processing). This can be the source
systems or the ODS (Operational Data Store), which contains the transactional data.

Database data follows normalization process.........where as Data Warehouse data data follows
denormalization process.....

19. What is the default cache size? How do you change the cache size if needed?
Answer Default read cache size is 128MB. We can incraese it by going into Datastage Administrator
and selecting the Tunable Tab and specify the cache size over thereregardsjagan

20. What are Stage Variables, Derivations and Constants?

Answer Stage Variable - An intermediate processing variable that retains value during read and
doesnt pass the value into target column.

Derivation - Expression that specifies value to be passed on to the target column.

Constant - Conditions that are either true or false that specifies flow of data with a link.

21. What is the flow of loading data into fact & dimensional tables?
Answer Here is the sequence of loading a datawarehouse.

1. The source data is first loading into the staging area, where data cleansing takes place.

2. The data from staging area is then loaded into dimensions/lookups.

3.Finally the Fact tables are loaded from the corresponding source tables from the staging area.

22. Dimension Modelling types along with their significance

Answer Data Modeling

1) E-R Diagrams

2) Dimensional modeling
2.a) logical modeling
2.b)Physical modeling

23. What are the difficulties faced in using DataStage? or what are the constraints in using DataStage?

Answer 1) if the numbers of lookups are more?

2) What will happen, while loading the data due to some regions job aborts?

24. What is the order of execution done internally in the transformer with the stage editor having
input links on the lft hand side and output links?

Answer Stage variables, constraints and column derivation or expressions.

Constraints run first; stage variables run second & column derivation.

You might also like