B. How Can We Run Same Job in 1 Day 2 Times?: 1. What Is Meta Data? Explain? Where It Is Used?
B. How Can We Run Same Job in 1 Day 2 Times?: 1. What Is Meta Data? Explain? Where It Is Used?
B. How Can We Run Same Job in 1 Day 2 Times?: 1. What Is Meta Data? Explain? Where It Is Used?
com
1. What is Meta data? Explain? Where it is used?
A. metadata means columns deff it means like eno,ename,sal,comm like that
a. It is a EAI tool (enterprise data interchange).It is totally differs from DS PX (ETL tool)
Using this we can convert the different format data to a desired format.
It does not deal with Extraction transformation and loading
A. pre-load means initial load it means first load into database if any modification did that is called
incremental load
And
Join Stage follows Modulus partitioning method. Merge follows same partitioning method as well
as Auto partitioning method. Lookup follows Entire partitioning method
7. Which type of partitioning follows the Remove duplicates stage?
Have no idea
Key Partitioning
Hash partition.
Hash by field partition or hash partition is best method for removing duplicates by assigning hash key
values.
-options
10. How do you remove duplicates without using remove duplicate stage?
Answer In the target make the column as the key column and run the job.
if the target is a sequential file this ll not work. for the same u have to use aggregator stage for the
solution
In parallel edition, by using sort stage, in properties tab select allow duplicates=false.
We can also define new environment variable. For that we can got to DS Admin.
Answer Macro-A macro is a set of instructions that can run applications. Example : A macro can
open your catalog, select a report(say for instance) convert that to another format and export it to
any specified location, provided the code (Program)is such. Prompt-A prompt specifies the manner in
which data in the reports are to be displayed. A Prompt can be defined at the catalog level either or
during report generation
Answer The modulus size can be increased by contacting your UNIX Admin.
Answer server
jobs mainly execute the jobs in sequential fashion,the ipc stage as well as link partioner and
link collector will simulate the parllel mode of execution over the sever jobs having single
cpu Link Partitioner : It receives data on a single input link and diverts the data to a
maximum no.of 64 output links and the data processed by the same stage having same
meta dataLink Collector : It will collects the data from 64 inputlinks, merges it into a single
data flowand loads to target. these both r active stagesand the design and mode of
execution of serverjobs has to be decidead by the designer
In PX, the types of parallel processing are 01. Pipeline Processing 02. Partitioning
Answer Container is a collection of stages used for the purpose of Reusability. There are
2 types of Containers.
a) Local Container
: Job Specific
b) Shared Container: Used in any job within a project. ?
There are two types of shared container:?
1.Server shared container. Used in server jobs (can also be used in parallel jobs).?
2.Parallel shared container. Used in parallel jobs. You can also include server shared
containers in parallel jobs as a way of incorporating server job functionality into a parallel
stage (for example, you could use one to make a server plug-in stage available to a parallel
job).regardsjagan
Answer By Database, one means OLTP (On Line Transaction Processing). This can be the source
systems or the ODS (Operational Data Store), which contains the transactional data.
Database data follows normalization process.........where as Data Warehouse data data follows
denormalization process.....
19. What is the default cache size? How do you change the cache size if needed?
Answer Default read cache size is 128MB. We can incraese it by going into Datastage Administrator
and selecting the Tunable Tab and specify the cache size over thereregardsjagan
Constant - Conditions that are either true or false that specifies flow of data with a link.
21. What is the flow of loading data into fact & dimensional tables?
Answer Here is the sequence of loading a datawarehouse.
1. The source data is first loading into the staging area, where data cleansing takes place.
3.Finally the Fact tables are loaded from the corresponding source tables from the staging area.
2) Dimensional modeling
2.a) logical modeling
2.b)Physical modeling
23. What are the difficulties faced in using DataStage? or what are the constraints in using DataStage?
2) What will happen, while loading the data due to some regions job aborts?
24. What is the order of execution done internally in the transformer with the stage editor having
input links on the lft hand side and output links?
Constraints run first; stage variables run second & column derivation.