Difference Between Join Stage and Look Up Stage in Datastage

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

Difference between Join Stage and Look Up Stage in Datastage

Join stage and Look Up stage have some different input requirements.

Based on the Requirement we use these Stages which is good for the performance.

We need to see weather we get good performance by using any stage in datastage.

And the stage supports the required inputs are not.

Lets say Join Stage= J.S And Look Up = M.S

J.S - The input names of the Join Stage are Left tables , Right Tables and Intermediate Tables. That means we call the left one as a Left table and right

one as a Right table and remaining tables between these tables are call it as Intermediate tables. ( That can be any number of tables in between )

L.S - The input names of the Look Up stage are Primary Tables and Reference Tables. That means First table will be considered as a Master tables

and remaining any number of tables are considered as a Update tables.

J.S - We can perform four types of Joins in Join Stage. That means it supports all the four types of Joins. They are

Inner Join

Left Outer Join

Right Outer Join

Full Outer Join

L.S - We can perform only two types of Joins in Look Stage. That means it supports two types of Joins here. And they are

Inner Join
And Left Outer Join

J.S - The Input requirements of Join stage are

There will be a N-Inputs ( In the case of Left, Inner, Right Outer Joins)

There will be a 2 Inputs ( In the case of Full Outer Join)

And there will be a 1 Output link and

there will be no reject links in Join stage.

L.S - The input requirements of Look Up Stage are as follows

There will be a N-Inputs ( In the Case of Normal Stage)

2 Inputs (In the Case of Sparse Look Up )

1 Output

And 1 Reject Link.

J.S - And Coming to Memory type. This is light memory Usage

L.S - It is a Heavy Memory Usage

J.S - Key Column Names should be Same. That is Primary record should be same with Secondary Records

L.S - Key column names Optional.

It should be same in the case of Sparse Look Up.

The Inner Join Type are as follows

J.S - Primary Records Should match with all secondary

i.e

J.S - Primary Records should match with all secondary.


The Input requirements with respect to Sorting are as follows.

J.S - In Join Stage Primary records and Secondary records should be sorted when coming( i.e data sorting is mandatory).

L.S - In Look Up stage it is Optional. That is all the primary and secondary records no need to be sorted.

And Treatment of Unmateched Records will be as follows

J.S - OK for the Primary and Secondary Records if the data is Unmatched records.

L.s - Ok for the Primary and we get warning if secondary records are unmatched.

Difference between Join Stage and Merge Stage in


Datastage

Join stage and Merge stage have some different input requirements.

Based on the Requirement we use these Stages which is good for the performance.

We need to see weather we get good performance by using any stage in datastage.

And the stage supports the required inputs are not.

Lets say Join Stage= J.S And Merge Stage = M.S

J.S - The input names of the Join Stage are Left tables , Right Tables and Intermediate Tables. That
means we call the left one as a Left table and right one as a Right table and remaining tables
between these tables are call it as Intermediate tables. ( That can be any number of tables in
between )

M.S - The input names of the Merge Stage are Master Tables and Update Tables. That means First
table will be considered as a Master tables and remaining any number of tables are considered as a
Update tables.

J.S - We can perform four types of Joins in Join Stage. That means it supports all the four types of
Joins. They are
Inner Join
Left Outer Join
Right Outer Join
Full Outer Join

M.S - We can perform only two types of Joins in Merge Stage. That means it supports two types of
Joins here. And they are
Inner Join
And Left Outer Join

J.S - The Input requirements of Join stage are

There will be a N-Inputs ( In the case of Left, Inner, Right Outer Joins)
There will be a 2 Inputs ( In the case of Full Outer Join)

And there will be a 1 Output link and


there will be no reject links in Join stage.

M.S - The input requirements of Merge Stage are as follows

There will be a N-Inputs


1 Output
And N-1 Reject Links.

J.S - And Coming to Memory type. This is light memory Usage

M.S - It is also a Light Memory Usage

Only LookUp Stage is considered as a Heavy Memory usage

J.S - Key Column Names should be Same. That is Primary record should be same with Secondary
Records

M.S - Key column names should be same here too. That is Primary records should be same with
Secondary Records.

The Inner Join Type are as follows

J.S - Primary Records Should match with all secondary

i.e

M.S - Primary Records should match with any secondary.

The Input requirements with respect to Sorting are as follows.

J.S - In Join Stage Primary records and Secondary records should be sorted when coming( i.e data
sorting is mandatory).

M.S - It was same with Merge stage also. That is all the primary and secondary records should be
Sorted ( i.e Mandatory)

And Treatment of Unmateched Records will be as follows


J.S - OK for the Primary and Secondary Records if the data is Unmatched records.

M.s - We get Warning message if the Primary records are Unmatched

And it is Ok for the secondary records.

Multiple Join stages to join three tables

If we have three tables to join and we don't have same key column in all the tables to

join the tables using one join stage.

In this case we can use Multiple join stages to join the tables.

You can take sample data as below

soft_com_1

e_id,e_name,e_job,dept_no

001,james,developer,10

002,merlin,tester,20

003,jonathan,developer,10

004,morgan,tester,20

005,mary,tester,20

soft_com_2

dept_no,d_name,loc_id

10,developer,200

20,tester,300

soft_com_3
loc_id,add_1,add_2

10,melbourne,victoria

20,brisbane,queensland

Take Job Design as below

Read and load the data in three sequential files.

In first Join stage ,

Go to Properties ----Select Key column as Deptno

and you can select Join type = Inner

Drag and drop the required columns in Output

Click Ok

In Second Join Stage

Go to Properties ---- Select Key column as loc_id

and you can select Join type = Inner

Drag and Drop the required columns in the output


Click ok

Give file name to the Target file, That's it

Compile and Run the Job

Join Stage Without Common Key Column

If we like to join the tables using Join stage , we need to have common key

columns in those tables. But some times we get the data without common key column.

In that case we can use column generator to create common column in both the

tables.

You can take Job Design as

Read and load the data in Seq. Files

Go to Column Generator to create column and sample data.

In properties select name to create.


and Drag and Drop the columns into the target

Now Go to the Join Stage and select Key column which we have created( You can give

any name, based on business requirement you can give understandable name)

In Output Drag and Drop all required columns

Give File name to Target File. Than

Compile and Run the Job.

Sample Tables You can take as below

Table1

e_id,e_name,e_loc

100,andi,chicago

200,borny,Indiana

300,Tommy,NewYork

Table2

Bizno,Job

20,clerk

30,salesman

Inner Join in Join Stage with example


If we have a Source data as below

xyz1 (Table 1 )

e_id,e_name,e_add

1,tim,la

2,sam,wsn

3,kim,mex

4,lin,ind

5,elina,chc

xyz2 (Table 2 )

e_id,address

1,los angeles

2,washington

3,mexico

4,indiana

5,chicago

We need the output as a

e_id, e_name,address

1,tim,los angeles

2,sam,washington

3,kim,meixico

4,lin,indiana

5,elina,chicago
Take job design as below

Read and Load the both the sourc tables in seq. files

And go to Join stage properties

Select Key column as e_id

JOIN Type = Inner

In Out put Column Drag and Drop Required Columns to go to output file and click ok.

Give file name for Target dataset and then


Compile and Run the Job . You will get the Required Output in the Target File.

What To Choose Join Stage or Lookup stage in Datastage

How to choose the stages.

Join stage or Lookup stage

We need to be careful when selecting the stages. We need to think about the

performance of the Job before selecting the stages. Time is more precious to the

clients. That's why we need to get the Job for very less time. We need to try

our best to get good performance to the Job.

Both the stages Join stage and Look up stage performs same thing. That is they

combine the tables we have. But why Lookup stage has been introduced.

Look Up Stage have some extra benefits which will not come with the Join stage.

Look up stage doest not required the data to be sorted. Sorting is mandatory with

the Join stage. In Look Up stage the columns with different column names can be

joined as well where it is not possible in the Join stage. That means Join stage,

the column name must be similar.

A Look Up Stage supports reject links , if our required demands reject links we
can’t go with Join stage. Because Join stage doesn’t supports Reject Links. And

Lookup stage has an option to fail the Job if the look up fails. It will be useful

when the look up stage is expected to be successful.

Look up stage keeps the reference data into the memory which yields better

performance for smaller volume of data. If you have large amount of data, you

need to go with Join stage.

You might also like