1.how Can We See The Data in Dataset? (A)
1.how Can We See The Data in Dataset? (A)
1.how Can We See The Data in Dataset? (A)
We can also view the data set through datastage client called designer, for this go to tools menu->clik on dataset management->select the dataset whichnone you want to view>the click ok->click on show datawindow(in cube shape)->click ok ....... B. If we want to see all the data it is : orchadmin dump <dataset_name> If we want to see specific field data it is :orchadmin dump -name -field <field_name> -field <field_name> <dataset_name> But before that the DS environment should be set. Like below ( but may change based on the configuration): export PATH . /opt/IBM/InformationServer/Server/DSEngine/dsenv LD_LIBRARY_PATH=$APT_ORCHHOME/lib:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH 2.what is usage of datastage with materialized views HP2 Ans: A. Materilized views are similar to normal views, but the difference is, materilized views are stored in the databse and they are refereshed timely to get the new data.... B. I Think u r answer correct in Oracle view but question is how can we use the materilized views in dataStage. My view is No need of materialized view in DataStage it can be used in Reporting time ( in OLAP) 3.What is RCP? TCS2 Ans: RCP is know as Run-time column propagation In running the job the columns may change from one stage to other stages & at the same time we will be loading the unnecessary columns in the stage that doesn't required to process.So we can load only the required columns to target database, as this is done by enabling the RCP.
1. eno
100
ename suresh
amount2 2000
this is our sourse data i would loke to disply like this eno ename esal acct 100 suresh 10000 sbi1 100 suresh 10000 sbi2 amount 1000 2000
Ans : In server jobs three types of Updates are available, Update Existing Rows Onlly Update Existing Rows Or Insert New rows Insert New rows or Update Existing Rows In Parallel jobs two types of Update actions are available, Update Upsert 4.what is the new version in Datastage ? what is the diff b/t New vesion & 7.5 version? Ans: A. New version of DS is 8.0 and supprots QUALITY STAGE & Profile Stage and etc B. New version of datastage was IBM WebSphere datastage 8.1.0 PX 5.How can u execute the sql query through unix? What is the Primary key for Dimension table? what is the primary key for Fact table? TCS3 Ans: Primary Key for the dimension table is its own primary Key PRimary Key for Fact table is Foriegn key which dirived from Dimensional table 6.How To find the location of APT_confid file Ans: A. you can find it in the job propeties--> parameters tab below it there is another called add environmental tab in that you can find under the parallel tab. B. Please follow the step DS Administrator--> Project Properties -- > Environment setup 7.How can we move a DATASTAGE JOB from Development to Testing evironment with the help of a datastage job using unix commands. Ans: 8. can we see the data in fixed width file? how can u change the datatype of fixed width files? Infosys1 Ans: in seq file field width option is there we can change datatype and length 9.what is stagearea?what is stage variable? 2.this is my source source : id, name target : id, name 100, murty 100, madan we have three duplicate records for the id column,how can we getthe source record?
100, madan 100, saran Ans: stage area is temporary memory location in real time environment we get the data in form of files from client so we need to store one temporary memory location and perform a all validations like data validations and filed validation and file validations . ofter we lode into a warehouse stage variable assinge the expression in transformer stage that can be reusable only with in transformer stage only so it is called the stage variable i am taking key column as name so we get the source record.
10.Hi I have scenario like this
11.How can i get target table like that without using Transformer stage? Ans: u use surrogate key generator. 12.How can we run same job in 1 day 2 times IBM4 Ans: multiple instance is a good concept available in ds parallel jobs through this concept we can run aphysical job more than one time at a time. check the option in job properties. "allow multiple instance". 13.how to call routines in stages? ME2 Ans: There is a routine activity in job sequence through which you can call your routine. Call routines in designer by using ytransformewr stage 14.When we will go for Data elements? TCS3 Ans: Data Elements are nothing but the Datatypes of the particular column.If source datatype not mathing with the target column that time will use.This available in Ds Manager. 15. i have source like deptno,sal 1,2000 2,3000 3,4000 1,2300 4,5000
5,1100 i want target like target1 deptno,sal 1,2000 3,4000 4,5000 target2 2,3000 1,2300 5,1100 with out using transformerstage Ans: From source copy the data into sets using copy or peak stage then for each copy stage use remove duplicate stage with duplicate to retain first and last you will get desired output..... first generate the surrogate key (eg rownum). then using the filter stage give the condition where= (rownum%2=0) to one dataset and rejected to another dataset. I think this will work. 16.how to find out number of records imported into source file? Ans: DS Job Monitor is available in DS Director.job Monitoring will help you to know the particular job how many records are processing/processed for each and every stage inside the job. 17.I have scenario like this seq file --> transformer --> sort stage --> dataset In this job which partitioning technique you are using and why??? CTS2 Ans: /// Hash partition or modulus partition. When using sort operations always better to go for key partition techniques. /// we want write partition on integer type field like deptno then go to modulus other wise go to hash is better in performance wise. 18.What is exact difference between Parallel Jobs and server Jobs.. IBM6 Ans:
SUPPORTS SYNCHRONIZED
SERVER JOBS USES SYMMETRICMULTIPROCESSING PARALLEL JOBS BOTH MASSIVE PARALLEL PROCESSING AND SYMMETRICMULTIPROCESSING 19.Explain the Change Apply stage? Ans: Inorder to know about Change Apply, u should have knowledge on ChangeCapture Stage.. Its better u go through the Datastage manuals.it will give you clear idea with example.. If you don't have docs..send your id ..i will forward you... //change capture- it takes two i/p links and one o/p link,It returns a change code in that change records returned into o/p links.It returns both source data and updated records Based on change code we can evaluate data. ccode-- 1 indicates record not in 1st file 2 indicates record not in 2nd file 20.What is PX? IBM1 Ans: PX stands for parallel extender. it is a process of executes the job on parallel node. 21.what is advantages of snowflake when it is used? IBM2 Ans: This can be used when the project data having the more number of devisions along with sub devision in one particular part of department. and more number of duplicated data is used for the project developmen. eg. pharma projects.... The main disadvantage of snowflake schema is its normalised format, so performance is poor and it is complex.
22.WHAT ARE PERFORMANACE SETTINGS YOU HAVE IMPLEMENTES IN YOUR PROJECT? Ans: ////In drvition ,Instead of calling routines implementing stage variales. 2.while using ODBC Stag,adjust as per row seetings. 3.using user defined queiries while Extract the date from sorce. 4.minimize null values 5.minimize unused columns 6.minimize warninigs 7.don't use more than 7 look up in a job,if exceede take another transformer. 8.don't use morethan 20 stages in a job. 9.before loading data drop indexes after re created indexes. 10.worked with DBA-Admin for 11.tuned in Project tunes in Datastage Admin ///// 1) Use user defined queries for extracting data from databases. 2) Use native databases stages rather than using ODBC. 3) Use sql join statement rather than using a lookup stage, if the data is less. 4) Tune the SQL query (the select statement used to extract the data from source) 5) use less oracle stages, less transformers and lookup stages, if you have more number of stages, divide the job into multiple jobs. 6) if ur using server jobs use IPC, Partitioner, collector. 7) if using parallel job then set the environmental variable APT_DUMP_SCORE to TRUE. this will give a job report that will show you what are all the operators inserted by datastage (like sorting, needed or uneeded partition and collection), so u can manually change the partition methods. /// Most of the performance tunning steps depends on the job design. 23.What are unix quentios in datastage TCS1 Ans: How to kill a process 1.Kill Process status 2.ps Regular Expresion 3.Grep Shellscript execution 4../script name Running Jobs 5.dsjob -run listing out files 6.ls lrt 24.How do you load dimension data and fact data? Which is first 2 Ans: A) 1.First load the data in to Dimension tables from the sourse table
2.Load the data into fact table from source by putting lookup on to the dimension tables (If it is a fact less fact table U need to load only dimension data other wise load the messers from source also). B) first we load data into dimension table then we will loadin to fact table.. dimensions and facts has parent and child relationship.many to one relation. 25.What is Fastly Changing Dimension? TCS3 Ans: // The Entities in the Dimention which are change rapidly is called Rapidly(fastly) changing dimention. best example is atm machine transactions. // usually in almost all projects have customer dimension which is fast growing dimension . The fast changing dimensions which changes frequently. 26.what is the use of invokation id IBM2 Ans: // It is required when we have a multi-instance job. // It is an ID that we specify for our identification while using a job in multiple instances. One unique ID for each instance of the job. When we enable to run a job in multiple instances, DS engine would throw a window where in we need to specify the invocation ID for that instance. When we check the log in director, we can identify the job by its name followed by its invocation ID. 27.How can you join flat file, oracle as a sources? IBM4 Ans: // First you convert the data from flat file into oracle, so now we have two oracle db's then easily we can join these two sources using join stage. // first populate flatfile data to orale db two methods, 1. write a join query in source oralce stage and link to target. 2. take two oracle sources like s1 and s2 , use join stage and link to target or direct method 3.take flat file and oracle source, use join stage and link to target. // By using Seqfile stage(flat file) and Oracle(Dynamic RDBMS stage)to extract the data from sources, the join the data using join stage. 28.At source level i have 40 columns,i want only 20 cols at target what r the various ways to get it IBM4 Ans: //Use Modify Stage and Drop the columns that are not needed . Or Use a Transformer and take the links tat are only needed(make sure that RCP is not marked ) // In the transform stage u select wich coloums u want those coloums only drage from input link to output link.
// 1. based in reuirement if you need 20 columns, you have to select 20 columns in database query. 2. if source is file, use transformer stage and populate require 20 columns only 3. if source is database or file you can use copy stage insted of transformer and modify. 29.Hi , Today 1000 records updated, tomorrow 500 records updated how to find that? Wipro2 Ans: /// by using scd1,scd2 /// As a developer you cannot find it unless you have CREATE_TS column in the table. Its always a best practice to have this column which records timestamp. DBAs can find out in several ways (Tracing etc) but its not a straight forward way 30. ename like ibm,tcs,hcl need display those records only how? Ans: /// Using filter or Transformer /// If the source is Oracle,or any Rdbms then use the 'where clause' option in the 'available properties to add' tab other wise we can done using 'filter or transformer' 31.when we will use connected Lookup & Unconnected Lookup Wipro3 Ans: /// This Questions Related for Informatica not in Datastage. /// In Datastage, the same can be expalained as Normal (Connected)Lookup and Sparse(unconnected) lookup. For sparse lookup When the refrence data volume is huge and primary volume is very less,you can pass the input records key columns to match against reference in refrence table extraction query where clause. 32.HOW CAN WE SEE THE DATA IN DATASET? IBM7 Ans: /// We cannot see the data in the Data Set (.ds files) through Unix, Editplus etc. But we can view the data by connecting a Peek stage to the Data Set stage in a job. Go to Director--> job name ---> Peek stage (view .ds file data) /// We cannot see the data in the Data Set (.ds files) through Unix, Editplus etc. 1. But we can view the data by connecting a Peek stage to the Data Set stage in a job. Go to Director--> job name ---> Peek stage (view .ds file data) or 2. Connect Sequential File Stage as the target stage to the Data Set stage. In the Sequential File Stage-->Input Tab-->Properties--> Give the Target file path. For Eg:
(/local/data1/edw/temp/DataSetOp.txt) and run the job. The new text file gets loaded with data from the .ds file. Thus data will be in a readable format in the text file. /// right click on the data set and select by view data option /// In the designer, tools->Dataset Management-> Browse for the dataset you want to view-> A window with the nodes on which the dataset is distributed gets displayed-> Select one of the nodes and Click on "Show Data Window" option that appears on the top left section of the window. By doing this one gets to know the data on each node. 33.How to convert rows into columns Ans: BY using Pivot stage convert rows into columns in datastage. If we want to change in informatica we use normalizer transformation. 34. How can u handle null values in transformer stage. Ans: ///Null handling in transformer in two ways one for 1. identifing the Null values 2. Remove the Null values 1.Ans: by using Null handing function 2.Ans: By using Constrains we can Remove the Null Values Rows In transformer ///in transformer we have null handling functions in that by using null to zero we can handle null. 35. it is possible to load two tables data into one sequential file?if possible how? plz share with me? Ans: ///sequential file has no storage memory limit. so that possible.but hashed file is not possible. /// Yes you can join two tables and load the data into sequential file... Take two tables use join stage and join the data based on the key and then load the data into sequential file... 36. what is the best stage in datastage parller jobs to use full out join and why? Ans: ///use the join stage rather than using lookup and merge stages. join stage supports all the joins like inner join,leftouter join,right outer join,full outer join lookup supports only inner,leftouter only ///join stage is the best,why means data getting from buffer area. 37. How can remove duplicates in a file using UNIX? Ans: ///$sort -u filename will sorts the data and removes the duplicates lines from the output
/// You can use UNIQ to remove duplicates frm a file but it can consider only same consecutive records. Ex: if u hve records as 1 vishwa 1 Shruti 2 Ravi 3 Naveen 1 vishwa and if u use UNIQ cmd, then you wil get vishwa, Ravi, Naveen and vishwa. So perform sort before using UNIQ so tat to avoid duplicates 38. I have 2 jobs.I want to ru job B if job A has run 3 times.How can I achieve this through datastage? Ans: Hi Friend, by using sequencer, we can run this scenario. Put the first job A in loop for 3 times. After successful completion of 3 runs, then, run the second job B. 39. source which format u will get either fixed or delimiter length format? what is the symbol of delimiter? Ans: If the source is file then the records is separated by either fixed length or delimiter format. and the symbol of delimiter means any some special character separate the records EG:Delimiter Eid,Ename,Sal 1,aaa,2000 2,bbb,3000 EG:FixedLength Eid Ename Sal 1 aaa 1000 2 bbb 2000 40. How do u set a default value to a column if the column value is NULL? Ans: ///If IsNull(link_name.column_name) then 'default_value' else link_name.column_name ///let make it more simpler use NullToValue() or NullToempty() function 41. when will you go for dataset and fileset? Ans: ///Dataset and Fileset are almost the same . Dataset is tool dependent and File Set is OS dependent(UNIX).
Dataset don't have any regulation of the amount of data that it has where as Fileset have limits to the data. 42. CAN WE DO HALF PROJECT IN PARALLEL JOBS AND HALF PROJECT IN SERVER JOBS? Ans: Yes you can, as long as your client is OK with it. We design jobs to finish certain tasks in ETL. If your task has less data and more complex tranformations then use server jobs and vice versa. 43. what r the stages mostly used in realtime scenarios? Ans: these are mostly used in real time scenario: database stages are: 1.oracle enterprise 2.tera data 3.db2 4.odbc file stage are: 1.sequencial file stage 2.dataset 3.file set 4.hash file stage processing stage are: 1.transformation 2.sort 3.modify 4.copy 5.funnel 6.lookup 7.surrogate key 8.join 9.aggregater 10 funnel stage 11.merge stage. debugging stage are: 1.head stage 2.tail stage 3.peek stage 44. Please tell me What is difference between 7.5 and 8.0 Ans: The major diff'es b/w 7.5 and 8.0 are 1.The look range introduced in 8.0 2.The manager is combined with Designer in 8.0
3.Parameter Set was introduced 4.Scd's are introduced in 8.0 5.Changes in Surrogate key in 8.0 45. Can a fact table contains textual information? Ans: ///NO.As a Fact table, it contains foreign keys for maintaining relationships and measures for analyzing the business as year/half-year/quarter wise. ///ANS: We can put textual information but not useful, actually in FACT table, in contains numeric value for calculating measurement like sales, profit/loss, revenue, date/time, production, stock etc. in Fact table, it contains foreign keys for maintaining relationships and measures for analyzing the business as week/ month/quarter/year etc. /// In a factless fact table we can have "textual information". 46. Tell me Wt main advantage of Stage varibles? Project level hints? Ans: The main advantage of Stage Variables is to reduce the repetition of the code for all the required columns in the Transformer. Say for Example, if you are having 10 lines of code to be used for all the derivations of columns in the transformer, we declare a Stage variable once and we assign these 10 lines of code to that variable and can place this Stage variable where ever needed in the derivation of the columns in the Transformer Stage. Th scope of these variables is limited with in the stage it is created. 47. what is a nodemap constraint? Ans: /// Nodemap constraint is a property check in the stage. The job process can restriced at a stage level with this check. It provides a drop down list of nodes available and by selecting a particular node the process only on that particular node. For example, if the incoming data is being processed in parallel mode then in the next stage if the nodemap constraint is checked then from here on the processing takes place in a sequential mode. /// Nodemap constraint is a property check in the stage. The paralell execution of the job can be restriced to a particular node or nodes at a stage level with this check. It provides a drop down list of nodes available and by selecting a particular node or nodes the job executes only on that particular node or nodes. For example, if the incoming data is being processed in parallel mode on 4 nodes then in the next stage if the nodemap constraint is checked and only 1 node is selected then the entire data is processed in a sequential mode i.e. on that particular node.
In case the incoming data processes on 4 nodes and nodemap contraint is checked and 2 nodes are selected then the data is distributed on these 2 nodes and the job process is carried out on the same. 48. How many nodes supported by a one cpu in parallel jobs? Ans: /// 1 cpu can support 1.5 node.. if u have dual core it supports upto 3 nodes only... /// Here node is nothing but processor, parallel jobs supports uniprocessor,smp,mpp and clustered system so nodes supporting is depend upon h/w architecture ok. 49. How do y read Sequential file from job control? Ans:/// when ever u want to run more than one job with single task then u have to select batch processing.and batch processing select with help of job control. ///You can read sequential from a batch job. In the batch job-->job control use functions Open seq #path name#-- this will open the sequential file Read seq.. check the syntax for these in hlp menu 50. why we use parameters instead of hard code in datastage? Ans:/// for security purpose ///not only security,for suppose i want to give another values go to design and change the values it is not convinent for user instead of this using parameters giving values at runtime is flexible 51. what is the difference between lookup stage reject link and merge stage reject link in datastage Parallel jobs? interm of output in Merge Reject link and Look Up Reject link ? Ans: Lookup stage reject link captures unmatched primary entries and Merge stage reject links captures unmatched secondary entries 52. How do u view log file for a particular job? Ans: we can view the log of a job in datastage director. For example you are in datastge designer window and you are running the job. Now goto tools menu and click "run director". Now you can see datastage director, click the log button on the tool bar. Here you could view log of your job. 53. what are fact tables and dimension tables? give example assuming one table? Ans: A Fact table is a table that contains the measures of intersert. A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.
54. which memory is used by lookup and join Ans: 55. how to unlock a locked job in datastage 8.0? Ans: /// data stage director > job > cleanup resources>stateupid>job names> select the required job> click on log out we unlock the job if it is not work then we go for data stage administrator>session management>active session In session management you can find locked jobs there u can select.. /// From Datastage web console remove the session for the locked jog. It will unlock the job. /// Tools > Reporting console > Administration > Session Management > Active session In session management you can find locked jobs there u can select. 56. How can your remove the duplicates in sequential File? Ans: /// In the source sequential file goto Output->Properties under options you mention reject mode = Output. This will gives the duplicate records in the output. This will 100% work why because in my present project we are using the same format. /// Go to Sequential File ---> Select Filter Property -----> Set Filetr = sort | Uniq...... Now its Working properly...... 57.Hhow can we send even and odd records from a sequential file to two different targets Ans: In transformer specify constraints Mod(@INROWNUM,2) =1 for Odd records and Mod(@INROWNUM,2) = 0 for even records. 58. what is the unix script to run the job? Please mention commands which we use often? Ans: /// by using dsjob -run -parametere<param name> -mode -wait projectname or -jobname here ds job is command -run is keyword /// /opt/IBM/InformationServer/Server/DSEngine/bin/dsjob -run -wait -jobstatus -param
<Parameter_Name1>=<Parameter_Value1> -param <Parameter_Name2>=<Parameter_Value2> <Project_Name> <Job_Name> Some times you can use dsjob instead of /opt/IBM/InformationServer/Server/DSEngine/bin/dsjob /// Unix command to run Datastage job :: dsjob -run -jobstatus projectname jobname 59. how to define satge variables in transformer stage? Ans: The stage variables are those variables that can be used and declared on our own inside the transformer stage. These are accessible only from the transformer stage in which they are declared. Any Stage variables we declare are shown in a table in the right pane of the links area. Defining the Stage Variable: Select Insert New Stage variable from the stage variable shortcut menu. A New variable is added in the table showing the stage variables in the links area. The default name given would be StageVar with the default datatype Varchar(255). These properties can be edited using the Transformer stage properties dialog box. To Open the Transformer Stage Properties box *Click on the stage properties button on the transformer toolbar. *Select the stage properties from the shortcut menu *Select Stage variable properties from the stage variable shortcut menu The Properties box contains the Variable Name, Initial Value, SQL data type, Precision, Scaling and Description (optional). Edit the variable as required and Click OK. 60. What are the common errors in data stage? Ans: //// 1) NLS - warning 2) Null value populated for a non null column 3) metadata mismatch - check the target databse data type and lenght. 4) datatype mismatch - wrong datatype used 5) parallel loding not enabled (this is a fatal error in parallel jobs if the database dosent have parallel loading enabled - oracle 10g)
//// Common error will get while Extraction ,Transforming and loading the data into target. 1.Data Type Mismatches(eg:non-numeric field in numeric field). 2.null values in not null fields. 3.fields size different. 4.data type sizes between sorce to target. //// following is general error that i faced. 1.Column mismatch or invalid no of column. 2.file opening error. 3.parameter are not defined or not given correct value. 4.mutex error 5.process time out. 61.Is file set can support i/p and o/p link at a time? Ans: No, it can have either one input link or one output link, just like Dataset stage. 62. How do u call shellscript/Batch file from DS? Ans: You can invoke shell script through Job Properties --> Before Job routine --->ExecSH (using execute command stage) 63. How did you reconcile source with target? Ans: /// By using DSGetLinkInfo function we can trace out how many records extracted from source and how many records loaded into target /// 1. Use Show Performance Statitics option, while running your job. This will give you records read from source, written to output link and written to Reject link. 2. If you want job statistics in a file, Call the DSendJobReport with the argument 2;directorypath in the "After routine" of the job for which the statistics are required. This creates the job log in text format in the directorypath specified. This will give you number of records read, written and rejected. You could write a shell script to format it if required. 64. parallel jobs run on cluster machines. server jobs run on smp and mpp. What do mean by cluster mechines and SMP and MPP..? Ans: SMP.)SYMTERRIC MULTI PROCESSING SMP:-ALL PROCESSORS WILL BE USING SAME MEMORY DISK,MEMORY BUFFERS.HERE PROCESSCORS WILL BE WAITING FOR DISK I/O PROCESS.
MPP:MASSIVELY PAPALLEL PROCESSING.)HERE EACH PROCESSOR WILL BE HAVING IT'S OWN MEMORY AND MAMORY DEVICES,PHYISICALLY ALL THIS PROCESSORS WILL EMBEDDED IN A SINGLE CABIN.AND WILL COMMUNICATE THROUGH SHARED MEMORY BUSES. CLUSTER M/C:) SAME AS MPP ,BUT PHYSICALLY THIS M/C ARE HOUSED IN DIFFERENT CABINS AND WILL COMMUNICATE THROUGH HIGH SPEED NETWORKS. 65. what is usage of datastage with materialized views? Ans: Materilized views are similar to normal views, but the difference is, materilized views are stored in the databse and they are refereshed timely to get the new data.... 66. How can we do null handling in sequential files? Ans: While importing the data we can specifie nullable 'yes' in define tab. or In format tab we can specifie the Final delimeter as 'null'. //// GO to the Format tab in the sequential file then select the Field defaults option and then select Null field value option at right side bottom and then provide the value........... 67. how to remove duplicates in transformer stage by using stage variables? one example? Ans://// In Stage variable: stage_variable3 <map> stage_variable1 if column=stage_variable1 than 0 else 1 <map> stage_variable2 column <map> stage_variable3 Put stage_variable2 as constrain to target stage. //// using hash partition technique, we can bring duplicate data(based on key columns) in one partition. Then in stage constraints filter out data with setting @inrownum = 1. This will remove duplicate in transformer stage. 68. What is isolation level and when do u use them?
Ans: Specifies the transaction isolation levels that provide the necessary consistency and concurrency control between transactions in the job and other transactions for optimal performance. Because Oracle does not prevent other transactions from modifying the data read by a query, that data may be changed by other transactions between two executions of the query. Thus, a transaction that executes a given query twice may experience both nonrepeatable reads and phantoms. Use one of the following transaction isolation levels: Read Committed. Takes exclusive locks on modified data and sharable locks on all other data. Each query executed by a transaction sees only data that was committed before the query (not the transaction) began. Oracle queries never read dirty, that is, uncommitted data. This is the default. Serializable. Takes exclusive locks on modified data and sharable locks on all other data. It sees only those changes committed when the transaction began plus those made by the transaction itself through INSERT, UPDATE, and DELETE statements. Serializable transactions do not experience nonrepeatable reads or phantoms. Read-only. Sees only those changes that were committed when the transaction began. This level does not permit INSERT, UPDATE, and DELETE statements. 69. how can find maximum salary by using Remove duplicate stage? Ans: R.D Stage remove the duplicates every one knows but u can find max salary...... Give the key of salary order by descending other option u can put=first 70. With out using Funnel Stage, how to populate the data from different sources to single target? Ans://// better use join stage kiran,bcause join stage accepts toretrive the records from different stages,and also join accepts many inputs and one output link.so dis stage can satisfies ur condition.and also ihave to say merge,lookup accepts rejects u need one target so join will satisfies.... //// We can populate the sources metadata to target without using funnel stage using "Seqential File" Stage. let me explain In Sequential file we have a property called "File" so first u give the file name and load the data.
Next time in the same sequential file right side we have a property "File" just click on that so it will ask another file name just give other file name ...do not load the data, in the same you can give how many files u have. Finally u ran the job Automatically the data will be appended 71. Pls tell me what is troubleshooting in datastage view? Ans: DataStage job fails with error ORA-12154 when accessing an Oracle database via the Oracle run-time client. The following are typical errors, but multiple variations exist: Error while trying to retrieve text for error ORA-12154 APT_OraReadOperator: connect failed. ORA-12154: TNS:could not resolve the connect identifier. ORA-12154: TNS:could not resolve service name Resolving the problem Error while trying to retrieve text for error ORA-12154 The above error means that not only did the connection to Oracle Database fail, but that Oracle was unable to retrieve the text of error message ORA-12154. This usually indicates that either the DataStage userid running the job does not have read access to the Oracle run-time client files, or that the environment variable ORACLE_HOME is not defined. ORA-12154: TNS:could not resolve the connect identifier. ORA-12154: TNS:could not resolve service name The above 2 errors indicate that the connection identifier or service name specified in the DataStage job (or in the ORACLE_SID environment variable or ODBC definition) was not known to either the Oracle client or the Oracle server. The connection/service identifiers known to the Oracle run-time client are defined in the tnsnames.ora file, $ORACLE_HOME/network/admin/tnsnames.ora Verify that the identifier specified for the failing Oracle connection has been defined in tnsnames.ora. If it is correctly defined, then next verify that the ORACLE_HOME environment variable is correctly defined, and that tnsnames.ora file has correct read permissions.
If the above items are configured correctly, also check the listener.log on Oracle server to confirm that the service id (or the database it maps to) are known to the Oracle server. Setting up environment variables required to use Oracle run-time client The Oracle client requires that the following environment variables be defined. These should be set in the .dsenv file in the DataStage DSEngine directory. ORACLE_HOME=/home/oracle LIBPATH=$LIBPATH:$ORACLE_HOME/lib: PATH=$PATH:$ORACLE_HOME/bin Change the path defined for ORACLE_HOME to the correct path for your system. ORACLE_HOME should be set to the absolute path to the home Oracle directory which is the directory level directly above the lib and bin directories. Please also note that the name of the library path environment variable varies with different operating systems: AIX - use LIBPATH Solaris - use LD_LIBRARY_PATH HP-UX - use SHLIB_PATH Linux - use LD_LIBRARY_PATH 72. what is the differeces between hash and modulus partition methods? Ans: //// Modulus Hash 1. For numerics 2. Datatype specific 1. For Numerics and characters 2. Not Datatype spefic
//// Hash-key:- Here data will be populated insuch a way that realated data will stay together. range of primery ker records will be populated to one partition. eg.)all primary records in this month will be populated to on e partiotion. modulus:-partitioned data will provide some information.in the sense customers related to one store will populated to one partiotion. ///// If the key field is numeric use modules else hash partition.as per performance tuning.
73. I am running a job with 1000 records.. If the job gots aborted after loading 400 records into target... In this case i want to load the records in the target with 401 record... How will we do it??? This scenario is not for sequence job it's only in the job Ex: Seq file--> Trans--> Dataset.. Ans: //// USING SAVE POINT CAN PREVENT THIS PROBLEM ///// using an option called cleanup on failure ///// by using look-up stage we can get the answer..there are two tables like 1000 records(source) table and 400 records(target) table. take the source table as primary table and 400 records table as reference table to look-up table reference table . . . source............. look-up......... target 74. i have source like balance,drawtime 20000, 8.30 50000,10.20 3000,4.00 i want target like this balance,drawtime 20000, 20.30 50000,22.20 3000,16.00 Ans: if (drawtime)< 13 then drawtime=drawtime+12 else drawtime 75. wt is the diff b/w odbc and oracle stage? Ans: ODBC stage can be used to connect to different databases like Oracle , DB2 ..etc . It can be connected to any databases with ODBC drivers where as ORACLE stage connects only to Oracle databases. If you want to connect to a Oracle database and you are using Oracle stage, it will be faster and better. ODBC stage also does the same but it will be slow and its generic ,, I suggest using ODBC stage where there is no specific plugins to connect to a database.
76. In my source i have 10 records but i want 11 records in target. How can i acheive this in Server jobs? Ans: using pivot satage you can get that target records dont bother ok you have to implement that ,otherwise transformer stage in also you can add one record in output colomns and implement thatderivation cell . ok i think it very useful to tour question 77. what is time dimension? and how to populate time dimension? Ans: Every DWH has time dimension u can load the time dimension though pl/sql script. 78. how to move project from developement to uat? Ans: Answer for 2 Question 2) By using the Datastage Manager we can move the project from Dev to Uat. .Through datastage manager Export the project into your lacal machine as .dsx format (project.dsx) from DEV server. .The same .dsx (project.dsx) import into UAT server by using the datastage manager. ->What is ur project architecture ? Ans: bottom up architecture 79. Why we use third party tools in datastage? Ans: Firstly, C++ complier to generate .gcc(The GCC C++ compiler) in UNIX environment. secondly, Tivoli Critical Alert for monitoring jobs..if any job fails(maintains projects ) it will automatically fires a Notification mail to support person. 80.What is the purpose of Debugging stages? In real time Where we will use? Ans: to pick up sample data and to test d job. 81. In aggregator stage,to find the sum of the entire group of columns,it displays in binary format. How can i solve this problem. Ans: Binary Format not support in Aggregate Stage 82. where we use column generator stage in real time scenario? Ans: The Column Generator stage adds columns to incoming data and generates mock data for these columns for each data row processed. The new data set is then output.. Let us two table x and Y ..and you are doing funnel.. X file is having only 2 columns and Y file is having 3 columns..
While doing funnel metadata should be same. in order to achieve this include one column to X file and next use this column generator stage, this stage populates some mock data in third column.. 83. What is the version control how can i apply this in DataStage can any one tell me the anser? Ans: Version control is a process of tracking the changes made to a particular job or process along with who has done it and why the change is done. It also maintains the references of the problems that have been fixed and the enhancements made to the process. DATASTAGE 7.5 comes with a separate installable for version control. Versions from DATASTAGE 8.0 do not have this tool. This can also be done by exporting the jobs to .dsx or .xml and importing the .dsx or .xml files when required. 84. wt is the difference between swith and filter stage? Ans: ///swithch consists of 1 i/p link,128 output links and one reject link. Filter consistd of 1 i/p link, n no.. of output links and one reject link.. we can perform on single colum in switch but in filter we can perform on multiple columns.. //// Filter stage: Multiple conditions means <,>,=,<=,>= Takes one input link and gives one or more output links and one reject link Switch stage: Only Equal operator Takes one input link and gives one output link 85. how to transfer file from one system to another system in unix? which cmd to be use? Ans: Through SCP(secure copy), ftp,sftp Command also we can 86. A flatfile contains 200 records.I want to load first 50 records at first time running the job,second 50 records at second time running and so on,how u can develop the job?pls give the steps? Ans: Design the job like this: 1. Read records from input flat file and click on option of rownumbercolumn in the file. It will generate a unique number corresponding to each record in that file. 2. Use filter stage and write the conditions like this: a. rownumbercolumn<=50(in 1st link to load the records in target file/database)
b. rownumbercolumn>50 (in 2nd link to load the records in the file with the same name as input file name, in overwrite mode) So, first time when your job runs first 50 records will be loaded in the target and same time the input file records are overwritten with records next first 50 records i.e. 51 to 200. 2nd time when your job runs first 50 records(i.e. 51-100) will be loaded in the target and same time the input file records are overwritten with records next first 50 records i.e. 101 to 200. And so on, all 50-50 records will be loaded in each run to the target 87. what is initial load and incremental load? Ans: The only difference that the inital load jobs were set to first truncate the tables and then do a complete load and the incremental load was set to insert new rows and update existing rows for DIMENSION Tables. Facts jobs were same (truncate and complete load) 88. What is force compile? Ans: For parallel jobs there is also a force compile option. The compilation of parallel jobs is by default optimized such that transformer stages only get recompiled if they have changed since the last compilation. The force compile option overrides this and causes all transformer stages in the job to be compiled. To select this option: Choose File > Force Compile 89. HOW WILL YOU IMPLEMENT SURROGATE KEY IN SCD BY USING SURR_KEY GENERATOR,THE VALUE OF S_KEY SHOULD NOT REPEAT EVEN IF THE JOB IS COMPILED REPEATEDELY? Ans: //// this type of problem is clear in datastage8.0.1 version by using the new surrogate key generator stage.But whenever we use below 8th version we can will face the problem in the case of surrogate key.to overcome this problem use the help of the routines. we have to pass the maximumvalue of the surrogate key through the routine.when ever we run the job second time the routine will pass the max value and it will generate the sequence from that number only ///If you want the SCD stage to generate new surrogate keys by using a key source that you created with a Surrogate Key Generator stage, you must use the NextSurrogateKey function to derive the Surrogate Key column. /// Pass the last key (last primary key or the last value in the seq) as parameter.
90. If I make any changes in the parallel job,do I need to implement the changes in the sequencer job,else the changes will be reflected automatically? Ans: /// once modifications are done,simply compile the job and run the sequencer.no need to do any changes in sequencer. /// The sequence doesnot contain the underlying code of the job but it picks up the code from the job whose reference is given in the sequence. /// What changes do u propose to make in the sequence???I only contains the job name and the parameters.As long as you dont make any changes to the job name or parameters used by the job u dont need to care about the sequence but if u change the above u need to compile the sequence. 91. What are the environmental settings for data stage,while working on parellel jobs? Ans: Mainly we need 3 Environment variable to run datastage px job. They are 1. APT_CONFIG_FILE 2. APT_RECORD_COUNT 3. APT_DUMP_SCORE 92. i have one table with one column in this column i have three rows like 1,1,2 then that rows populate to target as first tow rows as one row and remaing row as one row how it posible? COLUMN_NAME SHIVA RAMU MADHU THEN I WANT TO LIKE SHIVA AND RAMU IN ONE ROW AND MADHU IS ONE ROW IF ANY ONE KNOW PLZ TELL ME Ans: seqfile->sort->transformer->removeduplicate->targer inser file load col1 1 1 2 in the sort key->sort allowduplicate->true keychangecolumn->true in the transformer create the stage variable if keychnage=1 then col1 else stagevariablename:col1 drag col1 in transformer
in the derivation area only put stagevariable remove duplicate: key:col1 you select last records we will get answer col1 11 2 93. In a job of 20 one job is very slow due to that entire job is slow how can u find out which job is slow? Ans: By using jobmonitoring... it displays all the jobs.. state and link level performance. so we can find, which job is running slowly... 94. what is a message handler? Ans: we can use message handling,To DELETE warnings 95. Can u pls tell me How can u Call the Shell Scripting/Unix Commands in Job Sequence? Ans: /// by using Routines activity.u can access unix/shellscript (Explanation) ///There are two scenarios where u myt want to call a script Scenario 1(Dependency exists between script and a job): Where a job has to be executed first then the script has to run, upon completion of script execution only the sec job has to be invoked. In this case develop a sequencer job where first job activity will invoke the first job then using Execute command activity call the script u would desire to invoke by typing "sh <script name>" in the command property of the activity, then with the other job activity call the second job. Scenario 2: (Script and job are independent) : In this case right in your parallel job say job1, under job properties u can find "After-job subroutine" where u need to select "ExecSH" and pass the script name which you would like to execute. By doing this once the job1 execution completes the script gets invoked. The job succeeding the job1 say job2 doesnt wait for the execution of the script. 96. ....... 1.How to read multiple file from sequential stage. .........2.If a file doesn't arrive or doesn't exists in sequential stage how do u handle this. ........3.What do you do before taking data from source to staging area.
........4. I have a remove duplicate stage and transformer stage.what will u do for optimizeing the performance of the job. Ans: ////1)read method should be set as file pattern(for reading single file it should be specificfile) ////1.READ METHODE SHOULD BE TAKE SPECIFIC FILE U CAN TAKE MULTIPLE FILE ////Ans1 : We can use Read Method = Specific Files and give the full path of all the file one by one. Alternatively, you can use Read Method = File Pattern and specify the wild card. Ans 2: We can control this using Missing File Mode Option. Values are Ok to skip the file and continue and Error to abort the job. Ans3: It is important to make sure the meta data matches the record. It would be better to reject bad reject and collect them in Reject Link. This can be controlled by Reject Mode option. Values are Continue, Fail and Output. Output will collect the reject records in a reject link. 97. Totally how many jobs created in ur project? Ans: Ans:Depend upon duration: suppose u mentioned as like 12-15 months:52 jobs 98. what is ur datamart size & Dwh size? Ans: no one know about size of the DWH but u can define as like it should be tera bites and datamart size is 6 gb 99. How secure ur project? Ans:take care under Admin 100. by using dsjob..we can run only one job at a time?how can u run multiple jobs at a time in unix? Ans: Using cron command and at command to run multiple jobs at a time