A Interview Questions and Answers - Cool Interview
A Interview Questions and Answers - Cool Interview
A Interview Questions and Answers - Cool Interview
In a scenario I have col1, col2, col3, under that 1,x,y, and 2,a,b and I want in this
form col1, col2 and 1,x and 1,y and 2,a and 2,b, what is the procedure?
Use Normalizer :
create two ports -
first port occurs = 1
second make occurs = 2
two output ports are created and
connect to target
On a day, I load 10 rows in my target and on next day if I get 10 more rows to be
added to my target out of which 5 are updated rows how can I send them to target?
How can I insert and update the record?
What is the method of loading 5 flat files of having same structure to a single target
and which transformations I can use?
union transformation,
otherwise write all file paths of five files in one file and use this file in session properties as
indirect
you can use normalizer transformation .It will normalize the records.
1.star schema
2.snowflake schema
3.glaxy schema
3 types.That are
When we create a target as flat file and source as oracle.. how can i specify first
rows as column names in flat files...
use a pre sql statement....but this is a hardcoding method...if you change the column names
or put in extra columns in the flat file, you will have to change the insert statement
You can also achive this by changing the setting in the Informatica Repository manager to
display the columns heading. The only disadvantage of this is that it will be applied on all the
files that will be generated by This server
1.can u explain one critical mapping? 2.performance issue which one is better?
whether connected lookup tranformation or unconnected one?
If u need to calculate a value for all the rows or for the maximum rows coming out of the
source then go for a connected lookup.
we have to get value for a field 'customer' from order tabel or from customer_data table,on
the basis of following rule:
otherwise
customer=order.customer_name.
3.Give input/output what you need in the transformation.i.e reduce number of input and
output ports.
Update Strategy is used to drive the data to be Inert, Update and Delete depending upon
some condition. You can do this on session level tooo but there you cannot define any
condition.For eg: If you want to do update and insert in one mapping...you will create two
flows and will make one as insert and one as update depending upon some condition.Refer :
Update Strategy in Transformation Guide for more information
I have used in the case where i wanted to insert and update the records in the same mapping
.
Write a override sql query. Adjust the ports as per the sql query.
By writing SQL override and specifying joins in the SQL override.
In update strategy target table or flat file which gives more performance ? why?
Pros: Loading, Sorting, Merging operations will be faster as there is no index concept and
Data will be in ASCII mode.
When you add a relational or a flat file source definition to a mapping, you need to connect it
to a Source Qualifier transformation. The Source Qualifier represents the rows that the
Informatica Server reads when it executes a session.
Basic purpose of a source qualifier is to make the database specific data types into
informatica specific types , so that data can be integrated easily.
if u drag three hetrogenous sources and populated to target without any join means you are
entertaining Carteisn product. If you don't use join means not only diffrent sources but
homegeous sources are show same error.
If you are not interested to use joins at source qualifier level u can add some joins sepratly.
In Source qualifier we can join the tables from same database only.
How can u work with remote database in informatica?did u work directly by using
remote connections?
Configure FTP
Connection details
IP address
User authentication
1.Database partitioning
2.RoundRobin
3.Pass-through
4.Hash-Key partitioning
All these are applicable for relational targets.For flat file only database partitioning is not
applicable.
Informatica supports Nway partitioning.U can just specify the name of the target file and
create the partitions, rest will be taken care by informatica session.
For UNIX shell users, enclose the parameter file name in single quotes:
-paramfile '$PMRootDir/myfile.txt'
For Windows command prompt users, the parameter file name cannot have beginning or
trailing spaces. If the name includes spaces, enclose the file name in double quotes:
Note: When you write a pmcmd command that includes a parameter file located on another
machine, use the backslash () with the dollar sign ($). This ensures that the machine where
the variable is defined expands the server variable.
U can not. If u want to start batch that resides in a batch,create a new independent batch
and copy the necessary sessions into the new batch.
In addition, you can copy the workflow from the Repository manager. This will automatically
copy the mapping, associated source,targets and session to the target folder.
Yes it is possible. For copying a session to a folder in the same repository or to another in a
different repository, we can use the repository manager ( which is client sid etool).Simply by
just dragging the session to the target destination, the session will be copied.
How the informatica server increases the session performance through partitioning
the source?
For a relational sources informatica server creates multiple connections for each parttion of a
single source and extracts seperate range of data for each connection.Informatica server
reads multiple partitions of a single source concurently.Similarly for loading also informatica
server creates multiple connections to the target and loads partitions of data concurently.
Informatica server can achieve high performance by partitioning the pipleline and performing
the extract , transformation, and load for each partition in parallel.
Which tool U use to create and manage sessions and batches and to monitor and
stop the informatica server?
It is a ETL tool, you could not make reports from here, but you can generate metadata
report, that is not going to be used for business analysis
How can u recognise whether or not the newly added rows in the source r gets
insert in the target ?
If it is Type 2 Dimension the abouve answer is fine, but if u want to get the info of all the
insert statements and Updates you need to use session log file where you configure it to
verbose.
You will get complete set of data which record was inserted and which was not.
1. Version number
2. Flag
3.Date
What r the mapings that we use for slowly changing dimension table?
i want whole information on slowly changing dimension.and also want project on slowly
changing dimension in informatica.
We can use the following Mapping for slowly Changing dimension table.
• Expression
• Lookup
• Filter
• Sequence Generator
• Update Strategy
Type2
Full History
Version
Flag
Date
Type3
Update as Insert:
This option specified all the update records from source to be flagged as inserts in the target.
In other words, instead of updating the records in the target they are inserted as new
records.
This option enables informatica to flag the records either for update if they are old or insert, if
they are new records from source.
What is Datadriven?
The Informatica Server follows instructions coded into Update Strategy transformations within
the session mapping to determine how to flag rows for insert, delete, update, or reject.
If the mapping for the session contains an Update Strategy transformation, this field is
marked Data Driven by default
DATA DRIVEN
The model you choose constitutes your update strategy, how to handle changes to existing
rows. In PowerCenter and PowerMart, you set your update strategy at two different levels:
Within a session. When you configure a session, you can instruct the Informatica Server to
either treat all rows in the same way (for example, treat all rows as inserts), or use
instructions coded into the session mapping to flag rows for different database operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag
rows for insert, delete, update, or reject.
Its not neccessary both should follow primary and foreign relationship. If any relation ship
exists that will help u in performance point of view.
The Joiner transformation supports the following join types, which you set in the Properties
tab:
Normal (Default)
Master Outer
Detail Outer
Full Outer
A target load order group is the collection of source qualifiers, transformations, and targets
linked together in a mapping.
Join data originating from the same source database. You can join two or more tables with
primary-foreign key relationships by linking the sources to one Source Qualifier.
Filter records when the Informatica Server reads source data. If you include a filter condition,
the Informatica Server adds a WHERE clause to the default query.
Specify an outer join rather than the default inner join. If you include a user-defined join, the
Informatica Server replaces the join information specified by the metadata in the SQL query.
Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds an
ORDER BY clause to the default SQL query.
Select only distinct values from the source. If you choose Select Distinct, the Informatica
Server adds a SELECT DISTINCT statement to the default SQL query.
Create a custom query to issue a special SELECT statement for the Informatica Server to
read source data. For example, you might use a custom query to perform aggregate
calculations or execute a stored procedure.
When you add a relational or a flat file source definition to a mapping, you need to connect it
to a Source Qualifier transformation. The Source Qualifier represents the rows that the
Informatica Server reads when it executes a session.
Join data originating from the same source database. You can join two or more tables with
primary-foreign key relationships by linking the sources to one Source Qualifier.
Filter records when the Informatica Server reads source data. If you include a filter condition,
the Informatica Server adds a WHERE clause to the default query.
Specify an outer join rather than the default inner join. If you include a user-defined join, the
Informatica Server replaces the join information specified by the metadata in the SQL query.
Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds an
ORDER BY clause to the default SQL query.
Select only distinct values from the source. If you choose Select Distinct, the Informatica
Server adds a SELECT DISTINCT statement to the default SQL query.
Create a custom query to issue a special SELECT statement for the Informatica Server to
read source data. For example, you might use a custom query to perform aggregate
calculations or execute a stored procedure
Why we use stored procedure transformation?
Input
Output
Input Group
The Designer copies property information from the input ports of the input group to create a
set of output ports for each output group.
Output Groups
There are two types of output groups:
User-defined groups
Default group
You cannot modify or delete output ports or their properties.
Based on which port you want generate Rank is known as rank port, the generated values are
known as rank index.
When Informatica Server runs in UNICODE data movement mode ,then it uses the sort order
configured in session properties.
Hi,
We can run informatica server either in UNICODE data moment mode or ASCII data moment
mode.
Unicode mode: in this mode informatica server sorts the data as per the sorted order in
session.
ASCII Mode:in this mode informatica server sorts the date as per the binary order
Static cache
Dynamic cache
U can not insert or update the cache
U can insert rows into the cache as u pass to the target
The informatic server returns a value from the lookup table or cache when the condition is
true.When the condition is not true, informatica server returns the default value for
connected transformations and null for unconnected transformations.
The informatic server inserts rows into cache when the condition is false.This indicates that
the the row is not in the cache or target table. U can pass these rows to the target table
Cache
1. Static cache
2. Dynamic cache
3. Persistent cache
Connected lookup
Unconnected lookup
Receives input values diectly from the pipe line.
Receives input values from the result of a lkp expression in a another transformation.
Using it we can access the data from a relational table which is not a source in the mapping.
For Ex:Suppose the source contains only Empno, but we want Empname also in the
mapping.Then instead of adding another tbl which contains Empname as a source ,we can
Lkp the table and get the Empname in target.
Specifies the directory used to cache master records and the index to these records. By
default, the cached files are created in a directory specified by the server variable
$PMCacheDir. If you override the directory, make sure the directory exists and contains
enough disk space for the cache files. The directory can be a mapped or mounted drive.
Normal (Default)
Master Outer
Detail Outer
Full Outer
Now we can use a joiner even if the data is coming from the same source.
When you run a workflow that uses an Aggregator transformation, the Informatica Server
creates index and data caches in memory to process the transformation. If the Informatica
Server requires more space, it stores overflow values in cache files.
Can U use the maping parameters or variables created in one maping into another
maping?
NO. You might want to use a workflow parameter/variable if you want it to be visible with
other mappings/sessions
Start value = Current value ( when the session starts the execution of the undelying
mapping)
Start value <> Current value ( while the session is in progress and the variable value
changes in one ore more occasions)
Current value at the end of the session is nothing but the start value for the subsequent run
of the same session.
Source definitions. Definitions of database objects (tables, views, synonyms) or files that
provide source data.
Target definitions. Definitions of database objects or files that contain the target data.
Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.
Mappings. A set of source and target definitions along with transformations containing
business logic that you build into the transformation. These are the instructions that the
Informatica Server uses to transform and move data.
Reusable transformations. Transformations that you can use in multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about how and when the
Informatica Server moves data. A workflow is a set of instructions that describes how and
when to run tasks related to extracting, transforming, and loading data. A session is a type of
task that you can put in a workflow. Each session corresponds to a single mapping.
Transformations can be active or passive. An active transformation can change the number of
rows that pass through it, such as a Filter transformation that removes rows that do not meet
the filter condition.
A passive transformation does not change the number of rows that pass through it, such as
an Expression transformation that performs a calculation on data and passes all rows through
the transformation.
A mapplet should have a mapplet input transformation which recives input values, and a
output transformation which passes the final modified data to back to the mapping.
when the mapplet is displayed with in the mapping only input & output ports are displayed so
that the internal logic is hidden from end-user point of view.
Which transformation should u need while using the cobol sources as source
defintions?
Normalizer transformaiton which is used to normalize the data.Since cobol sources r oftenly
consists of Denormailzed data.
Which is a transformation?
How many ways you can update a relational source defintion and what r they?
Two ways
1. Edit the definition
2. Reimport the defintion
Where should U place the flat file to import the flat file defintion to the designer?
There is no such restrication to place the source file. In performance point of view its better
to place the file in server local src folder. if you need path please check the server properties
availble at workflow manager.
It doesn't mean we should not place in any other folder, if we place in server src folder by
default src will be selected at time session creation.
Data cleansing is a two step process including DETECTION and then CORRECTION of errors in
a data set.
I am providing the answer which I have taken it from Informatica 7.1.1 manual,
Ans: While running a Workflow,the PowerCenter Server uses the Load Manager process and
the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow
tasks.When the PowerCenter Server runs a workflow, the Load Manager performs the
following tasks:
When the PowerCenter Server runs a session, the DTM performs the following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled. Checks query
conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mapping, reader, writer, and transformation threads to extract,transform,
and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.
informatica server connects source data and target data using native
odbc drivers
again it connect to the repository for running sessions and retriveing metadata information
source------>informatica server--------->target
REPOSITORY
How to read rejected data or bad data from bad file and reload it to target?
Correction the rejected data and send to target relational tables using loadorder utility. Find
out the rejected data by using column indicatior and row indicator.
Manages the session and batch scheduling: Whe u start the informatica server the load
maneger launches and queries the repository for a list of sessions configured to run on the
informatica server.When u configure the session the loadmanager maintains list of list of
sessions and session start times.When u sart a session loadmanger fetches the session
information from the repository to perform the validations and verifications prior to starting
DTM process.
Locking and reading the session: When the informatica server starts a session lodamaager
locks the session from the repository.Locking prevents U starting the session again and again.
Reading the parameter file: If the session uses a parameter files,loadmanager reads the
parameter file and verifies that the session level parematers are declared in the file
Verifies permission and privelleges: When the sesson starts load manger checks whether or
not the user have privelleges to run the session.
Creating log files: Loadmanger creates logfile contains the status of session.
The LM also sends the 'failure mails' in case of failure in execution of the Subsequent DTM
process..
How can U create or import flat file definition in to the warehouse designer?
U can create flat file definition in warehouse designer.in the warehouse designer,u can create
new target: select the type as flat file. save it and u can enter various columns for that
created target by editing its properties.Once the target is created, save it. u can import it
from the mapping designer.
In a relational data model, for normalization purposes, year lookup, quarter lookup, month
lookup, and week lookups are not merged as a single table. In a dimensional data
modeling(star schema), these tables would be merged as a single table called TIME
DIMENSION for performance and slicing data.
This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We
can have a trend analysis by comparing this year sales with the previous year or this week
sales with the previous week.
In a STAR schema there is no relation between any two dimension tables, whereas in a
SNOWFLAKE schema there is a possible relation between the dimension tables.
In a mapping we can use any number of transformations depending on the project, and the
included transformations in the perticular related transformatons
Normal Load: Normal load will write information to the database log file so that if any
recorvery is needed it is will be helpful. when the source file is a text file and loading data to
a table,in such cases we should you normal load only, else the session will be failed.
Bulk Mode: Bulk load will not write information to the database log file so that if any
recorvery is needed we can't do any thing in such cases.
A "junk" dimension is a collection of random transactional codes, flags and/or text attributes
that are unrelated to any particular dimension. The junk dimension is simply a structure that
provides a convenient place to store the junk attributes. A good example would be a trade
fact in a company that brokers equity trades.
1) Unless you assign the output of the source qualifier to another transformation or to target
no way it will include the feild in the query.
How to get the first 100 rows from the flat file into the target?
1. Use test download option if you want to use it for testing.
Summary Filter --- we can apply records group by that contain common values.
Detail Filter --- we can apply to each and every record in a database.
Materialized views are schema objects that can be used to summarize, precompute, replicate,
and distribute data. E.g. to construct a data warehouse.
A materialized view provides indirect access to table data by storing the results of a query in
a separate schema object. Unlike an ordinary view, which does not take up any storage space
or contain any data
Compare Data Warehousing Top-Down approach with Bottom-up approach
Top down
ODS-->ETL-->Datawarehouse-->Datamart-->OLAP
Bottom up
ODS-->ETL-->Datamart-->Datawarehouse-->OLAP
Discuss which is better among incremental load, Normal Load and Bulk load
It depends on the requirement. Otherwise Incremental load which can be better as it takes
onle that data which is not available previously on the target.
Unconnected:
The unconnected Stored Procedure transformation is not connected directly to the flow of the
mapping. It either runs before or after the session, or is called by an expression in another
transformation in the mapping.
connected:
The flow of data through a mapping in connected mode also passes through the Stored
Procedure transformation. All data entering the transformation through the input ports affects
the stored procedure. You should use a connected Stored Procedure transformation when you
need data from an input port sent as an input parameter to the stored procedure, or the
results of a stored procedure sent as an output parameter to another transformation.
3.grid servers working on different operating systems can coexist on same server
7.version controlling
8.data profilling
What are the Differences between Informatica Power Center versions 6.2 and 7.1,
also between Versions 6.2 and 5.1?
The main difference between informatica 5.1 and 6.1 is that in 6.1 they introduce a new thing
called repository server and in place of server manager(5.1), they introduce workflow
manager and workflow monitor.
Whats the diff between Informatica powercenter server, repositoryserver and
repository?
Repository is a database in which all informatica componets are stored in the form of tables.
The reposiitory server controls the repository and maintains the data integrity and
Consistency across the repository when multiple users use Informatica. Powercenter
Server/Infa Server is responsible for execution of the components (sessions) stored in the
repository.
A Staging area in a DW is used as a temporary space to hold all the records from the source
system. So more or less it should be exact replica of the source systems except for the laod
startegy where we use truncate and reload options.
So create using the same layout as in your source tables or using the Generate SQL option in
the Warehouse Designer tab.
Filter transformation filters the rows that are not flagged and passes the flagged rows to the Update
strategy transformation
In a filter expression we want to compare one date field with a db2 system field
CURRENT DATE. Our Syntax: datefield = CURRENT DATE (we didn't define it by
ports, its a system field ), but this is not valid (PMParser: Missing Operator).. Can
someone help us. Thanks
Briefly explian the Versioning Concept in Power Center 7.1.
The db2 date formate is "yyyymmdd" where as sysdate in oracle will give "dd-mm-yy" so conversion of
db2 date formate to local database date formate is compulsary. other wise u will get that type of error
Briefly explian the Versioning Concept in Power Center 7.1.
When you create a version of a folder referenced by shortcuts, all shortcuts continue to reference their
original object in the original version. They do not automatically update to the current folder version.
For example, if you have a shortcut to a source definition in the Marketing folder, version 1.0.0, then
you create a new folder version, 1.5.0, the shortcut continues to point to the source definition in version
1.0.0.
Maintaining versions of shared folders can result in shortcuts pointing to different versions of the folder.
Though shortcuts to different versions do not affect the server, they might prove more difficult to
maintain. To avoid this, you can recreate shortcuts pointing to earlier versions, but this solution is not
practical for much-used objects. Therefore, when possible, do not version folders referenced by
shortcuts.
Itz possible to join the two or more tables by using source qualifier.But provided the tables should have
relationship.
When u drag n drop the tables u will getting the source qualifier for each table.Delete all the source
qualifiers.Add a common source qualifier for all.Right click on the source qualifier u will find EDIT click
on it.Click on the properties tab,u will find sql query in that u can write ur sqls
The best way to find out bottlenecks is writing to flat file and see where the bottle neck is .
Yes, we can use Informatica for cleansing data. some time we use stages to cleansing the data. It
depends upon performance again else we can use expression to cleasing data.
For example an feild X have some values and other with Null values and assigned to target feild where
target feild is notnull column, inside an expression we can assign space or some constant value to avoid
session failure.
The input data is in one format and target is in another format, we can change the format in
expression.
we can assign some default values to the target to represent complete set of data in the target.
It depends upon our requirment only.If you have good processing database you can create aggregation
table or view at database level else its better to use informatica. Here i'm explaing why we need to use
informatica.
what ever it may be informatica is a thrid party tool, so it will take more time to process aggregation
compared to the database, but in Informatica an option we called "Incremental aggregation" which will
help you to update the current values with current values +new values. No necessary to process entire
values again and again. Unless this can be done if nobody deleted that cache files. If that happend total
aggregation we need to execute on informatica also.
How do we estimate the depth of the session scheduling queue? Where do we set
the number of maximum concurrent sessions that Informatica can run at a given
time?
u set the max no of concurrent sessions in the info server.by default its 10. u can set to any no.
It depends upon the informatica version we r using. suppose if we r using informatica 6 it supports only
32 partitions where as informatica 7 supports 64 partitions.
Suppose session is configured with commit interval of 10,000 rows and source has
50,000 rows. Explain the commit points for Source based commit and Target based
commit. Assume appropriate value wherever required.
Source based commit will commit the data into target based on commit interval.so,for every 10,000
rows it will commit into target.
Target based commit will commit the data into target based on buffer size of the target.i.e., it commits
the data into target when ever the buffer fills.Let us assume that the buffer size is 6,000.So,for every
6,000 rows it commits the data.
Update or insert files are known by checking the target file or table only.
What is the procedure to write the query to list the highest salary of three
employees?
The following is the query to find out the top three salaries
select * from emp e where 3>(select count (*) from emp where
Which objects are required by the debugger to create a valid debug session?
source, target, lookups, expressions should be availble, min 1 break point should be available for
debugger to debug your session.
What is the limit to the number of sources and targets you can have in a mapping
As per my knowledge there is no such restriction to use this number of sources or targets inside a
mapping.
Question is " if you make N number of tables to participate at a time in processing what is the position
of your database. I orginzation point of view it is never encouraged to use N number of tables at a time,
It reduces database and informatica server performance
If you are having defined source you can use connected, source is not well defined or from different
database you can go for unconnected
In Dimensional modeling, Star Schema: A Single Fact table will be surrounded by a group of
Dimensional tables comprise of de- normalized data Snowflake Schema: A Single Fact table will be
surrounded by a group of Dimensional tables comprised of normalized dataThe Star Schema
(sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a
single "fact table" with a compound primary key, with one segment for each "dimension" and with
additional columns of additive, numeric facts.The Star Schema makes multi-dimensional database
(MDDB) functionality possible using a traditional relational database. Because relational databases are
the most common data management system in organizations today, implementing multi-dimensional
views of data using a relational database is very appealing. Even if you are using a specific MDDB
solution, its sources likely are relational databases. Another reason for using star schema is its ease of
understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables
in de-normalized second normal form (2NF). If you want to normalize dimensional tables, they look like
snowflakes (see snowflake schema) and the same problems of relational databases arise - you need
complex queries and business users cannot easily understand the meaning of data. Although query
performance may be improved by advanced DBMS technology and hardware, highly normalized tables
make reporting difficult and applications complex.The Snowflake Schema is a more complex data
warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema
because the diagram of the schema resembles a snowflake.Snowflake schemas normalize dimensions to
eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one
large table. For example, a product dimension table in a star schema might be normalized into a
products table, a Product-category table, and a product-manufacturer table in a snowflake schema.
While this saves space, it increases the number of dimension tables and requires more foreign key
joins. The result is more complex queries and reduced query performance.
You can use nested IIF statements to test multiple conditions. The following example tests for various
conditions and returns 0 if sales is zero or negative:
IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2, IIF( SALES < 200, SALARY3,
BONUS))), 0 )
You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following
shows how you can use DECODE instead of IIF :
What are variable ports and list two situations when they can be used?
We have mainly tree ports Inport, Outport, Variable port. Inport represents data is flowing into
transformation. Outport is used when data is mapped to next transformation. Variable port is used
when we mathematical caluculations are required. If any addition i will be more than happy if you can
share.
How does the server recognise the source and target databases?
By using ODBC connection.if it is relational.if is flat file FTP connection..see we can make sure with
connection in the properties of session both sources && targets
How to retrive the records from a rejected file. explane with syntax or example
During the execution of workflow all the rejected rows will be stored in bad files(where your informatica
server get installed;C:Program FilesInformatica PowerCenter 7.1Server) These bad files can be
imported as flat a file in source then thro' direct maping we can load these files in desired format.
if the two tables are relational, then u can use the SQL lookup over ride option to join the two tables in
the lookup properties.u cannot join a flat file and a relatioanl table.
eg: lookup default query will be select lookup table column_names from lookup_table. u can now
continue this query. add column_names of the 2nd table with the qualifier, and a where clause. if u want
to use a order by then use -- at the end of the order by.
Based on the requirement to your fact table, choose the sources and data and transform it based on
your business needs. For the fact table, you need a primary key so use a sequence generator
transformation to generate a unique key and pipe it to the target (fact) table with the foreign keys
from the source tables.
How to delete duplicate rows in flat files source is any option in informatica
Use a sorter transformation , in that u will have a "distinct" option make use of it .
In designer u will find the mapping parameters and variables options.u can assign a value to them in
designer. comming to there uses suppose u r doing incremental extractions daily. suppose ur source
system contains the day column. so every day u have to go to that mapping and change the day so that
the particular data will be extracted . if we do that it will be like a layman's work. there comes the
concept of mapping parameters and variables. once if u assign a value to a mapping variable then it will
change between sessions.
in the concept of mapping parameters and variables, the variable value will be
saved to the repository after the completion of the session and the next time when
u run the session, the server takes the saved variable value in the repository and
starts assigning the next value of the saved value. for example i ran a session and
in the end it stored a value of 50 to the repository.next time when i run the session,
it should start with the value of 70. not with the value of 51. how to do this.
start-------->session.
right clickon the session u will get a menu, in that go for persistant values, there u will find the last
value stored in the repository regarding to mapping variable. then remove it and put ur desired one, run
the session... i hope ur task will be done
but 9i allows
You can use aggregator after update strategy. The problem will be, once you perform the update
strategy, say you had flagged some rows to be deleted and you had performed aggregator
transformation for all rows, say you are using SUM function, then the deleted rows will be subtracted
from this aggregator transformation.
Because in Data warehousing historical data should be maintained, to maintain historical data means
suppose one employee details like where previously he worked, and now where he is working, all details
should be maintain in one table, if u maintain primary key it won't allow the duplicate records with
same employee id. so to maintain historical data we are all going for concept data warehousing by using
surrogate keys we can achieve the historical data(using oracle sequence for critical column).
so all the dimensions are marinating historical data, they are de normalized, because of duplicate entry
means not exactly duplicate record with same employee number another record is maintaining in the
table.
We can stop it using PMCMD command or in the monitor right click on that perticular session and select
stop.this will stop the current session and the sessions next to it.
How do you handle decimal places while importing a flatfile into informatica?
while importing flat file definetion just specify the scale for a neumaric data type. in the mapping, the
flat file source supports only number datatype(no decimal and integer). In the SQ associated with that
source will have a data type as decimal for that number port of the source.
source ->number datatype port ->SQ -> decimal datatype.Integer is not supported. hence decimal is
taken care.
If you are workflow is running slow in informatica. Where do you start trouble
shooting and what are the steps you follow?
When the work flow is running slowly u have to find out the bottlenecks
in this order
target
source
mapping
session
system
If you have four lookup tables in the workflow. How do you troubleshoot to improve
performance?
There r many ways to improve the mapping which has multiple lookups.
1) we can create an index for the lookup table if we have permissions(staging area).
2) divide the lookup mapping into two (a) dedicate one for insert means: source - target,, these r new
rows . only the new rows will come to mapping and the process will be fast . (b) dedicate the second
one to update : source=target,, these r existing rows. only the rows which exists allready will come into
the mapping.
Can anyone explain error handling in informatica with examples so that it will be
easy to explain the same in the interview.
Go to the session log file there we will find the information regarding to the
errors encountered.
load summary.
so by seeing the errors encountered during the session running, we can resolve the errors.
There is one file called the bad file which generally has the format as *.bad and it contains the records
rejected by informatica server. There are two parameters one fort the types of row and other for the
types of columns. The row indicators signifies what operation is going to take place ( i.e. insertion,
deletion, updation etc.). The column indicators contain information regarding why the column has been
rejected.( such as violation of not null constraint, value error, overflow etc.) If one rectifies the error in
the data preesent in the bad file and then reloads the data in the target,then the table will contain only
valid data.
How do I import VSAM files from source to target. Do I need a special plugin
As far my knowledge by using power exchange tool convert vsam file to oracle tables then do mapping
as usual to the target table.
it's change the rows into coloums and columns into rows
IQD file is nothing but Impromptu Query Definetion,This file is maily used in Cognos Impromptu tool
after creating a imr( report) we save the imr as IQD file which is used while creating a cube in power
play transformer.In data source type we select Impromptu Query Definetion.
sampling: just smaple the data throug send the data from source to target
Could anyone please tell me what are the steps required for type2
dimension/version data mapping. how can we implement it
1. Determine if the incoming row is 1) a new record 2) an updated record or 3) a record that
already exists in the table using two lookup transformations. Split the mapping into 3 seperate
flows using a router transformation.
2. If 1) create a pipe that inserts all the rows into the table.
3. If 2) create two pipes from the same source, one updating the old record, one to insert the new.
With out using Updatestretagy and sessons options, how we can do the update our
target table?
insert
update
insert as update
update as update
like that
Both the table should have primary key/foreign key relation ship
Both the table should be available in the same schema or same database
what is the best way to show metadata(number of rows at source, target and each
transformation level, error related data) in a report format
You can select these details from the repository table. you can use the view REP_SESS_LOG to get
these data
If u had to split the source level key going into two seperate tables. One as
surrogate and other as primary. Since informatica does not gurantee keys are
loaded properly(order!) into those tables. What are the different ways you could
handle this type of situation?
foreign key
This is not there in Informatica v 7. but heard that its included in the latest version 8.0 where u can
append to a flat file. Its about to be shipping in the market.
Partition points mark the thread boundaries in a source pipeline and divide
What are cost based and rule based approaches and the difference
Cost based and rule based approaches are the optimization techniques which are used in related to
databases, where we need to optimize a sql query.
Basically Oracle provides Two types of Optimizers (indeed 3 but we use only these two techniques., bcz
the third has some disadvantages.)
When ever you process any sql query in Oracle, what oracle engine internally does is, it reads the query
and decides which will the best possible way for executing the query. So in this process, Oracle follows
these optimization techniques.
1. cost based Optimizer(CBO): If a sql query can be executed in 2 different ways ( like may have path 1
and path2 for same query),then What CBO does is, it basically calculates the cost of each path and the
analyses for which path the cost of execution is less and then executes that path so that it can optimize
the quey execution.
2. Rule base optimizer(RBO): this basically follows the rules which are needed for executing a query. So
depending on the number of rules which are to be applied, the optimzer runs the query.
Use:
If the table you are trying to query is already analysed, then oracle will go with CBO.
For the first time, if table is not analysed, Oracle will go with full table scan.
Micro strategy is again an BI tool whicl is a HOLAP... u can create 2 dimensional report and also cubes in
here.......basically a reporting tool. IT HAS A FULL RANGE OF REPORTING ON WEB ALSO IN WINDOWS.
ya sure,Just right click on the particular session and going to recovery option
or
First of all meet your sponsors and make a BRD(business requirement document) about their
expectation from this datawarehouse(main aim comes from them).For example they need :customer
billing process.Now goto business managment team :they can ask for metrics out of billing process for
their use.Now magament people :monthly usage,billing metrics,sales organization,rate plan to perform
sales rep and channel performance analysis and rate plan analysis. So your dimension tables can
be:Customer (customer id,name,city,state etc)Sales rep;sales rep number,name,idsales org:sales ord
idBill dimension: Bill #,Bill date,Numberrate plan:rate plan codeAnd Fact table can be:Billing details(bill
#,customer id,minutes used,call details etc)you can follow star and snow flake schema in this
case.Depend upon the granualirty of your data.
what is difference between lookup cashe and unchashed lookup? Can i run the
mapping with out starting the informatica server?
The difference between cache and uncacheed lookup iswhen you configure the lookup transformation
cache lookup it stores all the lookup table data in the cache when the first input record enter into the
lookup transformation, in cache lookup the select statement executes only once and compares the
values of the input record with the values in the cachebut in uncache lookup the the select statement
executes for each input record entering into the lookup transformation and it has to connect to
database each time entering the new record
stop: _______If the session u want to stop is a part of batch you must stop the batch,
if the batch is part of nested batch, Stop the outer most bacth
Abort:----
You can issue the abort command , it is similar to stop command except it has 60 second time out .
If the server cannot finish processing and commiting data with in 60 sec
If you find your box running slower and slower over time, or not having enough memory to allocate new
sessions, then I suggest that ABORT not be used.
So then the question is: When I ask for a STOP, it takes forever. How do I get the session to stop fast?
well, first things first. STOP is a REQUEST to stop. It fires a request (equivalent to a control-c in
SQL*PLUS) to the source database, waits for the source database to clean up. The bigger the data in
the source query, the more time it takes to "roll-back" the source query, to maintain transaction
consistency in the source database. (ie: join of huge tables, big group by, big order by).
It then cleans up the buffers in memory by releasing the data (without writing to the target) but it WILL
run the data all the way through to the target buffers, never sending it to the target DB. The bigger the
session memory allocations, the longer it takes to clean up.
Then it fires a request to stop against the target DB, and waits for the target to roll-back. The higher
the commit point, the more data the target DB has to "roll-back".
If you use abort, be aware, you are choosing to "LOSE" memory on the server in which Informatica is
running (except AIX).
If you use ABORT and you then re-start the session, chances are, not only have you lost memory - but
now you have TWO competing queries on the source system after the same data, and you've locked out
any hope of performance in the source database. You're competing for resources with a defunct query
that's STILL rolling back.
ya Its Posible using pmcmd Command with out using the workflow Manager run the group of session.
If a session fails after loading of 10,000 records in to the target.How can u load the
records from 10001 th record when u run the session next time in informatica 6.1?
Running the session in recovery mode will work, but the target load type should be normal. If its bulk
then recovery wont work as expected
To flag source records as INSERT, DELETE, UPDATE or REJECT for target database. Default flag is Insert.
This is must for Incremental Data Loading.
What are mapping parameters and varibles in which situation we can use it
If we need to change certain attributes of a mapping after every time the session is run, it will be very
difficult to edit the mapping and then change the attribute. So we use mapping parameters and
variables and define the values in a parameter file. Then we could edit the parameter file to change the
attribute values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter value then we need to
edit the parameter file .
But value of mapping variables can be changed by using variable function. If we need to increment the
attribute value by 1 after every session run then we can use mapping variables .
In a mapping parameter we need to manually edit the attribute value in the parameter file after every
session run.
What is worklet and what use of worklet and in which situation we can use it
1)timer2)decesion3)command4)eventwait5)eventrise6)mail etc......
What is difference between dimention table and fact table and what are different
dimention tables and fact tables
In the fact table contain measurable data and less columns and meny rows,
In the dimensions table contain textual descrption of data and also contain meny columns,less rows
You should configure the mapping with the least number of transformations and expressions to do the
most amount of work possible. You should minimize the amount of data moved by deleting unnecessary
links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup
transformations), limit connected input/output or output ports. Limiting the number of connected
input/output or output ports reduces the amount of data the transformations store in the data cache.
You can also perform the following tasks to optimize the mapping:
You can also perform the following tasks to optimize the mapping: