Ab Initio Questionnaire Final

Confidential
What is Abinitio .......................................................... 6

What ways Abinitio is different from other ETL tools.. 6
Explain Ab Initio Architecture? ................................... 7
Functions of Co-Operating System?............................ 7
Functions of EME? ...................................................... 7
What is an EME Data store? ........................................ 8
What is a Sandbox? .................................................... 8
Important Questions .................................................. 8
What is the different file extensions used in Ab Initio?...................................................8
What is the difference between .cfg and .dbc files? ........................................................8
What are Datasets and name the different datasets? ........................................................9
What are Database components? .....................................................................................9
Explain Join with DB component? ................................................................................10
Explain Aggregate and when should it be used? ...........................................................11
Explain Normalize with example? .................................................................................11
Explain Denormalize sorted with example? ..................................................................12
Types of Parallelism ......................................................................................................12
Some of the components of ABINITIO .........................................................................13
What are different partitioning components? ...............................................................14
Difference between Sandbox and a Project in a EME? ................................................14
What do you mean by Parallel Computing? ..................................................................14
What are Record format metadata ?...............................................................................14
What is the order of Evaluation of graph and project parameters?................................15
What do you mean by a Switch Parameter? ..................................................................15
Difference between phase and checkpoint? ...................................................................15
Ab Initio Supports which shell? .....................................................................................15
Name few Compound Data Types in AB Initio? ...........................................................15
How many bytes does Packed decimal(7) occupies? ....................................................16
How many bytes does packed decimal(8,3,unsigned ) occupy? ....................................16
PIC 9(3)COMP-3 how do we represent the given packed decimal in a DML.? ..........16
Explain get_flow_state and next_ flow_state? ..............................................................16
Explain the function of reinterpret_as with an example? ..............................................17
What is a Transform Function?......................................................................................17
Explain Prioritized rule with an example? .....................................................................18
What is the difference between a Statement and Assignment? ...................................19
What are Packages ? Which all components can make use of Packages? .....................19
What are the different types of joins used in Ab Initio Components? ...........................19
What are the values we give for the sorted input parameter of an Aggregate
component? ....................................................................................................................19
Explain the conditions on which data moves to First, New, and Old ports in an Assign
Keys component? ...........................................................................................................20
Can we use Broadcast component for increasing data parallelism? .............................20
Explain the function of limit and ramp parameters? ......................................................20
Explain the function of Fuse components? ...................................................................20
Confidential - Internal Distribution 1

Confidential
Can we use Gather component to reduce data parallelism, and how does Gather differ
from Merge component ? ...............................................................................................21
In What scenario the record-requiredn parameter in the join component is required? ..21
Lookup functionality in Ab Initio. Briefly explain? ......................................................21
Explain the functionality of the Meta Pivot component? ..............................................21
What are the numeric results generated by the Parse function in a Read Raw
component and how are they interpreted? .....................................................................21
Explain Rollup component and Scan component? ........................................................22
Which component can we use to aggregate flow of records and then send them to
another computer? ..........................................................................................................22
Can we use any of the Ab Initio Components to perform Database Operations? .........22
Briefly explain about the ordered attribute? ..................................................................22
Describe when to use Lookup instead of Join? ..............................................................22
Briefly explain about max-core parameter? What happens if max-core parameter is set
too low and if it is set too high? .....................................................................................22
Explain Fan-in, Fan-out and All-To-All flows? ............................................................22
What do u mean by Degree of Parallelism?...................................................................23
Explain Repartitioning? .................................................................................................23
What are the Departitioning components in Ab Initio? .................................................23
If you need to do basic reformatting in parallel which partition component do we
use? ................................................................................................................................23
If you need bing parallel data into a serial dataset in a particular order, What
departition component would you use? .........................................................................23
Explain the ordering of data in the departitioning components used in Ab Initio? .......24
Which departitioning component is used to create an ordered serial flow? ..................24
Suggest some methods to avoid deadlocks during departitioning? ...............................24
What is a Layout? Is it Serial or Parallel?......................................................................24
What is a serial layout and a Parallel Layout? ...............................................................24
What are the different Layout options available in AB Initio Components? ................24
Which command do we use to create an empty multifile? ............................................24
Explain the role of Host process and Agent process in running a Graph? ....................24
What do you mean by multifile and multidirectory? .....................................................25
Name few multifile commands? ....................................................................................25
What do you mean by a Layout Marker?.......................................................................25
Name some of the components in AB Initio used for testing and Validation?..............25
What is the difference between is_valid, is_defined and is_blank functions? .............26
What do u mean by a multistage component ? Name any Multistage component ? ....27
Is Denormalize a multistage component if yes What are the different stages occurring
in this component? .........................................................................................................27
Explain the function of major key and minor key in a sort within group component? .27
When is it a bad idea to replace join with use of a lookup file in a reformatting
component? ....................................................................................................................28
Explain the function of lookup_next? ............................................................................28
Explain the function of Lookup_local?..........................................................................28
What is the purpose of using conditional components and how do u set the conditional
components in GDE? .....................................................................................................28

Confidential
How do we enforce the Reusability Feature in Ab Initio? .............................................28

Difference between Local and Formal Graph 28? .........................................................28
How can we reuse XFR’s and DML’s? .........................................................................29
What is the difference between the Reformat and Redefine Format components? ......29
When do I use a Replicate versus a Broadcast? ............................................................29
What is max core parameter? .........................................................................................29
What value should I set the for the max-core parameter? .............................................29
When should I use in-memory Rollup or Join versus sorted input and a Sort
component? ....................................................................................................................29
Can I use a graph or a command to move my multifiles in a 2-way multifile system
(MFS) to a 4-way MFS? ...............................................................................................30
What does the error message "straight flows may only connect ports having equal
depths" mean? ...............................................................................................................30
What does the error message "Trouble writing to socket: No space left on device"
mean? ...........................................................................................................................30
What is layout? .............................................................................................................30
What does the error message "too many open files" mean, and how do I fix it? .........30
What does the error message "Failed to allocate <n> bytes" mean and how do I fix it?
........................................................................................................................................30
What's the difference between a phase and a checkpoint? ...........................................30
What is the difference between API mode and utility mode in database components?
........................................................................................................................................30
What is an INDEX ? Why are indexes used in a table ?................................................31
Explain type of Join: Inner/Full outer/Explict ...............................................................31
Explain In-memory in JOIN/SCAN/ROLLUP/SORT .................................................31
Questions on Processing Mainframe files in abinitio. ...................................................31
Questions on JOB CONTROL: .....................................................................................33
Questions on DATABASE ACCESS ...........................................................................33
EME and AIR Commands .............................................................................................33
Explain DEPEDENCY ANALYSIS: ...........................................................................34
Explain PSETS: ............................................................................................................34
Common Functions: .......................................................................................................34
FILE MANAGEMENT .................................................................................................34
Why do you prefer lookup over join? ...........................................................................35
How do you debug Ab Initio graph? .............................................................................35
What version of Ab Initio graph you are using? What flavor of Unix you are running?
........................................................................................................................................35
What is Effective Metadata Management? ...................................................................35
If an xfr file changes, how do you figure out which dependent graphs have been
changed? (Ans - search for that xfr in ksh scripts) .......................................................35
What is multifile? Files partitioned using partition component.....................................35
What m_ commands you have used? ............................................................................35
What is difference between gather & merge? ...............................................................35
How does gather works internally? ...............................................................................36
What is re-partition? .....................................................................................................36
What is sandbox? ..........................................................................................................36

Confidential
What is is_valid, is_blank, is_defined? .........................................................................36

How do you find top 5 employees basis of their salary (using graph only)? ................36
How do you terminate running of a graph? [Ans - force_error] ...................................36
What is local lookup? ....................................................................................................36
What's difference between AI_SORT_MAX_CORE & AI_GRAPH_MAX_CORE?
AI_GRAPH_MAX_CORE = (total memory - memory used elsewhere)/(2 * number of
partitions) .......................................................................................................................36
What is .abinitiorc file? .................................................................................................36

What is the advantage of EME? ....................................................................................36
How do you find difference between two dates? ..........................................................36
How do you remove header & trailer from a graph? ....................................................37

how will u change your the values of parameter in higher env .....................................37
how to delete recovery files ..........................................................................................37
how to display co op system version UNIX ..................................................................37
how to check function values ........................................................................................37
- If I have a project parameter set, and I also have a variable exported in the start script.
What takes more priority? ..............................................................................................37
- How do I make a graph stop in the middle based on an internal condition.................37
How to run a graph from command line ........................................................................37
Explain m_eval, m_rollback and m_env? ......................................................................37
how will u change your the values of parameter in higher env .....................................38
How will you write SQL to execute exchange partition? ..............................................38
How to include dml or xfr ? ...........................................................................................38
How to test dbc file? ......................................................................................................38
Explain some m_db commands? ...................................................................................38
how can you graph 100 times?.......................................................................................38
Explain how various partition works? ...........................................................................38
What is the difference between part by key and round robin ........................................39
What are different de partition components...................................................................39
What are the various ways of unloading the data from a table ......................................39
What are the various ways of loading the data to a table...............................................39
What is the difference between rollup and scan ............................................................40
What is in memory option..............................................................................................40
does dedup component has in memory ..........................................................................40
how to drop a column using filter by expression ...........................................................40
What is the protocol of files in Abinitio ........................................................................40
Explain Run sql and run program component?..............................................................40
What is the difference between replicate and broad cast ...............................................40
What is assign keys component? ...................................................................................41
What is DBC file? ..........................................................................................................41
Explain mfs commands ..................................................................................................41
How will you include a .xfr in a reformat to use a function defined in the .xfr file ......41
What is the Formula for calculating a component's memory usage .............................41
What is MAX_CORE and how to calculate it ...............................................................41

Confidential
How many way parallel should you run? ......................................................................42

What is ABLOCAL and how can I use it to resolve failures when unloading in
parallel? .........................................................................................................................42
What does the error message "Failed to allocate <n> bytes" mean? ............................42
What does the error message "File table overflow" mean? ..........................................42
What does the error message "Mismatched straight flow" mean? ................................43
What does the error message "Too many open files" mean? ........................................43
What does the error message "Trouble writing to socket: No space left on device"
mean? ............................................................................................................................43
Can I use a graph or a command to move my multifiles in a two-way multifile system
(MFS) to a four-way MFS? ...........................................................................................43
How do I increase the timeout value for starting an Ab Initio process? .......................43
What is layout? .............................................................................................................43
What is the difference between m_rollback and m_cleanup and when would I use
them? .............................................................................................................................43
Can I access different databases from the same graph — for example, unload from
Oracle and load into Teradata? .....................................................................................43
How do I commit intermediate results in a database load? ..........................................44
Why does the number of digits vary between database and DML types? ....................44
Can I make my graph conditional so that certain components do not run? ..................44
How do I check the status of a graph in the end script and perform an appropriate
action based on that status (such as doing cleanup or sending an e-mail)? ..................44
What are Sandbox Hidden file? ................................ 44
What are various abinitio configuration variable? .... 45
AB_ Configuration Variables ........................................................................................46
Utility Commands ..................................................... 47
Catalog management commands ..................................................................................47
Cleanup commands .......................................................................................................47
Data commands .............................................................................................................48
File management commands .........................................................................................48
Job management commands .........................................................................................49
Other commands ............................................................................................................49
Suggest some methods to improve the Performance of
the graphs? .............................................................. 52
What are various versions of GDE you have worked on
and what are the major differences? ........................ 54
Replay .......................................................................................................................54
Scenario based questions ......................................... 55

Confidential
What is Abinitio
Abinitio is an ETL tool used for data loading from various source systems to a warehouse or a
data mart. Abinitio means from the beginning.
What ways Abinitio is different from other ETL tools.

Abinitio has a special feature called as parallelism, which is extensive in Abinitio when compared
to other ETL tools like informatica, data stage etc.

Confidential
Explain Ab Initio Architecture?
User Applications
Development Environments Ab Initio

GDE Shell
EME
Component User-defined 3rd Party
Library Components Components
The Ab Initio Co>Operating® System
Native Operating System (Unix, Windows, OS/390)
Functions of Co-Operating System?

Ab Initio Co>Operating system unites a network of computing resources – CPUs, Storage Disks,
Programs, Datasets into a quality data processing system. The Co> Operating system is layered
on top of the native operating systems of a collection of servers. It provides a distributed model
for process execution, file management, process monitoring, check pointing and debugging. A
user may perform all these functions on any or all servers from a single point of contact.
Functions of EME?
Enterprise Meta Environment is a high performance object oriented storage system that
inventories and manages various kinds of information associated with Ab Initio applications. It
provides storage for all aspects of your data processing system, from design information to
operations data.. The EME also stores data formats and business rules. It acts as the hub for
data and definitions. Integrated Meta data management provides a global and consolidated view
of the structure and meaning of applications and data.
• Source control
• Secure storage
• Conflict management
• Version history
• Differencing
• Documentation
• Abinitio Metadata
• Non Abinitio Metadata
• Analysis
• Impact - Downstream
• Dependency - Upstream

Confidential
• Job status
• Job Completion
• Tracking Information
• Lifecycle management
• Promotion – Dev to test to prod
• Migration – Project to project
What is an EME Data store?

An EME data store is a specific instance of the EME. It represents the specific EME storage to
which the user is currently connected through GDE. There can be many such data store
instances resident in an environment in which the EME has been installed. But the user can only
be connected to one data store at a time which is determined by the GDE’s current EME data
store settings.
What is a Sandbox?
A sandbox is a collection of graphs and related files that are stored in a single directory tree, and
treated as a group for purposes of version control, navigation, and migration.
A sandbox can be a file system copy of a repository project (EME project)
Important Questions
What is the different file extensions used in Ab Initio?

.dml - Data Manipulation Language files or Record format/type files for data storage.
.cfg - configuration file which contain the database connection strings and server details user id
password.
.mp - Ab Initio Graph extension
.xfr - Ab Initio Transform function files defined in Transform Components.
.dat - Data files (Both Serial and Multifiles)
.ksh - an Ab Initio graph, when deployed as shell script, stores in .ksh
.mpc - a file with a .mpc file extension represents a program or custom component.
.mdc - a file with an .mdc file extension represents a Dataset or custom dataset component.
What is the difference between .cfg and .dbc files?

.cfg: Database table configuration files for use with 2.1 database components
2.1 database components
They perform standard database operations, as follows:
LOAD DB TABLE loads records from an Ab Initio graph into a DB2, Informix, or Oracle database.
RUN DB SQL executes SQL statements in a DB2, Informix, or Oracle database — for Version 1.3
and higher of the Co>Operating System.
TRUNCATE DB TABLE deletes all the rows in a specified DB2, Informix, or Oracle database
table.
UNLOAD DB TABLE unloads records from a DB2, Informix, or Oracle database into an Ab Initio
graph.

Confidential
UPDATE DB TABLE executes UPDATE or INSERTS statements in embedded SQL format to

modify a table in an Oracle database.
.dbc: Database configuration files
What are Datasets and name the different datasets?

Dataset components represent data records or act upon data records as follows:
Input File represents data records read as input to a graph from one or multiple serial files or
from a multifile.
Intermediate File represents one or multiple serial files or a multifile of intermediate results that a
graph writes during execution, and saves for your review after execution.
Lookup File represents one or multiple serial files or a multifile of data records small enough to
be held in main memory, letting a transform function retrieve records much more quickly than it
could retrieve them if they were stored on disk.
Output File represents data records written as output from a graph into one or multiple serial files
or a multifile.
SAS Input File SAS Input File represents data records read as input to a graph from a SAS
dataset.
SAS Output File SAS Output File represents data records written as output from a graph into a
SAS dataset.
What are Database components?
Each database component, like all Ab Initio graph components, has a set of parameters that can
be accessed either by double-clicking the component (or right-clicking it and selecting Properties
from the drop-down menu) to reach the component properties dialog box, and then clicking the
Parameters tab.
The set of parameters offered for almost any given database component is dynamically
dependent on certain other values, namely:
The DBMS to which the component is connecting (i.e., the database specified by the dbms field
in the component's database configuration file)
(Secondarily, but only in some cases) the value specified in the component's dbms
_interface parameter
For example, if you set up an Output Table component with a configuration file that specifies
teradata as its dbms, you will find that the component has many more parameters than an Output
Table set up for a dbms of, say, db2uss. Moreover, you will find that this parameter set changes
yet again if you go on to specify a ter_interface ("Teradata interface") of Multiload instead of
Tpump. Similar variations can be seen with other combinations of database component and
DBMS.
The parameter tables in the individual section for each component give listings of:
The non-database-specific parameters (i.e., the set of parameters which are always available for
the component, no matter which DBMS is specified in the configuration file)
The database-specific parameters in separate tables, one for each supported DBMS
There are seven database components:

Confidential
Continuous Update Table is used in a continuous flow to execute UPDATE, INSERT or

DELETE statements in embedded SQL format to modify a table in a database. Continuous is
Input Table unloads data records from a database into an Ab Initio graph, allowing you to specify
as the source either a database table, or an SQL statement that selects data records from one or
more tables. Input Table is located in both the Database and Dataset folders in the Component
Organizer of the GDE.
Join with DB joins records from the flow or flows connected to its input port with records read
directly from a database, and outputs new records containing data based on, or calculated from,
the joined records.
Output Table loads data records from a graph into a database, letting you specify the records'
destination either directly as a single database table, or through an SQL statement that inserts
data records into one or more tables. Output Table is located in both the Database and Dataset
folders in the Component Organizer of the GDE.
Run SQL executes SQL statements in a database and writes confirmation messages to the log
port. You can use Run SQL to perform database operations such as table or index creation. Run
SQL is located in both the Database and Dataset folders in the Component Organizer of the
GDE.
Truncate Table deletes all the rows in a database table, and writes confirmation messages to the
log port. Truncate Table is located in both the Database and Dataset folders in the Component
Organizer of the GDE. Note: The Output Table component has in its Access properties tab a
"Truncate table before load" switch. If you are truncating a table in order to load new data
immediately afterwards, it is more efficient to set Output Table's truncate switch than to use the
separate component.
Update Table executes UPDATE, INSERT or DELETE statements in embedded SQL format to
modify a table in a database, and writes status information to the log port. Update Table is
located in both the Database and Dataset folders in the Component Organizer of the GDE. Note:
The SQL statements referenced by the updateSqlFile and (if specified) insertSqlFile
parameters are not checked by the GDE Validate operation.
Explain Join with DB component?

Purpose
Join with DB joins records from the flow or flows connected to its in port with records read directly
from a database, and outputs new records containing data based on, or calculated from, the
joined records.
For each input record, Join with DB executes the select_sql statement against the database.
match required : Controls how input records for which no rows are returned from the select_sql
statement are handled, but only when execute_on_miss is not specified:
True (the default) — The input record is sent to the unused port, and the join_with_db transform
function is not evaluated.
False — The join_with_db transform function is evaluated with a NULL value for query_result
execute_on_miss
A statement executed when no rows are returned by the select_sql statement. The
execute_on_miss statement should be an INSERT (or possibly an UPDATE); after it is executed,
the select_sql is executed a second time. If no results are generated on the second attempt, the
input record is rejected. A database commit is by default performed after each execution of
execute_on_miss, but you can change this by setting the commit_number parameter.
do_query
if you define the optional transform function do_query, join WITH DB calls this function for each
input record first. If the function returns True, the normal transform processing is done: the
select_sql statement is executed, and so on. If it returns False, no query is performed, and the
join_with_db function is called with a value of NULL for the query_result

0
Confidential
If no rows are returned by the select_sql:
If execute_on_miss is specified, the execute_on_miss SQL statement (INSERT or UPDATE) is

executed, and then the select_sql statement is retried. If there are still no results from the
database, the input record is sent to the reject port.
The execute_on_miss statement (like the select_sql statement) can contain bind variables
referencing values from the input record. (You can also perform manipulations on the values of
these variables before using them; see "Transform package for JOIN WITH DB".)
If execute_on_miss is not specified, the value of match_required determines what happens:

If match_required is True, the unmatched input record is sent to the unused port.
If match_required is False, the transform is evaluated with a NULL value for query_result.
If more than one row is returned by the select_sql:
The value of maximum_matches determines the maximum number of returned rows that are
used by the transform. The transform function is evaluated once for each row returned, within
maximum_matches.
The default value is -1, which specifies that all returned rows be used.
Explain Aggregate and when should it be used?

Aggregate generates records that summarize groups of records.
In general, use ROLLUP for new development rather than Aggregate. Rollup gives you more
control over record selection, grouping, and aggregation.
However, use Aggregate when you want to return the single record that has a field containing
either the maximum or the minimum value of all the records in the group.
Roll up
Rollup evaluates a group of input records that have the same key, and then generates records
that either summarize each group or select certain information from each group.
Rollup provides many aggregation functions : avg,count,first,last,max,min,sum,stdev,product
Example of rollup: Roll up on account and assign keys depending on values on different source
system. Rollup on account and get min of a field in the group.
Explain Normalize with example?
Normalize generates multiple output records from each of its input records. You can directly
specify the number of output records for each input record, or the number of output records can
depend on some calculation.
You need to provide the length transform function which tells who many times to execute the
normalize transform function.
If temporary_type is defined, Normalize executes the initialize transform function to output an

initial value to the temporary variable(s).

1
Confidential
If you have not defined temporary_type, it does not call the finalize function.
Some functions used

length_of : Returns the length of a value.
string_split : Splits a string into pieces.
In this example, the splinter string comma (,) returns a vector containing three elements:
string_split("quick,brown,fox", ",") [vector "quick", "brown", "fox"]
string_filter : Returns the characters of one string that also appear in another string.
re_index : Returns the index of the first character of a string matching a regular expression.
Explain Denormalize sorted with example?

Denormalize Sorted consolidates groups of related records by key into a single output record with
a vector field for each group, and optionally computes summary fields in the output record for
each group.
Denormalize Sorted requires grouped input
Specifying transform functions for DENORMALIZE SORTED

You have to code the element type and vector type.
Define element_type, which is the built-in transform vector element (elt) type.This element will
contain the data you are collecting from the individual records for the vector in the output record.
Specify certain details of the built-in denormalization_type transform vector.
This is the transform vector used internally by the denormalization function; you specify its length
(and its elt field(s) in the element_type definition). It is output by the finalize function as the vector
in the output record.
The vector must be fixed-length: there must be an integer constant between the square brackets
([ ]). Following are some valid declarations of denormalization_type:
In addition, you have the option of defining a rollup operation in a Denormalize Sorted transform.
This allows you to roll up and denormalize in the same operation. If you do this, you will also need
to define a temporary_type record to hold the data that is being rolled up.
denorm function is where actual denorm happens. Depending on the max size of vector you can
assign the update. As long as count is less than 20 (which is the fixed length of the vector defined
in your output record format), update will get a 1 value (true) and elt will be appended to the
vector
Types of Parallelism
Component Parallelism
A graph with multiple processes running simultaneously on separate data uses component
parallelism.
Data Parallelism
A graph that deals with data divided into segments and operates on each segment
simultaneously uses data parallelism. Nearly all commercial data processing tasks can use data
parallelism. To support this form of parallelism, AbInitio software provides Partition components to
segment data, and Departition components to merge segmented data back together.
Pipeline Parallelism
A graph with multiple components running simultaneously on the same data uses pipeline
parallelism.
Each component in the pipeline continuously reads from upstream components, processes data,
and writes to downstream components. Since a downstream component can process records
previously written by an upstream component, both components can operate in parallel.

2
Confidential
To make serial file parallel, you need to partition the data using Partitioning components
To run the component in parallel, the layout must be $MFS type.
ROLLUP/JOIN/SCAN/SORT requires data to be partitioned by the key.
When key consists of composite key (multiple fields), partitioning is just required on high level key
Some of the components of ABINITIO

Rollup: A multi-stage component that has initialize() , finalize(), rollup()
Performs group level activities like producing aggregates, summaries
Is similar to group by function in SQL
Rollup generates groups of records that summarize the groups of data records.
Scan: A multi-stage component that has initialize() finalize(), scan()

Scans each record in a group and performs specified action and writes to output port
Scan can be used to write advanced functions like: Sequential key generation, running totals.
scan generates series of cumulative summary of records
Normalize: When we want to create separate data records from a single record, we need to use
this. Generally data from mainframe comes in vectors and we need to use Normalize to separate
them into multiple records.
De-Normalize: When we want to create a single data record from 1 or more records. This has
reverse effect of Normalize.
Example of denormalise
Redefine Format: Use this component when you need to change the record layout.
De-partition:
Concat reads all data from a flow at a time and combines them with next flow
Gather reads all data from all flows randomly and combines them
Merge reads all data from all flows and maintains the sorted order of data
Input Table:- Input Table unloads data records from a database into an Ab Initio graph, allowing
you to specify as the source either a database table, or an SQL statement that selects data
records from one or more tables.
Output Table:- Output Table loads data records from a graph into a database, letting you specify
the records' destination either directly as a single database table, or through an SQL statement
that inserts data records into one or more tables
Update Table:- Update Table executes UPDATE, INSERT or DELETE statements in embedded
SQL format to modify a table in a database, and writes status information to the log port.
RunSQL:- Executes SQL statements in a database.
InputFile- Represents one file, many files, or a multifile as an input to your graph.
OutputFile Represents one file, many files, or a multifile as an output from your graph.
Lookup File:- Lookup Files are components containing shared data. Use lookup files with the
DML lookup functions to access records according to a key.
Concatenate:- Appends multiple flow partitions of data records one after another.
Gather:- Combines data records from multiple flow partitions arbitrarily.
Merge:- Combines data records from multiple flow partitions that have been sorted according to
the key specifier, and maintains the sort order.
Run Program:- Executes a standard UNIX or Windows NT program.
Trash:- Ends a flow by discarding all input data records
Partition by Key:- Distributes data records to its output flow partitions according to key values.

3
Confidential
Partition by Percentage:- Distributes a specified percentage of the total number of input data
records to each output flow.
Partition by RoundRobin:- Distributes data records evenly to each output flow in round-robin
fashion.
Use the Interleave component to reverse the effects of Partition by Round-robin.
Sort:- Orders your data according to a key specifier.
Dedup Sorted:- Separates one specified data record in each group of data records from the rest
of the records in the group.
Filter by Expression:- Filters data records according to a specified DML expression.
Join:- Performs inner, outer, and semi-joins with multiple flows of data records. It maximizes
performance by loading input data records into main memory.
Reformat:- Changes the record format of your data by dropping fields or by using DML
expressions to add fields, combine fields, or modify the data.
Rollup:- Generates data records that summarize groups of data records. Rollup in Memory
maximizes performance by keeping intermediate results in main memory.
Denormalize Sorted:- Consolidates groups of related records in the input to a single data record
with a vector field for each group, and optionally computes summary fields for each group.
It's the inverse of Normalize.
Normalize:- Generates multiple output data records from each input data record. Normalize can
separate a data record with a vector field into several individual records, each containing one
element of the vector.
Leading Records:- Leading Record copies a specified number of records from its in to its out
port, counting from the first record in the input file.
What are different partitioning components?

Partion by expression
Partion by key
Partition by Round-robin
Partion by key and sort
Partion by percentage
Partition by range
Difference between Sandbox and a Project in a EME?

The difference between the Sandbox and Project in the EME is the Sandbox will contain only the
latest version of each file checked out into that sandbox whereas the Project will contain , for
each file ,each and every version that has ever been checked in by anybody. Users cannot
directly manipulate Project. To work on a Project one working area should be set up in the
system. Once this area has been properly set up files can be checked out into it or checked in
from it.
What do you mean by Parallel Computing?

Parallel computing relates to the simultaneous performance of multiple operations.
What are Record format metadata ?

(usually in .dml files)
A record format is either a DML file or a DML string that describes data. You set record formats
on the Properties Parameters tab of the Properties dialog box. A record format is a type applied
to a port.

4
Confidential
An example of a record format is:
record
string(6) first_name;
string(10) last_name;
decimal(3) age;
date("YYYYMMDD") date_of_hire;
end
This format describes a record with four fields: first_name, last_name, age, and date_of_hire.
Each field has a different type.
Grouping and Ordering metadata (in key specify)
For data records that are input to a component, the description of the way input records are
ordered, partitioned, or grouped.
For data records which a component is to output, the description of the way the component
should order, partition, or group output records.
Computational metadata (transform functions in .xfr files)
A collection of rules and other logic that drive data transformation in Transform components.
What is the order of Evaluation of graph and project parameters?

 The host setup script is run.
 Common (included) project parameters are evaluated.
 Other project parameters are evaluated.
 The project-start.ksh script is run.
 Formal parameters are evaluated.
 Graph parameters are evaluated.
 The graph start script is run.
What do you mean by a Switch Parameter?

Switch parameter is also a member in project parameters. A switch parameter has a fixed set of
string values which you define when you create the parameter. The purpose of a switch
parameter is to allow you to change your sandbox's context: its value determines the values of
various other parameters that you make dependent on that switch. For each switch value, each of
the dependent parameters has a dependent value. Changing the switch's value thus changes the
values of all its dependent parameters
Difference between phase and checkpoint?

Phase: A phase is a stage of a graph that runs to completion before the start of the next stage.
Checkpoint: A checkpoint is a phase that acts as an intermediate stopping point in a graph to
safeguard against failures.
Ab Initio Supports which shell?

Korn Shell
Name few Compound Data Types in AB Initio?

Union :- A DML union type is a compound type that provides several interpretations of the same
data.

5
Confidential
union
Field-type field-name;
[field-type field-name;….]
End
Vector :- A vector type is a compound data object containing a sequence of elements, all of
which are the same type. The same element type can be any base type or another compound
type. There are four kinds of DML vectors
Fixed-size vectors
Delimited vectors
Data-sized vectors
Self-sized vectors
Record :- A record type is compound data type containing a sequence of named fields, each of
which can be a different DML base or compound type.
record
field-type field-name;
[field-type field-name…..]
end
How many bytes does Packed decimal(7) occupies?

Four Bytes.
How many bytes does packed decimal(8,3,unsigned ) occupy?

Four Bytes.
PIC 9(3)COMP-3 how do we represent the given packed decimal in a DML.?

Packed decimal(4,unsigned).
Explain get_flow_state and next_ flow_state?

Records in heterogeneous flows are specified with conditional fields where the condition reflects
the current flow state (rather than the data within a record). You can get the flow state with a call
to the get_flow_state function, which takes no arguments. By defining a function field named
next_flow_state, you can define the sequence of flow states. The initial flow state is 0.The
definition of get_flow_state treats the flow state as a counter: when a record of the first kind is
read, the flow state takes the value of the count field; subsequent records are of the second kind,
and the flow state "counts down" on each one until it reaches 0, when a record of the first kind is
again expected.
Example
The following specifier describes a flow in which the first record has the data field store_name,
and all subsequent records have the data fields item_name, price, and quantity.
record
if (get_flow_state() == 0)
string(7) store_name;
else
begin
string(6) item_name;
decimal(7,2) price;
decimal(3,0) quantity;
end

6
Confidential
double next_flow_state() = 1;
end
Explain the function of reinterpret_as with an example?

Returns the result of evaluating an expression and interpreting it as a specified type. This
reinterpret_as function is different from the cast operator in that it directly interprets the underlying
data rather than the value that data represents. The function returns NULL if:
The evaluation of expression results in NULL .The type cannot legally interpret x.
For example, the type is fixed-size and the value has fewer bytes than the size dictates; or the
type is a delimited type and the value does not contain the delimiter.
Syntax
type reinterpret_as ( type, x )
Arguments
type A type.
x An expression.
This function evaluates x and interprets it as type type, and returns the result.
Example
reinterpret_as(ebcdic string(1),(integer(1)) 129)  "a"

reinterpret_as(little endian integer(2), 'xyz')  31096
What is a Transform Function?

Transform functions are collections of rules and other logic that drive data transformation. A
transform function produces one or more outputs from zero or more inputs.
A component determines the values that are passed to the transform function and interprets the
results of the transform function. Most commonly, transform functions express record reformatting
logic. In general, however, you can use transform functions to encapsulate a wide variety of
computational knowledge that you can use in data cleansing, record merging, and record
aggregation.
Transform functions direct the behavior of transform components. A transform function is a

named, parameterized sequence of local variable definitions, statements, and rules that
computes expressions from input values and variables and assigns the results to output objects.
Just like a user-defined function in any other programming language, transform functions can call
other transform functions.
Syntax
The following is the syntax of a transform function:
output-list:: function_name ( input-list ) =

begin
local-variable-declaration-list
statement-list
rule-list
end;
A transform function definition consists of:

7
Confidential
A list of output variables

A name for the transform function
A list of input variables
An optional list of local variable definitions. (For more info, see Local Variables)
An optional list of local statements. For more information, see "Statements"
A series of rules. For more information see "Rules"
The list of local variable definitions, if any, must precede the list of statements, if any. The list of
statements, if any, must appear before the list of rules.
The following is an example of a transform function. It takes an input record with fields named a,
b, and c, and produces an output record with fields named x, y, and z using the rules shown.
out :: trans1(in) =
begin
out.x :: in.a;
out.y :: in.b + 1;
out.z :: in.c + 2;
end;
Explain Prioritized rule with an example?

The rules for a particular set of outputs on a left-hand side are attempted in order of priority,
starting with the assignment of lowest numbered priority and proceeding to assignments of higher
numbered priorities, then finally to an assignment for which no priority has been given.
If any rule results in error, the transform function results in error.
If a rule results in NULL, DML tries the rule with the next highest priority. If all rules result in
NULL, DML assigns to each output its default value or sets its value to NULL if it can hold NULL.
However, if not all outputs have a default value or can hold NULL, the transform function results
in error.
Examples of prioritized transform rules
The following are examples of prioritized rules.
Example 1
The following set of rules assigns the ssn field of the output variable master by trying several
different things, in sequence. First, DML attempts to copy the value from source1. If source1
results in NULL, then DML tries to copy the value from source2. If that results in NULL, then DML
copies the value 999999999.
master.ssn :1: source1.ssn;

master.ssn :2: source2.ssn;
master.ssn :3: "999999999";
Example 2
The following set of rules computes the partcode and src fields of the output variable new by
following a sequence of rules.
If the expression try_partcode1(old.partcode) succeeds, the rule assigns the resulting value and
1, respectively, to the fields partcode and src of new.

8
Confidential
If the expression try_partcode1(old.partcode) results in NULL, the rule evaluates the

try_partcode2(old.partcode). If the expression succeeds, the rules assigns the resulting value and
2, respectively, to the fields partcode and src of new.
new.partcode, new.src :1: try_partcode1(old.partcode), 1;

new.partcode, new.src : : try_partcode2(old.partcode), 2;
Explain the function of allocate( ) with an eg?
The allocate function provides a convenient means of constructing initial values for local or global
variables.
Example
Type myrec =
Record
String(5) x=’X’;
Decimal(5) y= -1;
String (10) z;
End;
Let myrec loc1 = allocate();
What is the difference between a Statement and Assignment?

Statements execute in the order they appear in the transform. Assignments are executed in the
order the fields appear in the output record format, and in priority order within fields. All
statements execute before any assignments. Local variable initialization occurs before either.
What are Packages ? Which all components can make use of Packages?
Packages are a high-level view of a transformation, and contain included files, global variables,
named types, helper functions, and transformation functions. Packages are available for
Reformat and Join as well as the multistage transform components.
What are the different types of joins used in Ab Initio Components?

Inner Join.
Full Outer Join.
Explicit Join.
What are the values we give for the sorted input parameter of an Aggregate
component?
In memory: Input need not be sorted and
Input must be sorted or grouped
When set to In memory: Input need not be sorted, Aggregate accepts ungrouped input, and
requires the use of the max-core parameter. When set to Input must be sorted or grouped,
Aggregate requires grouped input, and the max-core parameter is not available. Default is Input
must be sorted or grouped.

9
Confidential
Explain the conditions on which data moves to First, New, and Old ports in
an Assign Keys component?
1. If the natural key value of the input record matches the natural key value of any record on
the key port, Assign Keys:
Assigns the surrogate key value of the record on the key port to the surrogate key field of the
input record
Sends the input record to the old port
2. If the natural key value of the input record does not match the natural key value of any
record on the key port, and this is the first occurrence of this natural key value, Assign Keys:
Creates a new value for the surrogate key field of the input record
Sends the input record to both the new and the first port
3. If the natural key value of the input record does not match the natural key value of any
record on the key port, and this is not the first occurrence of this natural key value, Assign Keys:
Assigns the surrogate key value from the record on the first port that has the same natural key
value as the input record (see 2 above) to the surrogate key field of the input record
Sends the input record to the new port, but not the first port
Can we use Broadcast component for increasing data parallelism?

Broadcast can be used to increase data parallelism when you have connected a single fan-out
flow to the out port or to increase component parallelism when you have connected multiple
straight flows to the out port.
Explain the function of limit and ramp parameters?

When you set the reject-threshold parameter of a component to Use limit/ramp, the ramp and
limit parameters become available. The component then uses the ramp and limit parameters
together in a formula to control the component's tolerance for reject events:
The ramp parameter contains a float that represents a rate of reject events in the number of
records processed.
The limit parameter contains an integer that represents a number of reject events.
The component stops the execution of the graph if the number of reject events exceeds the result
of the following formula:
limit + (ramp * number_of_records_processed_so_far)
Explain the function of Fuse components?

Fuse applies a transform function to corresponding records of each input flow. The first time the
transform function executes, it uses the first record of each flow. The second time the transform
function executes, it uses the second record of each flow, and so on. Fuse sends the result of the
transform function to the out port.
The component works as follows.

The component tries to read from each of its input flows.

0
Confidential
If all of its input flows are finished, Fuse exits.
Otherwise, Fuse reads one record from each still-unfinished input port and a NULL from each
finished input port.
Can we use Gather component to reduce data parallelism, and how does
Gather differ from Merge component ?
You can use Gather to:
Reduce data parallelism, by connecting a single fan-in flow to the in port
Reduce component parallelism, by connecting multiple straight flows to the in port
Gather component is not key based , Merge component is key based. In Gather component
ordering is unpredictable . In Merge component Datas are in sorted order. Gather is used for
unordered departitioning and repartitioning. Merge is used for creating ordered serial flow.
In What scenario the record-requiredn parameter in the join component is

required?
If the Join type parameter is set to Explicit then we have to set the record-required parameter for
each port individually.
The setting of the record-required parameter determines whether or not Join calls the transform
function for a particular combination of key values in the data records on the in ports.
Lookup functionality in Ab Initio. Briefly explain?

Lookup File represents one or multiple serial files or a multifile of data records small enough to
be held in main memory, letting a transform function retrieve records much more quickly than it
could retrieve them if they were stored on disk.
Lookup File associates key values with corresponding data values to index records and retrieve
them.
Although a Lookup File appears in the graph without any connected flows, its contents are
accessible from other components in the same or later phases.
Explain the functionality of the Meta Pivot component?

The Meta Pivot component allows you to split records by data fields. The component converts
each input record into a series of separate output records. There is one separate output record
for each field of data in the original input record. Each output record contains the name and value
of a single data field from the original input record.
What are the numeric results generated by the Parse function in a Read
Raw component and how are they interpreted?
The Parse function returns a Positive number, Zero and a Negative number.
A Positive number indicates that the function should be called again if there is remaining input.
Zero indicates that the function should not be called again and that any remaining input should be
discarded.
A negative number indicates that the parser cannot parse the data and the function should
terminate with an error.

1
Confidential
Explain Rollup component and Scan component?
Rollup generates data records that summarize groups of data records.Rollup gives you more
control over record selection, grouping, and aggregation than Aggregate.Rollup requires grouped
input.
Scan generates a series of cumulative summary records - such as year-to-date totals - for groups
of data records. Scan requires grouped input.
Which component can we use to aggregate flow of records and then send
them to another computer?
We can make use of replicate component.
Can we use any of the Ab Initio Components to perform Database

Operations?
We can use Run SQL to perform database operations such as table or index creation.
Briefly explain about the ordered attribute?

The ordered attribute is a port attribute. It determines the order in which you attach flows to a port
, from top to bottom is significant to the component. If a port is ordered the order in which flows
are attached determines the result of the processing the component does; if you change the order
in which you attach the flows you create a different result.
Example:
Input port of a Concatenate component is ordered.
Describe when to use Lookup instead of Join?

General rules, use lookup over Join if
You use only few attributes from the lookup data
The lookup result is a single value (i.e. there is no or exactly one matching record)
The same values are frequently looked up against
The lookup set is relatively small compared to the driving table
There is spare memory within the phases
Briefly explain about max-core parameter? What happens if max-core

parameter is set too low and if it is set too high?
The value for the max-core parameter determines the maximum amount of memory, in bytes, that
the component will use. Note that if the component is running in parallel, the value of max-core
represents the maximum memory usage per partition, not the sum over all partitions. If you set
the max-core value too low, the component runs more slowly than expected. If you set the max-
core value too high, the component may use too many machine resources, slow the process
drastically, and cause hard-to-diagnose failures.
Explain Fan-in, Fan-out and All-To-All flows?

Fan-in flow:- Fan-in flows connect components with a large number of partitions to components
with a smaller number of partitions. The most common use of fan-in is to connect flows to

2
Confidential
Departition components. This flow pattern is used to merge data divided into many segments
back into a single segment, so other programs can access the data.
When you connect a component running in parallel to any component via a fan-in flow, the
number of partitions of the original component must be a multiple of the number of partitions of
the component receiving the fan-in flow. For example, you can connect a component running 9
ways parallel to a component running 3 ways parallel, but not to a component running 4 ways
parallel.
Fan-out flow:- Fan-out flows connect components with a small number of partitions to
components with a larger number of partitions. The most common use of fan-out is to connect
flows from partition components. This flow pattern is used to divide data into many segments for
performance improvements. When you connect a Partition component running in parallel to
another component running in parallel via a fan-out flow, the number of partitions of the
component receiving the fan-out flow must be a multiple of the number of partitions of the
Partition component. For example, you can connect a Partition component with 3 partitions via a
fan-out flow to a component with 9 partitions, but not to a component with 10 partitions.
All-to-All flow :- All-to-all flows typically connect components with different numbers of
partitions. Data from any of the upstream partitions is sent to any of the downstream partitions.
The most common use of all-to-all flows is to repartition data as in the following example.
What do u mean by Degree of Parallelism?

Degree of parallelism' usually refers to data parallelism. Data parallelism implies that for a
particular component Ab Initio(or rather the Co-op server) installed on the host machine spawns
multiple processes on different nodes. Each of this process does the work of the specified
component, *but* only on a subset of the actual data. The total number of such processes
running is defined as the degree of parallelism. In other way we can say that Degree of
Parallelism is the no. of flows your file is partitioned into.
Explain Repartitioning?
To repartition data is to change the degree of parallelism or the grouping of partitioned data. For
instance, if you have divided skewed data, you can repartition using a Partition by Key connected
to a Gather with an All-to-all flow. If the data is sorted on one key field and you want to sort on
another, you can repartition with a Partition by Key connected to a Sort with an all-to-all flow.
What are the Departitioning components in Ab Initio?

Concatenate
Gather
Merge
Interleave
If you need to do basic reformatting in parallel which partition component

do we use?
Partition By Round Robin. We can also use Partition by Key Component if we know the key
provided key is unique or nearly unique in the data
If you need bing parallel data into a serial dataset in a particular order,
What departition component would you use?
Merge Component.

3
Confidential
Explain the ordering of data in the departitioning components used in Ab

Initio?
Gather – Unpredictable.
Merge - Sorted.
Concatenate – Global.
Which departitioning component is used to create an ordered serial flow?

Merge.
Suggest some methods to avoid deadlocks during departitioning?

Use Concatenate and Merge with care
Use flow buffering (the GDE Default).
Insert a phase break before the departitioner.
Don’t serialize data unnecessarily; repartition instead of departition.
What is a Layout? Is it Serial or Parallel?
A layout is a list of host and directory locations, usually given by the URL of a multifile. If the
locations are not in a multifile, the layout is a list of URLs called a custom layout.
A program component's layout lists the hosts and directories in which the component runs. A
dataset component's layout lists the hosts and directories in which the data resides. Layouts are
set on the Properties Layout tab.
The layout defines the level of parallelism. Parallelism is achieved by partitioning data and
computation across processors.
Layout can be Serial and Parallel.
What is a serial layout and a Parallel Layout?

A serial layout specifies a single computer and a single directory on that computer.
A parallel layout specifies multiple computers with multiple directories across the computers. It is
permissible for the same computer to be repeated.
What are the different Layout options available in AB Initio Components?

Propagate from Neighbours.
Bind Layout to That of another Component.
Use Layout of URL.
Construct Layout Manually.
Run on these hosts.
Database components can use the same layout as a database table
Which command do we use to create an empty multifile?

m_touch
Explain the role of Host process and Agent process in running a Graph?
Pushing “Run” button generates script. Script is transmitted to Host computer.

4
Confidential
Script is invoked, creating Host process. Host process in turn spawns Agent Process. Agent
processes create Component processes on each processing computer. Component processes
do their jobs. Component processes communicate directly with datasets and each other to move
data around. As each Component process finishes with its data, it exits with success status.
When all of an Agent’s Component processes exit, the Agent informs the Host process that those
components are finished. The Agent process then exits. When all Agents have exited, the Host
process informs the GDE that the job is complete. The Host process then exits.
What do you mean by multifile and multidirectory?

Multidirectory:
A multidirectory is a parallel directory that is composed of individual directories on
different disks and/or nodes. The individual directories are partitions (A partition is a file that is a
portion of a multifile) of the multidirectory. Each multidirectory contains one control directory and
one or more data directories. Multifiles are stored in multi directories.
Multifile:
A multifile is a parallel file that is composed of individual files on different disks and/or
nodes. The individual files are partitions of the multifile. Each multifile contains one control
partition (A control partition is the file in a multifile that contains the locations (URLs) of the
multfile's data partitions) and one or more data partitions (A data partition is a file in a multifile that
contains data). Multifiles are stored in distributed directories called multidirectories.
The data in a multifile is usually divided across partitions by one of these methods:
Random or round robin partitioning
Partitioning based on ranges or functions, or
Replication or broadcast, in which each partition is an identical copy of the serial data.
Name few multifile commands?

m_kfs - Creates a multifile system
m_cp - copies the mfs
m_mv - Moves the mfs
m_cat - view the contains / data of mfs
m_ls - Lists information on the file or directories specified by the urls
m_rm - Remove Dir
m_mkdir - Creates the named multidirectory
What do you mean by a Layout Marker?

The GDE uses layout markers to show the level of parallelism on components. If the GDE can
determine the level of parallelism, it uses that level as the marker. For example, non-parallel files
have a marker of 1.
If GDE is not able to determine the Layout how does it abbreviate Layouts? How does the layout
marker display if the layout is propagated?
If the GDE cannot determine the level of parallelism, it abbreviates layouts as L1, L2, and so on.
An asterisk next to a layout marker (L1*) indicates a propagated layout.
Name some of the components in AB Initio used for testing and Validation?
Check Order
Compare Checksums
Compare Records
Compute Checksum
Generate Random Bytes
Generate Records

5
Confidential
Validate Records
What is the difference between is_valid, is_defined and is_blank

functions?
Is_blank:- Tests whether a string contains only blank characters.
This function returns the value 1 if the given string contains only blank characters or is a zero-
length string. Because every character in a zero-length string is a blank, this function returns 1 for
zero-length strings. The value 0 otherwise.
Syntax
int is_blank ( string s )
Arguments
s A string to test.
Examples
is_blank("")  1
is_blank("abc")  0
is_blank("ooo")  1
is_blank("ooo.ooo") 0
Is_defined:- Tests whether an expression is not NULL.

This function returns the value 1 if the expression evaluates to a non-NULL value. The value 0
otherwise. The inverse of is_defined is is_failure. Note the distinction between being valid and
being defined: a defined item has data, but a valid item has valid data. For example, a decimal
field filled with alphabetic characters is defined but not valid.
Syntax
int is_defined ( x )
Arguments
x An expression to test.
Examples
is_defined(123)  1
is_defined(NULL) 0
Is_valid:-Tests whether a value is valid.
Syntax
int is_valid ( x )
Arguments
x The value to test.

This function returns the value 1 if x a valid data item. The value 0 otherwise. If x is a record type
that has field-validity checking functions, the is_valid function calls each field-validity checking
function. The is_valid function returns 0 if any field-validity checking function returns 0 or NULL. A
data item is valid if it can be used in computation. Values of decimal type must be able to be
interpreted as a number in the specified format.Values of date type must be able to be interpreted
as a date in the specified format and must be an actual date. February 28, 1992 is a valid date,
but February 31, 1992 is not.Each element of a vector type must be valid. Each field of a record
type must be valid. If the record type defines function fields with names beginning with is_valid,
then the is_valid function of the record type calls each is_valid function field. For the record type
to be valid, each function field must be valid.

6
Confidential
Note the distinction between being valid and being defined: a defined item has data, but a valid
item has valid data. For example, a decimal field filled with alphabetic characters is defined but
not valid.
DML calls is_valid field functions only when is_valid is called on the record, not every time it
reads or creates a data record of that type.
Examples
is_valid(1) 1
is_valid("oao") 1
is_valid(234.234) 1
is_valid((decimal(8))"ooo") 0
is_valid((decimal(8))"oao")  0
What do u mean by a multistage component ? Name any Multistage

component ?
These components take several sets of rules to tell them how data is to be transformed in several
stages. Each set of rules (in the form of one transform function) determines how each stage of
the transformation will proceed. Stages include: input selection, initialization, iteration, finalization,
output selection, and more. Multistage transform components provide a temporary variable which
may be used to carry information between stages.
Multiple pieces of information may be conveyed from stage to stage by having multiple fields in
the temporary type
Example : - Rollup.
Is Denormalize a multistage component if yes What are the different stages

occurring in this component?
Yes.
Input Selection
Initialization
Rollup
Denormalize
Finalization
Output Selection
Explain the function of major key and minor key in a sort within group
component?
Sort within Groups assumes input records are sorted according to the major-key parameter. Sort
within Groups reads data records from all the flows connected to the in port until it either reaches
the end of a group or reaches the number of bytes specified in the max-core parameter. When
Sort within Groups reaches the end of a group, it does the following:
1. Sorts the records in the group according to the minor-key parameter

2. Writes the results to the out port
3. Repeats this procedure with the next group.

7
Confidential
When is it a bad idea to replace join with use of a lookup file in a

reformatting component?
Whenever you need to process every record in the flow that would become the lookup file. In
addition to outer joins, this includes inner joins with flows connected to unused ports (anti-joins).
Explain the function of lookup_next?

A successful call to either lookup or lookup_count must precede a call to lookup_next, which
returns successive records from file_label that match the values of the expression arguments
given in the prior call to either lookup_count or lookup.
Explain the function of Lookup_local?

The lookup _local restricts the search to only the local portion of the partitioned Lookup File. This
form of lookup is useful and efficient if the data being processed by the component is partitioned
in the same way as the Lookup File.
What is the purpose of using conditional components and how do u set the
conditional components in GDE?
The GDE supports Conditional Components where a shell expression determines, at runtime,
whether or not to include certain components.
To turn on this feature:
File -> Preferences -> “Parameters” section of dialog
Check “Conditional Components”
The “Condition” tab appears on all components
The Condition expression is a shell expression that, if it evaluates to “0” or “false”, will cause the
component to not exist. Any other value means the component will exist.Make sure the shell
expression returns the string “0” or “false”, not numerical 0.
How do we enforce the Reusability Feature in Ab Initio?

By Creating SubGraphs.
Difference between Local and Formal Graph Parameter?
In Local Parameter :
value is saved as part of graph.
value is same every time graph is run
In Formal Parameter :
value is provided to graph at runtime.

allows generic, reusable graphs
GDE will prompt for value at runtime
can be keyword or positional for deployed scripts:
keyword.ksh -input $DAT/inputfile.dat -sort_key “{k1;k2}”
posn.ksh $DAT/inputfile.dat “{k1;k2}”
can require value provided at runtime

8
Confidential
can have optional runtime input; default value must be provided.
How can we reuse XFR’s and DML’s?
Whenever we plan to use the same record structure (DML) or Transform functions (XFR), we can
store that in .DML and .XFR respectively. By doing this, we have the option of re-usability.
What is the difference between the Reformat and Redefine Format

components?
The Reformat component can change the record format of data records by dropping fields, or by
using DML expressions to add fields, combine fields, or transform the data in the records.
The Redefine Format component copies data records from its input to its output without changing the
values in the data records.
When do I use a Replicate versus a Broadcast?

Broadcast and Replicate are similar components but generally, Replicate is used to increase
component parallelism, emitting multiple straight flows to separate pipelines, while Broadcast is used
to increase data parallelism by feeding records to fan-out or all-to-all flows.
What is max core parameter?

Maxcore is used in sort rollup join components
max-core that determines the maximum amount of memory they will consume per partition before
they spill to disk. When the value of max-core is exceeded in any of the in-memory components, all of
the inputs are dropped to disk. This can have a dramatic impact on performance, but this does not
mean that it is always better to increase the value of max-core.
For the Sort component 100 MB is the default value for max-core
What value should I set the for the max-core parameter?

The max-core parameter is found in Sort, Join and Rollup components. There is no single,
optimal value for the max-core parameter, since a good value depends on your particular graph
and the environment in which it runs.
When should I use in-memory Rollup or Join versus sorted input and a Sort
component?
Whenever the data to be sorted is in small amounts then no need of using the sort component
The data can be sorted inmemory of rollup or join
But when the data amount is large then it is better to sort data first and then send to the rollup r join
components
Whereas in join if the inputs tobe sorted key and join key is same “inputs sorted”
Then the join will accepts the sorted data even if it is in small amounts
First sort component has to sort then sends to join

9
Confidential
Can I use a graph or a command to move my multifiles in a 2-way multifile

system (MFS) to a 4-way MFS?
You can copy your data from a 2-way multifile system to a 4-way multifile system using a graph with
an Input File and Output File component, and a partition component and a departition component of
your choice. Note that this does not remove the original files from the 2-way multifile system.
What does the error message "straight flows may only connect ports
having equal depths" mean?
This error message appears when you have two components connected by a straight flow and they
are running at different levels of parallelism.
What does the error message "Trouble writing to socket: No space left on
device" mean?
Your work directory, $AB_WORK_DIR, is full.
What is layout?
The layout of a component is a list of hostname/pathname locations which defines two things:
Where the component runs
The number of ways parallel the component runs or the depth of parallelism of the component.
A component's layout is a list of hostname/pathname locations which specifies where (on What
computer and in What directory) the component runs and the number of ways parallel the component
runs. For example, if you set your layout to refer to a 4-way multifile, the Co>Operating System starts
four copies of the same executable, resulting in four parallel processes. Each one of the processes
runs in one partition of the multifile.
What does the error message "too many open files" mean, and how do I fix
it?
The "too many open files" error messsage occurs most commonly because the value of the max-core
parameter of the Sort component is set too low. In these cases, increasing the value of the max-core
parameter solves the problem.
What does the error message "Failed to allocate <n> bytes" mean and how
do I fix it?
The "failed to allocate" type of error message is generated when an Ab Initio process has exceeded its
limit for some type of memory allocation.
What's the difference between a phase and a checkpoint?

The essential difference between a phase and a checkpoint is their purpose and how the temporary
files containing the data landed to disk are handled:
Phases are used to break up a graph into blocks for performance tuning.
Checkpoints are used for the purpose of recovery.
What is the difference between API mode and utility mode in database
components?
API and utility are two possible interfaces to databases from Ab Initio software; their uses may differ
depending upon the database in question.

0
Confidential
API mode is nothing but an insert statement in oracle

Utility mode acts as sql loader utility in oracle
What is an INDEX ? Why are indexes used in a table ?

INDEX is a general term for an ORACLE / SQL feature used primarily to speed execution an impose
UNIQUENESS upon certain data. INDEX provides a faster access method to one table’s data than
doing a full table scan. There are several types of Indexes :
UNIQUE INDEX, COMPRESSED INDEX, CONCATENATED INDEX. An Index has an entry for each
value found in the table’s Indexed field(s) ( except those with a NULL value ) and pointer(s) to the
rows having that value.
Explain type of Join: Inner/Full outer/Explict

Inner Join: record is NOT required on both data flow of join component. Will bring only rows that
are matched by two flows on the specified key
Full outer: “record required” on both flow ports. This means all non-matching records from both
ports will be sent to output port
Explicit: You can specify ports if “record is required” thus making a outer join
Records that do not match will go to unused ports
Records that fail transform function will go to reject port
Explain In-memory in JOIN/SCAN/ROLLUP/SORT

Eliminates need for Data sorting before these components
Needs to specify MAX_CORE. Max core is the memory allotted to run the processing of the
component in main memory.
In-memory will be faster.
Driver should be port that has large number of records
Loads smaller dataset into memory so that dataset will not take up too much memory
Questions on Processing Mainframe files in abinitio.

Data files are generally read by copy-books
Data files can be Fixed Length/Variable Length in mainframe system
Reading variable length file can be tricky as the record size is variable. This can be achieved by
getting length information of each record by adding a suffix at the end of file URL. Like “
mvs://sdv4/abc.dat, varstring
Data Files generally are de-normalized file meaning a single file may contain Parentchild
relation records
These files requires Date Format changes/Removal Header, Trailer/ Type changes
Mainframe files are EBCDIC files. DML needs to prefix ebcdic to specify the format of data is
ebcdic. If nothing is specified AbInitio thinks the data is in ASCII format
Copy book can be converted to DML by cobol-to-dml utility
Copy books generally contain Nested and Vector type data records
Vector can be Fixed size or variable sized.
Variable size vector size generally depends on data element within the record
Read mainframe file

out::parse() =
begin

1
Confidential
let string(2) raw_blocksize;
let unsigned big endian integer(2) blocksize;
raw_blocksize = read_string(2);
blocksize = reinterpret_as(unsigned big endian integer(2), raw_blocksize);
while (is_defined(blocksize))
begin
read_string(2);
copy_data(blocksize - 4);
raw_blocksize = read_string(2);
blocksize = reinterpret_as(unsigned big endian integer(2), raw_blocksize);
end
out :: 0;
end;
metadata type = record
big endian integer(2) record_size;
big endian integer(2) extra_record_size;
ebcdic string(6) ext_base_record_id;
if ((ext_base_record_id=="01MST ")) mst_base_segment mst_part;
if ((ext_base_record_id=="02FLAP")) flap flap_part;
if ((ext_base_record_id=="HEADER")) header header_part;
if ((ext_base_record_id=="TRAILE")) trailer trailer_part;
end;

2
Confidential
Questions on JOB CONTROL:

Phase: Phase is a stage in graph that ensures all components are done before going to next
process
Phase ensures completion of group of components.
Generally, if we are creating lookup files in graph for the use with in the same graph, put lookup
file creation in Phase 0
All phases are check-pointed
A check-point is a place in graph where graph will stores all the data at that point for recovery
purposes. If graph fails, the graph can run from that point onwards removing need to run from the
beginning.
when a graph runs, it create .job file in the current directory and a .rec file.
If job is successful these files will be cleaned by Ab Initio
If job fails, you have option to rollback using m_rollback –d <rec_file>
If you run the job, it will start at last check point if you have specified in graph
If you want to start the job from the beginning, then remove the recovery file and
Run the Job again. This will start the job from the beginning.
Temporary work area: When executing a graph, abinito uses temporary space by default
/var/abinitio. To change this set AB_WORK_DIR variable
Questions on DATABASE ACCESS

Input table / Output table are generally can be used to access Database table
A dbc file is required to access any database. It contains all the connectivity
Information and userid/passwd. Connectivity info, like: DBMS, Instance of
Server, Database name, Network and any default options you might want to Specify
When loading a table, API mode is generally used to load small amounts of data.
API mode makes odbc call to insert data into Target table so the load time is faster as no need to
invoke vendor load utility program. API mode is good for small number of records as the load time
is higher than load utilities
Utility mode: Invokes vendor Load Utility program, so the start-up time may be high but loading is
faster than API mode. So this mode is good for Loading high volumes of data
EME and AIR Commands
AB_AIR_ROOT: specifies the root of repository (instance EME)

AB_HOME: specifies the default ab initio program directory
AIR Commands:
1. air ls /Projects/…/<project_name> display all objects in that project
2. air lock set /Projects/…/<object> sets lock on that object
3. air lock show –user <user_id> shows all locks issued by the user
4. air lock release –object <object> release lock on a particular object
5. air rm <object> removes object from EME
6. air dependency-analysis performs dependency analysis

3
Confidential
EME: Ab Initio repository for storing and managing AbInitio objects.

Enables version controlling
Enable parallel development with multiple developers working on a project
Enables Meta-data analysis on Technical/Business level
Ability to migrate objects between Dev/Test and production environments
Ability to track links between objects
Explain DEPEDENCY ANALYSIS:

Makes all objects in a project and across projects linked and a dependency between these
objects is stored in EME
Ensures all object referenced in graphs are available and valid
Enables to perform Impact Analysis. If a object is changed, we can perform impact of that change
across the projects. In terms of What graphs/objects are affected
Enables web-tracking of graphs/objects and useful in Technical and Business Meta-data analysis
Ensures all object referenced in graphs are available and valid

Enables to perform Impact Analysis. If a object is changed, we can perform impact of that change
across the projects. In terms of What graphs/objects are affected
Enables web-tracking of graphs/objects and useful in Technical and Business Meta-data analysis
Explain PSETS:
PSETS are required for generic graphs
PSETS define the instance of a generic graph while storing the Run-time parameter values used
at run time.
PSETS makes dependency analysis possible on generic graphs
PSETS also makes tracking of run time statistics of generic graphs
Common Functions:
Is_valid() : Checks the validity. Data has to match its datatype
Example: is_valid(“2005-02-31”) fails as Feb has no 31
Is_defined(): checks if the data element has value.
If_defined(“2005-02-31) succeeds
Is_error(): reverse effect of is_defined()
String_substring(str, start, end): returns the part of string

String_lrtrim(str) : Trims leading and trailing spaces
String_lpad(str, pad char, length) : Left Pads the string with specified character
Force_error() : Forces an error on a record
FILE MANAGEMENT
use m_ prefix to perform unix operations on Multifiles
eg: m_touch : creates empty multi-file
m_mv : Moves mutifile from one place to another
m_rm : To remove multi-file

4
Confidential
Why do you prefer lookup over join?

Whenever you need to process every record in the flow that would become the lookup file. In
addition to outer joins, this includes inner joins with flows connected to unused ports (anti-joins).
General rules, use lookup over Join if
You use only few attributes from the lookup data
The lookup result is a single value (i.e. there is no or exactly one matching record)
The same values are frequently looked up against
The lookup set is relatively small compared to the driving table
There is spare memory within the phases
What is DB Join Component? How does it differ from normal join

component? Join with DB joins records from the flow or flows connected to its in port with
records read directly from a database, and outputs new records containing data based on, or
calculated from, the joined records.
Join component joins 2 or more flows files.
How do you debug Ab Initio graph?

Check the log files, err files and reject files that are generated in varius component of the graph.
Breakpoints
Watchers
Isolation mode
Context-specific error reporting
What version of Ab Initio graph you are using? What flavor of Unix you are
running?
GDE 1.14, CO-OP 2.14, UNIX Solaris
Where do you save your files? In local machine or server?
Server
What is Effective Metadata Management?
If an xfr file changes, how do you figure out which dependent graphs have
been changed? (Ans - search for that xfr in ksh scripts)
What is multifile? Files partitioned using partition component
What m_ commands you have used?

m_ls, m_rm, m_mv,m_cp etc
What is difference between gather & merge?

Both are Departition component. Gather combines records from multiple flows arbitrarily and
merge maintain the sort order on the key.

5
Confidential
How does gather works internally?

Gather combines records as data is available in the flows arbitrarily. Useful for efficient collection
of data.
What is re-partition?
Repartition means changing the degree of parallelism
It also is used to change the grouping of records within the partitions of partitioned data
What is sandbox?
It’s a filesystem copy of EME project. And is the development area for a project.
What is is_valid, is_blank, is_defined?
How do you find top 5 employees basis of their salary (using graph only)?
(Ans- use rollup & increment a variable)
How do you terminate running of a graph? [Ans - force_error]
What is local lookup?
What's difference between AI_SORT_MAX_CORE &

AI_GRAPH_MAX_CORE? AI_GRAPH_MAX_CORE = (total memory -
memory used elsewhere)/(2 * number of partitions)
How do you improve performance of a graph?
What is .abinitiorc file?

If the Co>Operating System does not find a value for a configuration variable in the environment
of a process or in one of the files listed in AB_CONFIGURATION, it looks next at the user
configuration file on the run host. The settings in the user configuration file apply only to the
specific user's login.
What is the advantage of EME?
How do you find difference between two dates?

Date_difference_months
Date_difference_day

6
Confidential
How do you remove header & trailer from a graph?

Code a Conditional DML and use filter by expression or reformat.
how will u change your the values of parameter in higher env

declare them as private parameters
how to delete recovery files

m_rollback -d to delete recovery files
m_rollback to start from last check point
how to display co op system version UNIX

m_env -v to display co op system version
how to check function values

m_eval to check function
- If I have a project parameter set, and I also have a variable exported in the
start script. What takes more priority?
Variable exported in start script
- How do I make a graph stop in the middle based on an internal condition
How to run a graph from command line

Deploy it as a script, then run the script.
For example (after having deployed the graph my_graph to the run directory):
cd my_sandbox/run ./my_graph.kshRun the .mp (graph) file directly, using the air sandbox run
command.
For example:
cd my_sandbox/mp air sandbox run my_graph.mp

air sandbox run mygrh.mp
Explain m_eval, m_rollback and m_env?

m_eval to check function

7
Confidential
m_rollback -d to delete recovery files

m_rollback to start from last check point
m_env -v to display co op system version
how will u change your the values of parameter in higher env

declare them as private parameters
How will you write SQL to execute exchange partition?
ALTER TABLE ${AI_OWNER_SCHEMA}.STATEMENT_FACT

EXCHANGE PARTITION STMT_${DATA_DATE}
WITH TABLE
${AI_LOADER_SCHEMA}.EXCH_STATEMENT_FACT_${DATA_DATE_YYYYMM}
INCLUDING INDEXES WITHOUT VALIDATION;
How to include dml or xfr ?

include "~$DML/my_dml.dml";
How to test dbc file?

m_db test mydbc.dbc
Explain some m_db commands?

m_db unload dbc_file -select "select *"
m_db load dbc_file -insert "insert statement"
how can you graph 100 times?
1) autosys
2) wrapper script
3) start and end script at graph level
4) conditional dml
Explain how various partition works?

Round robin  arbitary fashion
part by key  key based, lets say emp_id is the column key, then Abinitio internally performs a
hash function on the column and mod’s it by depth of parallelism.

8
Confidential
Internal function performed by abinitio

Hash_funtion(emp_id)%8 where 8 is the depth of parallelism. This will result in values from 0 to
7 and that will be the partition flow for that record
Partition by expression  based on a expression
If(state==’CT’) 0
Else if(state==’VA’) 1
Else 2
Partition by range  we can specify a range of records values
Based on column value it goes to that partition
Lets say we have specified emp_id as 30
Then till reaching 30 all records will goto first flow rest all goes to next flow
Partition by load balance 
Abinitio internally identifies which partition is less utilized interms of memory and cpu, and sends
the data to that partition.
At times it could be costly because, Abinitio takes time to justify the load balancing.
Percentage  we can specify how many percent of records should go into the flow.
10 20 50
This will ensure first 10% in 0flow, next 20% in 1flow, and next 50% in 2flow and rest in 3flow.
Broadcast  if we want to send all the records of input to each of the flow then we use lookup,
this is as same as a lookup function but this is not im memory operation as a lookup
What is the difference between part by key and round robin

partition by rr will have 0 skew because of its arbitary fashion, where as key will not have 0 skew
because it’s a key based operation.
What are different de partition components

gather  gathers all records, does not bother about order
interleave  opposite to part by rr, in arbitary fashion gather
concat  gathers one flow by one flow
merge  uses merge sort, generally if the input is in sorted manner and if the order is needed to
be maintained, then we use merge component
What are the various ways of unloading the data from a table
unload using MFS set in layout  opens x number of sessions, where x is the degree of
parallelism  this is the fastest way, if the table is not heavily partitioned
unload using ablocal  database default set in layout and writing a ablocal_expr in parameters
will help in unloading the data faster if the table is lightly partitioned – best used in dimension
tables
unload using SERIAL  ordinary way of unloading, better to add oracle’s parallel hint inorder to
increase the speed – parallel hint should be based on dba’s approval based on system’s
performance on a regular basis
What are the various ways of loading the data to a table

api  loads using normal insert sql, since it is normal sql insert, it will be slower compared to
other options. It can be loaded in MFS or SERIAL. If we are using MFS and the table has bitmap
index chances of hanging or slow loading is possible – better used for target tables like dim/fact
utility  loads using sql loader utility. Abinitio internally writes the control file and all other sql
loader settings to load the data in a table. The table should not have index/constraints on it, while
we are loading in utility mode. – better used of staging tables
exchange partition loads
What is exchange partition loads
drop index on exch table
disable constraint on exch table

9
Confidential
load in utility mode on exch table

re-build index on exch table
enable constraints on exch table
exchange partition
What is the difference between rollup and scan

rollup groups the data based on a key, scan performs a cumulative summary based on a key
What is in memory option

any key based component needs memory to process the data for sorting
if we want the data to be processed in primary memory, so that we can skip sorting, then we
specify the in memory option in components like scan, rollup, join etc.
this helps us avoid sorting.
In join component, we can avoid this sorting by explicitly specifying which of the 2 input port
should be in memory.
The driving port will stay in secondary memory and the other port corresponding to the driving
port will go to primary memory.
does dedup component has in memory

no, dedup component does not have in memory option, where as we can do the dedup operation
using scan component and specify the in-memory in scan component. This needs extra piece of
coding. We generally use the traditional way only
filter by expression – filters the data based on a value
how to drop a column using filter by expression

Abinitio has a facility of dropping columns in filter by expression. Lets say there are 10 columns in
the input DML of FbyExp and there are 9 columns in output DML of the FbyExp, then Abinitio
drops that extra column. But this is like deviating from the Abinitio re-format operation. This
should not be encouraged
What is the protocol of files in Abinitio

file:${PATH}/<File_Name>.<Ext>
it can be file: or mfile: based on the depth of the file
What is the syntax of update query in update table
update <table_name>
set col1=:col1, col2=:col2 where col3=:col3
Explain Run sql and run program component?

run sql  generally used for DDL statements like drop, truncate, alter, create etc we can also use
insert and commit based on the requirements.
run program  runs any shell script or any other unix based operation which has to be executed
in OS level.
What is the difference between replicate and broad cast

replicate – replicates 1 flow into multiple flows
replicate sends the same input to multiple output but cannot perform like a partitioning component
– serial to serial or mf to mfs is only possible

0
Confidential
Where as broad cast sends the same input to multiple flows of ouput but can perform like a
partition component – serial to mfs, mfs to mfs etc.
What is assign keys component?

Assign keys component generate the surrogate key based on the primary key value.
Lets say account number is primary key, the assign keys generate a secondary key(surrogate
key) for the corresponding account number, this will ensure that next time when you run the
graph, for the same account number, the same key is generated.
Where as the normal sequence generator(next_in_seq(), oracle sequence) generates in
sequence order and does not generate same value which was generated earlier.
This will help us retain the surrogate key across all tables based on a primary key value.
What is DBC file?

DBC files: DBC files contains set of parameters which when filled, helps in getting connection to
the database.
Explain mfs commands

How to delete mfs files – m_rm
How to create mfs dir… m_mkdir
How to create mfs system … m_mkfs /main /main1 /main2 …
What is a max core parameter. It is the amount of memory which is needed to process a
component(Abinitio decides the default value, or we can also change it based on the number of
graphs running parallely at the same time)
How will you include a .xfr in a reformat to use a function defined in the .xfr
file
include "~$XFR/xyz.xfr";
use the function as same as any predefined function in Abinitio as this include statement.
What is the Formula for calculating a component's memory usage

base amount + lookups + max-core where base amount is an amount of memory usually equal to
about 7 MB
What is MAX_CORE and how to calculate it

The max-core parameter (for those components that have it) allows you to specify the maximum
amount of memory that can be allocated for the component
Too low
Giving max-core a value less than what the component needs to get the job done completely in
memory will result in the component writing more temporary data to disk at runtime.
Too high
Allocating too much memory with max-core can have various effects, depending on the
circumstances:
If max-core is set high enough that your graph's working set no longer fits in physical memory,
the computer will have to start paging simply to run the graph. This will certainly have an adverse
effect on the graph's performance, and on the performance of any other applications running at
the time.

1
Confidential
If max-core is set so high that the computer's swap space is exhausted, you can cause your own
graph, and possibly other applications, or even the computer itself, to fail. This has the worst
possible effect on performance
figure out maximum graph memory usage, minus max-core components and lookup files
estimate aavailable physical memory
Subtract graph req. memeory from the avaialble memory
take 1/2 or 3/4 of this for safety margin
divide by# of partitions to get max-core
components: max-core is per partition
AI_GRAPH_MAX_CORE = (total memory - memory used elsewhere)/(2 * number of partitions)
How many way parallel should you run?

graphs degree of parallelism should equal to the number of processers available on the OS
Also the below are factors to decide
Number of CPUs available
Number of independent disk mount points available
Amount of memory available
What is ABLOCAL and how can I use it to resolve failures when unloading
in parallel?
Answer
Some complex SQL statements contain grammar that is not recognized by the Ab Initio parser
when unloading in parallel. In this case you can use the ABLOCAL construct to prevent the
INPUT TABLE component from parsing the SQL (it will get passed through to the database). It
also specifies which table to use for the parallel clause.
What does the error message "Failed to allocate <n> bytes" mean?
This error message is generated when an Ab Initio process has exceeded its limit for some type
of memory allocation. The entire computer is out of swap space.
What does the error message "File table overflow" mean?

This error message indicates that the system-wide limit on open files has been exceeded. Either
there are too many processes running on the system, or the kernel configuration needs to be
changed.
This error message might occur if the maximum number of open files allowed on the machine is
set too low, or if max-core is set too low in the components that are processing large amounts of
data

2
Confidential
What does the error message "Mismatched straight flow" mean?

This error message appears when you have two components that are connected by a straight
flow and running at different levels of parallelism. A straight flow requires the depths — the
number of partitions in the layouts — to be the same for both the source and destination.
What does the error message "Too many open files" mean?
This error message occurs most commonly when the value of the max-core parameter of the
SORT component is set too low. In these cases, increasing the value of the max-core parameter
solves the problem.
What does the error message "Trouble writing to socket: No space left on
device" mean?
This error message means your work directory (AB_WORK_DIR) is full.
NOTE: Any jobs running when AB_WORK_DIR fills up are unrecoverable
Can I use a graph or a command to move my multifiles in a two-way

multifile system (MFS) to a four-way MFS?
Answer
The most convenient way to do this is to construct a graph to partition the data (or departition it if
you want to go from a four-way layout to a two-way layout).
How do I increase the timeout value for starting an Ab Initio process?

You can increase timeout values with the Ab Initio configuration variables
AB_STARTUP_TIMEOUT and AB_TELNET_TIMEOUT_SECONDS.
What is layout?
The layout of a component is a list of hostname/pathname locations that defines two things:
Where the component runs
How many ways parallel the component runs (the depth of parallelism of the component)
What is the difference between m_rollback and m_cleanup and when would
I use them?
m_rollback has the same effect as an automatic rollback — using the jobname.rec file, it rolls
back a job to the last completed checkpoint, or to the beginning if the job has not completed any
checkpoints. The m_cleanup commands are used when the jobname.rec file doesn't exist and
you want to remove temporary files and directories left by failed jobs.
Can I access different databases from the same graph — for example,
unload from Oracle and load into Teradata?
Yes, you can access different databases from within the same graph. Each component that
references a database needs its own database configuration file.

3
Confidential
How do I commit intermediate results in a database load?

You can commit intermediate results to a database load by creating a commit table in API mode.
When you rerun the graph, the load will skip over the previously committed records.
Why does the number of digits vary between database and DML types?
For example, the database type NUMBER(18) for Oracle databases becomes decimal(19) in
DML.
Databases use different conventions than DML for specifying the number of significant digits. For
example, the database type NUMBER(18) in Oracle databases allows 18 digits plus a sign.
However, because DML's decimal type requires a separate character for the sign, this type
requires a 19-character decimal
Can I make my graph conditional so that certain components do not run?

You can enter a condition statement on the Condition tab for a graph component. This statement
is an expression that evaluates to the string value for true or false; the GDE then evaluates the
expression at runtime. If the expression evaluates to true, the component or subgraph is
executed. If it is false, the component or subgraph is not executed, and is either removed
completely or replaced with a flow between two user-designated ports.
How do I check the status of a graph in the end script and perform an
appropriate action based on that status (such as doing cleanup or
sending an e-mail)?
You can verify the status of the graph in the end script using the variable $mpjret. If the graph
succeeded, the value of the variable is 0; otherwise, it is non-zero.
What are Sandbox Hidden file?

.air-project-parameters
Contains the parameter definitions of all the parameters within a sandbox. This file is maintained
by the GDE and the Ab Initio environment scripts. You edit the contents of this file through the
sandbox parameters editor in the GDE (choose Project > Edit Sandbox > Parameters). A
sandbox is defined by the existence of the .air-project-parameters file at its top level.
.sandbox-state file
This file is used internally by the GDE to maintain an accurate state in the sandbox vis-a-vis the
project it represents.
.air-lock
This file is used internally by the GDE as part of its implementation of file locking.
.project-start.ksh and .project-end.ksh
You can edit the contents of these files by choosing Project > Administrative > Edit Project > Start
Script or End Script. in the GDE.
These scripts can be used to execute Korn shell commands at the beginning and end of project
instantiation. Keep in mind that if these scripts are defined in any common projects, they too will
be executed. See "What happens when you run a graph" for details of execution order.
.project.ksh
This Korn shell script is regenerated by the GDE any time the sandbox parameters are edited. It
sets up in the runtime environment the parameters that are defined in the .air-project-parameters
and .air-sandbox-overrides file. Do not edit this file, as it will be regenerated by the GDE any time
a parameter definition is changed.
.air-sandbox-overrides

4
Confidential
This file exists only if you are using version 1.11 or a later version of the GDE. It contains the
user's private values for any parameters in .air-project-parameters that have the Private Value
flag set. It has the same format as the .air-project-parameters file.
Having these values in a separate file keeps them from being checked in with the other
parameter values. It also allows you to change the values without having to lock the parameters
file.
When you edit a value (in the project parameters editor) for a parameter that has the Private
Value flag checked, the value is stored in the .air-sandbox-overrides file rather than the .air-
project-parameters file.
.air-sandbox-overrides-stdenv
This file exists only if you are using a version of the GDE earlier than version 1.11. This file
contains the user's personal overrides of any parameters in .air-project-parameters. It has the
same format as the .air-project-parameters file.
You add a parameter to this file by specifying ";U" in a parameter's Description cell in the
sandbox parameter editor.
.air-sandbox-overrides-default
This file exists only if you are using a version of the GDE earlier than version 1.11. This file
contains the default values of any parameter that has been overridden by the user. It has the
same format as the .air-project-parameters file. Do not edit this file.
Scripts
ab_project_setup.ksh
By dotting in this script you can set up a sandbox environment for the project outside the sandbox
(there is no need to use it within the sandbox and you should not use it there). This allows
deployed scripts to be run outside their sandbox.
What are various abinitio configuration variable?

The three types of configuration files — the user configuration file, the system configuration file,
and the files listed in AB_CONFIGURATION — all take the same form. They can contain three
types of lines:
Comments
Include statements
Configuration variable values
User configuration file

If the Co>Operating System does not find a value for a configuration variable in the environment
of a process or in one of the files listed in AB_CONFIGURATION, it looks next at the user
configuration file on the run host. The settings in the user configuration file apply only to the
specific user's login.
System configuration file

If the Co>Operating System does not find a value for a configuration variable in the environment,
in one of the files listed in AB_CONFIGURATION, or in the user configuration file, it looks next at
the system configuration file.

5
Confidential
The system configuration file is named abinitiorc, and is usually set up by the Co>Operating
System administrator.
When the Co>Operating System runs a process that needs a value for a configuration variable, it
looks for that value in the following places, in the order listed:
The environment of the process

Files specified by the value of the AB_CONFIGURATION environment variable
The user configuration file
The system configuration file (usually set up by the system administrator)
Example of
AB_WORK_DIR: /export/appl/abinitio/ab_work_dir
AB_UMASK : 002
AB_OUTPUT_FILE_UMASK : 002
AB_DOT_WORK_CREATION_MODE : 0775
AB_AGENT_COUNT : 1
AB_CONNECTION : ssh
AB_SSH_NEEDS_PASSWORD : false
AB_SSH : /usr/SYSADM/bin/sshi
MFS_MOUNT_POINTS:/export/appl
AB_AIR_ROOT://dsysadm-etl01/export/appl/abinitio/repository/repo
AB_HOME
AB_ Configuration Variables
ab_job
Specifies the base name of the job file for the current job
If you have not specified a value for the AB_JOB configuration variable, the GDE bases the
default value for AB_JOB on the name of the graph when it generates the script for that graph.
This means that the default value for AB_JOB for a particular graph will always be the same
AB_JOB_PREFIX
To avoid problems with multiple instances of a graph being run concurrently in the same
directory, you can make the AB_JOB value unique by exporting the AB_JOB_PREFIX
configuration variable. For example, if the AB_JOB_PREFIX value is ABC, use the following
command:
export AB_JOB_PREFIX=ABC
AB_WORK_DIR
is a configuration variable whose value is the directory used for Ab Initio scratch space and for
maintaining information related to graph recovery.
AB_DATA
the configuration variable AB_DATA_DIR provides a directory for temporary data files that lack a
specific, writable path in their layouts, and thus it overrides the behavior above. AB_DATA

6
Confidential
Utility Commands
Catalog management commands
The Co>Operating System catalog management commands provide a means of working with
catalogs of lookup files. A catalog is a list of lookup files. Each lookup file has:
A unique name
A primary key
The URL of a data file
The URL of the record format associated with the data file
The catalog management commands are:

 m_catalog_add
 m_catalog_delete
 m_catalog_export
 m_catalog_import
 m_lscatalog
 m_mkcatalog
 m_rmcatalog
Cleanup commands
The Co>Operating System cleanup commands do the following:

7
Confidential
 Find any Ab Initio temporary files and directories on a computer

 Display the amount of disk space they use
 Remove them
The cleanup commands are:

 m_cleanup
 m_cleanup_du
 m_cleanup_rm
Example of m_cleanup_du
find /var/abinitio/ -name "*.[hn]lg" -user e3umjk \ -mtime +7 | xargs m_cleanup_du -j
Data commands
The Co>Operating System data utility commands are:
 m_dump
This DML utility prints information about data records, their record formats, and the
evaluations of expressions
Printing particular records
This example prints five data records, numbers 5, 6, 7, 8, and 9:
$ m_dump xyz.dml xyz.dat -start 5 -end 9
 m_eval
This DML utility evaluates DML expressions and displays their derived types. Use it to
quickly test and evaluate simple, multiple, cast, and other expressions that you want to
use in a graph.
Example
m_eval '(date("YYYYMMDD")) (today() - 10)'
m_eval '((decimal(5)) 42)' "ooo42"
File management commands

The Co>Operating System file management commands allow you to manage a set of distributed
files in a multifile system as a single entity.
The file management utility commands are:
 m_chmod
 m_cp
 m_df
 m_du
 m_expand
 m_ls
 m_mkdir
 m_mkfs
 m_mv
 m_rm
 m_rmdir
 m_rmfs
 m_touch
 XXurl
Example of m_mkfs
m_mkfs //pluto.us.com/usr/ed/mfs3 \
//pear.us.com/p/mfs3-0 \
//pear.us.com/q/mfs3-1 \

8
Confidential
//plum.us.com/D:/mfs3-2
Example of m_mkdir
m_mkdir //pluto.us.com/usr/ed/mfs3/cust
XXurl
Manipulates filenames, file paths, URLs, and parts of URLs in various ways that might be useful
for graph developers and users.
Job management commands

The Co>Operating System job management commands are:
 m_kill - Kills a running job.
 m_report_tracking - Produces the same type of tracking reports the Co>Operating
System generates while a graph is running, using raw tracking data generated during the
execution of a graph and saved to a file.
Example
% AB_REPORT="flows processes" m_report_tracking graph2.trackingThe following
command produces the same tracking report with a five-second delay between sections:
 m_rollback - Rolls back a job to the last completed checkpoint, or to the beginning if the
job has not completed any checkpoints. Use only for batch graphs, except when
specifying the -d option.
 m_shminfo
 m_shutdown - to stop a continuous graph
 m_change_tracking - reporting interval for a graph's tracking reports dynamically, while it
is running.
Other commands
Other Co>Operating System commands are:
m_cs : shows the names of the character sets.
m_password - The m_password command uses AES encryption with a 256-bit key.
m_env - general-purpose utility for obtaining information about Ab Initio configuration variable
settings in an environment.
m_queue
m_view_errors
m_ftp
m_db commands
m_db convert
------------
Converts .cfg files to .dbc formats and .dbc to .cfg formats.
m_db convert -cfg <cfg-filename>
m_db convert -dbc <dbc-filename>
m_db create
------------
Creates the database table with DDL equivalent to the metadata-string.
If the table already exists, an error will be returned unless the -existence_ok flag is supplied.
m_db create <dbc-name> -dml '<metadata-string>' -table <tablename> [-existence_ok]
m_db create <dbc-name> -dml_file <dml-file> -table <tablename> [-existence_ok]
m_db create_commit_table
------------------------
Creates a table used internally by abinitio for checkpointing information.

9
Confidential
m_db create_commit_table <dbc-name> -table <tablename> -index <indexname> [-drop] [-

print_ddl]
Example: m_db create_commit_table mydb.dbc -table ab_ckpt_table -index ab_ckpt_index -drop
m_db describe
-------------
m_db describe -configs
Lists brief descriptions of all dbc files found.
m_db describe -oledb_providers
Describes OLE DB Providers for Windows NT.
m_db describe -oledb_data_sources
Describes OLE DB data sources for Windows NT.
m_db describe -supported_databases
Describes databases supported by this package.
m_db env
--------
Prints the value of environment variables used by this database vendor.
m_db env <dbc-name>
m_db find
----------
Shows which dbc file will be used for this dbc file name.
m_db find <dbc-name>
m_db genctl
------------
Generates a load control file based on the DML supplied.
m_db genctl <dbc-name> [<options>] -dml '<metadata-string>' -table <tablename> -component
<name>
m_db genctl <dbc-name> -dml_file <dml-filename> -table <tablename> -component <name>
<options>: [-unique_identifier <uid>] [-dsname <dsname>]
<options>: [-utproc <utproc>]
** Valid component names: db-db2uss-load
m_db genddl
------------
Generates database table creation DDL equivalent to the metadata-string.
m_db genddl <dbc-name> -dml '<metadata-string>' -table <tablename>
m_db genddl <dbc-name> -dml_file <dml-filename> -table <tablename>
m_db gendml
-----------
Generates appropriate metadata for that table/insert/select or expression.
m_db gendml <dbc-name> [<options>] -table <tablename>
m_db gendml <dbc-name> [<options>] -select '<sql-select-statement>'
m_db gendml <dbc-name> [<options>] -insert '<sql-insert-statement>'
m_db gendml <dbc-name> [<options>] -tables <texpr> [<options2>]
m_db gendml <dbc-name> [<options>] -views <texpr> [<options2>]
m_db gendml <dbc-name> [<options>] -aliases <texpr> [<options2>]
m_db gendml <dbc-name> [<options>] -all_objects <texpr> [<options2>]
m_db load
----------

0
Confidential
Loads data to a database table or insert statement from stdin or a file or string.
m_db load <dbc-name> -dml '<metadata-string>' -table <tablename> [-data <string> || -data_file
<filename>]
m_db load <dbc-name> -dml_file <dml-filename> -table <tablename> [-data <string> || -data_file
<filename>]
m_db load <dbc-name> -dml '<metadata-string>' -insert '<sql-insert-statement>' [-data <string> ||
-data_file <filename>]
m_db load <dbc-name> -dml_file <dml-filename> -insert '<sql-insert-statement>' [-data <string> ||
-data_file <filename>]
m_db print
-----------
Prints out the configuration settings for this config file.
m_db print <dbc-name> -value <tagname>
m_db print <dbc-name> -all [-verbose]
m_db run
-------------
Runs SQL against the database.
m_db run <dbc-name> -sql '<sql-string>'
Examples: m_db run mydb.dbc -sql 'create table foo (a char(5))'

m_db test
--------------
m_db test <dbc-name>
Run diagnostic tests against your database for this dbc file.
m_db test <dbc-name> [<options>]
Run tests which exercise queries against the Oracle catalog.
Examples: m_db test mydb.dbc
m_db truncate
--------------
Truncates the given table.
m_db truncate <dbc-name> -table <tablename>
m_db unload
-------------
Unloads data from database table, select or expression to stdout.
m_db unload <dbc-name> [<options>] -table <tablename>
m_db unload <dbc-name> [<options>] -select '<sql-select-statement>'
m_db unload <dbc-name> [<options>] -tables <texpr> [<options2>]
m_db unload <dbc-name> [<options>] -views <texpr> [<options2>]
m_db unload <dbc-name> [<options>] -aliases <texpr> [<options2>]
m_db unload <dbc-name> [<options>] -all_objects <texpr> [<options2>]
Examples: m_db unload mydb.dbc -table 'fred.mytable'
m_db version
--------------
Tries to get the database version from the database if it
can, and if not, it will return the version from the .dbc file.
m_db version <dbc-name>

1
Confidential
Suggest some methods to improve the Performance of the

graphs?
 Avoid Over-reliance on databases: For example, operations involving heavy computation
are usually better done with components, in the graph, rather than in a database. Sorting
will almost always be faster when you use the SORT component rather than sorting in
the database.
 Paging (or having very little free physical memory, which means you're close to paging) is
result of phases that have too many components trying to run at once or too much data
parallelism
 In Case of low volume processing go serial. When this is true, the graph's startup time
will be disproportionately large in relation to the actual run time. Can the application
process more data per run? Maybe it could use READ MULTIPLE FILES, for example, to
read many little files per run, instead of running so many times. ADHOC MULTIPLE
FILES
 Bad placement of phase breaks: Wherever a phase break occurs in a graph, the data in
the flow is written to disk; it is then read back into memory at the beginning of the next
phase. For example, putting a phase break just before a FILTER BY EXPRESSION is
probably a bad idea: the size of the data is probably going to be reduced by the
component, so why write it all to disk just before that happens?
 Too many sorts: The SORT component breaks pipeline parallelism and causes additional
disk I/O to happen.

2
Confidential
Controlling resource consumption

 Controlling disk consumption
 Making it a multifile and splitting its partitions among several disks
 Breaking the data up into smaller separate datasets
 Compressing the file
Controlling memory consumption

 Reducing the number of components in the graph's largest phase
 The peak memory consumption of a graph is determined by the memory consumption of
its "biggest" phase
 Reducing the level of data parallelism
 Each additional partition of a component requires space for the additional component
processes in memory; reducing the number of partitions reduces the number of
component processes.
 Reducing the specified value of the max-core parameter in any component that has this
parameter .The max-core parameter specifies the amount of memory you want the
component to have accessible for doing in-memory operations such as sorts. Reducing
this value is an obvious way to reduce a component's memory consumption.
 Taking advantage of sorted data by using components with sorted-input set to "Input
must be sorted or grouped"
Controlling CPU use

 Using the THROTTLE component
 This component allows you to slow down ("throttle") the rate at which records are
processed through a flow. The CPUs get less work at a time from the graph, and more
time to spend on other tasks.
Reducing data parallelism

 If you have x number of processors available, reducing data parallelism to less than x will
also reduce CPU demand by the number of partitions you have eliminated.
Tracking Performance
 Track the graph performance thru GDE, system utlities
Manage Graph Memory

 Phase breaks: Graph phases run sequentially, so you can reduce a graph's memory
demands by inserting phase breaks in it. Instead of a monolithic graph that requires 300
MB of memory for its working set, you can break it up into two phases that require only
150 MB each, so that the graph never requires more than 150 MB of physical memory at
any time.
 For the components that have max-core, the values of components' max-core
parameters. By setting a component's max-core lower, you reduce the size of its working
set (and hence its demand on the computer's available memory). If the component in fact
needs more space than max-core is set to, it uses disk space instead, writing temporary
files as needed.
General
 For large volume of data Go parallel as soon as possible.
 Once data is partitioned, do not bring down to serial, then partition back to parallel.
Repartition instead.
 Try to avoid landing multiple copies of the same data to disk in a phase break after a
Replicate component.

3
Confidential
 In general, completely fixed format records take less CPU to process than variable length
records.
 Drop fields that aren’t needed as soon as possible. This is often done “for free” in
transform components.
 “Flatten” out conditional fields as soon as possible.Often, conditional fields are used to
store multiple record types in a single format. Split these into separate processing
streams as soon as possible. Join them back at the end of the graph, if required.
 Join, Rollup, and Scan can operate either in-memory or on sorted data.If your data does
not fit in memory, and you need to do multiple joins or rollups on the same key, it will be
most efficient to sort once and set the rollups and joins to expect sorted input.
 Best performance will be gained when components have enough max-core to avoid
spilling to disk.If max-core can’t accommodate components’ memory needs, data will spill
to disk.
 Then you need disk space to accommodate the entire component memory requirement.
 Use Rollup or Filter by Expression as soon as possible if they will reduce the number of
records being processed.
 Join as early as possible if this will reduce the number of records being processed.
 Join as late as possible if this will increase the number of records or the width of records
being processed.
 When joining a very small dataset to a very large dataset, it may be more efficient to
broadcast the small dataset or use it as a lookup file rather than repartition and re-sort
the large dataset.
 Lookup Files benefit from fixed record formats.
 When creating lookup files from source data, drop any fields which aren’t keys or lookup
results.
 Use lookup multifiles and lookup_local() for very large datasets.
 Reducing the number of components may save startup costs, but may sometimes make
a graph harder to understand.
What are various versions of GDE you have worked on and what
are the major differences?
1.12,1.13,1.14 co op 2.12,2.13 and 2.14
1.12
Rollup capabilities were enhanced : you
You can now define local variables, statements, key_change, input_select, output_select and still
use aggregation functions
Non phased watchers

In previous versions, when you inserted a watcher on a flow, the GDE incremented the phases of
the downstream components of the flow. This is because a watcher behaved like an intermediate
file, which has an implicit phase break
Replay
The GDE allows you to save the tracking information of a run to a file, and later play it back,
optionally at a higher speed

4
Confidential
Support for some more datatypes packed unsigned decimal
1.13
Enabling the Debugger allows you to insert breakpoints into your transform functions.
A breakpoint is a position in a transform function at which the execution of the graph

stops so you can (1) examine the values and metadata of input records, output records,
and variables; and (2) evaluate expressions.
EME support for branching
1.14
Checkin is simpler and faster. In GDE Version 1.14, translation of graphs for analysis
purposes is no longer performed within the GDE; the options for controlling analysis
have been removed.
Sandbox view
Scenario based questions

Explain about your earlier project
You can explain JCP project (advantages and the need for keeping common ETL – reason being
should be stated as vendor’s maintence cost for each graph being high)
Explain the concept of wkly del flag in our dimension tables.
Explain about time series in performance indicators
What data model your earlier project looks like

Take example of transaction/statement….those are like snow flake schema
Ours is a top down approach ---- CDCI to CMAP/OTS/CTS
ODS, have you created one?

We don’t have a separate database for ODS, we create views on top of staging tables in the DW
and provide access to users (restricted access)
Read the 4 table oracle question, how to do the same in Abinitio

Read the 2 table oracle question, how to do the same in Abinitio
What are the common tools/components which you have created in Abinitio
Explain about OTS load graphs
Audit entries components (explain the job dependency and job audit table)
During exchange partition what all things are to be table care?

Drop index, disable constraint, load in direct enable constraint, create index and the do exchange
partition
The exchange table should have having same structure as that of target table, however the
partition of the exchange table should be same as sub partition of main table

5
Confidential
Explain some transformation process implemented by you?
Creating csv list of codes

Following priority from various sources
Generating dimension key
Using MIN , AVG MAX score functions in rollup
Creating summary tables.
Creating range dimension records
Creating master files refered by other process.
(number_of_partitions()*next_in_sequence()+this_partition())-number_of_partitions()+ 1 +
temp_today_date_decimal;

6

Ab Initio Questionnaire Final

Uploaded by

Copyright:

Available Formats

Ab Initio Questionnaire Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ab Initio Questionnaire Final

Uploaded by

Copyright:

Available Formats

What are the main components of Abinitio?

What are the main components of Abinitio?

What are some common transformation processes implemented in Abinitio?

What are some common transformation processes implemented in Abinitio?

Confidential

What is Abinitio .......................................................... 6

Confidential - Internal Distribution 1

Confidential - Internal Distribution 2

How do we enforce the Reusability Feature in Ab Initio? .............................................28

Confidential - Internal Distribution 3

What is is_valid, is_blank, is_defined? .........................................................................36

What is .abinitiorc file? .................................................................................................36

How do you remove header & trailer from a graph? ....................................................37

Confidential - Internal Distribution 4

How many way parallel should you run? ......................................................................42

Confidential - Internal Distribution 5

What ways Abinitio is different from other ETL tools.

Confidential - Internal Distribution 6

Explain Ab Initio Architecture?

Development Environments Ab Initio

The Ab Initio Co>Operating® System

Native Operating System (Unix, Windows, OS/390)

Functions of Co-Operating System?

Confidential - Internal Distribution 7

What is an EME Data store?

What is the different file extensions used in Ab Initio?

What is the difference between .cfg and .dbc files?

Confidential - Internal Distribution 8

UPDATE DB TABLE executes UPDATE or INSERTS statements in embedded SQL format to

.dbc: Database configuration files

What are Datasets and name the different datasets?

What are Database components?

There are seven database components:

Confidential - Internal Distribution 9

Continuous Update Table is used in a continuous flow to execute UPDATE, INSERT or

Explain Join with DB component?

Confidential - Internal Distribution 1

If no rows are returned by the select_sql:

If execute_on_miss is specified, the execute_on_miss SQL statement (INSERT or UPDATE) is

If execute_on_miss is not specified, the value of match_required determines what happens:

Explain Aggregate and when should it be used?

Explain Normalize with example?

If temporary_type is defined, Normalize executes the initialize transform function to output an

Confidential - Internal Distribution 1

Some functions used

Explain Denormalize sorted with example?

Denormalize Sorted requires grouped input

Specifying transform functions for DENORMALIZE SORTED

Confidential - Internal Distribution 1

Some of the components of ABINITIO

Scan: A multi-stage component that has initialize() finalize(), scan()

Confidential - Internal Distribution 1

What are different partitioning components?

Difference between Sandbox and a Project in a EME?

What do you mean by Parallel Computing?

What are Record format metadata ?

Confidential - Internal Distribution 1

An example of a record format is:

What is the order of Evaluation of graph and project parameters?