Informatica
Informatica
Informatica
Informatica
Introduction
• ETL, which stands for extract, transform and load, is a data integration
process that combines data from multiple data sources into a single,
consistent data store that is loaded into a data Warehouses or other
target system.
Extract
During data extraction, raw data is copied or exported from source locations to a staging area.
Data management teams can extract data from a variety of data sources, which can be structured
or unstructured.
Transform
In the staging area, the raw data undergoes data processing. Here, the data is transformed and
consolidated for its intended analytical use case
• Load
In this last step, the transformed data is moved from the staging area into a target data
warehouse. Typically, this involves an initial loading of all data, followed by periodic loading of
incremental data changes and, less often, full refreshes to erase and replace data in the
warehouse. For most organizations that use ETL, the process is automated, well-defined,
continuous and batch-driven. Typically, ETL takes place during off-hours when traffic on the source
systems and the data warehouse is at its lowest.
Informatica
• Informatica is a Software development company, which offers data
integration products. It offers products for ETL, data masking, data
Quality, data replica, data virtualization, master data management, etc.
Informatica Powercenter ETL/Data Integration tool is the most widely
used tool and in the common term when we say Informatica, it refers
to the Informatica PowerCenter tool for ETL.
Components:
Informatica Power Center
Informatica MDM
Informatica IDQ
Informatica IDQ Informatica MDM Informatica Powecenter
Informatica Master Data Management is well
suited for identifying the best version of the PowerCenter is the top of the line tool for
truth for master data. It provides a means to migrating data from one source to another. It
IDQ was great at helping us cleanse our data easily set trust and validation of multiple excels at filtering and modifying data before
and find duplicate information source systems to fine tune the resultant deploying it into a target environment, usually
master data. It is by far a leader in the master for end user use.
data management space.
Address Doctor: Automatically suggests and The tool is excellent at pulling data from
corrects the customer address based on the Allows you to implement detailed business source systems, modifying it to fit a target
global address database. Thus ensuring the rules to apply to your data. system, and then pushing the data accordingly.
correct physical address of the customer
IDQ can be merged with Powercenter very fast Works well with any ETL tool, but especially PowerCenter is very adept at applying data
and adequately well with Informatica PowerCenter. filters and ‘business rules’ to source data
before pushing the results to the end user.
Its high time Informatica must integrate all Address validation sometimes leads to It is a good, efficient product. Lots of
tools like powercenter, IDQ, IDE , incorrect results. This is my biggest issue with developers are easily available in the market
powerexchange into one which will simplify the product. to perform the work.
development and maintenance
Informatica Data Quality provides more Mapping within the tool can be difficult. But
accuracy, adaptability, compatibility, and is that is slated for upgrade with the next PowerCenter provides lots of features to
performance-oriented. Integration with other version. implement mapping rules.
applications is easy to achieve
End user experience improved and reporting Informatica MDM allowed for a faster stand- Very easy to design, develop and maintain ETL
needs increased with quality up of our Salesforce environment. code.
There is no real interface between
Informatica MDM provided a means to keep PowerCenter and programs that manage the
our multiple source systems synchronized. encapsulation of password for System
Accounts.
Skills For Informatica
• ETL
• Data Warehousing
• Informatica PowerCenter
• Oracle
• SQL
• PL/SQL
• Unix Shell Scripting
• Perl/Python/Java
• Control-M/Autosys
• Databases - Teradata, Netezza, DB2, Oracle, SQL Server, Sybase etc.
• Big Data - Hadoop, Hive, Sqoop etc.