Lab Workbook-Informatica EDC Migrating To MSFT AZURE DWM
Lab Workbook-Informatica EDC Migrating To MSFT AZURE DWM
Lab Workbook-Informatica EDC Migrating To MSFT AZURE DWM
Hands on Lab
EDC UI:
Password: pocInfaAdmin@2018
Page | 1
Enterprise Data Catalog – Hands on Lab
From the search results displayed, you can sort the results based on the asset name, the relevance, or the
system attributes or the custom attributes. You can use the search filters displayed to filter the search
results and view additional details for the displayed assets. After searching and finding the required
asset, you can annotate and enrich the required assets with custom attributes.
1. Log in to Enterprise Data Catalog using the following credentials: Username: edc-user /
Password: pocInfaAdmin@2018
2. Search for “Web Customers”
3. Filter search results by “Asset Type: Table”
4. Click on “WEB_CUST”. This will take you to the overview page of the asset. You can see the
description and, if you have the permission, you can edit the description and the custom
attributes such as people, domains, classification. You will also notice sample columns of the
table.
Page | 2
Enterprise Data Catalog – Hands on Lab
The Columns section displays the following details for each column in the asset:
o Column name
o Business Title, which allows you to assign business glossary resources to the asset
o Data domains that a user has assigned to the column, that are inferred from profile
results, or that are inferred from similar assets.
o The percentage of null, distinct, and non-distinct values that are calculated from profile
results.
o The source data type if a data type is defined in the data source.
o Data types that are inferred from profile results.
1. In this view, you can understand percentages of NULL, DISTINCT and NON_DISTINCT values in
each column.
2. Within the column view you can get detailed information about Column Value Distribution, Data
Patterns, Inferred Data Types and Auto inferred Data Domains.
3. Drill down into any column, for ex: “ADDRESS” to get more details about the column.
Page | 3
Enterprise Data Catalog – Hands on Lab
As you can see, we can view data content which may contain sensitive content. Access to data content can be
restricted via user privileges. The user you are logged with has privileges to data content. We have created
another user without privileges to data content.
Logout from Catalog and connect with user "no_access_to_sensitive_data" with password
"no_access_to_sensitive_data".
Page | 4
Enterprise Data Catalog – Hands on Lab
You get a clear message informing you have no access to value frequencies for this attribute as it may contain
sensitive data.
A data domain is a predefined or user-defined asset based on data values or a column or field name.
Some examples of data domains are Social Security number, account status, IP address, and UPC code.
Assigning a data domain to a column or field makes the asset easier to identify and understand.
You can organize data domains that apply to similar types of data in data domain groups. For example,
the Bank Account data domain group might contain data domains such as Account Status, Account
Number, and Credit Card Number.
Enterprise Data Catalog infers data domains for columns and fields based on profile results. It also infers
data domains for assets based on the data domains that are assigned to similar assets. You can accept
and reject assets for a data domain.
When you curate a data domain, you make the data domain more accurately reflect the type of data that
belongs in the data domain. Curating a data domain also makes Enterprise Data Catalog more accurate
when it infers data domains for similar assets.
Page | 5
Enterprise Data Catalog – Hands on Lab
5. Let’s assign now the right domain. To assign or change the Data Domain assignments, click the
7. Click “ok”
8. Now you will notice “CustomerID” is assigned as a data domain to CUSTOMER_ID
Page | 6
Enterprise Data Catalog – Hands on Lab
Lineage and impact describes the end-to-end data flow of data for an asset. The data flow for an asset
has two components, the lineage and the impact.
Lineage describes the flow of data from the origins to an asset. Lineage shows you where the data for an
asset comes from and which assets affect the asset that you are studying. When you view an asset in a
lineage and impact diagram, the lineage includes the asset that you are viewing and all of the upstream
assets in the data flow.
Impact describes the flow of data from an asset to the destinations. Impact shows you where the data is
used and which assets might be affected if you change the asset that you are studying. When you view an
asset in a lineage and impact diagram, the impact includes the asset that you are viewing and all of the
downstream assets in the data flow.
Objectives
Duration: 10 Minutes
5. Use the Lineage Sliders to add all levels in the lineage diagram.
Page | 7
Enterprise Data Catalog – Hands on Lab
6. Alternatively, you can also expand individual dotted links by clicking on the (+) icon on the link to
only expand a path.
8. In the middle of the lineage diagram, you will see customer order detail (you can hover over it).
Click the “Customer_Order_Details” table to see lineage for the asset. Now click on the plus icon
on “Customer Order Detail” and search for “Last”, then select the “NAME_LAST” column. Click on
“OK”. This step will expand all the columns in the tables in the lineage diagram that directly affect
this metric
Page | 8
Enterprise Data Catalog – Hands on Lab
9. Click (X) to view the lineage diagram. Notice that the lineage diagram shows the column name
NAME_LAST from upstream to downstream applications.
10. Click on LASTNAME column in the resource HermesCRM This will take you to the Lineage and
Impact diagram of the LASTNAME column
11. Click the ”Overview” tab to view value frequencies as well as a new feature, similar columns.
In an organization, a column name such as Last Name might exist across multiple data sources.
To identify the data sources that contain such columns, you can use column similarity in
Enterprise Data Catalog. It uses unsupervised clustering which is a machine learning technique
to identify the similar columns. Enterprise Data Catalog performs unsupervised clustering across
multiple data sources based on several factors, such as data overlap, distinct value match,
pattern match, and name match. It then assigns an overall similarity score as well as the match
likelihood for each factor.
You can click on any of the columns to view metadata associated with it.
Page | 9
Enterprise Data Catalog – Hands on Lab
14. Click on icon to show transformation details, and extend lineage to get full lineage as
below
Each orange bubble represents detailed transformation available coming from PowerCenter,
Informatica Cloud, Informatica Big Data Management or even Cloudera Navigator.
The icon indicates that a transformation has been applied on the attributes. Move your mouse
over or click on it to get more details on functions used
Page | 10
Enterprise Data Catalog – Hands on Lab
When you open an asset in the Relationships view, the selected asset appears at the center of the
Relationships view, and the related assets appear around the selected asset.
The Relationships view shows different circles that represent a specific asset or a group of assets.
By default, the selected asset is highlighted in blue and the related assets of the same type are
represented as small icons within the asset type circle.
The assets that you see in the Relationships view vary based on the selected asset type. For
example, if you select a table, the Relationships view displays related assets such as data domains,
business term, reports, and synonyms.
1. Continuing from the previous lab, close the impact summary to return to the lineage diagram of
the “CUSTOMER_ORDER_DETAILS” report.
2. Click on “Hermes / CUSTOMER” table, which takes you the lineage diagram of the table.
Optionally, you can click on asset details, to view metadata & statistics, of the table.
3. Click on relationships
Page | 11
Enterprise Data Catalog – Hands on Lab
The Relationship tab displays the relationship in a diagram that shows how the selected data
asset is related to other data assets. You can see how Hermes Customer table is associated to
other views, data domains, reports and users.
Overview
In this lesson, you will learn how the Catalog automatically classifies data based on known domains. You
will also learn how you can annotate datasets to further classify data assets along multiple dimensions.
Objectives
• Domain Overview
Page | 12
Enterprise Data Catalog – Hands on Lab
1. Click the Settings icon, then select Application Configuration on the top right bar.
2. The Application Configuration dialog box appears. Navigate to the “System Attributes” tab
3. Search for “Datatype”.
4. For Datatype that originates from relational objects(“com.infa.ldm.relational”), enable “Display in
Search Results”
7. Enable “Allow Searching”, “Allow Filtering” and “Display in Search Results” for the custom
attribute “Application Source” and “Data Migration to Azure”.
Page | 13
Enterprise Data Catalog – Hands on Lab
Note: it may take a few minutes for the attributes to appear for objects.
You can annotate and enrich the required assets with custom attributes. Enriching the assets with the
attributes makes the data asset easily discoverable. In this case, you are enriching the asset with the
attributes to make it easily discoverable data for migration to Azure DWH.
1. Search for search for “staging customer data from oracle data warehouse”
2. Under the asset overview page, scroll to down to the Custom Attributes pane, click on the Edit
Icon (Pencil Icon).
Page | 14
Enterprise Data Catalog – Hands on Lab
8. In the search bar, type: “staging customer data from oracle data warehouse”
9. Click on “show details”. Notice, the results, based on the classification you identified:
Note, in most cases, you will have thousands of tables to migrate. To efficiently tag and identify
the assets for data migration, Enterprise Data Catalog allows the use of rest api to update bulk
objects and retrieve objects with associated tags. (Please note, we will not have a hands-on lab
on Rest API, however, if you have questions please reach out to the instructor).
In this optional lesson, you will learn how to use the new drill down lineage views in the Catalog to
visualize Azure ecosystem objects. You will use the same techniques in lesson #3, “Lineage and Impact
Analysis”. As we are in a lab environment, we have already ingested metadata from Azure & Informatica
Cloud Data Integration. (In the next lab, you will learn how to move data to cloud using Intelligent Cloud
Services).
Page | 15
Enterprise Data Catalog – Hands on Lab
Notice that the Azure dimension table was loaded from a flat file.
6. Use the Lineage Sliders to add all levels in the lineage diagram.
On expanding the lineage, you will notice that the flat file was loaded to an oracle staging table,
then loaded to Azure Blob and the finally to an Azure SQL DW Table.
Page | 16
Enterprise Data Catalog – Hands on Lab
Once you migrated data into Azure SQL DW using Informatica cloud, you can capture the
metadata and lineage in Enterprise Data Catalog. (In the next lab, you will learn how to move data
to cloud using Intelligent Cloud Services).
7. Click on icon to show transformation details, and extend lineage to get full lineage as
below
8. You will see an orange bubble next to the DIM_RETAIL_STORE table. Click on the bubble to show
detailed transformation used to move data from the Oracle table to Azure SQL DW.
10. Select ALL the columns. Click on “OK”. This step will expand all the columns in the tables in the
lineage diagram that directly affect this metric.
11. You close this detailed lineage diagram to return to the dimension table lineage
You have now seen how metadata is captured from from Azure and Informatica Cloud Data
Integration, and how the catalog will be build lineage can to show how an architect can track their
data migration to Azure.
Page | 17