ADF Workshop by Amit Navgire

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

ADF Workshop by Amit Navgire

19 July 2021 09:28 PM

Day 1
Day 2

Course Link: https://www.amitnavgire.com/azure-data-factory-5-days-


workshop/
How to reach Amit Navgire: [email protected]
-------------------------------------------------------------

Day 1
Topics covered:
- Cloud basics
- Create free azure account
- Data Factory
○ Start Azure
○ Start ADF
○ Basics of ADF

Day 2
Topics covered:
- Continuation of Basics of ADF
○ More about Integration Runtime installation, configuration. (first 1hr)
○ Basic on Data Flow & Control Flow (Just after break)
- How to create Azure SQL Database
- How to create Azure Datalake
- First Task: copy data from database to datalake

-------------------------------------------------------------

Data Factory :-
- It is Data integration ETL service
- It orchestrates the overall workflow execution
- PaaS

Azure Page 1
Start Azure :-
- Go to https://portal.azure.com and it is homepage.

Azure Page 2
- Search a particular service from search bar

- Create a resource for IaaS and PaaS services. (100+ services)

Azure Page 3
Create a resource for IaaS and PaaS services. (100+ services)

- Go to "Subscriptions" to check all the subscriptions.

Azure Page 4
- When changing your subscription to pay as you go, then go to "Cost
Management + Billing" to compare cost for each subscription.
- If you are not using a resource, then delete it. Otherwise pay for that
without any usage.

Start ADF :-
- Go to azure portal
- Create a resource for "Data Factories"
- Create a new ADF

Azure Page 5
Azure Page 6
Azure Page 7
OR

Azure Page 8
Basics of ADF:
- Pipelines
- Activities
- Linked Services
- Datasets
- Triggers
- Integration Runtime
- Control Flow
- Data Flow

Azure Page 9
Pipelines:
- A pipeline is a logical grouping of activities that together perform a task
- A data factory can have one or multiple pipelines
Activities:
- The activities in a pipeline defines actions to perform on your data
- 3 type of activities
○ Data movement activities
○ Data transformation activities
○ Control activities
Linked Services:
- Linked services are much like connection strings, which define the
connection information needed for Data Factory to connect to external
resources
- Here is a sample scenario to copy data from Blob storage to a SQL
database, you create two linked services: Azure Storage and Azure SQL
Database
- Everything can be parameterized
Datasets:
- Datasets identify data within different data stores, such as tables, files,
folders and documents
- Before you create a dataset, you must create a linked service to link your
data to the data factory
- Same dataset cant be used for input & output in a pipeline
- Dataset points to single data source
Triggers:
- Triggers are used to schedule a execution of pipeline
- Pipeline and triggers have a many-to-many relationship
○ multiple triggers cab kick off a single pipeline or a single trigger can
kick off multiple pipelines
Integration Runtime:

Azure Page 10
Integration Runtime:
- The integration Runtime (IR) is the compute infrastructure used by ADF
- It is used when linked service is created
- Following available type of integration runtime:
○ Azure (default)
○ Self hosted
○ Azure SSIS
Control Flow:
- It orchestrates a set of control activities within a pipeline like
LookupActivity, ForEachActivity etc.
Data Flow:
- Data flows allow data engineers to develop graphical data transformation
logic without writing code
- Data flows are executed as activities within Azure Data Factory pipelines
using scaled out Azure Databricks clusters.

How to create Azure SQL Database?


Steps are like below:
- Go to azure portal
- Create a resource for "SQL Database"

- Give username & password which you can remember.

Azure Page 11
- Click "Configure database"

Azure Page 12
Azure Page 13
- Database is created.

- Click "Set server firewall"

Azure Page 14
Click "Set server firewall"

- Click "Query Editor"

- You can access the database using "Query editor"

Azure Page 15
- Also you can access using "SQL server management studio"

How to create Azure Datalake ?


Notes on ADL:
- Store as BLOB in azure Datalake
- Schema on read
Steps:
- Search for "storage accounts"

Azure Page 16
- Go to "Review + create"
- Click "Review + create" and then "Create" once validation passed.
- Click "Go to resource" once deployment is completed.

Azure Page 17
- Go to "Storage Explorer"
- Right click on "CONTAINERS" and click "Create file system"

- Upload files manually using Microsoft Azure Storage Explorer

Azure Page 18
First Task: copy data from database to datalake
- Create database table and insert records

Azure Page 19
- No transformation is applied here, so "COPY DATA" activity is used from
ADF.
- Provide appropriate "Name"

- Specify source
○ Specify "Source dataset" --> Click "+ New" to create new dataset
○ Search and select "Azure SQL Database"

Azure Page 20
○ Set properties -->
▪ specify "Name"
▪ create new linked service

▪ Select "Table name" and press OK

Azure Page 21
- Source configuration is completed.
- For Note:
○ "Use query" to specify whether unload full data, query output or sql
stored procedure output.
○ If source table has any partition, then specify that using "partition
option"
○ Use "Preview data" to check sample records.

- Specify sink
○ Specify "Source dataset" --> Click "+ New" to create new dataset
○ Search and select "Azure Data Lake Storage Gen 2"
○ Specify

Azure Page 22
○ Select file format

○ Set properties
▪ specify "Name"
▪ create new linked service

Azure Page 23
▪ Set properties where to store output data

○ Can select "block size", "maz rows perfile" etc.


○ Specify "file extension"

Azure Page 24
- Can specify "Mapping"

- Click "Publish all"

Azure Page 25
- Once publishing is completed, we can run it.
○ Can run in debug mode, using "Debug"
○ Can run using "Add trigger" and monitor the run from triggers.

Azure Page 26

You might also like