1

I have a copy activity build in Azure Data Factory V2, where the datasource is a SFTP folder with several XML files and the Sink is a Azure Postgres Database.

I have successfully used the copy activity for small files (20 MB). But I have 3 major XML files with 3 GB, 4.5 GB and 18 GB.

  • For the size of these files which settings should I choose? How many DIU?
  • Is the choice of the datasource relevant? This is, using Amazon S3 or Data Blob is better that FTP? (I ask this because is it taking too long for just coping the data).

1 Answer 1

0

A single copy activity can take advantage of scalable compute resources.

When using Azure integration runtime (IR), you can specify up to 256 data integration units (DIUs) for each copy activity, in a serverless manner. When using self-hosted IR, you can take either of the following approaches:

  • Manually scale up the machine.

  • Scale out to multiple machines (up to 4 nodes), and a single copy activity will partition its file set across all nodes.

A single copy activity reads from and writes to the data store using multiple threads in parallel.

You can run multiple copy activities in parallel. You can run in parallel by using control flow constructs.

For more information, see the following articles about solution templates:

Copy files from multiple containers

Migrate data from Amazon S3 to ADLS Gen2

Bulk copy with a control table

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.