2

I'm having an issue in that a PowerShell Script takes 10 times as long as a batch file to download files from AWS S3.

I have an existing batch file script to move files from one S3 Bucket to another, it takes about 30 seconds to move 1000 files.

The script looks like this aws s3 mv s3://bucket/folder/s3://bucket/%destKey%/%subDestKey%/ --recursive --include "*.json" -profile etl

I'd rather do this in PowerShell as I'd like to apply a lot more logic and I'm more comfortable in PowerShell.

My Powershell script to do the same things looks like this

$files = Get-S3object -BucketName bucket | where {$_.Key -like "*.json" -and 
$_.Key -notlike "inprogress*"}
foreach ($file in $files){

Copy-S3Object -BucketName bucket -Key $file.Key -DestinationKey 
"$date/$($file.key)" -DestinationBucket newbucket
Remove-S3Object -BucketName bucket -Key $file.Key -Force


}

However in PowerShell this script takes about 300 seconds to move 1000 files, has anyone else has this same experience? Hopefully the answer is that I'm taking the wrong approach here as I'd love to be able to use PowerShell for this task!

2 Answers 2

1

There are two reasons for the performance difference here:

  • Powershell uploads in a single thread
  • You are copying each file in series

AWS CLI is much faster because it uses multiple threads (up to 10 by default), and so is doing multiple simultaneous operations.

You can speed things up by changing your script to use the -parallel option, limiting the number of concurrent operations.

The foreach would then look like this:

foreach -parallel -throttlelimit 10 ($file in $files){

Copy-S3Object -BucketName bucket -Key $file.Key -DestinationKey "$date/$($file.key)" -DestinationBucket newbucket Remove-S3Object -BucketName bucket -Key $file.Key -Force

}

Depending on your system, Windows may limit you to only 5 parallel process, but this should still give you a reasonable speed up.

2
  • Yeah I think you are right, unfortunately to run the foreach in parallel you have to use a workflow and the overhead of this is almost as bad as not bothering as I'm only uploading small files, thanks for the answer though!
    – Harrison
    Commented Oct 11, 2018 at 12:54
  • In PowerShell 7.x+ you can use ForEach-Object -Parallel to parallelize the execution without the need for PowerShell Workflows. See devblogs.microsoft.com/powershell/… Commented Jan 25, 2021 at 17:20
0

My guess is that aws s3 is doing the move over a single HTTPS connection re-used for all the files.

On the other hand each PowerShell's Copy-S3Object and Remove-S3Object call opens a new HTTPS connection, does the SSL handshake, etc. That is a great overhead if you have to do it 1000x.

That's my guess :)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .