The OP's original question
Are there any issues with using async/await in a forEach loop? ...
was covered to an extent in @Bergi's selected answer,
which showed how to process in serial and in parallel. However there are other issues noted with parallelism -
- Order -- @chharvey notes that -
For example if a really small file finishes reading before a really large file, it will be logged first, even if the small file comes after the large file in the files array.
- Possibly opening too many files at once -- A comment by Bergi under another answer
It is also not good to open thousands of files at once to read them concurrently. One always has to do an assessment whether a sequential, parallel, or mixed approach is better.
So let's address these issues showing actual code that is brief and concise, and does not use third party libraries. Something easy to cut, paste, and modify.
Reading in parallel (all at once), printing in serial (as early as possible per file).
The easiest improvement is to perform full parallelism as in @Bergi's answer, but making a small change so that each file is printed as soon as possible while preserving order.
async function printFiles2() {
const readProms = (await getFilePaths()).map((file) =>
fs.readFile(file, "utf8")
);
await Promise.all([
await Promise.all(readProms), // branch 1
(async () => { // branch 2
for (const p of readProms) console.log(await p);
})(),
]);
}
Above, two separate branches are run concurrently.
- branch 1: Reading in parallel, all at once,
- branch 2: Reading in serial to force order, but waiting no longer than necessary
That was easy.
Reading in parallel with a concurrency limit, printing in serial (as early as possible per file).
A "concurrency limit" means that no more than N
files will ever being read at the same time.
Like a store that only allows in so many customers at a time (at least during COVID).
First a helper function is introduced -
function bootablePromise(kickMe: () => Promise<any>) {
let resolve: (value: unknown) => void = () => {};
const promise = new Promise((res) => { resolve = res; });
const boot = () => { resolve(kickMe()); };
return { promise, boot };
}
The function bootablePromise(kickMe:() => Promise<any>)
takes a
function kickMe
as an argument to start a task (in our case readFile
) but is not started immediately.
bootablePromise
returns a couple of properties
promise
of type Promise
boot
of type function ()=>void
promise
has two stages in life
- Being a promise to start a task
- Being a promise complete a task it has already started.
promise
transitions from the first to the second state when boot()
is called.
bootablePromise
is used in printFiles
--
async function printFiles4() {
const files = await getFilePaths();
const boots: (() => void)[] = [];
const set: Set<Promise<{ pidx: number }>> = new Set<Promise<any>>();
const bootableProms = files.map((file,pidx) => {
const { promise, boot } = bootablePromise(() => fs.readFile(file, "utf8"));
boots.push(boot);
set.add(promise.then(() => ({ pidx })));
return promise;
});
const concurLimit = 2;
await Promise.all([
(async () => { // branch 1
let idx = 0;
boots.slice(0, concurLimit).forEach((b) => { b(); idx++; });
while (idx<boots.length) {
const { pidx } = await Promise.race([...set]);
set.delete([...set][pidx]);
boots[idx++]();
}
})(),
(async () => { // branch 2
for (const p of bootableProms) console.log(await p);
})(),
]);
}
As before there are two branches
- branch 1: For running and handling concurrency.
- branch 2: For printing
The difference now is the no more than concurLimit
Promises are allowed to run concurrently.
The important variables are
boots
: The array of functions to call to force its corresponding Promise to transition. It is used only in branch 1.
set
: There are Promises in a random access container so that they can be easily removed once fulfilled. This container is used only in branch 1.
bootableProms
: These are the same Promises as initially in set
, but it is an array not a set, and the array is never changed. It is used only in branch 2.
Running with a mock fs.readFile
that takes times as follows (filename vs. time in ms).
const timeTable = {
"1": 600,
"2": 500,
"3": 400,
"4": 300,
"5": 200,
"6": 100,
};
test run times such as this are seen, showing the concurrency is working --
[1]0--0.601
[2]0--0.502
[3]0.503--0.904
[4]0.608--0.908
[5]0.905--1.105
[6]0.905--1.005
Available as executable in the typescript playground sandbox
for i = ...
never fails.