Semaphore slowdown #97545

gvanrossum · 2022-09-25T15:26:09Z

Here are two of my posts there, repeated:

The test program spawns oodles of trivial tasks and tries to rate-limit them by making each task acquire a semaphore first. The semaphore allows 50 tasks at a time. With python 3.11rc2, on my Mac it does nearly 25,000 iterations per second. With the new code it does about 900, or about 27 times slower.

The first 50 tasks take the easy path through acquire(), every following task gets put in the queue first. What makes it so slow is that once 50 tasks have acquired the semaphore in a single event loop iteration, they will also release it all in a single iteration -- but the new algorithm only wakes up the first waiting task, and the next 49 release() calls do nothing to the queue. So from then on we only wake up one task per iteration.

Of course, it's easy to complain when the broken code is fast. :-) But I think we can do better.

(UPDATE: The same code without using a semaphore can do over 200,000 loops/sec. Same if I keep the semaphore but set the initial value to 50,000. If I also remove the sleep(0) from the task it can do over 400,000 loops/sec. A loop that doesn't create tasks but calls sleep(0) and bumps the counter runs around 50,000.)

Actually the behavior is guaranteed. Under Scheduling Callbacks I read

Callbacks are called in the order in which they are registered.

So we can definitely rely on this. (Sorry I hadn't mentioned this earlier, I wasn't actually aware of the guarantee, just of how the code works.)

Something that doesn't affect correctness but may affect performance is that the event loop goes through a cycle:

wait for I/O events (*)
register callbacks for I/O events
call all callbacks that are registered and ready at this point (I/O and otherwise)
go back to top

(*) The timeout for the I/O wait is zero if there are callbacks that are immediately ready, otherwise the time until the first callback scheduled for a particular time in the future (call_later()).

The I/O wait has expensive fixed overhead, so we want to call as many callbacks in a single iteration as possible.

Therefore I think a it behooves us to make all futures ready for which we have place. I think it can be like this abstract algorithm:

Definitions:
- Level: L = self._value
- Waiters: W = self._waiters
- Ready: R = [w for w in W if w.done() and not w.cancelled()]
- Cancelled: C = [w for w in W if w.cancelled()]
- Blocked: B = [w for w in W if not w.done()]
- Note that R, C and B are views on W, not separate data structures
Invariant that should hold at all times:
- L >= |R|
  (I.e., we should not promise more guests to seat than we have open tables)
Operations:
- Equalize: while |B| > 0 and L >= |R|: make the first item of B ready (move it to R)
- Release: L++; Equalize
- Acquire:
  - if L > 0 and |R| == 0 and |B| == 0: L--; return
  - create a future F, append it to B, await it
  - when awaken (with or without exception):
    - assertion: F should be in either R or C
    - remove F from W (hence from R or C)
    - if no exception caught: L--; Equalize; return
    - if CancelledException caught and F.cancelled(): return
    - if CancelledException caught and not F.cancelled(): Equalize; return
    - (other exceptions are not expected and will bubble out unhandled)

The text was updated successfully, but these errors were encountered:

cykerway · 2022-09-25T23:45:36Z

Actually the behavior is guaranteed. Under Scheduling Callbacks I read

Callbacks are called in the order in which they are registered.

So we can definitely rely on this. (Sorry I hadn't mentioned this earlier, I wasn't actually aware of the guarantee, just of how the code works.)

Here my question is if these callbacks are the same thing as what used to wake up futures and resume tasks. We are not using call_soon() to resume tasks? IDK. I didn't read a lot of asyncio source code. If you are sure the loop has certain good property which is what we want, you may want to take a look at that PR. It has passed all tests so far and performed as fast as old code.

gvanrossum · 2022-09-26T01:17:03Z

Yeah, when a Future is made done, its callbacks are scheduled using call_soon. And the callbacks are what wakes up the tasks waiting for it.

(cherry picked from commit 68c46ae) Co-authored-by: Cyker Way <[email protected]>

gvanrossum · 2022-09-26T23:50:30Z

Thanks!

(cherry picked from commit 68c46ae) Co-authored-by: Cyker Way <[email protected]>

gvanrossum added the type-bug An unexpected behavior, bug, or error label Sep 25, 2022

gvanrossum mentioned this issue Sep 25, 2022

gh-90155: Fix broken asyncio.Semaphore and strengthen FIFO guarantee. #93222

Merged

AlexWaygood added performance Performance or resource usage topic-asyncio labels Sep 25, 2022

ezio-melotti added this to asyncio Sep 25, 2022

ezio-melotti moved this to Todo in asyncio Sep 25, 2022

cykerway mentioned this issue Sep 25, 2022

gh-97545: Make Semaphore run faster. #97549

Merged

gvanrossum pushed a commit that referenced this issue Sep 26, 2022

gh-97545: Make Semaphore run faster. (#97549)

68c46ae

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 26, 2022

pythongh-97545: Make Semaphore run faster. (pythonGH-97549)

74b417e

(cherry picked from commit 68c46ae) Co-authored-by: Cyker Way <[email protected]>

This was referenced Sep 26, 2022

[3.11] gh-97545: Make Semaphore run faster. (GH-97549) #97584

Merged

[3.10] gh-97545: Make Semaphore run faster. (GH-97549) #97585

Merged

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 26, 2022

pythongh-97545: Make Semaphore run faster. (pythonGH-97549)

c20a0cc

(cherry picked from commit 68c46ae) Co-authored-by: Cyker Way <[email protected]>

gvanrossum closed this as completed Sep 26, 2022

Repository owner moved this from Todo to Done in asyncio Sep 26, 2022

miss-islington added a commit that referenced this issue Sep 26, 2022

gh-97545: Make Semaphore run faster. (GH-97549)

2321561

(cherry picked from commit 68c46ae) Co-authored-by: Cyker Way <[email protected]>

miss-islington added a commit that referenced this issue Sep 27, 2022

gh-97545: Make Semaphore run faster. (GH-97549)

9a9bf88

(cherry picked from commit 68c46ae) Co-authored-by: Cyker Way <[email protected]>

pablogsal pushed a commit that referenced this issue Oct 24, 2022

gh-97545: Make Semaphore run faster. (GH-97549)

c064bf0

(cherry picked from commit 68c46ae) Co-authored-by: Cyker Way <[email protected]>

pablogsal pushed a commit that referenced this issue Oct 24, 2022

gh-97545: Make Semaphore run faster. (GH-97549)

9c80b55

(cherry picked from commit 68c46ae) Co-authored-by: Cyker Way <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semaphore slowdown #97545

Semaphore slowdown #97545

gvanrossum commented Sep 25, 2022 •

edited

Loading

cykerway commented Sep 25, 2022

gvanrossum commented Sep 26, 2022

gvanrossum commented Sep 26, 2022

Semaphore slowdown #97545

Semaphore slowdown #97545

Comments

gvanrossum commented Sep 25, 2022 • edited Loading

cykerway commented Sep 25, 2022

gvanrossum commented Sep 26, 2022

gvanrossum commented Sep 26, 2022

gvanrossum commented Sep 25, 2022 •

edited

Loading