Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semaphore slowdown #97545

Closed
gvanrossum opened this issue Sep 25, 2022 · 3 comments
Closed

Semaphore slowdown #97545

gvanrossum opened this issue Sep 25, 2022 · 3 comments
Labels
performance Performance or resource usage topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@gvanrossum
Copy link
Member

gvanrossum commented Sep 25, 2022

This started in #93222 (comment).

Here are two of my posts there, repeated:


The test program spawns oodles of trivial tasks and tries to rate-limit them by making each task acquire a semaphore first. The semaphore allows 50 tasks at a time. With python 3.11rc2, on my Mac it does nearly 25,000 iterations per second. With the new code it does about 900, or about 27 times slower.

The first 50 tasks take the easy path through acquire(), every following task gets put in the queue first. What makes it so slow is that once 50 tasks have acquired the semaphore in a single event loop iteration, they will also release it all in a single iteration -- but the new algorithm only wakes up the first waiting task, and the next 49 release() calls do nothing to the queue. So from then on we only wake up one task per iteration.

Of course, it's easy to complain when the broken code is fast. :-) But I think we can do better.

(UPDATE: The same code without using a semaphore can do over 200,000 loops/sec. Same if I keep the semaphore but set the initial value to 50,000. If I also remove the sleep(0) from the task it can do over 400,000 loops/sec. A loop that doesn't create tasks but calls sleep(0) and bumps the counter runs around 50,000.)


Actually the behavior is guaranteed. Under Scheduling Callbacks I read

Callbacks are called in the order in which they are registered.

So we can definitely rely on this. (Sorry I hadn't mentioned this earlier, I wasn't actually aware of the guarantee, just of how the code works.)

Something that doesn't affect correctness but may affect performance is that the event loop goes through a cycle:

  • wait for I/O events (*)
  • register callbacks for I/O events
  • call all callbacks that are registered and ready at this point (I/O and otherwise)
  • go back to top

(*) The timeout for the I/O wait is zero if there are callbacks that are immediately ready, otherwise the time until the first callback scheduled for a particular time in the future (call_later()).

The I/O wait has expensive fixed overhead, so we want to call as many callbacks in a single iteration as possible.

Therefore I think a it behooves us to make all futures ready for which we have place. I think it can be like this abstract algorithm:

  • Definitions:
    • Level: L = self._value
    • Waiters: W = self._waiters
    • Ready: R = [w for w in W if w.done() and not w.cancelled()]
    • Cancelled: C = [w for w in W if w.cancelled()]
    • Blocked: B = [w for w in W if not w.done()]
    • Note that R, C and B are views on W, not separate data structures
  • Invariant that should hold at all times:
    • L >= |R|
      (I.e., we should not promise more guests to seat than we have open tables)
  • Operations:
    • Equalize: while |B| > 0 and L >= |R|: make the first item of B ready (move it to R)
    • Release: L++; Equalize
    • Acquire:
      • if L > 0 and |R| == 0 and |B| == 0: L--; return
      • create a future F, append it to B, await it
      • when awaken (with or without exception):
        • assertion: F should be in either R or C
        • remove F from W (hence from R or C)
        • if no exception caught: L--; Equalize; return
        • if CancelledException caught and F.cancelled(): return
        • if CancelledException caught and not F.cancelled(): Equalize; return
        • (other exceptions are not expected and will bubble out unhandled)

@cykerway
Copy link
Contributor

Actually the behavior is guaranteed. Under Scheduling Callbacks I read

Callbacks are called in the order in which they are registered.

So we can definitely rely on this. (Sorry I hadn't mentioned this earlier, I wasn't actually aware of the guarantee, just of how the code works.)

Here my question is if these callbacks are the same thing as what used to wake up futures and resume tasks. We are not using call_soon() to resume tasks? IDK. I didn't read a lot of asyncio source code. If you are sure the loop has certain good property which is what we want, you may want to take a look at that PR. It has passed all tests so far and performed as fast as old code.

@gvanrossum
Copy link
Member Author

Yeah, when a Future is made done, its callbacks are scheduled using call_soon. And the callbacks are what wakes up the tasks waiting for it.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 26, 2022
(cherry picked from commit 68c46ae)

Co-authored-by: Cyker Way <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 26, 2022
(cherry picked from commit 68c46ae)

Co-authored-by: Cyker Way <[email protected]>
@gvanrossum
Copy link
Member Author

Thanks!

Repository owner moved this from Todo to Done in asyncio Sep 26, 2022
miss-islington added a commit that referenced this issue Sep 26, 2022
(cherry picked from commit 68c46ae)

Co-authored-by: Cyker Way <[email protected]>
miss-islington added a commit that referenced this issue Sep 27, 2022
(cherry picked from commit 68c46ae)

Co-authored-by: Cyker Way <[email protected]>
pablogsal pushed a commit that referenced this issue Oct 24, 2022
(cherry picked from commit 68c46ae)

Co-authored-by: Cyker Way <[email protected]>
pablogsal pushed a commit that referenced this issue Oct 24, 2022
(cherry picked from commit 68c46ae)

Co-authored-by: Cyker Way <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

3 participants