Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce overhead to add/remove asyncio readers and writers #106527

Closed
bdraco opened this issue Jul 7, 2023 · 12 comments
Closed

Reduce overhead to add/remove asyncio readers and writers #106527

bdraco opened this issue Jul 7, 2023 · 12 comments
Labels
performance Performance or resource usage topic-asyncio type-feature A feature request or enhancement

Comments

@bdraco
Copy link
Contributor

bdraco commented Jul 7, 2023

Pitch

The current code path for adding and removing asyncio readers and writers always has to do a try except KeyError to add a new reader/writer

except KeyError:

For use cases where readers are added and removed frequently (hard to change the design without breaking changes) this adds up quickly.

Screenshot 2023-07-07 at 8 22 45 AM

Linked PRs

@gvanrossum
Copy link
Member

Interesting. Have you measured your alternative? Is it faster? I'm a little surprised that calling .get_map().get(key) is slower than calling .get_key(key) plus raising and catching an exception -- that would suggest a function call is faster than raising an catching an exception.

Maybe we would do better trying to make exception handling faster? (I guess this code is unusual because in the common case the exception gets raised, so our "zero-overhead exceptions" don't help, since it's only zero overhead when it doesn't get raised.)

@kumaraditya303 Thoughts? The PR seems well thought out.

@bdraco
Copy link
Contributor Author

bdraco commented Jul 7, 2023

Have you measured your alternative? Is it faster?

The flames disappear from the py-spy after the change. I also did a cProfile and its faster. This isn't a great comparison though as its doesn't quantify the actual performance improvement and only shows more of the real world impact. I'll see if I can put together something that benchmarks it

add_reader

@bdraco
Copy link
Contributor Author

bdraco commented Jul 7, 2023

original: 1.048221042030491
new: 0.626125167007558

I think this is close enough to benchmark the change:

import asyncio
import timeit

loop = asyncio.get_event_loop()

original_code = """
try:
    key = selector.get_key(1)
except KeyError:
    pass
"""

new_code = """
key = selector.get_map().get(1)
if key is None:
    pass
"""

original_time = timeit.timeit(
    original_code,
    number=1000000,
    globals={"selector": loop._selector},
)
new_time = timeit.timeit(
    new_code,
    number=1000000,
    globals={"selector": loop._selector},
)

print("original: %s" % original_time)
print("new: %s" % new_time)

@gvanrossum
Copy link
Member

Yeah, that’s acceptable.

@bdraco
Copy link
Contributor Author

bdraco commented Jul 8, 2023

_SelectorMapping.__getitem__ has the same pattern
Screenshot 2023-07-08 at 9 01 51 AM

probably something like this would offer the same speed up

except KeyError:

    def __getitem__(self, fileobj):
        fd = self._selector._fileobj_lookup(fileobj)
        key = self._selector._fd_to_key.get(fd)
        if key is None:
            raise KeyError("{!r} is not registered".format(fileobj))
        return key

I'll wait for the review of the linked PR and pending the outcome there will open another PR for this. In the mean time I'll push the above code to production and run it for a bit.

That helps a bit as it avoids one except catch but on the add_reader case the last KeyError has to raise

Screenshot 2023-07-08 at 9 10 07 AM

@bdraco
Copy link
Contributor Author

bdraco commented Jul 9, 2023

It also looks like the selectors could remove _key_from_fd and replace it with a simple self._fd_to_key.get() unless _key_from_fd is a defined API promise. That's roughly a 10% reduction of run time of the code in selectors.py when running an asyncio event loop with ~10000 calls to select per minute since it doesn't have the extra function depth overhead.

That should also likely be another PR. Probably worth doing since its called in the select loop and this was on a mostly idle system.

@@ -332,7 +332,7 @@ def select(self, timeout=None):
             if fd in w:
                 events |= EVENT_WRITE
 
-            key = self._key_from_fd(fd)
+            key = self._fd_to_key.get(fd)
             if key:
                 ready.append((key, events & key.events))
         return ready
@@ -422,7 +422,7 @@ def select(self, timeout=None):
             if event & ~self._EVENT_WRITE:
                 events |= EVENT_READ
 
-            key = self._key_from_fd(fd)
+            key = self._fd_to_key.get(fd)
             if key:
                 ready.append((key, events & key.events))
         return ready
@@ -475,7 +475,7 @@ def select(self, timeout=None):
                 if event & ~select.EPOLLOUT:
                     events |= EVENT_READ
 
-                key = self._key_from_fd(fd)
+                key = self._fd_to_key.get(fd)
                 if key:
                     ready.append((key, events & key.events))
             return ready
@@ -570,7 +570,7 @@ def select(self, timeout=None):
                 if flag == select.KQ_FILTER_WRITE:
                     events |= EVENT_WRITE
 
-                key = self._key_from_fd(fd)
+                key = self._fd_to_key.get(fd)
                 if key:
                     ready.append((key, events & key.events))
             return ready

rough profile (different points in time with similar workloads)
before: Screenshot 2023-07-08 at 3 32 38 PM
after Screenshot 2023-07-08 at 3 32 34 PM

@AlexWaygood
Copy link
Member

Maybe we would do better trying to make exception handling faster? (I guess this code is unusual because in the common case the exception gets raised, so our "zero-overhead exceptions" don't help, since it's only zero overhead when it doesn't get raised.)

I think that would also be worthwhile, FWIW. I was quite surprised how much switching to LBYL idioms sped things up in #103318, largely because of how slow exception handling is.

(N.B. I don't think longterm goals like that should impact whether this proposed change is accepted or not.)

@kumaraditya303
Copy link
Contributor

FWIW I think it is much better to optimize selectors module rather than asyncio in this case.

@bdraco
Copy link
Contributor Author

bdraco commented Jul 14, 2023

FWIW I think it is much better to optimize selectors module rather than asyncio in this case.

I agree that selectors are the primary source of the bottleneck I'm seeing. (nothing else is called 100000x per minute on busy systems besides _run_once)

More discussion on that in #106555

@bdraco
Copy link
Contributor Author

bdraco commented Jul 14, 2023

The linked #106528 can use the new get implementation #106665 which will avoids the exceptions in the happy path of _add_reader which I think is still valuable on its own when you have a situation where readers are added and removed serval times per second.

@bdraco
Copy link
Contributor Author

bdraco commented Jul 14, 2023

edit: sorry I screwed up and linked the wrong PR above. Should have been #106665

@kumaraditya303
Copy link
Contributor

Thanks for working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-asyncio type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

4 participants