Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-109653: Just import recursive_repr in dataclasses #109822

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

sobolevn
Copy link
Member

@sobolevn sobolevn commented Sep 25, 2023

Import time is unchanged. Before:

» pyperf timeit 'import dataclasses'
.....................
Mean +- std dev: 106 ns +- 2 ns

After:

» pyperf timeit 'import dataclasses'
.....................
Mean +- std dev: 105 ns +- 2 ns

But, we now drop two dependencies: functools and _thread and add just one: reprlib
https://github.com/python/cpython/blame/f2eaa92b0cc5a37a9e6010c7c6f5ad1a230ea49b/Lib/dataclasses.py#L248-L266

It was strange to have

# This function's logic is copied from "recursive_repr" function in
# reprlib module to avoid dependency.

But import two other heavier modules. The most interesting part is that functools always imports reprlib inside. See #109653 (comment)

@sobolevn
Copy link
Member Author

The windows failure is known and unrelated: #109739

@ericvsmith
Copy link
Member

I don't think this change is worth making. It seems mostly like code churn: there's no actual problem being solved.

@AA-Turner
Copy link
Member

Is there a difference if the functools import is deferred to within the vendored _recursive_repr function (& we remove the top-level)?

A

@sobolevn
Copy link
Member Author

sobolevn commented Sep 25, 2023

@AA-Turner it makes no difference, because inspect imports 80% of all stdlib, including functools :)

inspect is only used in two places:

But, it still does not improve the import time:

» pyperf timeit 'import dataclasses'
.....................
Mean +- std dev: 105 ns +- 1 ns

I also tried to defer inspect, but it is still the same :(

@DavidCEllis
Copy link
Contributor

...it makes no difference, because inspect imports 80% of all stdlib, including functools :)

inspect is only used in two places:

But, it still does not improve the import time:

» pyperf timeit 'import dataclasses'
.....................
Mean +- std dev: 105 ns +- 1 ns

I also tried to defer inspect, but it is still the same :(

@sobolevn
Are these pyperf timings actually outputting accurate data when it comes to imports? The numbers seem surprisingly low to me.

I get similar numbers (scale-wise) from pyperf but they don't match with other timings I've done.

pyperf timeit "import dataclasses" 1

.....................
Mean +- std dev: 84.6 ns +- 4.1 ns

python -Ximporttime -c "import dataclasses" (truncated to only show dataclasses):

import time: self [us] | cumulative | imported package
...
import time:       961 |      18112 | dataclasses

Which lines up roughly with the timings I get from hyperfine comparing launching python with no imports to importing dataclasses.

hyperfine -w10 -r100 -N 'python -c "pass"' 'python -c "import dataclasses"'

Benchmark 1: python -c "pass"
  Time (mean ± σ):      11.0 ms ±   0.6 ms    [User: 8.4 ms, System: 2.4 ms]
  Range (min … max):    10.4 ms …  13.0 ms    100 runs
 
Benchmark 2: python -c "import dataclasses"
  Time (mean ± σ):      28.5 ms ±   1.0 ms    [User: 23.8 ms, System: 4.5 ms]
  Range (min … max):    27.0 ms …  32.0 ms    100 runs
 
Summary
  'python -c "pass"' ran
    2.59 ± 0.17 times faster than 'python -c "import dataclasses"'

From previous experiments mentioned in #109653 (comment) I would expect deferring inspect to remove close to half of the import time. However, it's not actually very useful to do so as any use of @dataclass will end up importing inspect in order to get the annotations, so practically you wouldn't actually see any benefit from delaying the import unless none of the modules you use make any dataclasses.


Slightly more directly related, and also mentioned in #109653 (comment) the definition of _recursive_repr you have in dataclasses makes use of functools.wraps which uses functools.partial which in turn uses reprlib.recursive_repr on its own __repr__ so perhaps this change still makes sense from that angle?

Footnotes

  1. I also get surprisingly high values for pyperf timeit 'import sys' in comparison.

@AA-Turner
Copy link
Member

A change in the other direction (per reprlib's own implementation):

diff --git a/Lib/dataclasses.py b/Lib/dataclasses.py
index 84f8d68ce0..8579f6d605 100644
--- a/Lib/dataclasses.py
+++ b/Lib/dataclasses.py
@@ -4,7 +4,6 @@
 import types
 import inspect
 import keyword
-import functools
 import itertools
 import abc
 import _thread
@@ -252,7 +251,6 @@ def _recursive_repr(user_function):
     # call.
     repr_running = set()

-    @functools.wraps(user_function)
     def wrapper(self):
         key = id(self), _thread.get_ident()
         if key in repr_running:
@@ -263,6 +261,15 @@ def wrapper(self):
         finally:
             repr_running.discard(key)
         return result
+
+    wrapper.__module__ = getattr(user_function, '__module__')
+    wrapper.__doc__ = getattr(user_function, '__doc__')
+    wrapper.__name__ = getattr(user_function, '__name__')
+    wrapper.__qualname__ = getattr(user_function, '__qualname__')
+    wrapper.__annotations__ = getattr(user_function, '__annotations__', {})
+    wrapper.__type_params__ = getattr(user_function, '__type_params__', ())
+    wrapper.__wrapped__ = user_function
+
     return wrapper

 class InitVar:

A

@sobolevn
Copy link
Member Author

sobolevn commented Oct 1, 2023

@AA-Turner we still have _thread import (1 _thread vs 1 reprlib) and quite a lot of boilerplate code that can go out of sync with the reprlib.recursive_repr (it happened before #109819) for no real reason.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boilerplate code that can go out of sync with the reprlib.recursive_repr (it happened before #109819)

IMO, this is a good reason to merge this PR.

Copy link
Member

@ericvsmith ericvsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as there's not a performance regression due to importing reprlib, I'm okay with this. I have not checked the performance.

@DavidCEllis
Copy link
Contributor

As long as there's not a performance regression due to importing reprlib, I'm okay with this. I have not checked the performance.

You're already importing reprlib indirectly via functools so this shouldn't change import time if that's a concern (functools will still be imported via inspect so you won't see an improvement by removing that direct import).

@ericvsmith
Copy link
Member

Agreed that reprlib is already imported (checked with 3.11).

@ericvsmith ericvsmith merged commit edc9d85 into python:main Mar 5, 2024
20 checks passed
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 25, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants