Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40799: Add _pydatetime module (Python datetime impl) #20472

Closed
wants to merge 3 commits into from
Closed

bpo-40799: Add _pydatetime module (Python datetime impl) #20472

wants to merge 3 commits into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented May 28, 2020

Move the Python implementation of the datetime module into a new
Lib/_pydatetime.py file to optimize "import datetime".

  • Define datetime.all in Lib/datetime.py.
  • _pydatetime.py defines name as 'datetime' for pickle.
  • TestModule.test_divide_and_round() now imports directly
    _pydatetime, since _divide_and_round() is no longer exported by the
    datetime module.
  • test_datetime also requires a fresh _pydatetime when loading
    pure_tests.

https://bugs.python.org/issue40799

@vstinner
Copy link
Member Author

vstinner commented Jun 8, 2020

@pganssle: This change makes "import decimal" 4.31x faster (814 us +- 32 us -> 189 us +- 4 us). In https://bugs.python.org/issue40799 you discussed the idea of converting datetime into a package. I'm not convinced by this idea, but I'm open for discussion :-) What do you think of this change? Is it to merge it?

@vstinner
Copy link
Member Author

This PR reduce names exposed in dir(datetime).

I just updated my PR to remove datetime.sys when the pure Python implementation is used.

The C implementation exposes datetime.datetime_CAPI capsule. Maybe it can be removed, but it should be done in a separated change.

Before:

C:
    dir 19 ['MAXYEAR', 'MINYEAR', '__all__', '__builtins__', '__cached__',
            '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__',
            'date', 'datetime', 'datetime_CAPI', 'sys', 'time', 'timedelta',
            'timezone', 'tzinfo']

Pure Python:
    dir 56 ['MAXYEAR', 'MINYEAR', '_DAYNAMES', '_DAYS_BEFORE_MONTH',
            '_DAYS_IN_MONTH', '_DI100Y', '_DI400Y', '_DI4Y', '_EPOCH',
            '_IsoCalendarDate', '_MAXORDINAL', '_MONTHNAMES', '__all__',
            '__builtins__', '__cached__', '__doc__', '__file__', '__loader__',
            '__name__', '__package__', '__spec__', '_build_struct_time',
            '_check_date_fields', '_check_time_fields', '_check_tzinfo_arg',
            '_check_tzname', '_check_utc_offset', '_cmp', '_cmperror',
            '_date_class', '_days_before_month', '_days_before_year',
            '_days_in_month', '_divide_and_round', '_format_offset',
            '_format_time', '_index', '_is_leap', '_isoweek1monday', '_math',
            '_ord2ymd', '_parse_hh_mm_ss_ff', '_parse_isoformat_date',
            '_parse_isoformat_time', '_time', '_time_class', '_tzinfo_class',
            '_wrap_strftime', '_ymd2ord', 'date', 'datetime', 'sys', 'time',
            'timedelta', 'timezone', 'tzinfo']

After:

C:

    dir 18 ['MAXYEAR', 'MINYEAR', '__all__', '__builtins__', '__cached__',
            '__doc__', '__file__', '__loader__', '__name__', '__package__',
            '__spec__', 'date', 'datetime', 'datetime_CAPI', 'time',
            'timedelta', 'timezone', 'tzinfo']

Pure Python:

    dir 17 ['MAXYEAR', 'MINYEAR', '__all__', '__builtins__', '__cached__',
            '__doc__', '__file__', '__loader__', '__name__', '__package__',
            '__spec__', 'date', 'datetime', 'time', 'timedelta',
            'timezone', 'tzinfo']

datetime.__all__ is not changed by this PR, it remains a list of 8 names for C and Python implementations:

('date', 'datetime', 'time', 'timedelta', 'timezone', 'tzinfo', 'MINYEAR', 'MAXYEAR')

@vstinner
Copy link
Member Author

@pganssle: I plan to merge this PR next Monday. It makes "import datetime" 4.3x faster by avoiding importing time, math, sys and operator modules (when the C extension is available).

This PR doesn't prevent to convert datetime.py into a package later.

Well, see https://bugs.python.org/issue40799 for the rationale ;-)

@vstinner
Copy link
Member Author

Ah, and this PR also fix https://bugs.python.org/issue40058 bug.

@pganssle
Copy link
Member

Please don't merge this next Monday without review.

@vstinner
Copy link
Member Author

Please don't merge this next Monday without review.

I pinged you 2 weeks ago and you didn't reply. Great if you can review it :-)

@pganssle
Copy link
Member

I pinged you 2 weeks ago and you didn't reply. Great if you can review it :-)

Yes, my review queue is very long and I don't have a lot of time to clear it out.

I think this is an important change and probably a good idea, but it's also a major change, one that, AFAICT, will break all automatic backporting for datetime.py until Python 3.9 is no longer supported. I don't think there's anything particularly urgent about this change, though. In fact, because it's going to break all backporting, it might make sense to do it only after the 3.9.1 release, since it will slow down the process of getting bugs fixed in Python 3.9.

Also:

This PR doesn't prevent to convert datetime.py into a package later.

It's true that it doesn't, but unless I'm mistaken, that will be another event that breaks backporting and makes it more complicated to navigate the project history. I'm not saying that these are show-stoppers, but given that datetime has had this structure for 10 years and that this change isn't even in response to any sort of any production issue from an end user, I think it would behoove us to take the time to get it right.

@vstinner
Copy link
Member Author

will break all automatic backporting for datetime.py until Python 3.9 is no longer supported

I tried to make a dummy change on top on my PR: a cherry-pick to 3.9 of this change without the rename change failed.

I rewrote this PR differently, and I ran the same test: the cherry-pick worked this time!

Maybe it's because this PR not only rename datetime.py, it also makes multiple changes _pydatetime.py at the same time.

@vstinner
Copy link
Member Author

Hum, it seems like depending on the number of changes in _pydatetime.py, git cherry-pick manages to detect the rename or not.

@vstinner
Copy link
Member Author

Between datetime.py file for datetime package (Lib/datetime/init.py), I have no preference. It seems like both options solve https://bugs.python.org/issue40799.

But I have concerns about moving Lib/_strptime.py into the package. See https://bugs.python.org/issue40799. Maybe _strptime.py can be move once these concerns are answered/fixed.

@pganssle
Copy link
Member

I rewrote this PR differently, and I ran the same test: the cherry-pick worked this time!

Good to know. If we can do this in a way that cherry-pick continues to work, I'd be a lot happier merging it as early in the cycle as possible.

I'm going to comment on the _strptime and folder vs. _pydatetime.py issues in the BPO ticket, so there's a centralized place for the larger design decisions.

@vstinner
Copy link
Member Author

Good to know. If we can do this in a way that cherry-pick continues to work, I'd be a lot happier merging it as early in the cycle as possible.

Sure, me too :-)

@facundobatista
Copy link
Member

I like this change, thanks for the work!!!

The conversation in python-dev ended in agreement with this change (overe having a package), so it looks we're green to go.

@vstinner one detail, though, shouldn't we keep the __name__ attribute in datetime.py?

Move the Python implementation of the datetime module into a new
Lib/_pydatetime.py file to optimize "import datetime".

* Define datetime.__all__ in Lib/datetime.py.
* _pydatetime.py defines __name__ as 'datetime' for pickle.
* TestModule.test_divide_and_round() now imports directly
  _pydatetime, since _divide_and_round() is no longer exported by the
  datetime module.
* test_datetime also requires a fresh _pydatetime when loading
  pure_tests.
@vstinner
Copy link
Member Author

vstinner commented Sep 4, 2020

I rebased the PR.

@vstinner
Copy link
Member Author

vstinner commented Sep 4, 2020

@vstinner one detail, though, shouldn't we keep the name attribute in datetime.py?

I don't understand your question. Would you mind to elaborate?

_pydatetime.py contains:

__name__ = 'datetime'   # For pickling

@vstinner
Copy link
Member Author

vstinner commented Sep 4, 2020

I'm not sure that git blame and git cherry-pick will work once the 3 commits of this PR will be squashed into a single commit.

Maybe I will need a GitHub admin to allow me to merge without squashing, only for this PR, to help Git to detect that the file is renamed.

Copy link
Member

@facundobatista facundobatista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks!

@facundobatista
Copy link
Member

@vstinner one detail, though, shouldn't we keep the name attribute in datetime.py?

I don't understand your question. Would you mind to elaborate?

_pydatetime.py contains:

__name__ = 'datetime'   # For pickling

I was on the impression that it needed to be in the "main .py" file, that somehow the "from ... import *" didn't apply to __name__, as it didn't with __doc__, but it looks I was wrong.

FTR, decimal (and it's dependant python/C implementations) follow the same patter here, so I deduce it's ok.

Thanks! again!

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for one change to the tests. Sorry for the long delay here.

I think maybe @pablogsal has permissions to do a rebase-merge or standard merge (to preserve the ability to do backports). Ideally we'd test this on some random branch using miss-islington if that can be arranged (though I'm not sure how easy that would be).

'_datetime', '_strptime'])
fast_tests = import_fresh_module(TESTS,
fresh=['datetime', '_datetime', '_strptime'],
blocked=['_pydatetime'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't leave comments for lines not directly affected by the PR, but I believe we now also need to add _pydatetime to the list below:

for modname in ['datetime', '_datetime', '_pydatetime', '_strptime']:
    sys.modules.pop(modname, None)

(At least until we understand why that was necessary in the first place...

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@vstinner
Copy link
Member Author

I lost track of this PR and it's now outdated (in conflict). I close it. A new PR should be created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants