Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate Stdlib Extension Modules #103092

Open
24 of 25 tasks
ericsnowcurrently opened this issue Mar 28, 2023 · 58 comments
Open
24 of 25 tasks

Isolate Stdlib Extension Modules #103092

ericsnowcurrently opened this issue Mar 28, 2023 · 58 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented Mar 28, 2023

See PEP 687.

Currently most stdlib extension have been ported to multi-phase init. There are still a number of them to be ported, almost entirely non-builtin modules. Also, some that have already been ported still have global state that needs to be fixed.

(This is part of the effort to finish isolating multiple interpreters from each other. See gh-100227.)

High-Level Info

How to isolate modules: https://docs.python.org/3/howto/isolating-extensions.html (AKA PEP 630).

The full list of modules that need porting can be found with: ...

The full list of remaining (unsupported) global variables is:

A full analysis of the modules may be found at the bottom of this post.

(other info)

Previous Work

Related Links

TODO

Here is the list of modules that need attention, in a rough, best-effort priority order. Additional details (e.g. if there is an issue and/or PR) is found in the analysis table at the bottom.

The above does not include test modules. They don't need to be ported/isolated (except for a few which already have been).


Modules Analysis

module builtin Windows PEP 594 issue PR ported # static types # other global objects # other globals
_asyncio yes 2
_collections X (???) (branch) yes 7
_ctypes **NO** 37 6 4
_curses **NO** 1 2 4
_curses_panel yes 1
_datetime gh-71587 gh-102995 **NO** 7 10 1
_decimal **NO** 4 10 6
_elementtree yes 1
_io X gh-101819 gh-101520 **NO** 5
_lsprof yes 2
_multibytecodec yes 23
_pickle (???) (yes) **NO** 5
_socket **NO** 1 2 3
_ssl (???) (branch) yes 1
_tkinter **NO** 8 9
_tracemalloc X gh-101520 **NO** 6 7
array yes 1
faulthandler X gh-101509 yes 3+ ~22+
msvcrt Y **NO** ??? ??? ???
pyexpat yes 1
readline **NO** 9
winreg Y **NO** ??? ??? ???
winsound Y **NO** ??? ??? ???
test/example modules

These can be ported/isolated but don't have to be. They are the lowest priority.

module issue PR ported # static types # other global objects # other globals
xxmodule 3 1
xxsubtype 2
xxlimited_35 2
...
...

Linked PRs

@ericsnowcurrently ericsnowcurrently added type-feature A feature request or enhancement extension-modules C modules in the Modules dir 3.12 bugs and security fixes labels Mar 28, 2023
@ericsnowcurrently
Copy link
Member Author

Ideally we'll get most of this done for 3.12. FWIW, there isn't enough time if it's just me, so consider this a call out to whoever can help. 😄 Feel free to adjust the TODO list and (hidden) "analysis" table above. Also feel free to reach out to any others that might be interested in pitching in here. Thanks!!!

@erlend-aasland, @corona10, @kumaraditya303, etc.

@ericsnowcurrently
Copy link
Member Author

@terryjreedy (for _tkinter)

@erlend-aasland
Copy link
Contributor

I've got WIP branches for datetime, io, pickle, collections, and ssl. The three former are up as draft PRs.

(FYI, I have quite a bit of CPython time scheduled for next week.)

@ericsnowcurrently
Copy link
Member Author

@markshannon

@erlend-aasland
Copy link
Contributor

Regarding the freelists in _asyncio, Kumar and Guido had some thoughts over at #91375 (comment).

@erlend-aasland
Copy link
Contributor

Elementtree was isolated/ported in gh-92123.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Mar 28, 2023

_lsprof is also done, AFAICS. ... just noticed rotatingtree.c; sorry 'bout the noise.

@ericsnowcurrently
Copy link
Member Author

Thanks for the updates!

@terryjreedy
Copy link
Member

@serhiy-storchaka for _tkinter. I have nothing to do with it.

@erlend-aasland
Copy link
Contributor

AFAICS, array should be fine; array_reconstructor is in the module state already.

@corona10
Copy link
Member

I will try to find which modules that I can support for this issue :)

@erlend-aasland
Copy link
Contributor

I will try to find which modules that I can support for this issue :)

Feel free to pick up any of my draft PRs! :)

@CharlieZhao95
Copy link
Contributor

I tried submitting some PRs(Port _datatime module, Convert _pickle module to use heap types) for multi-phase initialization last year (delayed due to PEP 687 not being accepted at that time).
Maybe now that I can start something new! Can I try to work for _decimal module, if no one has claimed it yet.

@shihai1991
Copy link
Member

I tried submitting some PRs(Port _datatime module, Convert _pickle module to use heap types) for multi-phase initialization last year (delayed due to PEP 687 not being accepted at that time).

Looks like you can reopen your PRs :)

@erlend-aasland
Copy link
Contributor

Maybe now that I can start something new! Can I try to work for _decimal module, if no one has claimed it yet.

Sure you can! I started doing some work on _decimal some weeks ago, but it is very far from complete; feel free to start with that branch :) https://github.com/erlend-aasland/cpython/tree/isolate-decimal

@kumaraditya303
Copy link
Contributor

I volunteer to review the PRs as I have been doing, and leave _asyncio for me to handle.

kumaraditya303 added a commit that referenced this issue Apr 4, 2023
Co-authored-by: Mohamed Koubaa <[email protected]>
Co-authored-by: Kumar Aditya <[email protected]>
aisk added a commit to aisk/cpython that referenced this issue Apr 4, 2023
aisk added a commit to aisk/cpython that referenced this issue Apr 4, 2023
aisk added a commit to aisk/cpython that referenced this issue Apr 4, 2023
aisk added a commit to aisk/cpython that referenced this issue Apr 4, 2023
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
@neonene
Copy link
Contributor

neonene commented Mar 28, 2024

@erlend-aasland I'd like _datetime static types to be heap types to help #113601. Would you mind if I open an issue and take the type conversion part from your POC (without putting them into some states)?

@vstinner
Copy link
Member

@erlend-aasland I'd like _datetime static types to be heap types to help #113601. Would you mind if I open an issue and take the type conversion part from your POC (without putting them into some states)?

Converting static types to heap types is not the blocker issue for the datetime. The problem of datetime is its C API.

It reminds me the problem of the AST C API. We tried different things but we had multiple crashes. At the end, I moved AST types to the PyInterpreterState, they are initialized at the first access. This way, from any code path, you always meet the same types. Before, depending on how types were created, you might meet different types and so PyAST_Check() returned false which was very surprising.

I can help you if you open an issue. But please, first, move definition of these types to PyInterpreterState (put PyTypeObject* pointers there), in a new "datetime" state even if it's shared library.

@neonene
Copy link
Contributor

neonene commented Mar 29, 2024

Maybe already known, but reloading any module (e.g.-R) seems to be possibly unsafe:

import sys
from _decimal import Decimal as D1
del sys.modules['_decimal']
from _decimal import Decimal as D2

print(a := D1(1), b := D1(1), a == b, a is b)  # load
print(c := D2(1), d := D2(1), c == d, c is d)  # reload
print(a == c)
  • 3.12: Static Decimal type, single-phase-init
1 1 True False
1 1 True False
True
  • 3.13: Heap Decimal type, multi-phase init
1 1 True False
1 1 True False
False   # a and c have different module states

Indeed, PyInterpreterState should be effective for _datetime to be used in main and isolated sub-interpreters.

@encukou
Copy link
Member

encukou commented Mar 29, 2024

What do you mean by unsafe? If something's unsafe at the C level (e.g. undefined behaviour, crashes), that's a bug.

This behaviour is known; see https://docs.python.org/3/howto/isolating-extensions.html#surprising-edge-cases and the warning in https://docs.python.org/3/library/sys.html#sys.modules

@neonene
Copy link
Contributor

neonene commented Mar 29, 2024

What do you mean by unsafe?

Thanks. I suspect _datetime 's crashes with multi-phase init come from incorrect state accesses from binary (or ternaly) functions. I also think the init check in test_capi should be reconsidered. However, I agree to use PyInterpreterState if we need to check multiple module states. I have not fully investigated, though.

EDIT: The _datetime crash can be seen in the refleak test.

@neonene
Copy link
Contributor

neonene commented Mar 31, 2024

I've opened a _datetime issue: #117398

@ericsnowcurrently
Copy link
Member Author

@Yhg1s, could we get an exception for 3.13 like we got for 3.12, regarding porting stdlib extensions to multi-phase init? You let us land those changes up to beta 2.

This time around there are only two modules left to be done, and the PRs for both are close enough that I expect they could be wrapped up before beta 2 (end of May).

@neonene
Copy link
Contributor

neonene commented May 3, 2024

@vstinner

From #118357 (comment):

Which crash? This PR is related to issue #117398 which doesn't mention any crash.

I said at #103092 (comment) in this thread:

I'd like _datetime static types to be heap types to help #113601.

You replied:

I can help you if you open an issue.

Then, I opened the issue #117398. I think we are aware of #113601, whose original issue was raised a year ago and you said there (#100911 (comment)):

It will work as expected once more extensions will be converted to the multi-phase initialization API.

We know well that the porting resolves existing different issues derived from static types and single-phase init. My volunteer is following the generic approach that seems to be supported by active core devs, especially by you. Also, the recent crash case you reported at #116467 on _ctypes (actually ctypes) was actually fixed by multi-phase init.

@kumaraditya303 has not been in good health, right? And @erlend-aasland appears to be not working individually on this project, and to be exhausted: #110415 (comment).

As to _datetime C-API, there seems to be no technical reason to block the isolation. Even if it blocks, I do not understand why there was no attempt to unblock nor to investigate further.

Such change is too late for beta1. It's better to wait for the beginning of the 3.14 dev cycle.

How do you think of the situation?

In my experience, such change is likely to cause regression and need more time to be stabilized.

Again, how do you think of the delay? Maybe isolated _datetime could be stabilized even at this point if we tried to work without bias nor reluctance? Why we cannot stabilize _datetime in the regular way with many eye-balls on 3.13?

@vstinner
Copy link
Member

vstinner commented May 5, 2024

Please open a dedicated issue for the datetime crash.

@vstinner
Copy link
Member

vstinner commented May 5, 2024

@Yhg1s, could we get an exception for 3.13 like we got for 3.12, regarding porting stdlib extensions to multi-phase init? You let us land those changes up to beta 2.

I don't think that it's a good idea. Regressions are complex to debug/fix, and it's better to continue the work in Python 3.14 and learn how to work around Python 3.13 limited isolation.

Copy link
Member

@Yhg1s, could we get an exception for 3.13 like we got for 3.12, regarding porting stdlib extensions to multi-phase init? You let us land those changes up to beta 2.

This time around there are only two modules left to be done, and the PRs for both are close enough that I expect they could be wrapped up before beta 2 (end of May).

Which modules are we talking about? You referenced this from a _datetime change that didn't seem to actually switch to multi-phase init.

@neonene
Copy link
Contributor

neonene commented May 5, 2024

Please open a dedicated issue for the datetime crash.

Opened #118608, whose title is taken from the _ctypes crash issue.

@vstinner
Copy link
Member

vstinner commented May 6, 2024

Opened #118608, whose title is taken from the _ctypes crash issue.

Oh, Python 3.12 is also affected. We cannot isolate datetime in 3.12 to fix the issue. We will need a different fix for 3.12. So 3.13 can use the same fix.

@neonene
Copy link
Contributor

neonene commented May 6, 2024

We will need a different fix for 3.12.

Unfortunately, ctypes also crashes on 3.12:

@ericsnowcurrently
Copy link
Member Author

Which modules are we talking about? You referenced this from a _datetime change that didn't seem to actually switch to multi-phase init.

datetime and ctypes. Those are the only two left from the list at the top of this issue.

Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…eters (python#113434)

Enable imports of _elementtree module in sub-interpreters
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
)

Test the following features for _ctypes types:
- disallow instantiation
- inheritance (MRO)
- immutability
- type name

The following _ctypes types are tested:
- Array
- CField
- COMError
- PyCArrayType
- PyCFuncPtrType
- PyCPointerType
- PyCSimpleType
- PyCStructType
- Structure
- Union
- UnionType
- _CFuncPtr
- _Pointer
- _SimpleCData

Co-authored-by: Erlend E. Aasland <[email protected]>
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests