Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour of IntFlag with a custom __new__ in Python 3.11.0. #101541

Closed
picnixz opened this issue Feb 3, 2023 · 10 comments
Closed

Unexpected behaviour of IntFlag with a custom __new__ in Python 3.11.0. #101541

picnixz opened this issue Feb 3, 2023 · 10 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@picnixz
Copy link
Contributor

picnixz commented Feb 3, 2023

Bug report

According to the enum documentation, it is possible to customize the enumeration value via a custom __new__ method and the enumeration member (e.g., by adding an attribute) via a custom __init__ method. However, the implementation of the enum.Flag class in 3.11.0 (and probably in 3.11.1) introduces some issues compared to the one in 3.10.3, especially in the case of an enum.IntFlag:

$ read -r -d '' code << EOM
from enum import IntFlag

class Flag(IntFlag):
    def __new__(cls, ch: str, *args):
        value = 1 << ord(ch)
        self = int.__new__(cls, value)
        self._value_ = value
        return self

    def __init__(self, _, *args):
        super().__init__()
        # do something with the positional arguments

    a = ('a', 'A')

print(repr(Flag.a ^ Flag.a))
EOM

$ python3.10 -c "$code"
<Flag.0: 0>

$ python3.11 -c "$code"
ValueError: 0 is not a valid Flag

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 16, in <module>
  File "/Lib/enum.py", line 1501, in __xor__
    return self.__class__(value ^ other)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Lib/enum.py", line 695, in __call__
    return cls.__new__(cls, value)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Lib/enum.py", line 1119, in __new__
    raise exc
  File "/Lib/enum.py", line 1096, in __new__
    result = cls._missing_(value)
             ^^^^^^^^^^^^^^^^^^^^
  File "/Lib/enum.py", line 1416, in _missing_
    pseudo_member = (__new__ or cls._member_type_.__new__)(cls, value)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 5, in __new__
TypeError: ord() expected string of length 1, but int found

This also results in the impossibility of writing Flag.a | i for i != 0 (for i = 0, it does work ! and this is confusing !), which IMHO is a regression compared to what was proposed in 3.10.3. It also clashes with the following assumption:

If a Flag operation is performed with an IntFlag member and:

  • the result is a valid IntFlag: an IntFlag is returned
  • result is not a valid IntFlag: the result depends on the FlagBoundary setting

Currently, the FlagBoundary for IntFlag is KEEP, so Flag.a | 12 is expected to be Flag.a|8|4 as in 3.10.3.

In order to avoid this issue, users need to write something like:

def __new__(cls, ch, *args):
    value = ch if isinstance(ch, int) else 1 << ord(ch)
    self = int.__new__(cls, value)
    self._value_ = value
    return self

Neverthless, this is only possible if __new__ converts an input U to an entirely different type V (enum member type) or if args is non-empty when declaring enumeration members. However, this fails in the following example:

class FlagFromChar(IntFlag):
    def __new__(cls, e: int):
        value = 1 << e
        self = int.__new__(cls, value)
        self._value_ = value
        return self

    a = ord('a')

# in Python 3.10.3
repr(FlagFromChar.a ^ FlagFromChar.a) == '<FlagFromChar.0: 0>'

# in Python 3.11.1
repr(FlagFromChar.a ^ FlagFromChar.a) == '<FlagFromChar: 1>'

Environment

  • CPython versions tested on: 3.10.3 and 3.11.0 (compiled from sources with GCC 7.5.0)
  • Operating System: openSUSE Leap 15.2 x86_64
  • Kernel: 5.3.18-lp152.106-default

Linked PRs

@picnixz picnixz added the type-bug An unexpected behavior, bug, or error label Feb 3, 2023
@AlexWaygood AlexWaygood added the stdlib Python modules in the Lib dir label Feb 3, 2023
@ethanfurman
Copy link
Member

Thank you for the thorough bug report!

The issue is that in 3.11+, when a new (psuedo) member needs to be created, the original __new__ is called to do the work. This is in preparation for being able to handle any extra attributes (if desired) in a future version of Python.

So the immediate issue is that your __new__ is trying to take the ord() of the number 0, which of course fails. The fix is something like:

value = ch
if isinstance(value, str):
    value = ord(ch)

which will work in 3.11, 3.10, etc..

@picnixz
Copy link
Contributor Author

picnixz commented Feb 4, 2023

Thank you for your answser, but as I mentioned in my report, this would only work in some specific cases and would not handle the second case where the transformation maps int to int. I admit that this may be a poor design choice but the fact remains that FlagFromChar.a ^ FlagFromChar.a is clearly not what we expect.

As I said, this may be (partially) avoided by checking whether args has length 0 or not because positional arguments (if present) are only used when the enumeration member is constructed in the class definition (and not when pseudo-members are constructed on the fly).

A way to handle the new attributes in a future version of Python is to require some other special method to be overloaded instead of __new__, method which, upon creating the class, would be removed (or kept, I don't know which would be the most suitable).

EDIT: For instance, in order to distinguish an enumeration member declaration (e.g. "a = ord('a')") with the pseudo-member creation, we could write "a = (ord('a'), None)" and ignore the None in the __new__, but this would be very ugly.

@ethanfurman
Copy link
Member

First, my apologies for not noticing that you were already aware of the "workaround" (checking the type of the incoming value).

Second, when I try the FlagFromChar example, I get the same results in both 3.10 and 3.11 (which is to say, a zero value).

Thirdly (from your edit), if you are creating members directly from ints, and psuedo-members are being created directly from ints, why do you need to be able to tell the difference?

@picnixz
Copy link
Contributor Author

picnixz commented Feb 5, 2023

Thank you for your quick answer !

First, my apologies for not noticing that you were already aware of the "workaround" (checking the type of the incoming value).

Do not worry about that !

Second, when I try the FlagFromChar example, I get the same results in both 3.10 and 3.11 (which is to say, a zero value).

I said "3.11.1" but it's "3.11.0", sorry about that. I recompiled using 3.11.1 sources but I still had the same issue. I was wondering whether it's an issue on my side but I cannot find an online interpreter where I change the interpreter version to exactly match a desired version.

Thirdly (from your edit), if you are creating members directly from ints, and psuedo-members are being created directly from ints, why do you need to be able to tell the difference?

Assume the enumeration members are declared with an ordinal value (in my example, it would be e = ord('a') = 97) but that the enumeration value is 1 << e. Then, what I expect is that only the members defined in the class body (and not pseudo-members) would have such behaviour. When trying other corner cases, I encountered the following failed example (at least on my machine).

from enum import IntFlag

class FlagFromChar(IntFlag):
    def __new__(cls, c):
        value = 1 << c
        self = int.__new__(cls, value)
        self._value_ = value
        return self

    a = ord('a')

print(FlagFromChar.a | 1)

With Python 3.10.3, it does not abort but with 3.11.0 / 3.11.1, it fails as follows:

ValueError: 158456325028528675187087900673 is not a valid FlagFromChar

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 12, in <module>
  File "/cpython/dist/3.11.1/lib/python3.11/enum.py", line 1501, in __or__
    return self.__class__(value | other)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cpython/dist/3.11.1/lib/python3.11/enum.py", line 715, in __call__
    return cls.__new__(cls, value)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/cpython/dist/3.11.1/lib/python3.11/enum.py", line 1139, in __new__
    raise exc
  File "/cpython/dist/3.11.1/lib/python3.11/enum.py", line 1116, in __new__
    result = cls._missing_(value)
             ^^^^^^^^^^^^^^^^^^^^
  File "/cpython/dist/3.11.1/lib/python3.11/enum.py", line 1436, in _missing_
    pseudo_member = (__new__ or cls._member_type_.__new__)(cls, value)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 5, in __new__
OverflowError: too many digits in integer

As we can see, FlagFromChar.a | 1 is correctly evaluated as an integer (1 << ord('a') | 1 == 158456325028528675187087900673), but unfortunately it cannot be converted to a pseudo-member since the input e to __new__ is now 158456325028528675187087900673 ! Since we are only dealing with integers, it is impossible to decide whether it's a real member or a pseudo-member being constructed. One possible fix for that would be:

class FlagFromChar(IntFlag):
    def __new__(cls, c, *args):
        # args = (None,) iff we are dealing with a real member
        value = (1 << c) if args else c
        self = int.__new__(cls, value)
        self._value_ = value
        return self

    a = (97, None)

In 3.10.3, pseudo-members (for IntFlag) were created by creating singleton pseudo-members 1 and by decomposing the input value into known flags and extra ones, namely when writing FlagFromChar.a | 1, we would have 1 as an extra flag. This is confirmed as follows:

>>> # Python 3.10.3
>>> FlagFromChar._value2member_map_
{158456325028528675187087900672: <Flag.a: 158456325028528675187087900672>}
>>> Flag.a | 1 
<Flag.a|1: 158456325028528675187087900673>
>>> FlagFromChar._value2member_map_
{158456325028528675187087900672: <Flag.a: 158456325028528675187087900672>, 1: <Flag.1: 1>, 158456325028528675187087900673: <Flag.a|1: 158456325028528675187087900673>}

In 3.11, the extra flags are not extracted since pseudo-members are directly created via __new__. In summary, for 3.10.3, the 1 above is a flag value whereas for 3.11, 1 is the flag "pre-value", i.e, an input to __new__ (I understand the purpose of not having those pseudo-members being stored because the space complexity for _value2member_map would increase).


This is somewhat entirely unrelated to this issue but when digging through the codebase and the documentation, I saw that the official documentation says the IntFlag class has the EJECT boundary but this is not the case (#93250). I don't know whether a doc-issue is already taking care of that.

Footnotes

  1. https://github.com/python/cpython/blob/3.10/Lib/enum.py#L948

@ethanfurman
Copy link
Member

Ah, thanks -- now I get it: __new__ is manipulating the incoming value, but for psuedo-members the value shoud be used as-is. Looks like 3.11.2 is scheduled for tomorrow -- I'll try to get a fix in, otherwise it'll be 3.11.3.

@carljm
Copy link
Member

carljm commented Feb 10, 2023

Looks like the fix and backport both landed? Closing as fixed, feel free to reopen if there is further work needed here beyond the merged PR.

@carljm carljm closed this as completed Feb 10, 2023
@ethanfurman
Copy link
Member

Reopening this issue.

The docs mentioned above had originally been written for Enum, and were not appropriately updated for Flag.

For Enum, the intention was that it was possible to supply several items on the member definition line, but not use all of them as the member value:

class Coordinate(bytes, Enum):
    PX = (0, 'P.X', 'km')

PX.value == 0

For Flags, that behavior is a little trickier for the user to implement because of the automatic creation of psuedo-members, but is still possible -- and it requires that whichever argument is used as the value, is not modified by __new__.

As an example of such behavior, here is an excerpt from a mixed aenum.Flag:

class TermColor(str, Flag):
    def __new__(cls, ...):
        ...
    #
    Bright = '1'          # ESC [ 1 m       # bright
    # foreground - 30s
    FG_Green = '32'           # ESC [ 32 m      # green
    #  background - 40s
    BG_Black = '40'           # ESC [ 30 m      # black

TermColor.FG_Green | TermColor.BG_Black
# <TermColor.FG_Green|BG_Black>

str(TermColor.FG_Green | TermColor.BG_Black)
# '\x1b[32;40m'

@ethanfurman ethanfurman reopened this Jun 15, 2023
@ethanfurman
Copy link
Member

Forgot to mention: I'll get a deprecation warning in 3.12.

@AlexWaygood AlexWaygood added 3.11 only security fixes 3.12 bugs and security fixes labels Jul 21, 2023
@picnixz
Copy link
Contributor Author

picnixz commented Jul 29, 2023

@ethanfurman

Forgot to mention: I'll get a deprecation warning in 3.12.

Which part would get a DeprecationWarning? does it mean that Flag classes would not support custom __new__ methods? if so, when is the API planned for removal?

@ethanfurman
Copy link
Member

I no longer remember what I was going to deprecate, so I won't. At some point in the future I hope to add capabilities to allow the TermColor example above, but it will not affect what is working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants