-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3.11 re.compile
raises SRE code error for valid regex.
#98740
Comments
I've bisected this to f703c96 |
The simplified example is: re.compile('()()()()()()()(?(1)()?)') It is caused by the fundamental flaw in the RE validation code which checks whether the last word in the "then" branch of the conditional expression matches opcode JUMP which was 16 in 3.10 and below and becomes 15 in 3.11. Unfortunately it matches the value of the argument of other opcode ("MARK 15") which means the end of the 8th capturing group. The bug is not new. Even simpler example for 3.11 is: re.compile(r'()(?(1)\x0f?)') and for 3.10 and below: re.compile(r'()(?(1)\x10?)') No matter what is the value of the JUMP opcode, there is always an example which fails. The solution of this issue will not be easy and may require changing semantic of some opcodes. |
In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure.
In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure.
In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure.
…onGH-98764) In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure. (cherry picked from commit e9ac890) Co-authored-by: Serhiy Storchaka <[email protected]>
pythonGH-98764) In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure.. (cherry picked from commit e9ac890) Co-authored-by: Serhiy Storchaka <[email protected]>
In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure. (cherry picked from commit e9ac890) Co-authored-by: Serhiy Storchaka <[email protected]>
…98764) (GH-99046) In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure. (cherry picked from commit e9ac890) Co-authored-by: Serhiy Storchaka <[email protected]>
Bug report
Following regex causes
re.compile()
to raiseRuntimeError: invalid SRE code
:Your environment
Python 3.11
I've checked and this hasn't been an issue in all previous Python interpreter versions, starting from 3.6 (the oldest I've checked).
What's more the regex is correctly recognized and does not cause any issues in other regexp implementations, e.g. the online tool https://regex101.com/
I've already asked about this on mailing list and confirmed that this is a bug.
@serhiy-storchaka has confirmed that the case for this bug has already been found.
The text was updated successfully, but these errors were encountered: