When computing the anchors on the traceback, results may be wrong if unicode chars are used #99103

fabioz · 2022-11-04T18:05:39Z

Bug report

Consider the code below:

d = {
    "ó": {
        "á": {
            "í": {
                "theta": 1
            }
        }
    }
}

try:
    result = d["ó"]["á"]["í"]["beta"]
except:
    import traceback;traceback.print_exc()

The output provided is:

Traceback (most recent call last):
  File "W:\pydev.debugger\check\snippet2.py", line 12, in <module>
    result = d["ó"]["á"]["í"]["beta"]
             ~~~~~~~~~~~~~~~~~~~^^^^^^^^
KeyError: 'beta'

Notice that for each additional unicode char, an additional `~' is added.

This seems to happen because when computing the anchors in traceback._extract_caret_anchors_from_line_segment the columns from the ast nodes generated in ast.parse seem to be related to bytes and not actual chars.

Your environment

CPython versions tested on: 3.11.0
Operating system and architecture: Windows 10

PR: gh-99103: Normalize specialized traceback anchors against the current line #99145

PR: [3.11] gh-99103: Normalize specialized traceback anchors against the current line #99423

The text was updated successfully, but these errors were encountered:

fabioz · 2022-11-04T18:27:17Z

Note: in another example it gets a bit worse and ends up throwing an internal failure:

#coding: utf-8

try:
    á = 1
    í = 2
    c = tuple
    
    result = á + í + c
except:
    import traceback;traceback.print_exc()

Gives me:

Traceback (most recent call last):
Traceback (most recent call last):
  File "W:\pydev.debugger\check\snippet2.py", line 8, in <module>
    result = á + í + c
             ~~~~~~~~^
TypeError: unsupported operand type(s) for +: 'int' and 'type'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "W:\pydev.debugger\check\snippet2.py", line 10, in <module>
    import traceback;traceback.print_exc()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "C:\bin\Miniconda\envs\py311_tests\Lib\traceback.py", line 183, in print_exc
    print_exception(*sys.exc_info(), limit=limit, file=file, chain=chain)
  File "C:\bin\Miniconda\envs\py311_tests\Lib\traceback.py", line 125, in print_exception
    te.print(file=file, chain=chain)
  File "C:\bin\Miniconda\envs\py311_tests\Lib\traceback.py", line 977, in print
    for line in self.format(chain=chain):
  File "C:\bin\Miniconda\envs\py311_tests\Lib\traceback.py", line 914, in format
    yield from _ctx.emit(exc.stack.format())
                         ^^^^^^^^^^^^^^^^^^
  File "C:\bin\Miniconda\envs\py311_tests\Lib\traceback.py", line 531, in format
    formatted_frame = self.format_frame_summary(frame_summary)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\bin\Miniconda\envs\py311_tests\Lib\traceback.py", line 478, in format_frame_summary
    colno = _byte_offset_to_character_offset(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\bin\Miniconda\envs\py311_tests\Lib\traceback.py", line 566, in _byte_offset_to_character_offset
    return len(as_utf8[:offset + 1].decode("utf-8"))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 13: unexpected end of data

fabioz · 2022-11-04T19:29:00Z

Note: the _byte_offset_to_character_offset could use the code below to compute the offset without having the issue of breaking in a decode operation:

_utf8_with_2_bytes = 0x80
_utf8_with_3_bytes = 0x800
_utf8_with_4_bytes = 0x10000


def _utf8_byte_offset_to_character_offset(s, offset):
    byte_offset = 0
    char_offset = 0
    for char_offset, character in enumerate(s):
        byte_offset += 1

        codepoint = ord(character)

        if codepoint >= _utf8_with_4_bytes:
            byte_offset += 3

        elif codepoint >= _utf8_with_3_bytes:
            byte_offset += 2

        elif codepoint >= _utf8_with_2_bytes:
            byte_offset += 1

        if byte_offset > offset:
            break

    # Make 1 based.
    char_offset += 1
    return char_offset

mdboom · 2022-11-04T21:08:35Z

The second and third comment seem to be a duplicate of #98744, which I confirmed is now fixed on main. The original issue seems to still exist on main, however.

…against the current line

… line (GH-99145) Automerge-Triggered-By: GH:isidentical

…t the current line (pythonGH-99145) Automerge-Triggered-By: GH:isidentical. (cherry picked from commit 57be545) Co-authored-by: Batuhan Taskaya <[email protected]>

…current line (#99423) [3.11] gh-99103: Normalize specialized traceback anchors against the current line (GH-99145) Automerge-Triggered-By: GH:isidentical. (cherry picked from commit 57be545) Co-authored-by: Batuhan Taskaya <[email protected]>

hauntsaninja · 2022-11-29T06:43:15Z

Looks like this has been fixed and backported, thank you for reporting!

fabioz added the type-bug An unexpected behavior, bug, or error label Nov 4, 2022

fabioz mentioned this issue Nov 4, 2022

PEP657 Column position of raised exceptions microsoft/debugpy#1099

Closed

pablogsal assigned isidentical Nov 4, 2022

mdboom added the topic-unicode label Nov 4, 2022

isidentical added a commit to isidentical/cpython that referenced this issue Nov 5, 2022

pythongh-99103: Normalize positions of specialized traceback anchors …

224b223

…against the current line

bedevere-bot mentioned this issue Nov 5, 2022

gh-99103: Normalize specialized traceback anchors against the current line #99145

Merged

isidentical added a commit to isidentical/cpython that referenced this issue Nov 12, 2022

pythongh-99103: Normalize positions of specialized traceback anchors …

f5af20e

…against the current line

miss-islington pushed a commit that referenced this issue Nov 12, 2022

gh-99103: Normalize specialized traceback anchors against the current…

57be545

… line (GH-99145) Automerge-Triggered-By: GH:isidentical

bedevere-bot mentioned this issue Nov 12, 2022

[3.11] gh-99103: Normalize specialized traceback anchors against the current line #99423

Merged

hauntsaninja closed this as completed Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When computing the anchors on the traceback, results may be wrong if unicode chars are used #99103

When computing the anchors on the traceback, results may be wrong if unicode chars are used #99103

fabioz commented Nov 4, 2022 •

edited by bedevere-bot

Loading

fabioz commented Nov 4, 2022

fabioz commented Nov 4, 2022

mdboom commented Nov 4, 2022

hauntsaninja commented Nov 29, 2022

When computing the anchors on the traceback, results may be wrong if unicode chars are used #99103

When computing the anchors on the traceback, results may be wrong if unicode chars are used #99103

Comments

fabioz commented Nov 4, 2022 • edited by bedevere-bot Loading

Bug report

Your environment

fabioz commented Nov 4, 2022

fabioz commented Nov 4, 2022

mdboom commented Nov 4, 2022

hauntsaninja commented Nov 29, 2022

fabioz commented Nov 4, 2022 •

edited by bedevere-bot

Loading