Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVf_QUOTEDPREFIX inconsistently escapes byte values #22833

Open
guest20 opened this issue Dec 5, 2024 · 5 comments
Open

SVf_QUOTEDPREFIX inconsistently escapes byte values #22833

guest20 opened this issue Dec 5, 2024 · 5 comments
Assignees

Comments

@guest20
Copy link

guest20 commented Dec 5, 2024

I don't think any of the notes in the "new bug report" template on github really apply to this issue This issue is a spun off from #20165 (comment)

In addition to not repeating the package name in this warning, can we also substr the class name in cases where it contains control characters or is longer than a "reasonable" class name ("Reasonable" could come from the longest path the OS allows, for example)

It would be nice in cases where one calls a method on a string that's not a class name, especially in webby cases, like jpgs or for large json blobs:

my $jph_binary = slurp glob "~/pics/kitten.jpg";
$jpg_binary->anything();
Can't locate object method "anything" via package "ÿØÿà^@^PJFIF^@^A^A^@..."
Can't locate object method "anything" via package "{"customer_id": 201394, "status": "active", renew_da..."
@leonerd
Copy link
Contributor

leonerd commented Dec 5, 2024

This is already quoted on recent Perl versions, because of SVf_QUOTEDPREFIX.

@guest20
Copy link
Author

guest20 commented Dec 5, 2024

@leonerd I have to admit, I'm still on 5.20

@leonerd
Copy link
Contributor

leonerd commented Dec 5, 2024

@leonerd I have to admit, I'm still on 5.20

Then, this is definitely fixed on later perls ;)

Though, this said, there is something weird going on with the encoding. It appears from this casual testing, that it escapes values 0x80 to 0xBF using \x{DD} notation, but not values 0xC0 to 0xFF:

$ perl -E 'my $obj = join "", map { chr rand 256 } 0 .. 500; $obj->hello'
Can't locate object method "hello" via package "\x{ab}\x{b6}\24\177\x{8d}f�i<�\x{bf}�W\x{9e}6\x{b4}\x{9f} U\x{be}�9�h\1c}0\x{ab}\x{97};\x{a8}\x{98}<�V�a\x{8c}�\x{ac}%@\3\3�\31"..."X�\x{a7}\x{92}\3`.\177\x{97}\0211[\x{85}...

Looking at the output via xxd we see a pattern in the byte outputs:

$ perl -E 'my $obj = join "", map { chr rand 256 } 0 .. 500; $obj->hello' 2>&1 | xxd
00000000: 4361 6e27 7420 6c6f 6361 7465 206f 626a  Can't locate obj
00000010: 6563 7420 6d65 7468 6f64 2022 6865 6c6c  ect method "hell
00000020: 6f22 2076 6961 2070 6163 6b61 6765 2022  o" via package "
00000030: 78ef 5e66 4d68 5c78 7b39 627d e32f de5c  x.^fMh\x{9b}./.\
00000040: 3235 5c78 7b61 347d 4a5c 2223 5c78 7b39  25\x{a4}J\"#\x{9
00000050: 667d 5c34 605c 787b 6139 7d5c 787b 6266  f}\4`\x{a9}\x{bf
00000060: 7d60 5c32 3f5c 787b 6265 7d5c 33ca 5c6e  }`\2?\x{be}\3.\n
00000070: 5c78 7b62 667d 5c33 335c 787b 3866 7d5c  \x{bf}\33\x{8f}\
00000080: 787b 3962 7d20 5c78 7b62 337d 5c78 7b39  x{9b} \x{b3}\x{9
00000090: 337d 5c78 7b61 307d 365c 3331 ea7a 5c5c  3}\x{a0}6\31.z\\

(admittedly this was much easier to read on my terminal with colours).

There are various literal byte values in here with values ef, e3, de, ca, ea, etc... but the output involves \x... escapings of values like 9b, a9, etc..

screenshot attached
Screenshot_2024-12-05_19-14-15

@leonerd leonerd changed the title Can't locate object method "%s" via package %s produces very long messages when methods are called on non-package strings SVf_QUOTEDPREFIX inconsistently escapes byte values Dec 5, 2024
@leonerd
Copy link
Contributor

leonerd commented Dec 5, 2024

I believe the code responsible is Perl_pv_escape() in dump.c, but on a first glance through it I don't see anything that jumps out as being responsible for this oddness in behaviour. A more careful look may be required.

@demerphq demerphq self-assigned this Dec 6, 2024
@demerphq
Copy link
Collaborator

demerphq commented Dec 6, 2024

A more careful look may be required.

I am looking into this.

The problem is that the code currently uses isWORDCHAR_uvchar() to determine if a codepoint is printable or not even when the string is ASCII and the codepoint is > 128. It should be using isWORDCHAR_A() instead.

I also noticed another minor issue that it appends elipses at the end when it should not as it is already appending the elipses in the middle. I have a patch running in test right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants