Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault in Perl 5.21.8 while fuzzing Perl binary #14389

Comments

Copy link

Migrated from rt.perl.org#123539 (status was 'resolved')

Searchable as RT123539$

Copy link
Author

From @geeknik

I cloned the git repo on 01/02/2015 and built from source using the afl-gcc
compiler​:

CC=/path/to/afl-gcc ./Configure
AFL_HARDEN=1 make

This is perl 5, version 21, subversion 8 (v5.21.8 (v5.21.7-209-g4e27940))
built for x86_64-linux

Besides the above information, this version of Perl was compiled using all
defaults (enter was pressed for every question).

While fuzzing the new perl binary, I found a testcase that causes a
segfault. I've attached the file for inclusion in the bug report. Please
note that a file named = containing zero bytes was also created in a tmp
directory around 1 minute 5 seconds before the testcase was written to
disk. Not sure if this is related but I wanted to add the information just
in case it was important. Please advise if this is a security issue and if
you will handle the CVE request.

@​@​ VALGRIND OUTPUT @​@​

valgrind -q /home/geeknik/perl5/perl
./id\​:000000\,sig\​:11\,src\​:002461+016504\,op\​:splice\,rep\​:32
==54123== Use of uninitialised value of size 8
==54123== at 0x651E15​: S_regatom (regcomp.c​:12632)
==54123== by 0x656B49​: S_regpiece (regcomp.c​:10821)
==54123== by 0x65AB0D​: S_regbranch (regcomp.c​:10747)
==54123== by 0x66C9A5​: S_reg.constprop.9 (regcomp.c​:10497)
==54123== by 0x690D3B​: Perl_re_op_compile (regcomp.c​:6697)
==54123== by 0x4C1222​: Perl_pmruntime (op.c​:5629)
==54123== by 0x5CBEBC​: Perl_yyparse (perly.y​:1000)
==54123== by 0x4F3114​: perl_parse (perl.c​:2273)
==54123== by 0x42A92B​: main (perlmain.c​:114)
==54123==
==54123== Invalid write of size 1
==54123== at 0x64C878​: S_regatom (regcomp.c​:12484)
==54123== by 0x656B49​: S_regpiece (regcomp.c​:10821)
==54123== by 0x65AB0D​: S_regbranch (regcomp.c​:10747)
==54123== by 0x66C9A5​: S_reg.constprop.9 (regcomp.c​:10497)
==54123== by 0x693D3B​: Perl_re_op_compile (regcomp.c​:6895)
==54123== by 0x4C1222​: Perl_pmruntime (op.c​:5629)
==54123== by 0x5CBEBC​: Perl_yyparse (perly.y​:1000)
==54123== by 0x4F3114​: perl_parse (perl.c​:2273)
==54123== by 0x42A92B​: main (perlmain.c​:114)
==54123== Address 0x5eea96c is 0 bytes after a block of size 3,500 alloc'd
==54123== at 0x4C28BED​: malloc (vg_replace_malloc.c​:263)
==54123== by 0x6CEFBC​: Perl_safesysmalloc (util.c​:144)
==54123== by 0x692C7A​: Perl_re_op_compile (regcomp.c​:6746)
==54123== by 0x4C1222​: Perl_pmruntime (op.c​:5629)
==54123== by 0x5CBEBC​: Perl_yyparse (perly.y​:1000)
==54123== by 0x4F3114​: perl_parse (perl.c​:2273)
==54123== by 0x42A92B​: main (perlmain.c​:114)
==54123==
syntax error at ./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line
11, near "printr)"
Unknown regexp modifier "/b" at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 17, at end of
line
Unknown regexp modifier "/G" at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 17, at end of
line
Unknown regexp modifier "/e" at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 17, at end of
line
Unknown regexp modifier "/r" at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 17, at end of
line
panic​: reg_node overrun trying to emit 0, 5eea97c>=5eea96c at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 20.

@​@​ GDB OUTPUT @​@​

gdb /home/geeknik/perl5/perl core
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+​: GNU GPL version 3 or later <http​://gnu.org/licenses/gpl.html

This is free software​: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see​:
<http​://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/geeknik/perl5/perl...done.
[New LWP 12797]

warning​: Can't read pathname for load map​: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/geeknik/perl5/perl
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f1223db11f0 in _int_free (av=0x7f12240c0e40, p=0x238f260) at
malloc.c​:5002
5002 malloc.c​: No such file or directory.
(gdb) bt
#0 0x00007f1223db11f0 in _int_free (av=0x7f12240c0e40, p=0x238f260) at
malloc.c​:5002
#1 0x00007f1223db47bc in *__GI___libc_free (mem=<optimized out>) at
malloc.c​:3738
#2 0x00000000007bf7e8 in Perl_sv_clear (orig_sv=orig_sv@​entry=0x238bbf8)
at sv.c​:6743
#3 0x00000000007b75f8 in Perl_sv_free2 (sv=0x238bbf8, rc=<optimized out>)
at sv.c​:7018
#4 0x000000000043cc85 in S_SvREFCNT_dec (sv=<optimized out>) at
inline.h​:166
#5 Perl_op_clear (o=o@​entry=0x238df50) at op.c​:931
#6 0x000000000043d568 in Perl_op_free (o=0x238df50) at op.c​:781
#7 0x000000000043e119 in Perl_opslab_force_free (slab=0x2389980) at
op.c​:443
#8 0x00000000005f12c1 in Perl_cv_undef_flags (cv=0x2368328, flags=0) at
pad.c​:350
#9 0x00000000007bedc0 in Perl_sv_clear (orig_sv=orig_sv@​entry=0x2368328)
at sv.c​:6592
#10 0x00000000007b75f8 in Perl_sv_free2 (sv=0x2368328, rc=<optimized out>)
at sv.c​:7018
#11 0x00000000004e77b5 in S_SvREFCNT_dec (sv=<optimized out>) at
inline.h​:166
#12 perl_destruct (my_perl=<optimized out>) at perl.c​:780
#13 0x000000000042ab24 in main (argc=2, argv=0x7fff5c8b3a58,
env=0x7fff5c8b3a70) at perlmain.c​:127
#14 0x00007f1223d57ead in __libc_start_main (main=<optimized out>,
argc=<optimized out>,
  ubp_av=<optimized out>, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>,
  stack_end=0x7fff5c8b3a48) at libc-start.c​:244
#15 0x000000000042ac45 in _start ()
(gdb)

geeknik@​deb7fuzz​:~/perl5/utils$ ./perlbug -d


Flags​:
  category=core
  severity=low


This perlbug was built using Perl 5.21.8 - Fri Jan 2 19​:02​:59 CST 2015
It is being executed now by Perl 5.21.7 - Thu Dec 18 14​:34​:01 CST 2014.

Site configuration information for perl 5.21.7​:

Configured by geeknik at Thu Dec 18 14​:34​:01 CST 2014.

Summary of my perl5 (revision 5 version 21 subversion 7) configuration​:
  Commit id​: e9d2bd8
  Platform​:
  osname=linux, osvers=3.2.0-4-amd64, archname=x86_64-linux
  uname='linux deb7fuzz 3.2.0-4-amd64 #1 smp debian 3.2.63-2+deb7u2
x86_64 gnulinux '
  config_args=''
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=undef, usemultiplicity=undef
  use64bitint=define, use64bitall=define, uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='/home/geeknik/afl/afl-gcc', ccflags ='-fwrapv -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2',
  optimize='-O2',
  cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
  ccversion='', gccversion='4.7.2', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678,
doublekind=3
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16,
longdblkind=3
  ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='/home/geeknik/afl/afl-gcc', ldflags =' -fstack-protector
-L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
  libs=-lnsl -ldl -lm -lcrypt -lutil -lc -lpthread
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc -lpthread
  libc=libc-2.13.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.13'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'


@​INC for perl 5.21.7​:
  /usr/local/lib/perl5/site_perl/5.21.7/x86_64-linux
  /usr/local/lib/perl5/site_perl/5.21.7
  /usr/local/lib/perl5/5.21.7/x86_64-linux
  /usr/local/lib/perl5/5.21.7
  .


Environment for perl 5.21.7​:
  HOME=/home/geeknik
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=/usr/local/bin​:/usr/bin​:/bin​:/usr/local/games​:/usr/games
  PERL_BADLANG (unset)
  SHELL=/bin/bash

Copy link
Author

From @geeknik

crash.gz

Copy link
Author

From @jkeenan

On Sat Jan 03 11​:49​:13 2015, brian.carpenter@​gmail.com wrote​:

I cloned the git repo on 01/02/2015 and built from source using the afl-gcc
compiler​:

CC=/path/to/afl-gcc ./Configure
AFL_HARDEN=1 make

Can you describe how one obtains the afl-gcc compiler?

This is perl 5, version 21, subversion 8 (v5.21.8 (v5.21.7-209-g4e27940))
built for x86_64-linux

Besides the above information, this version of Perl was compiled using all
defaults (enter was pressed for every question).

While fuzzing the new perl binary, I found a testcase that causes a
segfault. I've attached the file for inclusion in the bug report. Please
note that a file named = containing zero bytes was also created in a tmp
directory around 1 minute 5 seconds before the testcase was written to
disk. Not sure if this is related but I wanted to add the information just
in case it was important. Please advise if this is a security issue and if
you will handle the CVE request.

I'm a bit puzzled about the test case, since, insofar as it is intended to be a perl program, it fails to compile. It reports a syntax error near 'printr)'. Can you clarify?

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

Copy link
Author

The RT System itself - Status changed from 'new' to 'open'

Copy link
Author

From @mauke

Am 03.01.2015 um 22​:54 schrieb James E Keenan via RT​:

On Sat Jan 03 11​:49​:13 2015, brian.carpenter@​gmail.com wrote​:

I cloned the git repo on 01/02/2015 and built from source using the afl-gcc
compiler​:

CC=/path/to/afl-gcc ./Configure
AFL_HARDEN=1 make

Can you describe how one obtains the afl-gcc compiler?

http​://lcamtuf.coredump.cx/afl/

This is perl 5, version 21, subversion 8 (v5.21.8 (v5.21.7-209-g4e27940))
built for x86_64-linux

Besides the above information, this version of Perl was compiled using all
defaults (enter was pressed for every question).

While fuzzing the new perl binary, I found a testcase that causes a
segfault. I've attached the file for inclusion in the bug report. Please
note that a file named = containing zero bytes was also created in a tmp
directory around 1 minute 5 seconds before the testcase was written to
disk. Not sure if this is related but I wanted to add the information just
in case it was important. Please advise if this is a security issue and if
you will handle the CVE request.

I'm a bit puzzled about the test case, since, insofar as it is intended to be a perl program, it fails to compile. It reports a syntax error near 'printr)'. Can you clarify?

As the original bug report says​:

syntax error at ./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32
line 11, near "printr)"
Unknown regexp modifier "/b" at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 17, at end of
line
Unknown regexp modifier "/G" at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 17, at end of
line
Unknown regexp modifier "/e" at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 17, at end of
line
Unknown regexp modifier "/r" at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 17, at end of
line
panic​: reg_node overrun trying to emit 0, 5eea97c>=5eea96c at
./id​:000000,sig​:11,src​:002461+016504,op​:splice,rep​:32 line 20.

Perl panics/crashes while trying to handle a syntax error, apparently.

--
Lukas Mai <plokinom@​gmail.com>

Copy link
Author

From @geeknik

The original test case that I fed to AFL was this​:

#!/usr/local/bin/perl
print "Content-type​: text/html\n\n";
print "Hello World.\n";
print "Heres the form info​:<P>\n";
my($buffer);
my(@​pairs);
my($pair);
read(STDIN,$buffer,$ENV{'CONTENT_LENGTH'});
@​pairs = split(/&/, $buffer);
foreach $pair (@​pairs)
  {
  print "$pair<BR>\n"
  }
print "<P>Note that further parsing is\n";
print "necessary to turn the plus signs\n";
print "into spaces and get rid of some\n";
print "other web encoding.\n";

After about 25 million iterations, that script was turned into the test
case that I submitted. Shouldn't perl fail gracefully instead of just
crashing because of invalid input/characters/etc?

Copy link
Author

From @geeknik

The afl-gcc is available as part of American Fuzzy Lop which you can obtain
from here​: http​://lcamtuf.coredump.cx/afl/

Copy link
Author

From @cpansprout

On Sat Jan 03 14​:50​:18 2015, brian.carpenter@​gmail.com wrote​:

Shouldn't perl fail gracefully instead of just
crashing because of invalid input/characters/etc?

Yes, it should. I don’t get a crash when I run the script. I do get a panic message, though, in bleadperl and in 5.20.1. 5.18.3 just shows ‘normal’ errors, no panics. I’m going to run a bisect.

--

Father Chrysostomos

Copy link
Author

From @cpansprout

On Sat Jan 03 17​:35​:31 2015, sprout wrote​:

On Sat Jan 03 14​:50​:18 2015, brian.carpenter@​gmail.com wrote​:

Shouldn't perl fail gracefully instead of just
crashing because of invalid input/characters/etc?

Yes, it should. I don’t get a crash when I run the script. I do get
a panic message, though, in bleadperl and in 5.20.1. 5.18.3 just
shows ‘normal’ errors, no panics. I’m going to run a bisect.

31f05a3 is the first bad commit
commit 31f05a3
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Mon Jan 27 15​:35​:00 2014 -0700

  Work properly under UTF-8 LC_CTYPE locales
 
  This large (sorry, I couldn't figure out how to meaningfully split it
  up) commit causes Perl to fully support LC_CTYPE operations (case
  changing, character classification) in UTF-8 locales.
 
  As a side effect it resolves [perl #56820].

--

Father Chrysostomos

Copy link
Author

From @cpansprout

On Sat Jan 03 18​:00​:41 2015, sprout wrote​:

On Sat Jan 03 17​:35​:31 2015, sprout wrote​:

On Sat Jan 03 14​:50​:18 2015, brian.carpenter@​gmail.com wrote​:

Shouldn't perl fail gracefully instead of just
crashing because of invalid input/characters/etc?

Yes, it should. I don’t get a crash when I run the script. I do get
a panic message, though, in bleadperl and in 5.20.1. 5.18.3 just
shows ‘normal’ errors, no panics. I’m going to run a bisect.

31f05a3 is the first bad commit
commit 31f05a3
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Mon Jan 27 15​:35​:00 2014 -0700

Work properly under UTF\-8 LC\_CTYPE locales

This large \(sorry\, I couldn't figure out how to meaningfully split it
up\) commit causes Perl to fully support LC\_CTYPE operations \(case
changing\, character classification\) in UTF\-8 locales\.

As a side effect it resolves \[perl \#56820\]\.

Reduced case. This appears to have nothing to do with the syntax error.

/TffffffffffffTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT5TT
TTTTTTTTTTTTTTTTTTTTTTT3TTgTTTTTTTTTTTTTTTTTTTTT2TTTTTTTTTTTTTTTTTTTTTT
THHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHiHHHHHH
Hffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffff
fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffff&ffff/il

The output I get is​:

panic​: reg_node overrun trying to emit 0, 7fcdf140cca8>=7fcdf140cca4 at /Users/sprout/Downloads/crash line 7.

With some variations of the above, I got malloc errors.

--

Father Chrysostomos

Copy link
Author

From @khwilliamson

On 01/03/2015 08​:39 PM, Father Chrysostomos via RT wrote​:

On Sat Jan 03 18​:00​:41 2015, sprout wrote​:

On Sat Jan 03 17​:35​:31 2015, sprout wrote​:

On Sat Jan 03 14​:50​:18 2015, brian.carpenter@​gmail.com wrote​:

Shouldn't perl fail gracefully instead of just
crashing because of invalid input/characters/etc?

Yes, it should. I don’t get a crash when I run the script. I do get
a panic message, though, in bleadperl and in 5.20.1. 5.18.3 just
shows ‘normal’ errors, no panics. I’m going to run a bisect.

31f05a3 is the first bad commit
commit 31f05a3
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Mon Jan 27 15​:35​:00 2014 -0700

 Work properly under UTF\-8 LC\_CTYPE locales

 This large \(sorry\, I couldn't figure out how to meaningfully split it
 up\) commit causes Perl to fully support LC\_CTYPE operations \(case
 changing\, character classification\) in UTF\-8 locales\.

 As a side effect it resolves \[perl \#56820\]\.

Reduced case. This appears to have nothing to do with the syntax error.

/TffffffffffffTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT5TT
TTTTTTTTTTTTTTTTTTTTTTT3TTgTTTTTTTTTTTTTTTTTTTTT2TTTTTTTTTTTTTTTTTTTTTT
THHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHiHHHHHH
Hffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffff
fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffff&ffff/il

The output I get is​:

panic​: reg_node overrun trying to emit 0, 7fcdf140cca8>=7fcdf140cca4 at /Users/sprout/Downloads/crash line 7.

With some variations of the above, I got malloc errors.

I didn't get these errors, but I did see some valgrind issues, which the
attached patch resolves. Please try it out to see if it fixes your
problems.

Copy link
Author

From @khwilliamson

0003-Trial-patch-for-perl-123539-Segmentation-Fault-in-Pe.patch
From 29adfaedc3d0dec9ff288c7c40642e85018b57cf Mon Sep 17 00:00:00 2001
From: Karl Williamson <[email protected]>
Date: Sat, 3 Jan 2015 22:30:05 -0700
Subject: [PATCH 3/3] Trial patch for [perl #123539] Segmentation Fault in Perl
 5.21.8

---
 regcomp.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/regcomp.c b/regcomp.c
index 78c614d..82d45e8 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -12444,7 +12444,9 @@ tryagain:
                         && is_PROBLEMATIC_LOCALE_FOLD_cp(ender)))
                 {
                     if (UTF) {
-                        const STRLEN unilen = reguni(pRExC_state, ender, s);
+                        const STRLEN unilen = (SIZE_ONLY && ! FOLD)
+                                               ? UNISKIP(ender)
+                                               : (uvchr_to_utf8((U8*)s, ender) - (U8*)s);
                         if (unilen > 0) {
                            s   += unilen;
                            len += unilen;
@@ -12457,6 +12459,9 @@ tryagain:
                          * cancel out the increment that follows */
                         len--;
                     }
+                    else if (FOLD) {
+                        *(s++) = (char) ender;
+                    }
                     else {
                         REGC((char)ender, s++);
                     }
-- 
1.9.1

Copy link
Author

From @cpansprout

On Sat Jan 03 21​:33​:22 2015, public@​khwilliamson.com wrote​:

On 01/03/2015 08​:39 PM, Father Chrysostomos via RT wrote​:

On Sat Jan 03 18​:00​:41 2015, sprout wrote​:

On Sat Jan 03 17​:35​:31 2015, sprout wrote​:

On Sat Jan 03 14​:50​:18 2015, brian.carpenter@​gmail.com wrote​:

Shouldn't perl fail gracefully instead of just
crashing because of invalid input/characters/etc?

Yes, it should. I don’t get a crash when I run the script. I do
get
a panic message, though, in bleadperl and in 5.20.1. 5.18.3 just
shows ‘normal’ errors, no panics. I’m going to run a bisect.

31f05a3 is the first bad commit
commit 31f05a3
Author​: Karl Williamson <public@​khwilliamson.com>
Date​: Mon Jan 27 15​:35​:00 2014 -0700

Work properly under UTF-8 LC_CTYPE locales

This large (sorry, I couldn't figure out how to meaningfully split
it
up) commit causes Perl to fully support LC_CTYPE operations (case
changing, character classification) in UTF-8 locales.

As a side effect it resolves [perl #56820].

Reduced case. This appears to have nothing to do with the syntax
error.

/TffffffffffffTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT5TT
TTTTTTTTTTTTTTTTTTTTTTT3TTgTTTTTTTTTTTTTTTTTTTTT2TTTTTTTTTTTTTTTTTTTTTT
THHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHiHHHHHH
Hffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffff
fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffff&ffff/il

The output I get is​:

panic​: reg_node overrun trying to emit 0, 7fcdf140cca8>=7fcdf140cca4
at /Users/sprout/Downloads/crash line 7.

With some variations of the above, I got malloc errors.

I didn't get these errors, but I did see some valgrind issues, which
the
attached patch resolves. Please try it out to see if it fixes your
problems.

Yes, it works for me, both with the reduced case and with the original script from this ticket. I have no idea how it works, though.

--

Father Chrysostomos

Copy link
Author

From @jkeenan

On Sat Jan 03 21​:33​:22 2015, public@​khwilliamson.com wrote​:

I didn't get these errors, but I did see some valgrind issues, which
the
attached patch resolves. Please try it out to see if it fixes your
problems.

The patch works for me as well. With it I get normal syntax errors as shown by the attachment.

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

Copy link
Author

From @jkeenan

Bareword found where operator expected at /home/jkeenan/learn/perl/p5p/noshebang-123539-crash line 21, near "HHHHHHHHHHHHHHHHHHHH"
  (Might be a runaway multi-line // string starting on line 17)
  (Missing semicolon on previous line?)
syntax error at /home/jkeenan/learn/perl/p5p/noshebang-123539-crash line 11, near "printr)"
Unknown regexp modifier "/b" at /home/jkeenan/learn/perl/p5p/noshebang-123539-crash line 17, at end of line
Unknown regexp modifier "/G" at /home/jkeenan/learn/perl/p5p/noshebang-123539-crash line 17, at end of line
Unknown regexp modifier "/e" at /home/jkeenan/learn/perl/p5p/noshebang-123539-crash line 17, at end of line
Unknown regexp modifier "/r" at /home/jkeenan/learn/perl/p5p/noshebang-123539-crash line 17, at end of line
syntax error at /home/jkeenan/learn/perl/p5p/noshebang-123539-crash line 21, near "HHHHHHHHHHHHHHHHHHHH"
Unrecognized character \x83; marked by <-- HERE after HHHHHHHHHH<-- HERE near column 22 at /home/jkeenan/learn/perl/p5p/noshebang-123539-crash line 21.

Copy link
Author

From @hvds

"Father Chrysostomos via RT" <perlbug-followup@​perl.org> wrote​:
:On Sat Jan 03 21​:33​:22 2015, public@​khwilliamson.com wrote​:
:> I didn't get these errors, but I did see some valgrind issues, which
:> the
:> attached patch resolves. Please try it out to see if it fixes your
:> problems.
:
:Yes, it works for me, both with the reduced case and with the original script from this ticket. I have no idea how it works, though.

I don't understand the fix either, but I was able to come up with a
somewhat shorter test case​:

% ./perl -we '/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbAAbbb/il'
panic​: reg_node overrun trying to emit 0, 1720b50>=1720b50 at -e line 1.
%

That's the first hit (len=130, pos1=125, pos2=126) from this search​:

#!./perl
my $lc = 'b';
my $uc = 'A'; # any of [AFHIJSTWY]
for my $len (2 .. 408) {
  for my $pos1 (0 .. $len - 2) {
  for my $pos2 ($pos1 + 1 .. $len - 1) {
  my $s = $lc x $len;
  substr($s, $_, 1) = $uc for ($pos1, $pos2);
  eval qq{/$s/il};
  die "$len​: $pos1, $pos2" if $@​;
} } }

I also don't know what's special about [AFHIJSTWY], but I assume they relate
somehow to the PROBLEMATIC_LOCALE_FOLD characters.

With Karl's suggested patch, the search completes without seeing any problems.

Hugo

Copy link
Author

From @demerphq

On 4 January 2015 at 14​:20, <hv@​crypt.org> wrote​:

"Father Chrysostomos via RT" <perlbug-followup@​perl.org> wrote​:
:On Sat Jan 03 21​:33​:22 2015, public@​khwilliamson.com wrote​:
:> I didn't get these errors, but I did see some valgrind issues, which
:> the
:> attached patch resolves. Please try it out to see if it fixes your
:> problems.
:
:Yes, it works for me, both with the reduced case and with the original script from this ticket. I have no idea how it works, though.

I don't understand the fix either,

Just to explain, I am pretty sure the key part of the patch is​:

- const STRLEN unilen = reguni(pRExC_state, ender, s);
+ const STRLEN unilen = (SIZE_ONLY && ! FOLD)
+ ? UNISKIP(ender)
+ :
(uvchr_to_utf8((U8*)s, ender) - (U8*)s);

The regex engine does two compile passes over the pattern (the first
is sometimes repeated so sometimes it is actually three passes), the
first pass is a sizing pass, which calculates/estimates an upper limit
on the size of the compiled regexp program that will result from the
pattern. Once this has been calculated the caller allocates the buffer
the program will be written into and then recompiles. During the first
pass the SIZE_ONLY flag is set, and this flag is used in all of the
compiler logic to decide what to do.

Prior to Karl's patch it looks like we would try to write to the
non-existent program buffer during the first path. With his patch it
does the right thing on the first patch.

Hugo​: bear in mind that the logic in study_chunk() fires AFTER the
final write pass. IOW, this is all happening well before the optimizer
fires. And indeed the multi-pass nature is exactly what I want to
change (someday).

cheers,
yves

Copy link
Author

From @hvds

demerphq <demerphq@​gmail.com> wrote​:
:On 4 January 2015 at 14​:20, <hv@​crypt.org> wrote​:
:> "Father Chrysostomos via RT" <perlbug-followup@​perl.org> wrote​:
:> :On Sat Jan 03 21​:33​:22 2015, public@​khwilliamson.com wrote​:
:> :> I didn't get these errors, but I did see some valgrind issues, which
:> :> the
:> :> attached patch resolves. Please try it out to see if it fixes your
:> :> problems.
:> :
:> :Yes, it works for me, both with the reduced case and with the original script from this ticket. I have no idea how it works, though.
:>
:> I don't understand the fix either,
:
:Just to explain, I am pretty sure the key part of the patch is​:
:
:- const STRLEN unilen = reguni(pRExC_state, ender, s);
:+ const STRLEN unilen = (SIZE_ONLY && ! FOLD)
:+ ? UNISKIP(ender)
:+ :
:(uvchr_to_utf8((U8*)s, ender) - (U8*)s);
:
:The regex engine does two compile passes over the pattern (the first
:is sometimes repeated so sometimes it is actually three passes), the
:first pass is a sizing pass, which calculates/estimates an upper limit
:on the size of the compiled regexp program that will result from the
:pattern. Once this has been calculated the caller allocates the buffer
:the program will be written into and then recompiles. During the first
:pass the SIZE_ONLY flag is set, and this flag is used in all of the
:compiler logic to decide what to do.
:
:Prior to Karl's patch it looks like we would try to write to the
:non-existent program buffer during the first path. With his patch it
:does the right thing on the first patch.

It's the other way round isn't it? reguni() will just return the size
when SIZE_ONLY, but write to s otherwise; the patch makes it also write
to s when SIZE_ONLY && !FOLD.

Looking further I see this, way further up​:
  /* In pass1, folded, we use a temporary buffer instead of the
  * actual node, as the node doesn't exist yet */
  s = (SIZE_ONLY && FOLD) ? foldbuf : STRING(ret);
.. at which point it starts to make a little more sense.

Hugo

Copy link
Author

From @demerphq

On 5 January 2015 at 17​:28, <hv@​crypt.org> wrote​:

demerphq <demerphq@​gmail.com> wrote​:
:On 4 January 2015 at 14​:20, <hv@​crypt.org> wrote​:
:> "Father Chrysostomos via RT" <perlbug-followup@​perl.org> wrote​:
:> :On Sat Jan 03 21​:33​:22 2015, public@​khwilliamson.com wrote​:
:> :> I didn't get these errors, but I did see some valgrind issues, which
:> :> the
:> :> attached patch resolves. Please try it out to see if it fixes your
:> :> problems.
:> :
:> :Yes, it works for me, both with the reduced case and with the original script from this ticket. I have no idea how it works, though.
:>
:> I don't understand the fix either,
:
:Just to explain, I am pretty sure the key part of the patch is​:
:
:- const STRLEN unilen = reguni(pRExC_state, ender, s);
:+ const STRLEN unilen = (SIZE_ONLY && ! FOLD)
:+ ? UNISKIP(ender)
:+ :
:(uvchr_to_utf8((U8*)s, ender) - (U8*)s);
:
:The regex engine does two compile passes over the pattern (the first
:is sometimes repeated so sometimes it is actually three passes), the
:first pass is a sizing pass, which calculates/estimates an upper limit
:on the size of the compiled regexp program that will result from the
:pattern. Once this has been calculated the caller allocates the buffer
:the program will be written into and then recompiles. During the first
:pass the SIZE_ONLY flag is set, and this flag is used in all of the
:compiler logic to decide what to do.
:
:Prior to Karl's patch it looks like we would try to write to the
:non-existent program buffer during the first path. With his patch it
:does the right thing on the first patch.

It's the other way round isn't it? reguni() will just return the size
when SIZE_ONLY, but write to s otherwise;

Yes, I missed that.

the patch makes it also write
to s when SIZE_ONLY && !FOLD.

Looking further I see this, way further up​:
/* In pass1, folded, we use a temporary buffer instead of the
* actual node, as the node doesn't exist yet */
s = (SIZE_ONLY && FOLD) ? foldbuf : STRING(ret);
.. at which point it starts to make a little more sense.

Indeed. Sorry for the misdirection.

Cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Copy link
Author

From @khwilliamson

Fixed in blead by
405dffc
--
Karl Williamson

Copy link
Author

@khwilliamson - Status changed from 'open' to 'pending release'

Copy link
Author

From @khwilliamson

On Tue Jan 06 14​:11​:00 2015, khw wrote​:

Fixed in blead by
405dffc

which is suitable for a maintenance release.

Here's the text of that commit message​:

This is a minimal patch suitable for a maintenance release. It extracts
the guts of reguni and REGC without the conditional they have. The next
commit will do some refactoring to avoid branching and to make things
clearer.

This bug is due to the current two pass structure of the Perl regular
expression compiler. The first pass attempts to do just enough work to
figure out how much space to malloc for the compiled pattern; and the
2nd pass actually fills in the details. One problem with this design is
that in many cases quite a bit of work is required to figure out the
size, and this work is thrown away and redone in the second pass.
Another problem is that it is easy to forget to do enough work in the
sizing pass, and that is what happened with the blamed commit. I
understand that there are plans (God speed) to change the compiler
design.

When not under /i matching, the size of a node that will match a
sequence of characters is just the number of bytes those characters take
up. We have an easy way to calculate the number of bytes any code point
will occupy in UTF-8, and it's just 1 byte per code point for non-UTF-8.
So in the sizing pass, we don't actually have to figure out the
representation of the characters. However under /i matching, we do.
First of all, matching of UTF-8 strings is done by replacing each
character of each string by its fold-case (function fc()) and then
comparing. This is required by the nature of full Unicode matching
which is not 1-1. If we do that replacement for the pattern at compile
time, we avoid having to do it over-and-over as pattern matching
backtracks at execution. And because fc(x) may not occupy the same
number of bytes as x, and there is no easy way to know that size without
actually doing the fc(), we have to do the fold in the sizing pass.
Now, there are relatively few folds where sizeof(fc(x)) != sizeof(x), so
we could construct an exception table for those few cases where it is,
and look up through that.

But there is another reason that we have to fold in the sizing pass.
And that is because of the potential for multi-character folds being
split across regnodes. The regular expression compiler generates
EXACTish regnodes for matching sequences of characters exactly or via
/i. The limit for how many bytes in a sequence such a node can match is
255 because the length is stored in a U8. If the pattern has a sequence
longer than that, it is split into two or more EXACTish nodes in a row.
(Actually, the compiler splits at a size much lower than that; I'm not
sure why, but then two adjoining nodes whose total sum length is at most
255 get joined later in the third, optimizing pass.) Now consider,
matching the character U+FB03 LATIN SMALL LIGATURE FFI. It matches the
sequence of the three characters "f f i". Because of the design of the
regex pattern matching code, if these characters are such that the first
one or two are at the end of one EXACTish node, and the final two or one
are in another EXACTish node, then U+FB03 wrongly would not match them.
Matches can't cross node boundaries. If the pattern were tweaked so all
three characters were in either the first or second node, then the match
would succeed. And that is what the compiler does. When it reaches the
node's size limit, and the final character is one that is a non-terminal
character in a multi-char fold, what's in the node is backed-off until
it ends with a character without this characteristic. This has to be
done in the sizing pass, as we are repacking the nodes, which can affect
the size of the pattern, and we have to know what the folds are in order
to determine all this.

(We don't fold non-UTF-8 patterns. This is for two reasons. One is
that one character, the U+00B5 MICRO SIGN, folds to above-Latin1, and if
we folded it, we would have to change the pattern into UTF-8, and that
would slow everything down. I've thought about adding a regnode type
for the much more common case of a sequence that doesn't have this
character in it, and which could hence be folded at compile time. But
I've not been able to justify this because of the 2nd reason, which is
folds in this range are simple enough to be handled by an array lookup,
so folding is fast at runtime.)

Then there is the complication of matching under locale rules. This bug
manifested itself only under /l matching. We can't fold at pattern
compile time, because the folding rules won't be known until runtime.
This isn't a problem for non-UTF-8 locales, as all folds are 1-1, and so
there never will be a multi-char fold. But there could be such folds in
a UTF-8 locale, so the regnodes have to be packed to work for that
eventuality. The blamed commit did not do that, and because this issue
doesn't arise unless there is a string long enough to trigger the
problem, this wasn't found until now. What is needed, and what this
commit does, is for the unfolded characters to be accumulated in both
passes. The code that looks for potential multi-char fold issues
handles both folded and unfolded-inputs, so will work.

Copy link
Author

From [Unknown Contact. See original ticket]

On Tue Jan 06 14​:11​:00 2015, khw wrote​:

Fixed in blead by
405dffc

which is suitable for a maintenance release.

Here's the text of that commit message​:

This is a minimal patch suitable for a maintenance release. It extracts
the guts of reguni and REGC without the conditional they have. The next
commit will do some refactoring to avoid branching and to make things
clearer.

This bug is due to the current two pass structure of the Perl regular
expression compiler. The first pass attempts to do just enough work to
figure out how much space to malloc for the compiled pattern; and the
2nd pass actually fills in the details. One problem with this design is
that in many cases quite a bit of work is required to figure out the
size, and this work is thrown away and redone in the second pass.
Another problem is that it is easy to forget to do enough work in the
sizing pass, and that is what happened with the blamed commit. I
understand that there are plans (God speed) to change the compiler
design.

When not under /i matching, the size of a node that will match a
sequence of characters is just the number of bytes those characters take
up. We have an easy way to calculate the number of bytes any code point
will occupy in UTF-8, and it's just 1 byte per code point for non-UTF-8.
So in the sizing pass, we don't actually have to figure out the
representation of the characters. However under /i matching, we do.
First of all, matching of UTF-8 strings is done by replacing each
character of each string by its fold-case (function fc()) and then
comparing. This is required by the nature of full Unicode matching
which is not 1-1. If we do that replacement for the pattern at compile
time, we avoid having to do it over-and-over as pattern matching
backtracks at execution. And because fc(x) may not occupy the same
number of bytes as x, and there is no easy way to know that size without
actually doing the fc(), we have to do the fold in the sizing pass.
Now, there are relatively few folds where sizeof(fc(x)) != sizeof(x), so
we could construct an exception table for those few cases where it is,
and look up through that.

But there is another reason that we have to fold in the sizing pass.
And that is because of the potential for multi-character folds being
split across regnodes. The regular expression compiler generates
EXACTish regnodes for matching sequences of characters exactly or via
/i. The limit for how many bytes in a sequence such a node can match is
255 because the length is stored in a U8. If the pattern has a sequence
longer than that, it is split into two or more EXACTish nodes in a row.
(Actually, the compiler splits at a size much lower than that; I'm not
sure why, but then two adjoining nodes whose total sum length is at most
255 get joined later in the third, optimizing pass.) Now consider,
matching the character U+FB03 LATIN SMALL LIGATURE FFI. It matches the
sequence of the three characters "f f i". Because of the design of the
regex pattern matching code, if these characters are such that the first
one or two are at the end of one EXACTish node, and the final two or one
are in another EXACTish node, then U+FB03 wrongly would not match them.
Matches can't cross node boundaries. If the pattern were tweaked so all
three characters were in either the first or second node, then the match
would succeed. And that is what the compiler does. When it reaches the
node's size limit, and the final character is one that is a non-terminal
character in a multi-char fold, what's in the node is backed-off until
it ends with a character without this characteristic. This has to be
done in the sizing pass, as we are repacking the nodes, which can affect
the size of the pattern, and we have to know what the folds are in order
to determine all this.

(We don't fold non-UTF-8 patterns. This is for two reasons. One is
that one character, the U+00B5 MICRO SIGN, folds to above-Latin1, and if
we folded it, we would have to change the pattern into UTF-8, and that
would slow everything down. I've thought about adding a regnode type
for the much more common case of a sequence that doesn't have this
character in it, and which could hence be folded at compile time. But
I've not been able to justify this because of the 2nd reason, which is
folds in this range are simple enough to be handled by an array lookup,
so folding is fast at runtime.)

Then there is the complication of matching under locale rules. This bug
manifested itself only under /l matching. We can't fold at pattern
compile time, because the folding rules won't be known until runtime.
This isn't a problem for non-UTF-8 locales, as all folds are 1-1, and so
there never will be a multi-char fold. But there could be such folds in
a UTF-8 locale, so the regnodes have to be packed to work for that
eventuality. The blamed commit did not do that, and because this issue
doesn't arise unless there is a string long enough to trigger the
problem, this wasn't found until now. What is needed, and what this
commit does, is for the unfolded characters to be accumulated in both
passes. The code that looks for potential multi-char fold issues
handles both folded and unfolded-inputs, so will work.

Copy link
Author

From @demerphq

On 6 January 2015 at 23​:16, Karl Williamson via RT
<perlbug-comment@​perl.org> wrote​:

On Tue Jan 06 14​:11​:00 2015, khw wrote​:

Fixed in blead by
405dffc

which is suitable for a maintenance release.

Here's the text of that commit message​:

This is a minimal patch suitable for a maintenance release. It extracts
the guts of reguni and REGC without the conditional they have. The next
commit will do some refactoring to avoid branching and to make things
clearer.

++

Nice explanation.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Copy link
Author

From @hvds

"Karl Williamson via RT" <perlbug-comment@​perl.org> wrote​:
:On Tue Jan 06 14​:11​:00 2015, khw wrote​:
:> Fixed in blead by
:> 405dffc
:
:which is suitable for a maintenance release.
:
:Here's the text of that commit message​:
[snip]

Thanks Karl, that's very clear.

So I assume we ended up splitting in pass 2 (or 3) resulting in a size
growth that should have been anticipated in pass 1 but wasn't, and this
is why the symptom was a "panic​: reg_node overrun". Is that right?

I think there'd be value in having a test for this that's more minimal
than the original fuzz-generated example​: I'm not sure if my shorter [1]
case is shortest, or best-suited, but it'll be easier (particularly with
some additional comments) for the next guy to make sense of if something
in this area breaks again in the future.

Hugo

[1] % ./perl -we '/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbAAbbb/il'
panic​: reg_node overrun trying to emit 0, 1720b50>=1720b50 at -e line 1.
%

Copy link
Author

From @khwilliamson

On 01/07/2015 02​:04 AM, hv@​crypt.org wrote​:

"Karl Williamson via RT" <perlbug-comment@​perl.org> wrote​:
:On Tue Jan 06 14​:11​:00 2015, khw wrote​:
:> Fixed in blead by
:> 405dffc
:
:which is suitable for a maintenance release.
:
:Here's the text of that commit message​:
[snip]

Thanks Karl, that's very clear.

So I assume we ended up splitting in pass 2 (or 3) resulting in a size
growth that should have been anticipated in pass 1 but wasn't, and this
is why the symptom was a "panic​: reg_node overrun". Is that right?

The problem was that it was reading uninitialized memory in pass 1.
That ended up giving a different result than when the memory was
properly initialized in pass2, so the nodes it was sized for were not
correct. I don't known how this translated into the overrun.

I think there'd be value in having a test for this that's more minimal
than the original fuzz-generated example​: I'm not sure if my shorter [1]
case is shortest, or best-suited, but it'll be easier (particularly with
some additional comments) for the next guy to make sense of if something
in this area breaks again in the future.

The reason I went with the test I did was that it had a bunch of f's in
a row, which I know off the top of my head are part of multi-char folds,
whereas yours didn't have any characters that I knew that about, so I
would have had to look it up. Probably the bug is triggered by just
exceeding the node size limit, depending on what's in memory at the
time. But, by being sure that it will have to execute the loop to back
off all the f's put some extra stress on the code, so I went with that.
  I thought the reference to the bug number in the .t was sufficient
comment if someone needed to understand the test. I didn't want to
really explain it there, as it is rather complicated. (It took me a
couple hours to write that commit message)

Patches welcome.

Hugo

[1] % ./perl -we '/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbAAbbb/il'
panic​: reg_node overrun trying to emit 0, 1720b50>=1720b50 at -e line 1.
%

Copy link
Author

From @hvds

Karl Williamson <public@​khwilliamson.com> wrote​:
:On 01/07/2015 02​:04 AM, hv@​crypt.org wrote​:
:> "Karl Williamson via RT" <perlbug-comment@​perl.org> wrote​:
:> :On Tue Jan 06 14​:11​:00 2015, khw wrote​:
:> :> Fixed in blead by
:> :> 405dffc
:> :
:> :which is suitable for a maintenance release.
:> :
:> :Here's the text of that commit message​:
:> [snip]
:>
:> Thanks Karl, that's very clear.
:>
:> So I assume we ended up splitting in pass 2 (or 3) resulting in a size
:> growth that should have been anticipated in pass 1 but wasn't, and this
:> is why the symptom was a "panic​: reg_node overrun". Is that right?
:
:The problem was that it was reading uninitialized memory in pass 1.

Ah, this is the bit I was missing; I understand now why reproducing it
(and testing it) usefully is hard.

Hugo

Copy link
Author

From @khwilliamson

Thanks for submitting this ticket

The issue should be resolved with the release today of Perl v5.22, available at http​://www.perl.org/get.html
If you find that the problem persists, feel free to reopen this ticket

--
Karl Williamson for the Perl 5 porters team

Copy link
Author

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant