Merge branch 'merge-pcre' into 10.0

This commit is contained in:
Sergei Golubchik 2016-06-21 16:44:03 +02:00
commit b760a69e1a
31 changed files with 3593 additions and 2242 deletions

View file

@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2015 University of Cambridge
Copyright (c) 1997-2016 University of Cambridge
All rights reserved
@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2015 Zoltan Herczeg
Copyright(c) 2010-2016 Zoltan Herczeg
All rights reserved.
@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2015 Zoltan Herczeg
Copyright(c) 2009-2016 Zoltan Herczeg
All rights reserved.

View file

@ -65,6 +65,7 @@
# so it has been removed.
# 2013-10-08 PH got rid of the "source" command, which is a bash-ism (use ".")
# 2013-11-05 PH added support for PARENS_NEST_LIMIT
# 2016-03-01 PH applied Chris Wilson's patch for MSVC static build
PROJECT(PCRE C CXX)

View file

@ -4,12 +4,104 @@ ChangeLog for PCRE
Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
development is happening in the PCRE2 10.xx series.
Version 8.39 14-June-2016
-------------------------
1. If PCRE_AUTO_CALLOUT was set on a pattern that had a (?# comment between
an item and its qualifier (for example, A(?#comment)?B) pcre_compile()
misbehaved. This bug was found by the LLVM fuzzer.
2. Similar to the above, if an isolated \E was present between an item and its
qualifier when PCRE_AUTO_CALLOUT was set, pcre_compile() misbehaved. This
bug was found by the LLVM fuzzer.
3. Further to 8.38/46, negated classes such as [^[:^ascii:]\d] were also not
working correctly in UCP mode.
4. The POSIX wrapper function regexec() crashed if the option REG_STARTEND
was set when the pmatch argument was NULL. It now returns REG_INVARG.
5. Allow for up to 32-bit numbers in the ordin() function in pcregrep.
6. An empty \Q\E sequence between an item and its qualifier caused
pcre_compile() to misbehave when auto callouts were enabled. This bug was
found by the LLVM fuzzer.
7. If a pattern that was compiled with PCRE_EXTENDED started with white
space or a #-type comment that was followed by (?-x), which turns off
PCRE_EXTENDED, and there was no subsequent (?x) to turn it on again,
pcre_compile() assumed that (?-x) applied to the whole pattern and
consequently mis-compiled it. This bug was found by the LLVM fuzzer.
8. A call of pcre_copy_named_substring() for a named substring whose number
was greater than the space in the ovector could cause a crash.
9. Yet another buffer overflow bug involved duplicate named groups with a
group that reset capture numbers (compare 8.38/7 below). Once again, I have
just allowed for more memory, even if not needed. (A proper fix is
implemented in PCRE2, but it involves a lot of refactoring.)
10. pcre_get_substring_list() crashed if the use of \K in a match caused the
start of the match to be earlier than the end.
11. Migrating appropriate PCRE2 JIT improvements to PCRE.
12. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind
assertion, caused pcretest to generate incorrect output, and also to read
uninitialized memory (detected by ASAN or valgrind).
13. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply
nested set of parentheses of sufficient size caused an overflow of the
compiling workspace (which was diagnosed, but of course is not desirable).
14. And yet another buffer overflow bug involving duplicate named groups, this
time nested, with a nested back reference. Yet again, I have just allowed
for more memory, because anything more needs all the refactoring that has
been done for PCRE2. An example pattern that provoked this bug is:
/((?J)(?'R'(?'R'(?'R'(?'R'(?'R'(?|(\k'R'))))))))/ and the bug was
registered as CVE-2016-1283.
15. pcretest went into a loop if global matching was requested with an ovector
size less than 2. It now gives an error message. This bug was found by
afl-fuzz.
16. An invalid pattern fragment such as (?(?C)0 was not diagnosing an error
("assertion expected") when (?(?C) was not followed by an opening
parenthesis.
17. Fixed typo ("&&" for "&") in pcre_study(). Fortunately, this could not
actually affect anything, by sheer luck.
18. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC
static compilation.
19. Modified the RunTest script to incorporate a valgrind suppressions file so
that certain errors, provoked by the SSE2 instruction set when JIT is used,
are ignored.
20. A racing condition is fixed in JIT reported by Mozilla.
21. Minor code refactor to avoid "array subscript is below array bounds"
compiler warning.
22. Minor code refactor to avoid "left shift of negative number" warning.
23. Fix typo causing compile error when 16- or 32-bit JIT is compiled without
UCP support.
24. Refactor to avoid compiler warnings in pcrecpp.cc.
25. Refactor to fix a typo in pcre_jit_test.c
26. Patch to support compiling pcrecpp.cc with Intel compiler.
Version 8.38 23-November-2015
-----------------------------
1. If a group that contained a recursive back reference also contained a
forward reference subroutine call followed by a non-forward-reference
subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
subroutine call, for example /.((?2)(?R)\1)()/, pcre_compile() failed to
compile correct code, leading to undefined behaviour or an internally
detected error. This bug was discovered by the LLVM fuzzer.

View file

@ -25,7 +25,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2015 University of Cambridge
Copyright (c) 1997-2016 University of Cambridge
All rights reserved.
@ -36,7 +36,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2015 Zoltan Herczeg
Copyright(c) 2010-2016 Zoltan Herczeg
All rights reserved.
@ -47,7 +47,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2015 Zoltan Herczeg
Copyright(c) 2009-2016 Zoltan Herczeg
All rights reserved.

View file

@ -1,6 +1,15 @@
News about PCRE releases
------------------------
Release 8.39 14-June-2016
-------------------------
Some appropriate PCRE2 JIT improvements have been retro-fitted to PCRE1. Apart
from that, this is another bug-fix release. Note that this library (now called
PCRE1) is now being maintained for bug fixes only. New projects are advised to
use the new PCRE2 libraries.
Release 8.38 23-November-2015
-----------------------------

View file

@ -67,6 +67,15 @@ fi
./pcretest -C utf >/dev/null
utf8=$?
# We need valgrind suppressions when JIT is in use. (This isn't perfect because
# some tests are run with -no-jit, but as PCRE1 is in maintenance only, I have
# not bothered about that.)
./pcretest -C jit >/dev/null
if [ $? -eq 1 -a "$valgrind" != "" ] ; then
valgrind="$valgrind --suppressions=./testdata/valgrind-jit.supp"
fi
echo "Testing pcregrep main features"
echo "---------------------------- Test 1 ------------------------------" >testtrygrep

View file

@ -178,6 +178,7 @@ nojit=
sim=
skip=
valgrind=
vjs=
# This is in case the caller has set aliases (as I do - PH)
unset cp ls mv rm
@ -357,6 +358,9 @@ $sim ./pcretest -C jit >/dev/null
jit=$?
if [ $jit -ne 0 -a "$nojit" != "yes" ] ; then
jitopt=-s+
if [ "$valgrind" != "" ] ; then
vjs="--suppressions=$testdata/valgrind-jit.supp"
fi
fi
# If no specific tests were requested, select all. Those that are not
@ -423,7 +427,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ $do1 = yes ] ; then
echo $title1
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput1 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput1 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput1 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -441,7 +445,7 @@ fi
if [ $do2 = yes ] ; then
echo $title2 "(not UTF-$bits)"
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput2 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput2 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput2 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -504,7 +508,7 @@ if [ $do3 = yes ] ; then
if [ "$locale" != "" ] ; then
echo $title3 "(using '$locale' locale)"
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $infile testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $infile testtry
if [ $? = 0 ] ; then
if $cf $outfile testtry >teststdout || \
$cf $outfile2 testtry >teststdout || \
@ -540,7 +544,7 @@ if [ $do4 = yes ] ; then
echo " Skipped because UTF-$bits support is not available"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput4 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput4 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput4 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -560,7 +564,7 @@ if [ $do5 = yes ] ; then
echo " Skipped because UTF-$bits support is not available"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput5 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput5 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput5 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -580,7 +584,7 @@ if [ $do6 = yes ] ; then
echo " Skipped because Unicode property support is not available"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput6 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput6 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput6 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -602,7 +606,7 @@ if [ $do7 = yes ] ; then
echo " Skipped because Unicode property support is not available"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput7 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput7 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput7 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -698,7 +702,7 @@ if [ $do12 = yes ] ; then
if [ $jit -eq 0 -o "$nojit" = "yes" ] ; then
echo " Skipped because JIT is not available or not usable"
else
$sim $valgrind ./pcretest -q $bmode $testdata/testinput12 testtry
$sim $valgrind $vjs ./pcretest -q $bmode $testdata/testinput12 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput12 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -735,7 +739,7 @@ if [ "$do14" = yes ] ; then
cp -f $testdata/saved16 testsaved16
cp -f $testdata/saved32 testsaved32
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput14 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput14 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput14 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -759,7 +763,7 @@ if [ "$do15" = yes ] ; then
echo " Skipped because UTF-$bits support is not available"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput15 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput15 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput15 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -783,7 +787,7 @@ if [ $do16 = yes ] ; then
echo " Skipped because Unicode property support is not available"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput16 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput16 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput16 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -805,7 +809,7 @@ if [ $do17 = yes ] ; then
echo " Skipped when running 8-bit tests"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput17 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput17 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput17 testtry
if [ $? != 0 ] ; then exit 1; fi
@ -829,7 +833,7 @@ if [ $do18 = yes ] ; then
echo " Skipped because UTF-$bits support is not available"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput18 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput18 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput18-$bits testtry
if [ $? != 0 ] ; then exit 1; fi
@ -853,7 +857,7 @@ if [ $do19 = yes ] ; then
echo " Skipped because Unicode property support is not available"
else
for opt in "" "-s" $jitopt; do
$sim $valgrind ./pcretest -q $bmode $opt $testdata/testinput19 testtry
$sim $valgrind ${opt:+$vjs} ./pcretest -q $bmode $opt $testdata/testinput19 testtry
if [ $? = 0 ] ; then
$cf $testdata/testoutput19 testtry
if [ $? != 0 ] ; then exit 1; fi

View file

@ -9,18 +9,18 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [38])
m4_define(pcre_minor, [39])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [2015-11-23])
m4_define(pcre_date, [2016-06-14])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:6:2])
m4_define(libpcre16_version, [2:6:2])
m4_define(libpcre32_version, [0:6:0])
m4_define(libpcreposix_version, [0:3:0])
m4_define(libpcre_version, [3:7:2])
m4_define(libpcre16_version, [2:7:2])
m4_define(libpcre32_version, [0:7:0])
m4_define(libpcreposix_version, [0:4:0])
m4_define(libpcrecpp_version, [0:1:0])
AC_PREREQ(2.57)

View file

@ -315,9 +315,8 @@ documentation for details of how to do this. It is a non-standard way of
building PCRE, for use in environments that have limited stacks. Because of the
greater use of memory management, it runs more slowly. Separate functions are
provided so that special-purpose external code can be used for this case. When
used, these functions are always called in a stack-like manner (last obtained,
first freed), and always for memory blocks of the same size. There is a
discussion about PCRE's stack usage in the
used, these functions always allocate memory blocks of the same size. There is
a discussion about PCRE's stack usage in the
<a href="pcrestack.html"><b>pcrestack</b></a>
documentation.
</P>
@ -2913,9 +2912,9 @@ Cambridge CB2 3QH, England.
</P>
<br><a name="SEC26" href="#TOC1">REVISION</a><br>
<P>
Last updated: 09 February 2014
Last updated: 18 December 2015
<br>
Copyright &copy; 1997-2014 University of Cambridge.
Copyright &copy; 1997-2015 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.

File diff suppressed because it is too large Load diff

View file

@ -1,4 +1,4 @@
.TH PCREAPI 3 "09 February 2014" "PCRE 8.35"
.TH PCREAPI 3 "18 December 2015" "PCRE 8.39"
.SH NAME
PCRE - Perl-compatible regular expressions
.sp
@ -273,9 +273,8 @@ documentation for details of how to do this. It is a non-standard way of
building PCRE, for use in environments that have limited stacks. Because of the
greater use of memory management, it runs more slowly. Separate functions are
provided so that special-purpose external code can be used for this case. When
used, these functions are always called in a stack-like manner (last obtained,
first freed), and always for memory blocks of the same size. There is a
discussion about PCRE's stack usage in the
used, these functions always allocate memory blocks of the same size. There is
a discussion about PCRE's stack usage in the
.\" HREF
\fBpcrestack\fP
.\"
@ -2914,6 +2913,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
Last updated: 09 February 2014
Copyright (c) 1997-2014 University of Cambridge.
Last updated: 18 December 2015
Copyright (c) 1997-2015 University of Cambridge.
.fi

View file

@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2014 University of Cambridge
Copyright (c) 1997-2016 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -485,7 +485,7 @@ static const char error_texts[] =
"lookbehind assertion is not fixed length\0"
"malformed number or name after (?(\0"
"conditional group contains more than two branches\0"
"assertion expected after (?(\0"
"assertion expected after (?( or (?(?C)\0"
"(?R or (?[+-]digits must be followed by )\0"
/* 30 */
"unknown POSIX class name\0"
@ -560,6 +560,7 @@ static const char error_texts[] =
/* 85 */
"parentheses are too deeply nested (stack check)\0"
"digits missing in \\x{} or \\o{}\0"
"regular expression is too complicated\0"
;
/* Table to identify digits and hex digits. This is used when compiling
@ -4566,6 +4567,10 @@ for (;; ptr++)
pcre_uint32 ec;
pcre_uchar mcbuffer[8];
/* Come here to restart the loop without advancing the pointer. */
REDO_LOOP:
/* Get next character in the pattern */
c = *ptr;
@ -4591,7 +4596,8 @@ for (;; ptr++)
if (code > cd->start_workspace + cd->workspace_size -
WORK_SIZE_SAFETY_MARGIN) /* Check for overrun */
{
*errorcodeptr = ERR52;
*errorcodeptr = (code >= cd->start_workspace + cd->workspace_size)?
ERR52 : ERR87;
goto FAILED;
}
@ -4645,9 +4651,10 @@ for (;; ptr++)
goto FAILED;
}
/* If in \Q...\E, check for the end; if not, we have a literal */
/* If in \Q...\E, check for the end; if not, we have a literal. Otherwise an
isolated \E is ignored. */
if (inescq && c != CHAR_NULL)
if (c != CHAR_NULL)
{
if (c == CHAR_BACKSLASH && ptr[1] == CHAR_E)
{
@ -4655,7 +4662,7 @@ for (;; ptr++)
ptr++;
continue;
}
else
else if (inescq)
{
if (previous_callout != NULL)
{
@ -4670,18 +4677,27 @@ for (;; ptr++)
}
goto NORMAL_CHAR;
}
/* Control does not reach here. */
/* Check for the start of a \Q...\E sequence. We must do this here rather
than later in case it is immediately followed by \E, which turns it into a
"do nothing" sequence. */
if (c == CHAR_BACKSLASH && ptr[1] == CHAR_Q)
{
inescq = TRUE;
ptr++;
continue;
}
}
/* In extended mode, skip white space and comments. We need a loop in order
to check for more white space and more comments after a comment. */
/* In extended mode, skip white space and comments. */
if ((options & PCRE_EXTENDED) != 0)
{
for (;;)
const pcre_uchar *wscptr = ptr;
while (MAX_255(c) && (cd->ctypes[c] & ctype_space) != 0) c = *(++ptr);
if (c == CHAR_NUMBER_SIGN)
{
while (MAX_255(c) && (cd->ctypes[c] & ctype_space) != 0) c = *(++ptr);
if (c != CHAR_NUMBER_SIGN) break;
ptr++;
while (*ptr != CHAR_NULL)
{
@ -4695,8 +4711,29 @@ for (;; ptr++)
if (utf) FORWARDCHAR(ptr);
#endif
}
c = *ptr; /* Either NULL or the char after a newline */
}
/* If we skipped any characters, restart the loop. Otherwise, we didn't see
a comment. */
if (ptr > wscptr) goto REDO_LOOP;
}
/* Skip over (?# comments. We need to do this here because we want to know if
the next thing is a quantifier, and these comments may come between an item
and its quantifier. */
if (c == CHAR_LEFT_PARENTHESIS && ptr[1] == CHAR_QUESTION_MARK &&
ptr[2] == CHAR_NUMBER_SIGN)
{
ptr += 3;
while (*ptr != CHAR_NULL && *ptr != CHAR_RIGHT_PARENTHESIS) ptr++;
if (*ptr == CHAR_NULL)
{
*errorcodeptr = ERR18;
goto FAILED;
}
continue;
}
/* See if the next thing is a quantifier. */
@ -4820,15 +4857,15 @@ for (;; ptr++)
if (STRNCMP_UC_C8(ptr+1, STRING_WEIRD_STARTWORD, 6) == 0)
{
nestptr = ptr + 7;
ptr = sub_start_of_word - 1;
continue;
ptr = sub_start_of_word;
goto REDO_LOOP;
}
if (STRNCMP_UC_C8(ptr+1, STRING_WEIRD_ENDWORD, 6) == 0)
{
nestptr = ptr + 7;
ptr = sub_end_of_word - 1;
continue;
ptr = sub_end_of_word;
goto REDO_LOOP;
}
/* Handle a real character class. */
@ -5046,20 +5083,22 @@ for (;; ptr++)
ptr = tempptr + 1;
continue;
/* For the other POSIX classes (ascii, xdigit) we are going to fall
through to the non-UCP case and build a bit map for characters with
code points less than 256. If we are in a negated POSIX class
within a non-negated overall class, characters with code points
greater than 255 must all match. In the special case where we have
not yet generated any xclass data, and this is the final item in
the overall class, we need do nothing: later on, the opcode
/* For the other POSIX classes (ascii, cntrl, xdigit) we are going
to fall through to the non-UCP case and build a bit map for
characters with code points less than 256. If we are in a negated
POSIX class, characters with code points greater than 255 must
either all match or all not match. In the special case where we
have not yet generated any xclass data, and this is the final item
in the overall class, we need do nothing: later on, the opcode
OP_NCLASS will be used to indicate that characters greater than 255
are acceptable. If we have already seen an xclass item or one may
follow (we have to assume that it might if this is not the end of
the class), explicitly match all wide codepoints. */
the class), explicitly list all wide codepoints, which will then
either not match or match, depending on whether the class is or is
not negated. */
default:
if (!negate_class && local_negate &&
if (local_negate &&
(xclass || tempptr[2] != CHAR_RIGHT_SQUARE_BRACKET))
{
*class_uchardata++ = XCL_RANGE;
@ -6529,21 +6568,6 @@ for (;; ptr++)
case CHAR_LEFT_PARENTHESIS:
ptr++;
/* First deal with comments. Putting this code right at the start ensures
that comments have no bad side effects. */
if (ptr[0] == CHAR_QUESTION_MARK && ptr[1] == CHAR_NUMBER_SIGN)
{
ptr += 2;
while (*ptr != CHAR_NULL && *ptr != CHAR_RIGHT_PARENTHESIS) ptr++;
if (*ptr == CHAR_NULL)
{
*errorcodeptr = ERR18;
goto FAILED;
}
continue;
}
/* Now deal with various "verbs" that can be introduced by '*'. */
if (ptr[0] == CHAR_ASTERISK && (ptr[1] == ':'
@ -6604,8 +6628,21 @@ for (;; ptr++)
cd->had_accept = TRUE;
for (oc = cd->open_caps; oc != NULL; oc = oc->next)
{
*code++ = OP_CLOSE;
PUT2INC(code, 0, oc->number);
if (lengthptr != NULL)
{
#ifdef COMPILE_PCRE8
*lengthptr += 1 + IMM2_SIZE;
#elif defined COMPILE_PCRE16
*lengthptr += 2 + IMM2_SIZE;
#elif defined COMPILE_PCRE32
*lengthptr += 4 + IMM2_SIZE;
#endif
}
else
{
*code++ = OP_CLOSE;
PUT2INC(code, 0, oc->number);
}
}
setverb = *code++ =
(cd->assert_depth > 0)? OP_ASSERT_ACCEPT : OP_ACCEPT;
@ -6734,6 +6771,15 @@ for (;; ptr++)
for (i = 3;; i++) if (!IS_DIGIT(ptr[i])) break;
if (ptr[i] == CHAR_RIGHT_PARENTHESIS)
tempptr += i + 1;
/* tempptr should now be pointing to the opening parenthesis of the
assertion condition. */
if (*tempptr != CHAR_LEFT_PARENTHESIS)
{
*errorcodeptr = ERR28;
goto FAILED;
}
}
/* For conditions that are assertions, check the syntax, and then exit
@ -7258,7 +7304,7 @@ for (;; ptr++)
issue is fixed "properly" in PCRE2. As PCRE1 is now in maintenance
only mode, we finesse the bug by allowing more memory always. */
*lengthptr += 2 + 2*LINK_SIZE;
*lengthptr += 4 + 4*LINK_SIZE;
/* It is even worse than that. The current reference may be to an
existing named group with a different number (so apparently not
@ -7274,7 +7320,12 @@ for (;; ptr++)
so far in order to get the number. If the name is not found, leave
the value of recno as 0 for a forward reference. */
else
/* This patch (removing "else") fixes a problem when a reference is
to multiple identically named nested groups from within the nest.
Once again, it is not the "proper" fix, and it results in an
over-allocation of memory. */
/* else */
{
ng = cd->named_groups;
for (i = 0; i < cd->names_found; i++, ng++)
@ -7585,39 +7636,15 @@ for (;; ptr++)
newoptions = (options | set) & (~unset);
/* If the options ended with ')' this is not the start of a nested
group with option changes, so the options change at this level. If this
item is right at the start of the pattern, the options can be
abstracted and made external in the pre-compile phase, and ignored in
the compile phase. This can be helpful when matching -- for instance in
caseless checking of required bytes.
If the code pointer is not (cd->start_code + 1 + LINK_SIZE), we are
definitely *not* at the start of the pattern because something has been
compiled. In the pre-compile phase, however, the code pointer can have
that value after the start, because it gets reset as code is discarded
during the pre-compile. However, this can happen only at top level - if
we are within parentheses, the starting BRA will still be present. At
any parenthesis level, the length value can be used to test if anything
has been compiled at that level. Thus, a test for both these conditions
is necessary to ensure we correctly detect the start of the pattern in
both phases.
group with option changes, so the options change at this level.
If we are not at the pattern start, reset the greedy defaults and the
case value for firstchar and reqchar. */
if (*ptr == CHAR_RIGHT_PARENTHESIS)
{
if (code == cd->start_code + 1 + LINK_SIZE &&
(lengthptr == NULL || *lengthptr == 2 + 2*LINK_SIZE))
{
cd->external_options = newoptions;
}
else
{
greedy_default = ((newoptions & PCRE_UNGREEDY) != 0);
greedy_non_default = greedy_default ^ 1;
req_caseopt = ((newoptions & PCRE_CASELESS) != 0)? REQ_CASELESS:0;
}
greedy_default = ((newoptions & PCRE_UNGREEDY) != 0);
greedy_non_default = greedy_default ^ 1;
req_caseopt = ((newoptions & PCRE_CASELESS) != 0)? REQ_CASELESS:0;
/* Change options at this level, and pass them back for use
in subsequent branches. */
@ -7896,16 +7923,6 @@ for (;; ptr++)
c = ec;
else
{
if (escape == ESC_Q) /* Handle start of quoted string */
{
if (ptr[1] == CHAR_BACKSLASH && ptr[2] == CHAR_E)
ptr += 2; /* avoid empty string */
else inescq = TRUE;
continue;
}
if (escape == ESC_E) continue; /* Perl ignores an orphan \E */
/* For metasequences that actually match a character, we disable the
setting of a first character if it hasn't already been set. */

View file

@ -250,6 +250,7 @@ Arguments:
code the compiled regex
stringname the name of the capturing substring
ovector the vector of matched substrings
stringcount number of captured substrings
Returns: the number of the first that is set,
or the number of the last one if none are set,
@ -258,13 +259,16 @@ Returns: the number of the first that is set,
#if defined COMPILE_PCRE8
static int
get_first_set(const pcre *code, const char *stringname, int *ovector)
get_first_set(const pcre *code, const char *stringname, int *ovector,
int stringcount)
#elif defined COMPILE_PCRE16
static int
get_first_set(const pcre16 *code, PCRE_SPTR16 stringname, int *ovector)
get_first_set(const pcre16 *code, PCRE_SPTR16 stringname, int *ovector,
int stringcount)
#elif defined COMPILE_PCRE32
static int
get_first_set(const pcre32 *code, PCRE_SPTR32 stringname, int *ovector)
get_first_set(const pcre32 *code, PCRE_SPTR32 stringname, int *ovector,
int stringcount)
#endif
{
const REAL_PCRE *re = (const REAL_PCRE *)code;
@ -295,7 +299,7 @@ if (entrysize <= 0) return entrysize;
for (entry = (pcre_uchar *)first; entry <= (pcre_uchar *)last; entry += entrysize)
{
int n = GET2(entry, 0);
if (ovector[n*2] >= 0) return n;
if (n < stringcount && ovector[n*2] >= 0) return n;
}
return GET2(entry, 0);
}
@ -402,7 +406,7 @@ pcre32_copy_named_substring(const pcre32 *code, PCRE_SPTR32 subject,
PCRE_UCHAR32 *buffer, int size)
#endif
{
int n = get_first_set(code, stringname, ovector);
int n = get_first_set(code, stringname, ovector, stringcount);
if (n <= 0) return n;
#if defined COMPILE_PCRE8
return pcre_copy_substring(subject, ovector, stringcount, n, buffer, size);
@ -457,7 +461,10 @@ pcre_uchar **stringlist;
pcre_uchar *p;
for (i = 0; i < double_count; i += 2)
size += sizeof(pcre_uchar *) + IN_UCHARS(ovector[i+1] - ovector[i] + 1);
{
size += sizeof(pcre_uchar *) + IN_UCHARS(1);
if (ovector[i+1] > ovector[i]) size += IN_UCHARS(ovector[i+1] - ovector[i]);
}
stringlist = (pcre_uchar **)(PUBL(malloc))(size);
if (stringlist == NULL) return PCRE_ERROR_NOMEMORY;
@ -473,7 +480,7 @@ p = (pcre_uchar *)(stringlist + stringcount + 1);
for (i = 0; i < double_count; i += 2)
{
int len = ovector[i+1] - ovector[i];
int len = (ovector[i+1] > ovector[i])? (ovector[i+1] - ovector[i]) : 0;
memcpy(p, subject + ovector[i], IN_UCHARS(len));
*stringlist++ = p;
p += len;
@ -619,7 +626,7 @@ pcre32_get_named_substring(const pcre32 *code, PCRE_SPTR32 subject,
PCRE_SPTR32 *stringptr)
#endif
{
int n = get_first_set(code, stringname, ovector);
int n = get_first_set(code, stringname, ovector, stringcount);
if (n <= 0) return n;
#if defined COMPILE_PCRE8
return pcre_get_substring(subject, ovector, stringcount, n, stringptr);

View file

@ -7,7 +7,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2014 University of Cambridge
Copyright (c) 1997-2016 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -275,7 +275,7 @@ pcre.h(.in) and disable (comment out) this message. */
typedef pcre_uint16 pcre_uchar;
#define UCHAR_SHIFT (1)
#define IN_UCHARS(x) ((x) << UCHAR_SHIFT)
#define IN_UCHARS(x) ((x) * 2)
#define MAX_255(c) ((c) <= 255u)
#define TABLE_GET(c, table, default) (MAX_255(c)? ((table)[c]):(default))
@ -283,7 +283,7 @@ typedef pcre_uint16 pcre_uchar;
typedef pcre_uint32 pcre_uchar;
#define UCHAR_SHIFT (2)
#define IN_UCHARS(x) ((x) << UCHAR_SHIFT)
#define IN_UCHARS(x) ((x) * 4)
#define MAX_255(c) ((c) <= 255u)
#define TABLE_GET(c, table, default) (MAX_255(c)? ((table)[c]):(default))
@ -2289,7 +2289,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69,
ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79,
ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERRCOUNT };
ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERRCOUNT };
/* JIT compiling modes. The function list is indexed by them. */

File diff suppressed because it is too large Load diff

View file

@ -242,13 +242,17 @@ static struct regression_test_case regression_test_cases[] = {
{ MA, 0, "a\\z", "aaa" },
{ MA, 0 | F_NOMATCH, "a\\z", "aab" },
/* Brackets. */
/* Brackets and alternatives. */
{ MUA, 0, "(ab|bb|cd)", "bacde" },
{ MUA, 0, "(?:ab|a)(bc|c)", "ababc" },
{ MUA, 0, "((ab|(cc))|(bb)|(?:cd|efg))", "abac" },
{ CMUA, 0, "((aB|(Cc))|(bB)|(?:cd|EFg))", "AcCe" },
{ MUA, 0, "((ab|(cc))|(bb)|(?:cd|ebg))", "acebebg" },
{ MUA, 0, "(?:(a)|(?:b))(cc|(?:d|e))(a|b)k", "accabdbbccbk" },
{ MUA, 0, "\xc7\x82|\xc6\x82", "\xf1\x83\x82\x82\xc7\x82\xc7\x83" },
{ MUA, 0, "=\xc7\x82|#\xc6\x82", "\xf1\x83\x82\x82=\xc7\x82\xc7\x83" },
{ MUA, 0, "\xc7\x82\xc7\x83|\xc6\x82\xc6\x82", "\xf1\x83\x82\x82\xc7\x82\xc7\x83" },
{ MUA, 0, "\xc6\x82\xc6\x82|\xc7\x83\xc7\x83|\xc8\x84\xc8\x84", "\xf1\x83\x82\x82\xc8\x84\xc8\x84" },
/* Greedy and non-greedy ? operators. */
{ MUA, 0, "(?:a)?a", "laab" },
@ -318,6 +322,14 @@ static struct regression_test_case regression_test_cases[] = {
{ CMUA, 0, "[^\xe1\xbd\xb8][^\xc3\xa9]", "\xe1\xbd\xb8\xe1\xbf\xb8\xc3\xa9\xc3\x89#" },
{ MUA, 0, "[^\xe1\xbd\xb8][^\xc3\xa9]", "\xe1\xbd\xb8\xe1\xbf\xb8\xc3\xa9\xc3\x89#" },
{ MUA, 0, "[^\xe1\xbd\xb8]{3,}?", "##\xe1\xbd\xb8#\xe1\xbd\xb8#\xc3\x89#\xe1\xbd\xb8" },
{ MUA, 0, "\\d+123", "987654321,01234" },
{ MUA, 0, "abcd*|\\w+xy", "aaaaa,abxyz" },
{ MUA, 0, "(?:abc|((?:amc|\\b\\w*xy)))", "aaaaa,abxyz" },
{ MUA, 0, "a(?R)|([a-z]++)#", ".abcd.abcd#."},
{ MUA, 0, "a(?R)|([a-z]++)#", ".abcd.mbcd#."},
{ MUA, 0, ".[ab]*.", "xx" },
{ MUA, 0, ".[ab]*a", "xxa" },
{ MUA, 0, ".[ab]?.", "xx" },
/* Bracket repeats with limit. */
{ MUA, 0, "(?:(ab){2}){5}M", "abababababababababababM" },
@ -574,6 +586,16 @@ static struct regression_test_case regression_test_cases[] = {
{ MUA, 0, "(?:(?=.)??[a-c])+m", "abacdcbacacdcaccam" },
{ MUA, 0, "((?!a)?(?!([^a]))?)+$", "acbab" },
{ MUA, 0, "((?!a)?\?(?!([^a]))?\?)+$", "acbab" },
{ MUA, 0, "a(?=(?C)\\B)b", "ab" },
{ MUA, 0, "a(?!(?C)\\B)bb|ab", "abb" },
{ MUA, 0, "a(?=\\b|(?C)\\B)b", "ab" },
{ MUA, 0, "a(?!\\b|(?C)\\B)bb|ab", "abb" },
{ MUA, 0, "c(?(?=(?C)\\B)ab|a)", "cab" },
{ MUA, 0, "c(?(?!(?C)\\B)ab|a)", "cab" },
{ MUA, 0, "c(?(?=\\b|(?C)\\B)ab|a)", "cab" },
{ MUA, 0, "c(?(?!\\b|(?C)\\B)ab|a)", "cab" },
{ MUA, 0, "a(?=)b", "ab" },
{ MUA, 0 | F_NOMATCH, "a(?!)b", "ab" },
/* Not empty, ACCEPT, FAIL */
{ MUA | PCRE_NOTEMPTY, 0 | F_NOMATCH, "a*", "bcx" },
@ -664,6 +686,7 @@ static struct regression_test_case regression_test_cases[] = {
{ PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_CRLF | PCRE_FIRSTLINE, 1, ".", "\r\n" },
{ PCRE_FIRSTLINE | PCRE_NEWLINE_LF | PCRE_DOTALL, 0 | F_NOMATCH, "ab.", "ab" },
{ MUA | PCRE_FIRSTLINE, 1 | F_NOMATCH, "^[a-d0-9]", "\nxx\nd" },
{ PCRE_NEWLINE_ANY | PCRE_FIRSTLINE | PCRE_DOTALL, 0, "....a", "012\n0a" },
/* Recurse. */
{ MUA, 0, "(a)(?1)", "aa" },
@ -798,6 +821,9 @@ static struct regression_test_case regression_test_cases[] = {
/* (*SKIP) verb. */
{ MUA, 0 | F_NOMATCH, "(?=a(*SKIP)b)ab|ad", "ad" },
{ MUA, 0, "(\\w+(*SKIP)#)", "abcd,xyz#," },
{ MUA, 0, "\\w+(*SKIP)#|mm", "abcd,xyz#," },
{ MUA, 0 | F_NOMATCH, "b+(?<=(*SKIP)#c)|b+", "#bbb" },
/* (*THEN) verb. */
{ MUA, 0, "((?:a(*THEN)|aab)(*THEN)c|a+)+m", "aabcaabcaabcaabcnacm" },
@ -1534,10 +1560,10 @@ static int regression_tests(void)
is_successful = 0;
}
#endif
#if defined SUPPORT_PCRE16 && defined SUPPORT_PCRE16
if (ovector16_1[i] != ovector16_2[i] || ovector16_1[i] != ovector16_1[i] || ovector16_1[i] != ovector16_2[i]) {
printf("\n16 and 16 bit: Ovector[%d] value differs(J16:%d,I16:%d,J32:%d,I32:%d): [%d] '%s' @ '%s' \n",
i, ovector16_1[i], ovector16_2[i], ovector16_1[i], ovector16_2[i],
#if defined SUPPORT_PCRE16 && defined SUPPORT_PCRE32
if (ovector16_1[i] != ovector16_2[i] || ovector16_1[i] != ovector32_1[i] || ovector16_1[i] != ovector32_2[i]) {
printf("\n16 and 32 bit: Ovector[%d] value differs(J16:%d,I16:%d,J32:%d,I32:%d): [%d] '%s' @ '%s' \n",
i, ovector16_1[i], ovector16_2[i], ovector32_1[i], ovector32_2[i],
total, current->pattern, current->input);
is_successful = 0;
}

View file

@ -1371,7 +1371,7 @@ do
for (c = 0; c < 16; c++) start_bits[c] |= map[c];
for (c = 128; c < 256; c++)
{
if ((map[c/8] && (1 << (c&7))) != 0)
if ((map[c/8] & (1 << (c&7))) != 0)
{
int d = (c >> 6) | 0xc0; /* Set bit for this starter */
start_bits[d/8] |= (1 << (d&7)); /* and then skip on to the */

View file

@ -66,7 +66,7 @@ Arg RE::no_arg((void*)NULL);
// inclusive test if we ever needed it. (Note that not only the
// __attribute__ syntax, but also __USER_LABEL_PREFIX__, are
// gnu-specific.)
#if defined(__GNUC__) && __GNUC__ >= 3 && defined(__ELF__)
#if defined(__GNUC__) && __GNUC__ >= 3 && defined(__ELF__) && !defined(__INTEL_COMPILER)
# define ULP_AS_STRING(x) ULP_AS_STRING_INTERNAL(x)
# define ULP_AS_STRING_INTERNAL(x) #x
# define USER_LABEL_PREFIX_STR ULP_AS_STRING(__USER_LABEL_PREFIX__)
@ -168,22 +168,22 @@ bool RE::FullMatch(const StringPiece& text,
const Arg& ptr16) const {
const Arg* args[kMaxArgs];
int n = 0;
if (&ptr1 == &no_arg) goto done; args[n++] = &ptr1;
if (&ptr2 == &no_arg) goto done; args[n++] = &ptr2;
if (&ptr3 == &no_arg) goto done; args[n++] = &ptr3;
if (&ptr4 == &no_arg) goto done; args[n++] = &ptr4;
if (&ptr5 == &no_arg) goto done; args[n++] = &ptr5;
if (&ptr6 == &no_arg) goto done; args[n++] = &ptr6;
if (&ptr7 == &no_arg) goto done; args[n++] = &ptr7;
if (&ptr8 == &no_arg) goto done; args[n++] = &ptr8;
if (&ptr9 == &no_arg) goto done; args[n++] = &ptr9;
if (&ptr10 == &no_arg) goto done; args[n++] = &ptr10;
if (&ptr11 == &no_arg) goto done; args[n++] = &ptr11;
if (&ptr12 == &no_arg) goto done; args[n++] = &ptr12;
if (&ptr13 == &no_arg) goto done; args[n++] = &ptr13;
if (&ptr14 == &no_arg) goto done; args[n++] = &ptr14;
if (&ptr15 == &no_arg) goto done; args[n++] = &ptr15;
if (&ptr16 == &no_arg) goto done; args[n++] = &ptr16;
if (&ptr1 == &no_arg) { goto done; } args[n++] = &ptr1;
if (&ptr2 == &no_arg) { goto done; } args[n++] = &ptr2;
if (&ptr3 == &no_arg) { goto done; } args[n++] = &ptr3;
if (&ptr4 == &no_arg) { goto done; } args[n++] = &ptr4;
if (&ptr5 == &no_arg) { goto done; } args[n++] = &ptr5;
if (&ptr6 == &no_arg) { goto done; } args[n++] = &ptr6;
if (&ptr7 == &no_arg) { goto done; } args[n++] = &ptr7;
if (&ptr8 == &no_arg) { goto done; } args[n++] = &ptr8;
if (&ptr9 == &no_arg) { goto done; } args[n++] = &ptr9;
if (&ptr10 == &no_arg) { goto done; } args[n++] = &ptr10;
if (&ptr11 == &no_arg) { goto done; } args[n++] = &ptr11;
if (&ptr12 == &no_arg) { goto done; } args[n++] = &ptr12;
if (&ptr13 == &no_arg) { goto done; } args[n++] = &ptr13;
if (&ptr14 == &no_arg) { goto done; } args[n++] = &ptr14;
if (&ptr15 == &no_arg) { goto done; } args[n++] = &ptr15;
if (&ptr16 == &no_arg) { goto done; } args[n++] = &ptr16;
done:
int consumed;
@ -210,22 +210,22 @@ bool RE::PartialMatch(const StringPiece& text,
const Arg& ptr16) const {
const Arg* args[kMaxArgs];
int n = 0;
if (&ptr1 == &no_arg) goto done; args[n++] = &ptr1;
if (&ptr2 == &no_arg) goto done; args[n++] = &ptr2;
if (&ptr3 == &no_arg) goto done; args[n++] = &ptr3;
if (&ptr4 == &no_arg) goto done; args[n++] = &ptr4;
if (&ptr5 == &no_arg) goto done; args[n++] = &ptr5;
if (&ptr6 == &no_arg) goto done; args[n++] = &ptr6;
if (&ptr7 == &no_arg) goto done; args[n++] = &ptr7;
if (&ptr8 == &no_arg) goto done; args[n++] = &ptr8;
if (&ptr9 == &no_arg) goto done; args[n++] = &ptr9;
if (&ptr10 == &no_arg) goto done; args[n++] = &ptr10;
if (&ptr11 == &no_arg) goto done; args[n++] = &ptr11;
if (&ptr12 == &no_arg) goto done; args[n++] = &ptr12;
if (&ptr13 == &no_arg) goto done; args[n++] = &ptr13;
if (&ptr14 == &no_arg) goto done; args[n++] = &ptr14;
if (&ptr15 == &no_arg) goto done; args[n++] = &ptr15;
if (&ptr16 == &no_arg) goto done; args[n++] = &ptr16;
if (&ptr1 == &no_arg) { goto done; } args[n++] = &ptr1;
if (&ptr2 == &no_arg) { goto done; } args[n++] = &ptr2;
if (&ptr3 == &no_arg) { goto done; } args[n++] = &ptr3;
if (&ptr4 == &no_arg) { goto done; } args[n++] = &ptr4;
if (&ptr5 == &no_arg) { goto done; } args[n++] = &ptr5;
if (&ptr6 == &no_arg) { goto done; } args[n++] = &ptr6;
if (&ptr7 == &no_arg) { goto done; } args[n++] = &ptr7;
if (&ptr8 == &no_arg) { goto done; } args[n++] = &ptr8;
if (&ptr9 == &no_arg) { goto done; } args[n++] = &ptr9;
if (&ptr10 == &no_arg) { goto done; } args[n++] = &ptr10;
if (&ptr11 == &no_arg) { goto done; } args[n++] = &ptr11;
if (&ptr12 == &no_arg) { goto done; } args[n++] = &ptr12;
if (&ptr13 == &no_arg) { goto done; } args[n++] = &ptr13;
if (&ptr14 == &no_arg) { goto done; } args[n++] = &ptr14;
if (&ptr15 == &no_arg) { goto done; } args[n++] = &ptr15;
if (&ptr16 == &no_arg) { goto done; } args[n++] = &ptr16;
done:
int consumed;
@ -252,22 +252,22 @@ bool RE::Consume(StringPiece* input,
const Arg& ptr16) const {
const Arg* args[kMaxArgs];
int n = 0;
if (&ptr1 == &no_arg) goto done; args[n++] = &ptr1;
if (&ptr2 == &no_arg) goto done; args[n++] = &ptr2;
if (&ptr3 == &no_arg) goto done; args[n++] = &ptr3;
if (&ptr4 == &no_arg) goto done; args[n++] = &ptr4;
if (&ptr5 == &no_arg) goto done; args[n++] = &ptr5;
if (&ptr6 == &no_arg) goto done; args[n++] = &ptr6;
if (&ptr7 == &no_arg) goto done; args[n++] = &ptr7;
if (&ptr8 == &no_arg) goto done; args[n++] = &ptr8;
if (&ptr9 == &no_arg) goto done; args[n++] = &ptr9;
if (&ptr10 == &no_arg) goto done; args[n++] = &ptr10;
if (&ptr11 == &no_arg) goto done; args[n++] = &ptr11;
if (&ptr12 == &no_arg) goto done; args[n++] = &ptr12;
if (&ptr13 == &no_arg) goto done; args[n++] = &ptr13;
if (&ptr14 == &no_arg) goto done; args[n++] = &ptr14;
if (&ptr15 == &no_arg) goto done; args[n++] = &ptr15;
if (&ptr16 == &no_arg) goto done; args[n++] = &ptr16;
if (&ptr1 == &no_arg) { goto done; } args[n++] = &ptr1;
if (&ptr2 == &no_arg) { goto done; } args[n++] = &ptr2;
if (&ptr3 == &no_arg) { goto done; } args[n++] = &ptr3;
if (&ptr4 == &no_arg) { goto done; } args[n++] = &ptr4;
if (&ptr5 == &no_arg) { goto done; } args[n++] = &ptr5;
if (&ptr6 == &no_arg) { goto done; } args[n++] = &ptr6;
if (&ptr7 == &no_arg) { goto done; } args[n++] = &ptr7;
if (&ptr8 == &no_arg) { goto done; } args[n++] = &ptr8;
if (&ptr9 == &no_arg) { goto done; } args[n++] = &ptr9;
if (&ptr10 == &no_arg) { goto done; } args[n++] = &ptr10;
if (&ptr11 == &no_arg) { goto done; } args[n++] = &ptr11;
if (&ptr12 == &no_arg) { goto done; } args[n++] = &ptr12;
if (&ptr13 == &no_arg) { goto done; } args[n++] = &ptr13;
if (&ptr14 == &no_arg) { goto done; } args[n++] = &ptr14;
if (&ptr15 == &no_arg) { goto done; } args[n++] = &ptr15;
if (&ptr16 == &no_arg) { goto done; } args[n++] = &ptr16;
done:
int consumed;
@ -300,22 +300,22 @@ bool RE::FindAndConsume(StringPiece* input,
const Arg& ptr16) const {
const Arg* args[kMaxArgs];
int n = 0;
if (&ptr1 == &no_arg) goto done; args[n++] = &ptr1;
if (&ptr2 == &no_arg) goto done; args[n++] = &ptr2;
if (&ptr3 == &no_arg) goto done; args[n++] = &ptr3;
if (&ptr4 == &no_arg) goto done; args[n++] = &ptr4;
if (&ptr5 == &no_arg) goto done; args[n++] = &ptr5;
if (&ptr6 == &no_arg) goto done; args[n++] = &ptr6;
if (&ptr7 == &no_arg) goto done; args[n++] = &ptr7;
if (&ptr8 == &no_arg) goto done; args[n++] = &ptr8;
if (&ptr9 == &no_arg) goto done; args[n++] = &ptr9;
if (&ptr10 == &no_arg) goto done; args[n++] = &ptr10;
if (&ptr11 == &no_arg) goto done; args[n++] = &ptr11;
if (&ptr12 == &no_arg) goto done; args[n++] = &ptr12;
if (&ptr13 == &no_arg) goto done; args[n++] = &ptr13;
if (&ptr14 == &no_arg) goto done; args[n++] = &ptr14;
if (&ptr15 == &no_arg) goto done; args[n++] = &ptr15;
if (&ptr16 == &no_arg) goto done; args[n++] = &ptr16;
if (&ptr1 == &no_arg) { goto done; } args[n++] = &ptr1;
if (&ptr2 == &no_arg) { goto done; } args[n++] = &ptr2;
if (&ptr3 == &no_arg) { goto done; } args[n++] = &ptr3;
if (&ptr4 == &no_arg) { goto done; } args[n++] = &ptr4;
if (&ptr5 == &no_arg) { goto done; } args[n++] = &ptr5;
if (&ptr6 == &no_arg) { goto done; } args[n++] = &ptr6;
if (&ptr7 == &no_arg) { goto done; } args[n++] = &ptr7;
if (&ptr8 == &no_arg) { goto done; } args[n++] = &ptr8;
if (&ptr9 == &no_arg) { goto done; } args[n++] = &ptr9;
if (&ptr10 == &no_arg) { goto done; } args[n++] = &ptr10;
if (&ptr11 == &no_arg) { goto done; } args[n++] = &ptr11;
if (&ptr12 == &no_arg) { goto done; } args[n++] = &ptr12;
if (&ptr13 == &no_arg) { goto done; } args[n++] = &ptr13;
if (&ptr14 == &no_arg) { goto done; } args[n++] = &ptr14;
if (&ptr15 == &no_arg) { goto done; } args[n++] = &ptr15;
if (&ptr16 == &no_arg) { goto done; } args[n++] = &ptr16;
done:
int consumed;

View file

@ -2437,7 +2437,7 @@ return options;
static char *
ordin(int n)
{
static char buffer[8];
static char buffer[14];
char *p = buffer;
sprintf(p, "%d", n);
while (*p != 0) p++;

View file

@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2014 University of Cambridge
Copyright (c) 1997-2016 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -173,7 +173,8 @@ static const int eint[] = {
REG_BADPAT, /* group name must start with a non-digit */
/* 85 */
REG_BADPAT, /* parentheses too deeply nested (stack check) */
REG_BADPAT /* missing digits in \x{} or \o{} */
REG_BADPAT, /* missing digits in \x{} or \o{} */
REG_BADPAT /* pattern too complicated */
};
/* Table of texts corresponding to POSIX error codes */
@ -364,6 +365,7 @@ start location rather than being passed as a PCRE "starting offset". */
if ((eflags & REG_STARTEND) != 0)
{
if (pmatch == NULL) return REG_INVARG;
so = pmatch[0].rm_so;
eo = pmatch[0].rm_eo;
}

View file

@ -2250,7 +2250,7 @@ data is not zero. */
static int callout(pcre_callout_block *cb)
{
FILE *f = (first_callout | callout_extra)? outfile : NULL;
int i, pre_start, post_start, subject_length;
int i, current_position, pre_start, post_start, subject_length;
if (callout_extra)
{
@ -2280,14 +2280,19 @@ printed lengths of the substrings. */
if (f != NULL) fprintf(f, "--->");
/* If a lookbehind is involved, the current position may be earlier than the
match start. If so, use the match start instead. */
current_position = (cb->current_position >= cb->start_match)?
cb->current_position : cb->start_match;
PCHARS(pre_start, cb->subject, 0, cb->start_match, f);
PCHARS(post_start, cb->subject, cb->start_match,
cb->current_position - cb->start_match, f);
current_position - cb->start_match, f);
PCHARS(subject_length, cb->subject, 0, cb->subject_length, NULL);
PCHARSV(cb->subject, cb->current_position,
cb->subject_length - cb->current_position, f);
PCHARSV(cb->subject, current_position, cb->subject_length - current_position, f);
if (f != NULL) fprintf(f, "\n");
@ -5612,6 +5617,12 @@ while (!done)
break;
}
if (use_size_offsets < 2)
{
fprintf(outfile, "Cannot do global matching with an ovector size < 2\n");
break;
}
/* If we have matched an empty string, first check to see if we are at
the end of the subject. If so, the /g loop is over. Otherwise, mimic what
Perl's /g options does. This turns out to be rather cunning. First we set
@ -5740,3 +5751,4 @@ return yield;
}
/* End of pcretest.c */

View file

@ -138,4 +138,6 @@ is required for these tests. --/
/.((?2)(?R)\1)()/B
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
/-- End of testinput11 --/

View file

@ -4217,4 +4217,30 @@ backtracking verbs. --/
/a[[:punct:]b]/BZ
/L(?#(|++<!(2)?/BZ
/L(?#(|++<!(2)?/BOZ
/L(?#(|++<!(2)?/BCZ
/L(?#(|++<!(2)?/BCOZ
/(A*)\E+/CBZ
/()\Q\E*]/BCZ
/(?<A>)(?J:(?<B>)(?<B>))(?<C>)/
\O\CC
/(?=a\K)/
ring bpattingbobnd $ 1,oern cou \rb\L
/(?<=((?C)0))/
9010
abcd
/((?J)(?'R'(?'R'(?'R'(?'R'(?'R'(?|(\k'R'))))))))/
/\N(?(?C)0?!.)*/
/-- End of testinput2 --/

View file

@ -1553,4 +1553,13 @@
\x{200}
\x{37e}
/[^[:^ascii:]\d]/8W
a
~
0
\a
\x{7f}
\x{389}
\x{20ac}
/-- End of testinput6 --/

View file

@ -853,4 +853,8 @@ of case for anything other than the ASCII letters. --/
/a[b[:punct:]]/8WBZ
/L(?#(|++<!(2)?/B8COZ
/L(?#(|++<!(2)?/B8WCZ
/-- End of testinput7 --/

View file

@ -231,7 +231,7 @@ Memory allocation (code space): 73
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 77
Memory allocation (code space): 93
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
@ -765,4 +765,7 @@ Memory allocation (code space): 14
25 End
------------------------------------------------------------------
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
Failed: regular expression is too complicated at offset 490
/-- End of testinput11 --/

View file

@ -231,7 +231,7 @@ Memory allocation (code space): 155
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 157
Memory allocation (code space): 189
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
@ -765,4 +765,7 @@ Memory allocation (code space): 28
25 End
------------------------------------------------------------------
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
Failed: missing ) at offset 509
/-- End of testinput11 --/

View file

@ -231,7 +231,7 @@ Memory allocation (code space): 45
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 50
Memory allocation (code space): 62
------------------------------------------------------------------
0 30 Bra
3 7 CBra 1
@ -765,4 +765,7 @@ Memory allocation (code space): 10
38 End
------------------------------------------------------------------
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
Failed: missing ) at offset 509
/-- End of testinput11 --/

View file

@ -419,7 +419,7 @@ Need char = '>'
/(?U)<.*>/I
Capturing subpattern count = 0
Options: ungreedy
No options
First char = '<'
Need char = '>'
abc<def>ghi<klm>nop
@ -443,7 +443,7 @@ Need char = '='
/(?U)={3,}?/I
Capturing subpattern count = 0
Options: ungreedy
No options
First char = '='
Need char = '='
abc========def
@ -477,7 +477,7 @@ Failed: lookbehind assertion is not fixed length at offset 12
/(?i)abc/I
Capturing subpattern count = 0
Options: caseless
No options
First char = 'a' (caseless)
Need char = 'c' (caseless)
@ -489,7 +489,7 @@ No need char
/(?i)^1234/I
Capturing subpattern count = 0
Options: anchored caseless
Options: anchored
No first char
No need char
@ -502,7 +502,7 @@ No need char
/(?s).*/I
Capturing subpattern count = 0
May match empty string
Options: anchored dotall
Options: anchored
No first char
No need char
@ -516,7 +516,7 @@ Starting chars: a b c d
/(?i)[abcd]/IS
Capturing subpattern count = 0
Options: caseless
No options
No first char
No need char
Subject length lower bound = 1
@ -524,7 +524,7 @@ Starting chars: A B C D a b c d
/(?m)[xy]|(b|c)/IS
Capturing subpattern count = 1
Options: multiline
No options
No first char
No need char
Subject length lower bound = 1
@ -538,7 +538,7 @@ No need char
/(?i)(^a|^b)/Im
Capturing subpattern count = 1
Options: caseless multiline
Options: multiline
First char at start or follows newline
No need char
@ -555,13 +555,13 @@ Failed: malformed number or name after (?( at offset 4
Failed: malformed number or name after (?( at offset 4
/(?(?i))/
Failed: assertion expected after (?( at offset 3
Failed: assertion expected after (?( or (?(?C) at offset 3
/(?(abc))/
Failed: reference to non-existent subpattern at offset 7
/(?(?<ab))/
Failed: assertion expected after (?( at offset 3
Failed: assertion expected after (?( or (?(?C) at offset 3
/((?s)blah)\s+\1/I
Capturing subpattern count = 1
@ -1179,7 +1179,7 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: anchored dotall
Options: anchored
No first char
No need char
@ -2735,7 +2735,7 @@ No match
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless extended
Options: extended
First char = 'a' (caseless)
Need char = 'c' (caseless)
@ -2748,7 +2748,7 @@ Need char = 'c' (caseless)
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless extended
Options: extended
First char = 'a' (caseless)
Need char = 'c' (caseless)
@ -3095,7 +3095,7 @@ Need char = 'b'
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: ungreedy
No options
First char = 'x'
Need char = 'b'
xaaaab
@ -3497,7 +3497,7 @@ Need char = 'c'
/(?i)[ab]/IS
Capturing subpattern count = 0
Options: caseless
No options
No first char
No need char
Subject length lower bound = 1
@ -6299,7 +6299,7 @@ Capturing subpattern count = 3
Named capturing subpatterns:
A 2
A 3
Options: anchored dupnames
Options: anchored
Duplicate name status changes
No first char
No need char
@ -7870,7 +7870,7 @@ No match
Failed: malformed number or name after (?( at offset 6
/(?(''))/
Failed: assertion expected after (?( at offset 4
Failed: assertion expected after (?( or (?(?C) at offset 4
/(?('R')stuff)/
Failed: reference to non-existent subpattern at offset 7
@ -14346,7 +14346,7 @@ No match
"((?2)+)((?1))"
"(?(?<E>.*!.*)?)"
Failed: assertion expected after (?( at offset 3
Failed: assertion expected after (?( or (?(?C) at offset 3
"X((?2)()*+){2}+"BZ
------------------------------------------------------------------
@ -14574,4 +14574,100 @@ No match
End
------------------------------------------------------------------
/L(?#(|++<!(2)?/BZ
------------------------------------------------------------------
Bra
L?+
Ket
End
------------------------------------------------------------------
/L(?#(|++<!(2)?/BOZ
------------------------------------------------------------------
Bra
L?
Ket
End
------------------------------------------------------------------
/L(?#(|++<!(2)?/BCZ
------------------------------------------------------------------
Bra
Callout 255 0 14
L?+
Callout 255 14 0
Ket
End
------------------------------------------------------------------
/L(?#(|++<!(2)?/BCOZ
------------------------------------------------------------------
Bra
Callout 255 0 14
L?
Callout 255 14 0
Ket
End
------------------------------------------------------------------
/(A*)\E+/CBZ
------------------------------------------------------------------
Bra
Callout 255 0 7
SCBra 1
Callout 255 1 2
A*
Callout 255 3 0
KetRmax
Callout 255 7 0
Ket
End
------------------------------------------------------------------
/()\Q\E*]/BCZ
------------------------------------------------------------------
Bra
Callout 255 0 7
Brazero
SCBra 1
Callout 255 1 0
KetRmax
Callout 255 7 1
]
Callout 255 8 0
Ket
End
------------------------------------------------------------------
/(?<A>)(?J:(?<B>)(?<B>))(?<C>)/
\O\CC
Matched, but too many substrings
copy substring C failed -7
/(?=a\K)/
ring bpattingbobnd $ 1,oern cou \rb\L
Start of matched string is beyond its end - displaying from end to start.
0: a
0L
/(?<=((?C)0))/
9010
--->9010
0 ^ 0
0 ^ 0
0:
1: 0
abcd
--->abcd
0 ^ 0
0 ^ 0
0 ^ 0
0 ^ 0
No match
/((?J)(?'R'(?'R'(?'R'(?'R'(?'R'(?|(\k'R'))))))))/
/\N(?(?C)0?!.)*/
Failed: assertion expected after (?( or (?(?C) at offset 4
/-- End of testinput2 --/

View file

@ -2557,4 +2557,20 @@ No match
\x{37e}
0: \x{37e}
/[^[:^ascii:]\d]/8W
a
0: a
~
0: ~
0
No match
\a
0: \x{07}
\x{7f}
0: \x{7f}
\x{389}
No match
\x{20ac}
No match
/-- End of testinput6 --/

View file

@ -2348,4 +2348,24 @@ No match
End
------------------------------------------------------------------
/L(?#(|++<!(2)?/B8COZ
------------------------------------------------------------------
Bra
Callout 255 0 14
L?
Callout 255 14 0
Ket
End
------------------------------------------------------------------
/L(?#(|++<!(2)?/B8WCZ
------------------------------------------------------------------
Bra
Callout 255 0 14
L?+
Callout 255 14 0
Ket
End
------------------------------------------------------------------
/-- End of testinput7 --/