Merge branch 'merge-pcre' into 10.0

This commit is contained in:
Sergei Golubchik 2015-05-04 22:25:57 +02:00
commit 0b4f5060bb
41 changed files with 1695 additions and 768 deletions

View file

@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2014 University of Cambridge
Copyright (c) 1997-2015 University of Cambridge
All rights reserved
@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2014 Zoltan Herczeg
Copyright(c) 2010-2015 Zoltan Herczeg
All rights reserved.
@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2014 Zoltan Herczeg
Copyright(c) 2009-2015 Zoltan Herczeg
All rights reserved.

View file

@ -1,6 +1,173 @@
ChangeLog for PCRE
------------------
Version 8.37 28-April-2015
--------------------------
1. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges
for those parentheses to be closed with whatever has been captured so far.
However, it was failing to mark any other groups between the hightest
capture so far and the currrent group as "unset". Thus, the ovector for
those groups contained whatever was previously there. An example is the
pattern /(x)|((*ACCEPT))/ when matched against "abcd".
2. If an assertion condition was quantified with a minimum of zero (an odd
thing to do, but it happened), SIGSEGV or other misbehaviour could occur.
3. If a pattern in pcretest input had the P (POSIX) modifier followed by an
unrecognized modifier, a crash could occur.
4. An attempt to do global matching in pcretest with a zero-length ovector
caused a crash.
5. Fixed a memory leak during matching that could occur for a subpattern
subroutine call (recursive or otherwise) if the number of captured groups
that had to be saved was greater than ten.
6. Catch a bad opcode during auto-possessification after compiling a bad UTF
string with NO_UTF_CHECK. This is a tidyup, not a bug fix, as passing bad
UTF with NO_UTF_CHECK is documented as having an undefined outcome.
7. A UTF pattern containing a "not" match of a non-ASCII character and a
subroutine reference could loop at compile time. Example: /[^\xff]((?1))/.
8. When a pattern is compiled, it remembers the highest back reference so that
when matching, if the ovector is too small, extra memory can be obtained to
use instead. A conditional subpattern whose condition is a check on a
capture having happened, such as, for example in the pattern
/^(?:(a)|b)(?(1)A|B)/, is another kind of back reference, but it was not
setting the highest backreference number. This mattered only if pcre_exec()
was called with an ovector that was too small to hold the capture, and there
was no other kind of back reference (a situation which is probably quite
rare). The effect of the bug was that the condition was always treated as
FALSE when the capture could not be consulted, leading to a incorrect
behaviour by pcre_exec(). This bug has been fixed.
9. A reference to a duplicated named group (either a back reference or a test
for being set in a conditional) that occurred in a part of the pattern where
PCRE_DUPNAMES was not set caused the amount of memory needed for the pattern
to be incorrectly calculated, leading to overwriting.
10. A mutually recursive set of back references such as (\2)(\1) caused a
segfault at study time (while trying to find the minimum matching length).
The infinite loop is now broken (with the minimum length unset, that is,
zero).
11. If an assertion that was used as a condition was quantified with a minimum
of zero, matching went wrong. In particular, if the whole group had
unlimited repetition and could match an empty string, a segfault was
likely. The pattern (?(?=0)?)+ is an example that caused this. Perl allows
assertions to be quantified, but not if they are being used as conditions,
so the above pattern is faulted by Perl. PCRE has now been changed so that
it also rejects such patterns.
12. A possessive capturing group such as (a)*+ with a minimum repeat of zero
failed to allow the zero-repeat case if pcre2_exec() was called with an
ovector too small to capture the group.
13. Fixed two bugs in pcretest that were discovered by fuzzing and reported by
Red Hat Product Security:
(a) A crash if /K and /F were both set with the option to save the compiled
pattern.
(b) Another crash if the option to print captured substrings in a callout
was combined with setting a null ovector, for example \O\C+ as a subject
string.
14. A pattern such as "((?2){0,1999}())?", which has a group containing a
forward reference repeated a large (but limited) number of times within a
repeated outer group that has a zero minimum quantifier, caused incorrect
code to be compiled, leading to the error "internal error:
previously-checked referenced subpattern not found" when an incorrect
memory address was read. This bug was reported as "heap overflow",
discovered by Kai Lu of Fortinet's FortiGuard Labs and given the CVE number
CVE-2015-2325.
23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine
call within a group that also contained a recursive back reference caused
incorrect code to be compiled. This bug was reported as "heap overflow",
discovered by Kai Lu of Fortinet's FortiGuard Labs, and given the CVE
number CVE-2015-2326.
24. Computing the size of the JIT read-only data in advance has been a source
of various issues, and new ones are still appear unfortunately. To fix
existing and future issues, size computation is eliminated from the code,
and replaced by on-demand memory allocation.
25. A pattern such as /(?i)[A-`]/, where characters in the other case are
adjacent to the end of the range, and the range contained characters with
more than one other case, caused incorrect behaviour when compiled in UTF
mode. In that example, the range a-j was left out of the class.
26. Fix JIT compilation of conditional blocks, which assertion
is converted to (*FAIL). E.g: /(?(?!))/.
27. The pattern /(?(?!)^)/ caused references to random memory. This bug was
discovered by the LLVM fuzzer.
28. The assertion (?!) is optimized to (*FAIL). This was not handled correctly
when this assertion was used as a condition, for example (?(?!)a|b). In
pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect
error about an unsupported item.
29. For some types of pattern, for example /Z*(|d*){216}/, the auto-
possessification code could take exponential time to complete. A recursion
depth limit of 1000 has been imposed to limit the resources used by this
optimization.
30. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored
because \S ensures they are all in the class. The code for doing this was
interacting badly with the code for computing the amount of space needed to
compile the pattern, leading to a buffer overflow. This bug was discovered
by the LLVM fuzzer.
31. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside
other kinds of group caused stack overflow at compile time. This bug was
discovered by the LLVM fuzzer.
32. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment
between a subroutine call and its quantifier was incorrectly compiled,
leading to buffer overflow or other errors. This bug was discovered by the
LLVM fuzzer.
33. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an
assertion after (?(. The code was failing to check the character after
(?(?< for the ! or = that would indicate a lookbehind assertion. This bug
was discovered by the LLVM fuzzer.
34. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
a fixed maximum following a group that contains a subroutine reference was
incorrectly compiled and could trigger buffer overflow. This bug was
discovered by the LLVM fuzzer.
35. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1)))
caused a stack overflow instead of the diagnosis of a non-fixed length
lookbehind assertion. This bug was discovered by the LLVM fuzzer.
36. The use of \K in a positive lookbehind assertion in a non-anchored pattern
(e.g. /(?<=\Ka)/) could make pcregrep loop.
37. There was a similar problem to 36 in pcretest for global matches.
38. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
and a subsequent item in the pattern caused a non-match, backtracking over
the repeated \X did not stop, but carried on past the start of the subject,
causing reference to random memory and/or a segfault. There were also some
other cases where backtracking after \C could crash. This set of bugs was
discovered by the LLVM fuzzer.
39. The function for finding the minimum length of a matching string could take
a very long time if mutual recursion was present many times in a pattern,
for example, /((?2){73}(?2))((?1))/. A better mutual recursion detection
method has been implemented. This infelicity was discovered by the LLVM
fuzzer.
40. Static linking against the PCRE library using the pkg-config module was
failing on missing pthread symbols.
Version 8.36 26-September-2014
------------------------------

View file

@ -6,7 +6,8 @@ and semantics are as close as possible to those of the Perl 5 language.
Release 8 of PCRE is distributed under the terms of the "BSD" licence, as
specified below. The documentation for PCRE, supplied in the "doc"
directory, is distributed under the same terms as the software itself.
directory, is distributed under the same terms as the software itself. The data
in the testdata directory is not copyrighted and is in the public domain.
The basic library functions are written in C and are freestanding. Also
included in the distribution is a set of C++ wrapper functions, and a
@ -24,7 +25,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2014 University of Cambridge
Copyright (c) 1997-2015 University of Cambridge
All rights reserved.
@ -35,7 +36,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2014 Zoltan Herczeg
Copyright(c) 2010-2015 Zoltan Herczeg
All rights reserved.
@ -46,7 +47,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2014 Zoltan Herczeg
Copyright(c) 2009-2015 Zoltan Herczeg
All rights reserved.

View file

@ -1,6 +1,14 @@
News about PCRE releases
------------------------
Release 8.37 28-April-2015
--------------------------
This is bug-fix release. Note that this library (now called PCRE1) is now being
maintained for bug fixes only. New projects are advised to use the new PCRE2
libraries.
Release 8.36 26-September-2014
------------------------------

View file

@ -1,6 +1,14 @@
Building PCRE without using autotools
-------------------------------------
NOTE: This document relates to PCRE releases that use the original API, with
library names libpcre, libpcre16, and libpcre32. January 2015 saw the first
release of a new API, known as PCRE2, with release numbers starting at 10.00
and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old libraries
(now called PCRE1) are still being maintained for bug fixes, but there will be
no new development. New projects are advised to use the new PCRE2 libraries.
This document contains the following sections:
General
@ -761,4 +769,4 @@ There is also a mirror here:
http://www.vsoft-software.com/downloads.html
==========================
Last Updated: 14 May 2013
Last Updated: 10 February 2015

View file

@ -1,7 +1,16 @@
README file for PCRE (Perl-compatible regular expression library)
-----------------------------------------------------------------
The latest release of PCRE is always available in three alternative formats
NOTE: This set of files relates to PCRE releases that use the original API,
with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers starting at
10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
libraries (now called PCRE1) are still being maintained for bug fixes, but
there will be no new development. New projects are advised to use the new PCRE2
libraries.
The latest release of PCRE1 is always available in three alternative formats
from:
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
@ -990,4 +999,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 24 October 2014
Last updated: 10 February 2015

View file

@ -506,6 +506,11 @@ echo "---------------------------- Test 106 -----------------------------" >>tes
(cd $srcdir; echo "a" | $valgrind $pcregrep -M "|a" ) >>testtrygrep 2>&1
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 107 -----------------------------" >>testtrygrep
echo "a" >testtemp1grep
echo "aaaaa" >>testtemp1grep
(cd $srcdir; $valgrind $pcregrep --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
echo "RC=$?" >>testtrygrep
# Now compare the results.

View file

@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [36])
m4_define(pcre_minor, [37])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [2014-09-26])
m4_define(pcre_date, [2015-04-28])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:4:2])
m4_define(libpcre16_version, [2:4:2])
m4_define(libpcre32_version, [0:4:0])
m4_define(libpcre_version, [3:5:2])
m4_define(libpcre16_version, [2:5:2])
m4_define(libpcre32_version, [0:5:0])
m4_define(libpcreposix_version, [0:3:0])
m4_define(libpcrecpp_version, [0:1:0])

View file

@ -1,6 +1,14 @@
Building PCRE without using autotools
-------------------------------------
NOTE: This document relates to PCRE releases that use the original API, with
library names libpcre, libpcre16, and libpcre32. January 2015 saw the first
release of a new API, known as PCRE2, with release numbers starting at 10.00
and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old libraries
(now called PCRE1) are still being maintained for bug fixes, but there will be
no new development. New projects are advised to use the new PCRE2 libraries.
This document contains the following sections:
General
@ -761,4 +769,4 @@ There is also a mirror here:
http://www.vsoft-software.com/downloads.html
==========================
Last Updated: 14 May 2013
Last Updated: 10 February 2015

View file

@ -1,7 +1,16 @@
README file for PCRE (Perl-compatible regular expression library)
-----------------------------------------------------------------
The latest release of PCRE is always available in three alternative formats
NOTE: This set of files relates to PCRE releases that use the original API,
with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers starting at
10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
libraries (now called PCRE1) are still being maintained for bug fixes, but
there will be no new development. New projects are advised to use the new PCRE2
libraries.
The latest release of PCRE1 is always available in three alternative formats
from:
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
@ -990,4 +999,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 24 October 2014
Last updated: 10 February 2015

View file

@ -13,13 +13,24 @@ from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
<li><a name="TOC2" href="#SEC2">SECURITY CONSIDERATIONS</a>
<li><a name="TOC3" href="#SEC3">USER DOCUMENTATION</a>
<li><a name="TOC4" href="#SEC4">AUTHOR</a>
<li><a name="TOC5" href="#SEC5">REVISION</a>
<li><a name="TOC1" href="#SEC1">PLEASE TAKE NOTE</a>
<li><a name="TOC2" href="#SEC2">INTRODUCTION</a>
<li><a name="TOC3" href="#SEC3">SECURITY CONSIDERATIONS</a>
<li><a name="TOC4" href="#SEC4">USER DOCUMENTATION</a>
<li><a name="TOC5" href="#SEC5">AUTHOR</a>
<li><a name="TOC6" href="#SEC6">REVISION</a>
</ul>
<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
<br><a name="SEC1" href="#TOC1">PLEASE TAKE NOTE</a><br>
<P>
This document relates to PCRE releases that use the original API,
with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers starting at
10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
libraries (now called PCRE1) are still being maintained for bug fixes, but
there will be no new development. New projects are advised to use the new PCRE2
libraries.
</P>
<br><a name="SEC2" href="#TOC1">INTRODUCTION</a><br>
<P>
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
@ -115,7 +126,7 @@ clashes. In some environments, it is possible to control which external symbols
are exported when a shared library is built, and in these cases the
undocumented symbols are not exported.
</P>
<br><a name="SEC2" href="#TOC1">SECURITY CONSIDERATIONS</a><br>
<br><a name="SEC3" href="#TOC1">SECURITY CONSIDERATIONS</a><br>
<P>
If you are using PCRE in a non-UTF application that permits users to supply
arbitrary patterns for compilation, you should be aware of a feature that
@ -149,7 +160,7 @@ against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.
</P>
<br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br>
<br><a name="SEC4" href="#TOC1">USER DOCUMENTATION</a><br>
<P>
The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format,
@ -188,7 +199,7 @@ follows:
In the "man" and HTML formats, there is also a short page for each C library
function, listing its arguments and results.
</P>
<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
@ -202,11 +213,11 @@ Putting an actual email address here seems to have been a spam magnet, so I've
taken it away. If you want to email me, use my two initials, followed by the
two digits 10, at the domain cam.ac.uk.
</P>
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
<P>
Last updated: 08 January 2014
Last updated: 10 February 2015
<br>
Copyright &copy; 1997-2014 University of Cambridge.
Copyright &copy; 1997-2015 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.

View file

@ -1,6 +1,18 @@
.TH PCRE 3 "08 January 2014" "PCRE 8.35"
.TH PCRE 3 "10 February 2015" "PCRE 8.37"
.SH NAME
PCRE - Perl-compatible regular expressions
PCRE - Perl-compatible regular expressions (original API)
.SH "PLEASE TAKE NOTE"
.rs
.sp
This document relates to PCRE releases that use the original API,
with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers starting at
10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
libraries (now called PCRE1) are still being maintained for bug fixes, but
there will be no new development. New projects are advised to use the new PCRE2
libraries.
.
.
.SH INTRODUCTION
.rs
.sp
@ -213,6 +225,6 @@ two digits 10, at the domain cam.ac.uk.
.rs
.sp
.nf
Last updated: 08 January 2014
Copyright (c) 1997-2014 University of Cambridge.
Last updated: 10 February 2015
Copyright (c) 1997-2015 University of Cambridge.
.fi

View file

@ -13,7 +13,18 @@ PCRE(3) Library Functions Manual PCRE(3)
NAME
PCRE - Perl-compatible regular expressions
PCRE - Perl-compatible regular expressions (original API)
PLEASE TAKE NOTE
This document relates to PCRE releases that use the original API, with
library names libpcre, libpcre16, and libpcre32. January 2015 saw the
first release of a new API, known as PCRE2, with release numbers start-
ing at 10.00 and library names libpcre2-8, libpcre2-16, and
libpcre2-32. The old libraries (now called PCRE1) are still being main-
tained for bug fixes, but there will be no new development. New
projects are advised to use the new PCRE2 libraries.
INTRODUCTION
@ -179,8 +190,8 @@ AUTHOR
REVISION
Last updated: 08 January 2014
Copyright (c) 1997-2014 University of Cambridge.
Last updated: 10 February 2015
Copyright (c) 1997-2015 University of Cambridge.
------------------------------------------------------------------------------

View file

@ -1704,6 +1704,7 @@ Arguments:
utf TRUE in UTF-8 / UTF-16 / UTF-32 mode
atend TRUE if called when the pattern is complete
cd the "compile data" structure
recurses chain of recurse_check to catch mutual recursion
Returns: the fixed length,
or -1 if there is no fixed length,
@ -1713,10 +1714,11 @@ Returns: the fixed length,
*/
static int
find_fixedlength(pcre_uchar *code, BOOL utf, BOOL atend, compile_data *cd)
find_fixedlength(pcre_uchar *code, BOOL utf, BOOL atend, compile_data *cd,
recurse_check *recurses)
{
int length = -1;
recurse_check this_recurse;
register int branchlength = 0;
register pcre_uchar *cc = code + 1 + LINK_SIZE;
@ -1741,7 +1743,8 @@ for (;;)
case OP_ONCE:
case OP_ONCE_NC:
case OP_COND:
d = find_fixedlength(cc + ((op == OP_CBRA)? IMM2_SIZE : 0), utf, atend, cd);
d = find_fixedlength(cc + ((op == OP_CBRA)? IMM2_SIZE : 0), utf, atend, cd,
recurses);
if (d < 0) return d;
branchlength += d;
do cc += GET(cc, 1); while (*cc == OP_ALT);
@ -1775,7 +1778,15 @@ for (;;)
cs = ce = (pcre_uchar *)cd->start_code + GET(cc, 1); /* Start subpattern */
do ce += GET(ce, 1); while (*ce == OP_ALT); /* End subpattern */
if (cc > cs && cc < ce) return -1; /* Recursion */
d = find_fixedlength(cs + IMM2_SIZE, utf, atend, cd);
else /* Check for mutual recursion */
{
recurse_check *r = recurses;
for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break;
if (r != NULL) return -1; /* Mutual recursion */
}
this_recurse.prev = recurses;
this_recurse.group = cs;
d = find_fixedlength(cs + IMM2_SIZE, utf, atend, cd, &this_recurse);
if (d < 0) return d;
branchlength += d;
cc += 1 + LINK_SIZE;
@ -2129,32 +2140,60 @@ for (;;)
{
case OP_CHAR:
case OP_CHARI:
case OP_NOT:
case OP_NOTI:
case OP_EXACT:
case OP_EXACTI:
case OP_NOTEXACT:
case OP_NOTEXACTI:
case OP_UPTO:
case OP_UPTOI:
case OP_NOTUPTO:
case OP_NOTUPTOI:
case OP_MINUPTO:
case OP_MINUPTOI:
case OP_NOTMINUPTO:
case OP_NOTMINUPTOI:
case OP_POSUPTO:
case OP_POSUPTOI:
case OP_NOTPOSUPTO:
case OP_NOTPOSUPTOI:
case OP_STAR:
case OP_STARI:
case OP_NOTSTAR:
case OP_NOTSTARI:
case OP_MINSTAR:
case OP_MINSTARI:
case OP_NOTMINSTAR:
case OP_NOTMINSTARI:
case OP_POSSTAR:
case OP_POSSTARI:
case OP_NOTPOSSTAR:
case OP_NOTPOSSTARI:
case OP_PLUS:
case OP_PLUSI:
case OP_NOTPLUS:
case OP_NOTPLUSI:
case OP_MINPLUS:
case OP_MINPLUSI:
case OP_NOTMINPLUS:
case OP_NOTMINPLUSI:
case OP_POSPLUS:
case OP_POSPLUSI:
case OP_NOTPOSPLUS:
case OP_NOTPOSPLUSI:
case OP_QUERY:
case OP_QUERYI:
case OP_NOTQUERY:
case OP_NOTQUERYI:
case OP_MINQUERY:
case OP_MINQUERYI:
case OP_NOTMINQUERY:
case OP_NOTMINQUERYI:
case OP_POSQUERY:
case OP_POSQUERYI:
case OP_NOTPOSQUERY:
case OP_NOTPOSQUERYI:
if (HAS_EXTRALEN(code[-1])) code += GET_EXTRALEN(code[-1]);
break;
}
@ -2334,11 +2373,6 @@ Arguments:
Returns: TRUE if what is matched could be empty
*/
typedef struct recurse_check {
struct recurse_check *prev;
const pcre_uchar *group;
} recurse_check;
static BOOL
could_be_empty_branch(const pcre_uchar *code, const pcre_uchar *endcode,
BOOL utf, compile_data *cd, recurse_check *recurses)
@ -2469,8 +2503,8 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
empty_branch = FALSE;
do
{
if (!empty_branch && could_be_empty_branch(code, endcode, utf, cd, NULL))
empty_branch = TRUE;
if (!empty_branch && could_be_empty_branch(code, endcode, utf, cd,
recurses)) empty_branch = TRUE;
code += GET(code, 1);
}
while (*code == OP_ALT);
@ -3065,7 +3099,7 @@ Returns: TRUE if the auto-possessification is possible
static BOOL
compare_opcodes(const pcre_uchar *code, BOOL utf, const compile_data *cd,
const pcre_uint32 *base_list, const pcre_uchar *base_end)
const pcre_uint32 *base_list, const pcre_uchar *base_end, int *rec_limit)
{
pcre_uchar c;
pcre_uint32 list[8];
@ -3082,6 +3116,9 @@ pcre_uint32 chr;
BOOL accepted, invert_bits;
BOOL entered_a_group = FALSE;
if (*rec_limit == 0) return FALSE;
--(*rec_limit);
/* Note: the base_list[1] contains whether the current opcode has greedy
(represented by a non-zero value) quantifier. This is a different from
other character type lists, which stores here that the character iterator
@ -3152,7 +3189,8 @@ for(;;)
while (*next_code == OP_ALT)
{
if (!compare_opcodes(code, utf, cd, base_list, base_end)) return FALSE;
if (!compare_opcodes(code, utf, cd, base_list, base_end, rec_limit))
return FALSE;
code = next_code + 1 + LINK_SIZE;
next_code += GET(next_code, 1);
}
@ -3172,7 +3210,7 @@ for(;;)
/* The bracket content will be checked by the
OP_BRA/OP_CBRA case above. */
next_code += 1 + LINK_SIZE;
if (!compare_opcodes(next_code, utf, cd, base_list, base_end))
if (!compare_opcodes(next_code, utf, cd, base_list, base_end, rec_limit))
return FALSE;
code += PRIV(OP_lengths)[c];
@ -3605,11 +3643,20 @@ register pcre_uchar c;
const pcre_uchar *end;
pcre_uchar *repeat_opcode;
pcre_uint32 list[8];
int rec_limit;
for (;;)
{
c = *code;
/* When a pattern with bad UTF-8 encoding is compiled with NO_UTF_CHECK,
it may compile without complaining, but may get into a loop here if the code
pointer points to a bad value. This is, of course a documentated possibility,
when NO_UTF_CHECK is set, so it isn't a bug, but we can detect this case and
just give up on this optimization. */
if (c >= OP_TABLE_LENGTH) return;
if (c >= OP_STAR && c <= OP_TYPEPOSUPTO)
{
c -= get_repeat_base(c) - OP_STAR;
@ -3617,7 +3664,8 @@ for (;;)
get_chr_property_list(code, utf, cd->fcc, list) : NULL;
list[1] = c == OP_STAR || c == OP_PLUS || c == OP_QUERY || c == OP_UPTO;
if (end != NULL && compare_opcodes(end, utf, cd, list, end))
rec_limit = 1000;
if (end != NULL && compare_opcodes(end, utf, cd, list, end, &rec_limit))
{
switch(c)
{
@ -3673,7 +3721,8 @@ for (;;)
list[1] = (c & 1) == 0;
if (compare_opcodes(end, utf, cd, list, end))
rec_limit = 1000;
if (compare_opcodes(end, utf, cd, list, end, &rec_limit))
{
switch (c)
{
@ -3947,14 +3996,14 @@ Arguments:
adjust the amount by which the group is to be moved
utf TRUE in UTF-8 / UTF-16 / UTF-32 mode
cd contains pointers to tables etc.
save_hwm the hwm forward reference pointer at the start of the group
save_hwm_offset the hwm forward reference offset at the start of the group
Returns: nothing
*/
static void
adjust_recurse(pcre_uchar *group, int adjust, BOOL utf, compile_data *cd,
pcre_uchar *save_hwm)
size_t save_hwm_offset)
{
pcre_uchar *ptr = group;
@ -3966,7 +4015,8 @@ while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)
/* See if this recursion is on the forward reference list. If so, adjust the
reference. */
for (hc = save_hwm; hc < cd->hwm; hc += LINK_SIZE)
for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
hc += LINK_SIZE)
{
offset = (int)GET(hc, 0);
if (cd->start_code + offset == ptr + 1)
@ -4171,7 +4221,11 @@ if ((options & PCRE_CASELESS) != 0)
range. Otherwise, use a recursive call to add the additional range. */
else if (oc < start && od >= start - 1) start = oc; /* Extend downwards */
else if (od > end && oc <= end + 1) end = od; /* Extend upwards */
else if (od > end && oc <= end + 1)
{
end = od; /* Extend upwards */
if (end > classbits_end) classbits_end = (end <= 0xff ? end : 0xff);
}
else n8 += add_to_class(classbits, uchardptr, options, cd, oc, od);
}
}
@ -4411,7 +4465,7 @@ const pcre_uchar *tempptr;
const pcre_uchar *nestptr = NULL;
pcre_uchar *previous = NULL;
pcre_uchar *previous_callout = NULL;
pcre_uchar *save_hwm = NULL;
size_t save_hwm_offset = 0;
pcre_uint8 classbits[32];
/* We can fish out the UTF-8 setting once and for all into a BOOL, but we
@ -5470,6 +5524,12 @@ for (;; ptr++)
PUT(previous, 1, (int)(code - previous));
break; /* End of class handling */
}
/* Even though any XCLASS list is now discarded, we must allow for
its memory. */
if (lengthptr != NULL)
*lengthptr += (int)(class_uchardata - class_uchardata_base);
#endif
/* If there are no characters > 255, or they are all to be included or
@ -5870,6 +5930,7 @@ for (;; ptr++)
{
register int i;
int len = (int)(code - previous);
size_t base_hwm_offset = save_hwm_offset;
pcre_uchar *bralink = NULL;
pcre_uchar *brazeroptr = NULL;
@ -5924,7 +5985,7 @@ for (;; ptr++)
if (repeat_max <= 1) /* Covers 0, 1, and unlimited */
{
*code = OP_END;
adjust_recurse(previous, 1, utf, cd, save_hwm);
adjust_recurse(previous, 1, utf, cd, save_hwm_offset);
memmove(previous + 1, previous, IN_UCHARS(len));
code++;
if (repeat_max == 0)
@ -5948,7 +6009,7 @@ for (;; ptr++)
{
int offset;
*code = OP_END;
adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, save_hwm);
adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, save_hwm_offset);
memmove(previous + 2 + LINK_SIZE, previous, IN_UCHARS(len));
code += 2 + LINK_SIZE;
*previous++ = OP_BRAZERO + repeat_type;
@ -6011,26 +6072,25 @@ for (;; ptr++)
for (i = 1; i < repeat_min; i++)
{
pcre_uchar *hc;
pcre_uchar *this_hwm = cd->hwm;
size_t this_hwm_offset = cd->hwm - cd->start_workspace;
memcpy(code, previous, IN_UCHARS(len));
while (cd->hwm > cd->start_workspace + cd->workspace_size -
WORK_SIZE_SAFETY_MARGIN - (this_hwm - save_hwm))
WORK_SIZE_SAFETY_MARGIN -
(this_hwm_offset - base_hwm_offset))
{
size_t save_offset = save_hwm - cd->start_workspace;
size_t this_offset = this_hwm - cd->start_workspace;
*errorcodeptr = expand_workspace(cd);
if (*errorcodeptr != 0) goto FAILED;
save_hwm = (pcre_uchar *)cd->start_workspace + save_offset;
this_hwm = (pcre_uchar *)cd->start_workspace + this_offset;
}
for (hc = save_hwm; hc < this_hwm; hc += LINK_SIZE)
for (hc = (pcre_uchar *)cd->start_workspace + base_hwm_offset;
hc < (pcre_uchar *)cd->start_workspace + this_hwm_offset;
hc += LINK_SIZE)
{
PUT(cd->hwm, 0, GET(hc, 0) + len);
cd->hwm += LINK_SIZE;
}
save_hwm = this_hwm;
base_hwm_offset = this_hwm_offset;
code += len;
}
}
@ -6075,7 +6135,7 @@ for (;; ptr++)
else for (i = repeat_max - 1; i >= 0; i--)
{
pcre_uchar *hc;
pcre_uchar *this_hwm = cd->hwm;
size_t this_hwm_offset = cd->hwm - cd->start_workspace;
*code++ = OP_BRAZERO + repeat_type;
@ -6097,22 +6157,21 @@ for (;; ptr++)
copying them. */
while (cd->hwm > cd->start_workspace + cd->workspace_size -
WORK_SIZE_SAFETY_MARGIN - (this_hwm - save_hwm))
WORK_SIZE_SAFETY_MARGIN -
(this_hwm_offset - base_hwm_offset))
{
size_t save_offset = save_hwm - cd->start_workspace;
size_t this_offset = this_hwm - cd->start_workspace;
*errorcodeptr = expand_workspace(cd);
if (*errorcodeptr != 0) goto FAILED;
save_hwm = (pcre_uchar *)cd->start_workspace + save_offset;
this_hwm = (pcre_uchar *)cd->start_workspace + this_offset;
}
for (hc = save_hwm; hc < this_hwm; hc += LINK_SIZE)
for (hc = (pcre_uchar *)cd->start_workspace + base_hwm_offset;
hc < (pcre_uchar *)cd->start_workspace + this_hwm_offset;
hc += LINK_SIZE)
{
PUT(cd->hwm, 0, GET(hc, 0) + len + ((i != 0)? 2+LINK_SIZE : 1));
cd->hwm += LINK_SIZE;
}
save_hwm = this_hwm;
base_hwm_offset = this_hwm_offset;
code += len;
}
@ -6208,7 +6267,7 @@ for (;; ptr++)
{
int nlen = (int)(code - bracode);
*code = OP_END;
adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, save_hwm);
adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
memmove(bracode + 1 + LINK_SIZE, bracode, IN_UCHARS(nlen));
code += 1 + LINK_SIZE;
nlen += 1 + LINK_SIZE;
@ -6342,7 +6401,7 @@ for (;; ptr++)
else
{
*code = OP_END;
adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm);
adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
code += 1 + LINK_SIZE;
len += 1 + LINK_SIZE;
@ -6391,7 +6450,7 @@ for (;; ptr++)
default:
*code = OP_END;
adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm);
adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
code += 1 + LINK_SIZE;
len += 1 + LINK_SIZE;
@ -6420,15 +6479,25 @@ for (;; ptr++)
parenthesis forms. */
case CHAR_LEFT_PARENTHESIS:
newoptions = options;
skipbytes = 0;
bravalue = OP_CBRA;
save_hwm = cd->hwm;
reset_bracount = FALSE;
/* First deal with various "verbs" that can be introduced by '*'. */
ptr++;
/* First deal with comments. Putting this code right at the start ensures
that comments have no bad side effects. */
if (ptr[0] == CHAR_QUESTION_MARK && ptr[1] == CHAR_NUMBER_SIGN)
{
ptr += 2;
while (*ptr != CHAR_NULL && *ptr != CHAR_RIGHT_PARENTHESIS) ptr++;
if (*ptr == CHAR_NULL)
{
*errorcodeptr = ERR18;
goto FAILED;
}
continue;
}
/* Now deal with various "verbs" that can be introduced by '*'. */
if (ptr[0] == CHAR_ASTERISK && (ptr[1] == ':'
|| (MAX_255(ptr[1]) && ((cd->ctypes[ptr[1]] & ctype_letter) != 0))))
{
@ -6549,10 +6618,18 @@ for (;; ptr++)
goto FAILED;
}
/* Initialize for "real" parentheses */
newoptions = options;
skipbytes = 0;
bravalue = OP_CBRA;
save_hwm_offset = cd->hwm - cd->start_workspace;
reset_bracount = FALSE;
/* Deal with the extended parentheses; all are introduced by '?', and the
appearance of any of them means that this is not a capturing group. */
else if (*ptr == CHAR_QUESTION_MARK)
if (*ptr == CHAR_QUESTION_MARK)
{
int i, set, unset, namelen;
int *optset;
@ -6561,17 +6638,6 @@ for (;; ptr++)
switch (*(++ptr))
{
case CHAR_NUMBER_SIGN: /* Comment; skip to ket */
ptr++;
while (*ptr != CHAR_NULL && *ptr != CHAR_RIGHT_PARENTHESIS) ptr++;
if (*ptr == CHAR_NULL)
{
*errorcodeptr = ERR18;
goto FAILED;
}
continue;
/* ------------------------------------------------------------ */
case CHAR_VERTICAL_LINE: /* Reset capture count for each branch */
reset_bracount = TRUE;
@ -6620,8 +6686,13 @@ for (;; ptr++)
if (tempptr[1] == CHAR_QUESTION_MARK &&
(tempptr[2] == CHAR_EQUALS_SIGN ||
tempptr[2] == CHAR_EXCLAMATION_MARK ||
tempptr[2] == CHAR_LESS_THAN_SIGN))
(tempptr[2] == CHAR_LESS_THAN_SIGN &&
(tempptr[3] == CHAR_EQUALS_SIGN ||
tempptr[3] == CHAR_EXCLAMATION_MARK))))
{
cd->iscondassert = TRUE;
break;
}
/* Other conditions use OP_CREF/OP_DNCREF/OP_RREF/OP_DNRREF, and all
need to skip at least 1+IMM2_SIZE bytes at the start of the group. */
@ -6698,8 +6769,7 @@ for (;; ptr++)
ptr++;
}
namelen = (int)(ptr - name);
if (lengthptr != NULL && (options & PCRE_DUPNAMES) != 0)
*lengthptr += IMM2_SIZE;
if (lengthptr != NULL) *lengthptr += IMM2_SIZE;
}
/* Check the terminator */
@ -6735,6 +6805,7 @@ for (;; ptr++)
goto FAILED;
}
PUT2(code, 2+LINK_SIZE, recno);
if (recno > cd->top_backref) cd->top_backref = recno;
break;
}
@ -6757,6 +6828,7 @@ for (;; ptr++)
int offset = i++;
int count = 1;
recno = GET2(slot, 0); /* Number from first found */
if (recno > cd->top_backref) cd->top_backref = recno;
for (; i < cd->names_found; i++)
{
slot += cd->name_entry_size;
@ -7114,11 +7186,11 @@ for (;; ptr++)
if (!is_recurse) cd->namedrefcount++;
/* If duplicate names are permitted, we have to allow for a named
reference to a duplicated name (this cannot be determined until the
second pass). This needs an extra 16-bit data item. */
/* We have to allow for a named reference to a duplicated name (this
cannot be determined until the second pass). This needs an extra
16-bit data item. */
if ((options & PCRE_DUPNAMES) != 0) *lengthptr += IMM2_SIZE;
*lengthptr += IMM2_SIZE;
}
/* In the real compile, search the name table. We check the name
@ -7475,12 +7547,22 @@ for (;; ptr++)
goto FAILED;
}
/* Assertions used not to be repeatable, but this was changed for Perl
compatibility, so all kinds can now be repeated. We copy code into a
/* All assertions used not to be repeatable, but this was changed for Perl
compatibility. All kinds can now be repeated except for assertions that are
conditions (Perl also forbids these to be repeated). We copy code into a
non-register variable (tempcode) in order to be able to pass its address
because some compilers complain otherwise. */
because some compilers complain otherwise. At the start of a conditional
group whose condition is an assertion, cd->iscondassert is set. We unset it
here so as to allow assertions later in the group to be quantified. */
if (bravalue >= OP_ASSERT && bravalue <= OP_ASSERTBACK_NOT &&
cd->iscondassert)
{
previous = NULL;
cd->iscondassert = FALSE;
}
else previous = code;
previous = code; /* For handling repetition */
*code = bravalue;
tempcode = code;
tempreqvary = cd->req_varyopt; /* Save value before bracket */
@ -7727,7 +7809,7 @@ for (;; ptr++)
const pcre_uchar *p;
pcre_uint32 cf;
save_hwm = cd->hwm; /* Normally this is set when '(' is read */
save_hwm_offset = cd->hwm - cd->start_workspace; /* Normally this is set when '(' is read */
terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
CHAR_GREATER_THAN_SIGN : CHAR_APOSTROPHE;
@ -8054,6 +8136,7 @@ int length;
unsigned int orig_bracount;
unsigned int max_bracount;
branch_chain bc;
size_t save_hwm_offset;
/* If set, call the external function that checks for stack availability. */
@ -8071,6 +8154,8 @@ bc.current_branch = code;
firstchar = reqchar = 0;
firstcharflags = reqcharflags = REQ_UNSET;
save_hwm_offset = cd->hwm - cd->start_workspace;
/* Accumulate the length for use in the pre-compile phase. Start with the
length of the BRA and KET and any extra bytes that are required at the
beginning. We accumulate in a local variable to save frequent testing of
@ -8212,7 +8297,7 @@ for (;;)
int fixed_length;
*code = OP_END;
fixed_length = find_fixedlength(last_branch, (options & PCRE_UTF8) != 0,
FALSE, cd);
FALSE, cd, NULL);
DPRINTF(("fixed length = %d\n", fixed_length));
if (fixed_length == -3)
{
@ -8273,7 +8358,7 @@ for (;;)
{
*code = OP_END;
adjust_recurse(start_bracket, 1 + LINK_SIZE,
(options & PCRE_UTF8) != 0, cd, cd->hwm);
(options & PCRE_UTF8) != 0, cd, save_hwm_offset);
memmove(start_bracket + 1 + LINK_SIZE, start_bracket,
IN_UCHARS(code - start_bracket));
*start_bracket = OP_ONCE;
@ -8497,6 +8582,7 @@ do {
case OP_RREF:
case OP_DNRREF:
case OP_DEF:
case OP_FAIL:
return FALSE;
default: /* Assertion */
@ -9081,6 +9167,7 @@ cd->dupnames = FALSE;
cd->namedrefcount = 0;
cd->start_code = cworkspace;
cd->hwm = cworkspace;
cd->iscondassert = FALSE;
cd->start_workspace = cworkspace;
cd->workspace_size = COMPILE_WORK_SIZE;
cd->named_groups = named_groups;
@ -9118,13 +9205,6 @@ if (length > MAX_PATTERN_SIZE)
goto PCRE_EARLY_ERROR_RETURN;
}
/* If there are groups with duplicate names and there are also references by
name, we must allow for the possibility of named references to duplicated
groups. These require an extra data item each. */
if (cd->dupnames && cd->namedrefcount > 0)
length += cd->namedrefcount * IMM2_SIZE * sizeof(pcre_uchar);
/* Compute the size of the data block for storing the compiled pattern. Integer
overflow should no longer be possible because nowadays we limit the maximum
value of cd->names_found and cd->name_entry_size. */
@ -9183,6 +9263,7 @@ cd->name_table = (pcre_uchar *)re + re->name_table_offset;
codestart = cd->name_table + re->name_entry_size * re->name_count;
cd->start_code = codestart;
cd->hwm = (pcre_uchar *)(cd->start_workspace);
cd->iscondassert = FALSE;
cd->req_varyopt = 0;
cd->had_accept = FALSE;
cd->had_pruneorskip = FALSE;
@ -9319,7 +9400,7 @@ if (cd->check_lookbehind)
int end_op = *be;
*be = OP_END;
fixed_length = find_fixedlength(cc, (re->options & PCRE_UTF8) != 0, TRUE,
cd);
cd, NULL);
*be = end_op;
DPRINTF(("fixed length = %d\n", fixed_length));
if (fixed_length < 0)

View file

@ -2736,9 +2736,10 @@ for (;;)
condcode == OP_DNRREF)
return PCRE_ERROR_DFA_UCOND;
/* The DEFINE condition is always false */
/* The DEFINE condition is always false, and the assertion (?!) is
converted to OP_FAIL. */
if (condcode == OP_DEF)
if (condcode == OP_DEF || condcode == OP_FAIL)
{ ADD_ACTIVE(state_offset + codelink + LINK_SIZE + 1, 0); }
/* The only supported version of OP_RREF is for the value RREF_ANY,

View file

@ -1136,93 +1136,81 @@ for (;;)
printf("\n");
#endif
if (offset < md->offset_max)
if (offset >= md->offset_max) goto POSSESSIVE_NON_CAPTURE;
matched_once = FALSE;
code_offset = (int)(ecode - md->start_code);
save_offset1 = md->offset_vector[offset];
save_offset2 = md->offset_vector[offset+1];
save_offset3 = md->offset_vector[md->offset_end - number];
save_capture_last = md->capture_last;
DPRINTF(("saving %d %d %d\n", save_offset1, save_offset2, save_offset3));
/* Each time round the loop, save the current subject position for use
when the group matches. For MATCH_MATCH, the group has matched, so we
restart it with a new subject starting position, remembering that we had
at least one match. For MATCH_NOMATCH, carry on with the alternatives, as
usual. If we haven't matched any alternatives in any iteration, check to
see if a previous iteration matched. If so, the group has matched;
continue from afterwards. Otherwise it has failed; restore the previous
capture values before returning NOMATCH. */
for (;;)
{
matched_once = FALSE;
code_offset = (int)(ecode - md->start_code);
save_offset1 = md->offset_vector[offset];
save_offset2 = md->offset_vector[offset+1];
save_offset3 = md->offset_vector[md->offset_end - number];
save_capture_last = md->capture_last;
DPRINTF(("saving %d %d %d\n", save_offset1, save_offset2, save_offset3));
/* Each time round the loop, save the current subject position for use
when the group matches. For MATCH_MATCH, the group has matched, so we
restart it with a new subject starting position, remembering that we had
at least one match. For MATCH_NOMATCH, carry on with the alternatives, as
usual. If we haven't matched any alternatives in any iteration, check to
see if a previous iteration matched. If so, the group has matched;
continue from afterwards. Otherwise it has failed; restore the previous
capture values before returning NOMATCH. */
for (;;)
md->offset_vector[md->offset_end - number] =
(int)(eptr - md->start_subject);
if (op >= OP_SBRA) md->match_function_type = MATCH_CBEGROUP;
RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md,
eptrb, RM63);
if (rrc == MATCH_KETRPOS)
{
md->offset_vector[md->offset_end - number] =
(int)(eptr - md->start_subject);
if (op >= OP_SBRA) md->match_function_type = MATCH_CBEGROUP;
RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md,
eptrb, RM63);
if (rrc == MATCH_KETRPOS)
offset_top = md->end_offset_top;
ecode = md->start_code + code_offset;
save_capture_last = md->capture_last;
matched_once = TRUE;
mstart = md->start_match_ptr; /* In case \K changed it */
if (eptr == md->end_match_ptr) /* Matched an empty string */
{
offset_top = md->end_offset_top;
ecode = md->start_code + code_offset;
save_capture_last = md->capture_last;
matched_once = TRUE;
mstart = md->start_match_ptr; /* In case \K changed it */
if (eptr == md->end_match_ptr) /* Matched an empty string */
{
do ecode += GET(ecode, 1); while (*ecode == OP_ALT);
break;
}
eptr = md->end_match_ptr;
continue;
do ecode += GET(ecode, 1); while (*ecode == OP_ALT);
break;
}
/* See comment in the code for capturing groups above about handling
THEN. */
if (rrc == MATCH_THEN)
{
next = ecode + GET(ecode,1);
if (md->start_match_ptr < next &&
(*ecode == OP_ALT || *next == OP_ALT))
rrc = MATCH_NOMATCH;
}
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
md->capture_last = save_capture_last;
ecode += GET(ecode, 1);
if (*ecode != OP_ALT) break;
eptr = md->end_match_ptr;
continue;
}
if (!matched_once)
/* See comment in the code for capturing groups above about handling
THEN. */
if (rrc == MATCH_THEN)
{
md->offset_vector[offset] = save_offset1;
md->offset_vector[offset+1] = save_offset2;
md->offset_vector[md->offset_end - number] = save_offset3;
next = ecode + GET(ecode,1);
if (md->start_match_ptr < next &&
(*ecode == OP_ALT || *next == OP_ALT))
rrc = MATCH_NOMATCH;
}
if (allow_zero || matched_once)
{
ecode += 1 + LINK_SIZE;
break;
}
RRETURN(MATCH_NOMATCH);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
md->capture_last = save_capture_last;
ecode += GET(ecode, 1);
if (*ecode != OP_ALT) break;
}
/* FALL THROUGH ... Insufficient room for saving captured contents. Treat
as a non-capturing bracket. */
if (!matched_once)
{
md->offset_vector[offset] = save_offset1;
md->offset_vector[offset+1] = save_offset2;
md->offset_vector[md->offset_end - number] = save_offset3;
}
/* VVVVVVVVVVVVVVVVVVVVVVVVV */
/* VVVVVVVVVVVVVVVVVVVVVVVVV */
if (allow_zero || matched_once)
{
ecode += 1 + LINK_SIZE;
break;
}
DPRINTF(("insufficient capture room: treat as non-capturing\n"));
/* VVVVVVVVVVVVVVVVVVVVVVVVV */
/* VVVVVVVVVVVVVVVVVVVVVVVVV */
RRETURN(MATCH_NOMATCH);
/* Non-capturing possessive bracket with unlimited repeat. We come here
from BRAZERO with allow_zero = TRUE. The code is similar to the above,
@ -1388,6 +1376,7 @@ for (;;)
break;
case OP_DEF: /* DEFINE - always false */
case OP_FAIL: /* From optimized (?!) condition */
break;
/* The condition is an assertion. Call match() to evaluate it - setting
@ -1404,8 +1393,11 @@ for (;;)
condition = TRUE;
/* Advance ecode past the assertion to the start of the first branch,
but adjust it so that the general choosing code below works. */
but adjust it so that the general choosing code below works. If the
assertion has a quantifier that allows zero repeats we must skip over
the BRAZERO. This is a lunatic thing to do, but somebody did! */
if (*ecode == OP_BRAZERO) ecode++;
ecode += GET(ecode, 1);
while (*ecode == OP_ALT) ecode += GET(ecode, 1);
ecode += 1 + LINK_SIZE - PRIV(OP_lengths)[condcode];
@ -1474,7 +1466,18 @@ for (;;)
md->offset_vector[offset] =
md->offset_vector[md->offset_end - number];
md->offset_vector[offset+1] = (int)(eptr - md->start_subject);
if (offset_top <= offset) offset_top = offset + 2;
/* If this group is at or above the current highwater mark, ensure that
any groups between the current high water mark and this group are marked
unset and then update the high water mark. */
if (offset >= offset_top)
{
register int *iptr = md->offset_vector + offset_top;
register int *iend = md->offset_vector + offset;
while (iptr < iend) *iptr++ = -1;
offset_top = offset + 2;
}
}
ecode += 1 + IMM2_SIZE;
break;
@ -1826,7 +1829,11 @@ for (;;)
are defined in a range that can be tested for. */
if (rrc >= MATCH_BACKTRACK_MIN && rrc <= MATCH_BACKTRACK_MAX)
{
if (new_recursive.offset_save != stacksave)
(PUBL(free))(new_recursive.offset_save);
RRETURN(MATCH_NOMATCH);
}
/* Any return code other than NOMATCH is an error. */
@ -3476,7 +3483,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM23);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
#ifdef SUPPORT_UCP
@ -3897,7 +3904,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM30);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--;
@ -4032,7 +4039,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM34);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--;
@ -5603,7 +5610,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM44);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--;
@ -5645,12 +5652,17 @@ for (;;)
if (possessive) continue; /* No backtracking */
/* We use <= pp rather than == pp to detect the start of the run while
backtracking because the use of \C in UTF mode can cause BACKCHAR to
move back past pp. This is just palliative; the use of \C in UTF mode
is fraught with danger. */
for(;;)
{
int lgb, rgb;
PCRE_PUCHAR fptr;
if (eptr == pp) goto TAIL_RECURSE; /* At start of char run */
if (eptr <= pp) goto TAIL_RECURSE; /* At start of char run */
RMATCH(eptr, ecode, offset_top, md, eptrb, RM45);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
@ -5668,7 +5680,7 @@ for (;;)
for (;;)
{
if (eptr == pp) goto TAIL_RECURSE; /* At start of char run */
if (eptr <= pp) goto TAIL_RECURSE; /* At start of char run */
fptr = eptr - 1;
if (!utf) c = *fptr; else
{
@ -5918,7 +5930,7 @@ for (;;)
if (possessive) continue; /* No backtracking */
for(;;)
{
if (eptr == pp) goto TAIL_RECURSE;
if (eptr <= pp) goto TAIL_RECURSE;
RMATCH(eptr, ecode, offset_top, md, eptrb, RM46);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--;

View file

@ -2446,6 +2446,7 @@ typedef struct compile_data {
BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */
BOOL check_lookbehind; /* Lookbehinds need later checking */
BOOL dupnames; /* Duplicate names exist */
BOOL iscondassert; /* Next assert is a condition */
int nltype; /* Newline type */
int nllen; /* Newline string length */
pcre_uchar nl[4]; /* Newline string when fixed length */
@ -2459,6 +2460,13 @@ typedef struct branch_chain {
pcre_uchar *current_branch;
} branch_chain;
/* Structure for mutual recursion detection. */
typedef struct recurse_check {
struct recurse_check *prev;
const pcre_uchar *group;
} recurse_check;
/* Structure for items in a linked list that represents an explicit recursive
call within the pattern; used by pcre_exec(). */

File diff suppressed because it is too large Load diff

View file

@ -51,8 +51,6 @@ POSSIBILITY OF SUCH DAMAGE.
#include "pcre_internal.h"
#define PCRE_BUG 0x80000000
/*
Letter characters:
\xe6\x92\xad = 0x64ad = 25773 (kanji)
@ -69,6 +67,9 @@ POSSIBILITY OF SUCH DAMAGE.
\xc3\x89 = 0xc9 = 201 (E')
\xc3\xa1 = 0xe1 = 225 (a')
\xc3\x81 = 0xc1 = 193 (A')
\x53 = 0x53 = S
\x73 = 0x73 = s
\xc5\xbf = 0x17f = 383 (long S)
\xc8\xba = 0x23a = 570
\xe2\xb1\xa5 = 0x2c65 = 11365
\xe1\xbd\xb8 = 0x1f78 = 8056
@ -78,6 +79,10 @@ POSSIBILITY OF SUCH DAMAGE.
\xc7\x84 = 0x1c4 = 452
\xc7\x85 = 0x1c5 = 453
\xc7\x86 = 0x1c6 = 454
Caseless sets:
ucp_Armenian - \x{531}-\x{556} -> \x{561}-\x{586}
ucp_Coptic - \x{2c80}-\x{2ce3} -> caseless: XOR 0x1
ucp_Latin - \x{ff21}-\x{ff3a} -> \x{ff41]-\x{ff5a}
Mark property:
\xcc\x8d = 0x30d = 781
@ -626,6 +631,9 @@ static struct regression_test_case regression_test_cases[] = {
{ MUA, 0, "(?P<Name>a)?(?P<Name2>b)?(?(Name)c|d)+?dd", "bcabcacdb bdddd" },
{ MUA, 0, "(?P<Name>a)?(?P<Name2>b)?(?(Name)c|d)+l", "ababccddabdbccd abcccl" },
{ MUA, 0, "((?:a|aa)(?(1)aaa))x", "aax" },
{ MUA, 0, "(?(?!)a|b)", "ab" },
{ MUA, 0, "(?(?!)a)", "ab" },
{ MUA, 0 | F_NOMATCH, "(?(?!)a|b)", "ac" },
/* Set start of match. */
{ MUA, 0, "(?:\\Ka)*aaaab", "aaaaaaaa aaaaaaabb" },
@ -944,7 +952,7 @@ static void setstack16(pcre16_extra *extra)
pcre16_assign_jit_stack(extra, callback16, getstack16());
}
#endif /* SUPPORT_PCRE8 */
#endif /* SUPPORT_PCRE16 */
#ifdef SUPPORT_PCRE32
static pcre32_jit_stack *stack32;
@ -967,7 +975,7 @@ static void setstack32(pcre32_extra *extra)
pcre32_assign_jit_stack(extra, callback32, getstack32());
}
#endif /* SUPPORT_PCRE8 */
#endif /* SUPPORT_PCRE32 */
#ifdef SUPPORT_PCRE16
@ -1177,7 +1185,7 @@ static int regression_tests(void)
#elif defined SUPPORT_PCRE16
pcre16_config(PCRE_CONFIG_UTF16, &utf);
pcre16_config(PCRE_CONFIG_UNICODE_PROPERTIES, &ucp);
#elif defined SUPPORT_PCRE16
#elif defined SUPPORT_PCRE32
pcre32_config(PCRE_CONFIG_UTF32, &utf);
pcre32_config(PCRE_CONFIG_UNICODE_PROPERTIES, &ucp);
#endif

View file

@ -70,7 +70,7 @@ Arguments:
code pointer to start of group (the bracket)
startcode pointer to start of the whole pattern's code
options the compiling options
int RECURSE depth
recurses chain of recurse_check to catch mutual recursion
Returns: the minimum length
-1 if \C in UTF-8 mode or (*ACCEPT) was encountered
@ -80,12 +80,13 @@ Returns: the minimum length
static int
find_minlength(const REAL_PCRE *re, const pcre_uchar *code,
const pcre_uchar *startcode, int options, int recurse_depth)
const pcre_uchar *startcode, int options, recurse_check *recurses)
{
int length = -1;
/* PCRE_UTF16 has the same value as PCRE_UTF8. */
BOOL utf = (options & PCRE_UTF8) != 0;
BOOL had_recurse = FALSE;
recurse_check this_recurse;
register int branchlength = 0;
register pcre_uchar *cc = (pcre_uchar *)code + 1 + LINK_SIZE;
@ -130,7 +131,7 @@ for (;;)
case OP_SBRAPOS:
case OP_ONCE:
case OP_ONCE_NC:
d = find_minlength(re, cc, startcode, options, recurse_depth);
d = find_minlength(re, cc, startcode, options, recurses);
if (d < 0) return d;
branchlength += d;
do cc += GET(cc, 1); while (*cc == OP_ALT);
@ -393,7 +394,7 @@ for (;;)
ce = cs = (pcre_uchar *)PRIV(find_bracket)(startcode, utf, GET2(slot, 0));
if (cs == NULL) return -2;
do ce += GET(ce, 1); while (*ce == OP_ALT);
if (cc > cs && cc < ce)
if (cc > cs && cc < ce) /* Simple recursion */
{
d = 0;
had_recurse = TRUE;
@ -401,8 +402,22 @@ for (;;)
}
else
{
int dd = find_minlength(re, cs, startcode, options, recurse_depth);
if (dd < d) d = dd;
recurse_check *r = recurses;
for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break;
if (r != NULL) /* Mutual recursion */
{
d = 0;
had_recurse = TRUE;
break;
}
else
{
int dd;
this_recurse.prev = recurses;
this_recurse.group = cs;
dd = find_minlength(re, cs, startcode, options, &this_recurse);
if (dd < d) d = dd;
}
}
slot += re->name_entry_size;
}
@ -418,14 +433,26 @@ for (;;)
ce = cs = (pcre_uchar *)PRIV(find_bracket)(startcode, utf, GET2(cc, 1));
if (cs == NULL) return -2;
do ce += GET(ce, 1); while (*ce == OP_ALT);
if (cc > cs && cc < ce)
if (cc > cs && cc < ce) /* Simple recursion */
{
d = 0;
had_recurse = TRUE;
}
else
{
d = find_minlength(re, cs, startcode, options, recurse_depth);
recurse_check *r = recurses;
for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break;
if (r != NULL) /* Mutual recursion */
{
d = 0;
had_recurse = TRUE;
}
else
{
this_recurse.prev = recurses;
this_recurse.group = cs;
d = find_minlength(re, cs, startcode, options, &this_recurse);
}
}
}
else d = 0;
@ -474,12 +501,21 @@ for (;;)
case OP_RECURSE:
cs = ce = (pcre_uchar *)startcode + GET(cc, 1);
do ce += GET(ce, 1); while (*ce == OP_ALT);
if ((cc > cs && cc < ce) || recurse_depth > 10)
if (cc > cs && cc < ce) /* Simple recursion */
had_recurse = TRUE;
else
{
branchlength += find_minlength(re, cs, startcode, options,
recurse_depth + 1);
recurse_check *r = recurses;
for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break;
if (r != NULL) /* Mutual recursion */
had_recurse = TRUE;
else
{
this_recurse.prev = recurses;
this_recurse.group = cs;
branchlength += find_minlength(re, cs, startcode, options,
&this_recurse);
}
}
cc += 1 + LINK_SIZE;
break;
@ -1503,7 +1539,7 @@ if ((re->options & PCRE_ANCHORED) == 0 &&
/* Find the minimum length of subject string. */
switch(min = find_minlength(re, code, code, re->options, 0))
switch(min = find_minlength(re, code, code, re->options, NULL))
{
case -2: *errorptr = "internal error: missing capturing bracket"; return NULL;
case -3: *errorptr = "internal error: opcode not recognized"; return NULL;

View file

@ -1582,12 +1582,15 @@ while (ptr < endptr)
int endlinelength;
int mrc = 0;
int startoffset = 0;
int prevoffsets[2];
unsigned int options = 0;
BOOL match;
char *matchptr = ptr;
char *t = ptr;
size_t length, linelength;
prevoffsets[0] = prevoffsets[1] = -1;
/* At this point, ptr is at the start of a line. We need to find the length
of the subject string to pass to pcre_exec(). In multiline mode, it is the
length remainder of the data in the buffer. Otherwise, it is the length of
@ -1729,55 +1732,86 @@ while (ptr < endptr)
{
if (!invert)
{
if (printname != NULL) fprintf(stdout, "%s:", printname);
if (number) fprintf(stdout, "%d:", linenumber);
int oldstartoffset = startoffset;
/* Handle --line-offsets */
/* It is possible, when a lookbehind assertion contains \K, for the
same string to be found again. The code below advances startoffset, but
until it is past the "bumpalong" offset that gave the match, the same
substring will be returned. The PCRE1 library does not return the
bumpalong offset, so all we can do is ignore repeated strings. (PCRE2
does this better.) */
if (line_offsets)
fprintf(stdout, "%d,%d\n", (int)(matchptr + offsets[0] - ptr),
offsets[1] - offsets[0]);
/* Handle --file-offsets */
else if (file_offsets)
fprintf(stdout, "%d,%d\n",
(int)(filepos + matchptr + offsets[0] - ptr),
offsets[1] - offsets[0]);
/* Handle --only-matching, which may occur many times */
else
if (prevoffsets[0] != offsets[0] || prevoffsets[1] != offsets[1])
{
BOOL printed = FALSE;
omstr *om;
prevoffsets[0] = offsets[0];
prevoffsets[1] = offsets[1];
for (om = only_matching; om != NULL; om = om->next)
if (printname != NULL) fprintf(stdout, "%s:", printname);
if (number) fprintf(stdout, "%d:", linenumber);
/* Handle --line-offsets */
if (line_offsets)
fprintf(stdout, "%d,%d\n", (int)(matchptr + offsets[0] - ptr),
offsets[1] - offsets[0]);
/* Handle --file-offsets */
else if (file_offsets)
fprintf(stdout, "%d,%d\n",
(int)(filepos + matchptr + offsets[0] - ptr),
offsets[1] - offsets[0]);
/* Handle --only-matching, which may occur many times */
else
{
int n = om->groupnum;
if (n < mrc)
BOOL printed = FALSE;
omstr *om;
for (om = only_matching; om != NULL; om = om->next)
{
int plen = offsets[2*n + 1] - offsets[2*n];
if (plen > 0)
int n = om->groupnum;
if (n < mrc)
{
if (printed) fprintf(stdout, "%s", om_separator);
if (do_colour) fprintf(stdout, "%c[%sm", 0x1b, colour_string);
FWRITE(matchptr + offsets[n*2], 1, plen, stdout);
if (do_colour) fprintf(stdout, "%c[00m", 0x1b);
printed = TRUE;
int plen = offsets[2*n + 1] - offsets[2*n];
if (plen > 0)
{
if (printed) fprintf(stdout, "%s", om_separator);
if (do_colour) fprintf(stdout, "%c[%sm", 0x1b, colour_string);
FWRITE(matchptr + offsets[n*2], 1, plen, stdout);
if (do_colour) fprintf(stdout, "%c[00m", 0x1b);
printed = TRUE;
}
}
}
}
if (printed || printname != NULL || number) fprintf(stdout, "\n");
if (printed || printname != NULL || number) fprintf(stdout, "\n");
}
}
/* Prepare to repeat to find the next match */
/* Prepare to repeat to find the next match. If the patterned contained
a lookbehind tht included \K, it is possible that the end of the match
might be at or before the actual strting offset we have just used. We
need to start one character further on. Unfortunately, for unanchored
patterns, the actual start offset can be greater that the one that was
set as a result of "bumpalong". PCRE1 does not return the actual start
offset, so we have to check against the original start offset. This may
lead to duplicates - we we need the fudge above to avoid printing them.
(PCRE2 does this better.) */
match = FALSE;
if (line_buffered) fflush(stdout);
rc = 0; /* Had some success */
startoffset = offsets[1]; /* Restart after the match */
if (startoffset <= oldstartoffset)
{
if ((size_t)startoffset >= length)
goto END_ONE_MATCH; /* We were at the end */
startoffset = oldstartoffset + 1;
if (utf8)
while ((matchptr[startoffset] & 0xc0) == 0x80) startoffset++;
}
goto ONLY_MATCHING_RESTART;
}
}
@ -1974,6 +2008,7 @@ while (ptr < endptr)
/* Advance to after the newline and increment the line number. The file
offset to the current line is maintained in filepos. */
END_ONE_MATCH:
ptr += linelength + endlinelength;
filepos += (int)(linelength + endlinelength);
linenumber++;

View file

@ -2257,16 +2257,19 @@ if (callout_extra)
fprintf(f, "Callout %d: last capture = %d\n",
cb->callout_number, cb->capture_last);
for (i = 0; i < cb->capture_top * 2; i += 2)
if (cb->offset_vector != NULL)
{
if (cb->offset_vector[i] < 0)
fprintf(f, "%2d: <unset>\n", i/2);
else
for (i = 0; i < cb->capture_top * 2; i += 2)
{
fprintf(f, "%2d: ", i/2);
PCHARSV(cb->subject, cb->offset_vector[i],
cb->offset_vector[i+1] - cb->offset_vector[i], f);
fprintf(f, "\n");
if (cb->offset_vector[i] < 0)
fprintf(f, "%2d: <unset>\n", i/2);
else
{
fprintf(f, "%2d: ", i/2);
PCHARSV(cb->subject, cb->offset_vector[i],
cb->offset_vector[i+1] - cb->offset_vector[i], f);
fprintf(f, "\n");
}
}
}
}
@ -2519,7 +2522,7 @@ re->name_entry_size = swap_uint16(re->name_entry_size);
re->name_count = swap_uint16(re->name_count);
re->ref_count = swap_uint16(re->ref_count);
if (extra != NULL)
if (extra != NULL && (extra->flags & PCRE_EXTRA_STUDY_DATA) != 0)
{
pcre_study_data *rsd = (pcre_study_data *)(extra->study_data);
rsd->size = swap_uint32(rsd->size);
@ -2700,7 +2703,7 @@ re->name_entry_size = swap_uint16(re->name_entry_size);
re->name_count = swap_uint16(re->name_count);
re->ref_count = swap_uint16(re->ref_count);
if (extra != NULL)
if (extra != NULL && (extra->flags & PCRE_EXTRA_STUDY_DATA) != 0)
{
pcre_study_data *rsd = (pcre_study_data *)(extra->study_data);
rsd->size = swap_uint32(rsd->size);
@ -3453,7 +3456,7 @@ while (!done)
pcre_extra *extra = NULL;
#if !defined NOPOSIX /* There are still compilers that require no indent */
regex_t preg;
regex_t preg = { NULL, 0, 0} ;
int do_posix = 0;
#endif
@ -5603,6 +5606,12 @@ while (!done)
if (!do_g && !do_G) break;
if (use_offsets == NULL)
{
fprintf(outfile, "Cannot do global matching without an ovector\n");
break;
}
/* If we have matched an empty string, first check to see if we are at
the end of the subject. If so, the /g loop is over. Otherwise, mimic what
Perl's /g options does. This turns out to be rather cunning. First we set
@ -5618,9 +5627,33 @@ while (!done)
g_notempty = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
}
/* For /g, update the start offset, leaving the rest alone */
/* For /g, update the start offset, leaving the rest alone. There is a
tricky case when \K is used in a positive lookbehind assertion. This can
cause the end of the match to be less than or equal to the start offset.
In this case we restart at one past the start offset. This may return the
same match if the original start offset was bumped along during the
match, but eventually the new start offset will hit the actual start
offset. (In PCRE2 the true start offset is available, and this can be
done better. It is not worth doing more than making sure we do not loop
at this stage in the life of PCRE1.) */
if (do_g) start_offset = use_offsets[1];
if (do_g)
{
if (g_notempty == 0 && use_offsets[1] <= start_offset)
{
if (start_offset >= len) break; /* End of subject */
start_offset++;
if (use_utf)
{
while (start_offset < len)
{
if ((bptr[start_offset] & 0xc0) != 0x80) break;
start_offset++;
}
}
}
else start_offset = use_offsets[1];
}
/* For /G, update the pointer and length */
@ -5637,7 +5670,7 @@ while (!done)
CONTINUE:
#if !defined NOPOSIX
if (posix || do_posix) regfree(&preg);
if ((posix || do_posix) && preg.re_pcre != 0) regfree(&preg);
#endif
if (re != NULL) new_free(re);

View file

@ -743,3 +743,11 @@ RC=0
---------------------------- Test 106 -----------------------------
a
RC=0
---------------------------- Test 107 -----------------------------
1:0,1
2:0,1
2:1,1
2:2,1
2:3,1
2:4,1
RC=0

View file

@ -5720,4 +5720,14 @@ AbcdCBefgBhiBqz
/[\Q]a\E]+/
aa]]
/(?:((abcd))|(((?:(?:(?:(?:abc|(?:abcdef))))b)abcdefghi)abc)|((*ACCEPT)))/
1234abcd
/(\2)(\1)/
"Z*(|d*){216}"
"(?1)(?#?'){8}(a)"
baaaaaaaaac
/-- End of testinput1 --/

View file

@ -134,4 +134,6 @@ is required for these tests. --/
/(((a\2)|(a*)\g<-1>))*a?/B
/((?+1)(\1))/B
/-- End of testinput11 --/

View file

@ -87,4 +87,12 @@ and a couple of things that are different with JIT. --/
/^12345678abcd/mS++
12345678abcd
/-- Test pattern compilation --/
/(?:a|b|c|d|e)(?R)/S++
/(?:a|b|c|d|e)(?R)(?R)/S++
/(a(?:a|b|c|d|e)b){8,16}/S++
/-- End of testinput12 --/

View file

@ -1380,6 +1380,8 @@
1X
123456\P
//KF>/dev/null
/abc/IS>testsavedregex
<testsavedregex
abc
@ -4078,4 +4080,76 @@ backtracking verbs. --/
/\x{whatever}/
"((?=(?(?=(?(?=(?(?=()))))))))"
a
"(?(?=)==)(((((((((?=)))))))))"
a
/^(?:(a)|b)(?(1)A|B)/I
aA123\O3
aA123\O6
'^(?:(?<AA>a)|b)(?(<AA>)A|B)'
aA123\O3
aA123\O6
'^(?<AA>)(?:(?<AA>a)|b)(?(<AA>)A|B)'J
aA123\O3
aA123\O6
'^(?:(?<AA>X)|)(?:(?<AA>a)|b)\k{AA}'J
aa123\O3
aa123\O6
/(?<N111>(?J)(?<N111>1(111111)11|)1|1|)(?(<N111>)1)/
/(?(?=0)?)+/
/(?(?=0)(?=00)?00765)/
00765
/(?(?=0)(?=00)?00765|(?!3).56)/
00765
456
** Failers
356
'^(a)*+(\w)'
g
g\O3
'^(?:a)*+(\w)'
g
g\O3
//C
\O\C+
"((?2){0,1999}())?"
/((?+1)(\1))/BZ
/(?(?!)a|b)/
bbb
aaa
"((?2)+)((?1))"
"(?(?<E>.*!.*)?)"
"X((?2)()*+){2}+"BZ
"X((?2)()*+){2}"BZ
"(?<=((?2))((?1)))"
/(?<=\Ka)/g+
aaaaa
/(?<=\Ka)/G+
aaaaa
/((?2){73}(?2))((?1))/
/-- End of testinput2 --/

View file

@ -722,4 +722,9 @@
/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/8
#\x{10000}#\x{100}#\x{10ffff}#
"[\S\V\H]"8
/\C(\W?ſ)'?{{/8
\\C(\\W?ſ)'?{{
/-- End of testinput4 --/

View file

@ -790,4 +790,12 @@
/[b-d\x{200}-\x{250}]*[ae-h]?#[\x{200}-\x{250}]{0,8}[\x00-\xff]*#[\x{200}-\x{250}]+[a-z]/8BZ
/[^\xff]*PRUNE:\x{100}abc(xyz(?1))/8DZ
/(?<=\K\x{17f})/8g+
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
/(?<=\K\x{17f})/8G+
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
/-- End of testinput5 --/

View file

@ -1496,4 +1496,10 @@
/^s?c/mi8
scat
/[A-`]/i8
abcdefghijklmno
/\C\X*QT/8
Ӆ\x0aT
/-- End of testinput6 --/

View file

@ -4837,4 +4837,8 @@
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
/(?(?!)a|b)/
bbb
aaa
/-- End of testinput8 --/

View file

@ -9411,4 +9411,22 @@ No match
aa]]
0: aa]]
/(?:((abcd))|(((?:(?:(?:(?:abc|(?:abcdef))))b)abcdefghi)abc)|((*ACCEPT)))/
1234abcd
0:
1: <unset>
2: <unset>
3: <unset>
4: <unset>
5:
/(\2)(\1)/
"Z*(|d*){216}"
"(?1)(?#?'){8}(a)"
baaaaaaaaac
0: aaaaaaaaa
1: a
/-- End of testinput1 --/

View file

@ -231,7 +231,7 @@ Memory allocation (code space): 73
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 57
Memory allocation (code space): 61
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
@ -733,4 +733,19 @@ Memory allocation (code space): 14
41 End
------------------------------------------------------------------
/((?+1)(\1))/B
------------------------------------------------------------------
0 20 Bra
2 16 Once
4 12 CBra 1
7 9 Recurse
9 5 CBra 2
12 \1
14 5 Ket
16 12 Ket
18 16 Ket
20 20 Ket
22 End
------------------------------------------------------------------
/-- End of testinput11 --/

View file

@ -231,7 +231,7 @@ Memory allocation (code space): 155
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 117
Memory allocation (code space): 125
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
@ -733,4 +733,19 @@ Memory allocation (code space): 28
41 End
------------------------------------------------------------------
/((?+1)(\1))/B
------------------------------------------------------------------
0 20 Bra
2 16 Once
4 12 CBra 1
7 9 Recurse
9 5 CBra 2
12 \1
14 5 Ket
16 12 Ket
18 16 Ket
20 20 Ket
22 End
------------------------------------------------------------------
/-- End of testinput11 --/

View file

@ -231,7 +231,7 @@ Memory allocation (code space): 45
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 34
Memory allocation (code space): 38
------------------------------------------------------------------
0 30 Bra
3 7 CBra 1
@ -733,4 +733,19 @@ Memory allocation (code space): 10
60 End
------------------------------------------------------------------
/((?+1)(\1))/B
------------------------------------------------------------------
0 31 Bra
3 25 Once
6 19 CBra 1
11 14 Recurse
14 8 CBra 2
19 \1
22 8 Ket
25 19 Ket
28 25 Ket
31 31 Ket
34 End
------------------------------------------------------------------
/-- End of testinput11 --/

View file

@ -176,4 +176,12 @@ No match, mark = m (JIT)
12345678abcd
0: 12345678abcd (JIT)
/-- Test pattern compilation --/
/(?:a|b|c|d|e)(?R)/S++
/(?:a|b|c|d|e)(?R)(?R)/S++
/(a(?:a|b|c|d|e)b){8,16}/S++
/-- End of testinput12 --/

View file

@ -561,7 +561,7 @@ Failed: assertion expected after (?( at offset 3
Failed: reference to non-existent subpattern at offset 7
/(?(?<ab))/
Failed: syntax error in subpattern name (missing terminator) at offset 7
Failed: assertion expected after (?( at offset 3
/((?s)blah)\s+\1/I
Capturing subpattern count = 1
@ -1566,30 +1566,35 @@ Need char = 'b'
/a(?(1)b)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
No need char
/a(?(1)bag|big)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
Need char = 'g'
/a(?(1)bag|big)*(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
No need char
/a(?(1)bag|big)+(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
Need char = 'g'
/a(?(1)b..|b..)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
Need char = 'b'
@ -3379,24 +3384,28 @@ Need char = 'a'
/(?(1)ab|ac)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
No need char
/(?(1)abz|acz)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
First char = 'a'
Need char = 'z'
/(?(1)abz)(.)/I
Capturing subpattern count = 1
Max back reference = 1
No options
No first char
No need char
/(?(1)abz)(1)23/I
Capturing subpattern count = 1
Max back reference = 1
No options
No first char
Need char = '3'
@ -5605,6 +5614,10 @@ No match
123456\P
No match
//KF>/dev/null
Compiled pattern written to /dev/null
Study data written to /dev/null
/abc/IS>testsavedregex
Capturing subpattern count = 0
No options
@ -6336,6 +6349,7 @@ No need char
/^(?P<A>a)?(?(A)a|b)/I
Capturing subpattern count = 1
Max back reference = 1
Named capturing subpatterns:
A 1
Options: anchored
@ -6353,6 +6367,7 @@ No match
/(?:(?(ZZ)a|b)(?P<ZZ>X))+/I
Capturing subpattern count = 1
Max back reference = 1
Named capturing subpatterns:
ZZ 1
No options
@ -6370,6 +6385,7 @@ Failed: reference to non-existent subpattern at offset 9
/(?:(?(ZZ)a|b)(?(ZZ)a|b)(?P<ZZ>X))+/I
Capturing subpattern count = 1
Max back reference = 1
Named capturing subpatterns:
ZZ 1
No options
@ -6381,6 +6397,7 @@ Need char = 'X'
/(?:(?(ZZ)a|\(b\))\\(?P<ZZ>X))+/I
Capturing subpattern count = 1
Max back reference = 1
Named capturing subpatterns:
ZZ 1
No options
@ -10226,6 +10243,7 @@ No starting char list
(?(1)|.) # check that there was an empty component
/xiIS
Capturing subpattern count = 1
Max back reference = 1
Options: anchored caseless extended
No first char
Need char = ':'
@ -10255,6 +10273,7 @@ Failed: different names for subpatterns of the same number are not allowed at of
b(?<quote> (?<apostrophe>')|(?<realquote>")) )
(?('quote')[a-z]+|[0-9]+)/JIx
Capturing subpattern count = 6
Max back reference = 1
Named capturing subpatterns:
apostrophe 2
apostrophe 5
@ -10317,6 +10336,7 @@ No match
End
------------------------------------------------------------------
Capturing subpattern count = 4
Max back reference = 4
Named capturing subpatterns:
D 4
D 1
@ -10364,6 +10384,7 @@ No match
End
------------------------------------------------------------------
Capturing subpattern count = 4
Max back reference = 1
Named capturing subpatterns:
A 1
A 4
@ -10486,6 +10507,7 @@ No starting char list
/()i(?(1)a)/SI
Capturing subpattern count = 1
Max back reference = 1
No options
No first char
Need char = 'i'
@ -14206,4 +14228,199 @@ Failed: digits missing in \x{} or \o{} at offset 3
/\x{whatever}/
Failed: non-hex character in \x{} (closing brace missing?) at offset 3
"((?=(?(?=(?(?=(?(?=()))))))))"
a
0:
1:
2:
"(?(?=)==)(((((((((?=)))))))))"
a
No match
/^(?:(a)|b)(?(1)A|B)/I
Capturing subpattern count = 1
Max back reference = 1
Options: anchored
No first char
No need char
aA123\O3
Matched, but too many substrings
0: aA
aA123\O6
0: aA
1: a
'^(?:(?<AA>a)|b)(?(<AA>)A|B)'
aA123\O3
Matched, but too many substrings
0: aA
aA123\O6
0: aA
1: a
'^(?<AA>)(?:(?<AA>a)|b)(?(<AA>)A|B)'J
aA123\O3
Matched, but too many substrings
0: aA
aA123\O6
Matched, but too many substrings
0: aA
1:
'^(?:(?<AA>X)|)(?:(?<AA>a)|b)\k{AA}'J
aa123\O3
Matched, but too many substrings
0: aa
aa123\O6
Matched, but too many substrings
0: aa
1: <unset>
/(?<N111>(?J)(?<N111>1(111111)11|)1|1|)(?(<N111>)1)/
/(?(?=0)?)+/
Failed: nothing to repeat at offset 7
/(?(?=0)(?=00)?00765)/
00765
0: 00765
/(?(?=0)(?=00)?00765|(?!3).56)/
00765
0: 00765
456
0: 456
** Failers
No match
356
No match
'^(a)*+(\w)'
g
0: g
1: <unset>
2: g
g\O3
Matched, but too many substrings
0: g
'^(?:a)*+(\w)'
g
0: g
1: g
g\O3
Matched, but too many substrings
0: g
//C
\O\C+
Callout 255: last capture = -1
--->
+0 ^
Matched, but too many substrings
"((?2){0,1999}())?"
/((?+1)(\1))/BZ
------------------------------------------------------------------
Bra
Once
CBra 1
Recurse
CBra 2
\1
Ket
Ket
Ket
Ket
End
------------------------------------------------------------------
/(?(?!)a|b)/
bbb
0: b
aaa
No match
"((?2)+)((?1))"
"(?(?<E>.*!.*)?)"
Failed: assertion expected after (?( at offset 3
"X((?2)()*+){2}+"BZ
------------------------------------------------------------------
Bra
X
Once
CBra 1
Recurse
Braposzero
SCBraPos 2
KetRpos
Ket
CBra 1
Recurse
Braposzero
SCBraPos 2
KetRpos
Ket
Ket
Ket
End
------------------------------------------------------------------
"X((?2)()*+){2}"BZ
------------------------------------------------------------------
Bra
X
CBra 1
Recurse
Braposzero
SCBraPos 2
KetRpos
Ket
CBra 1
Recurse
Braposzero
SCBraPos 2
KetRpos
Ket
Ket
End
------------------------------------------------------------------
"(?<=((?2))((?1)))"
Failed: lookbehind assertion is not fixed length at offset 17
/(?<=\Ka)/g+
aaaaa
0: a
0+ aaaa
0: a
0+ aaaa
0: a
0+ aaa
0: a
0+ aa
0: a
0+ a
0: a
0+
/(?<=\Ka)/G+
aaaaa
0: a
0+ aaaa
0: a
0+ aaa
0: a
0+ aa
0: a
0+ a
0: a
0+
/((?2){73}(?2))((?1))/
/-- End of testinput2 --/

View file

@ -1271,4 +1271,10 @@ No match
#\x{10000}#\x{100}#\x{10ffff}#
0: #\x{10000}#\x{100}#\x{10ffff}#
"[\S\V\H]"8
/\C(\W?ſ)'?{{/8
\\C(\\W?ſ)'?{{
No match
/-- End of testinput4 --/

View file

@ -1897,4 +1897,49 @@ Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 5
End
------------------------------------------------------------------
/[^\xff]*PRUNE:\x{100}abc(xyz(?1))/8DZ
------------------------------------------------------------------
Bra
[^\x{ff}]*
PRUNE:\x{100}abc
CBra 1
xyz
Recurse
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf
No first char
Need char = 'z'
/(?<=\K\x{17f})/8g+
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}
0: \x{17f}
0+
/(?<=\K\x{17f})/8G+
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}\x{17f}
0: \x{17f}
0+ \x{17f}
0: \x{17f}
0+
/-- End of testinput5 --/

View file

@ -2461,4 +2461,12 @@ No match
scat
0: sc
/[A-`]/i8
abcdefghijklmno
0: a
/\C\X*QT/8
Ӆ\x0aT
No match
/-- End of testinput6 --/

View file

@ -7785,4 +7785,10 @@ Matched, but offsets vector is too small to show all matches
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
/(?(?!)a|b)/
bbb
0: b
aaa
No match
/-- End of testinput8 --/