WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
56082
Look-ahead assertions with back references don’t work as expected
https://bugs.webkit.org/show_bug.cgi?id=56082
Summary
Look-ahead assertions with back references don’t work as expected
Markus Wulftange
Reported
2011-03-10 01:57:21 PST
There seems to be a bug with look-ahead assertions that use a back reference: "P11".match(/^P([1-6])(?=\1)([1-6])$/) This obviously should match as there is another 1 following the first 1 (matched string of the first group). In opposite to this, using a negated look-ahead assertion instead does match although there is another 1 following: "P11".match(/^P([1-6])(?!\1)([1-6])$/)
Attachments
Patch that tests assertion pre-checked character length independent from the remaining expression
(10.38 KB, patch)
2011-03-14 15:25 PDT
,
Michael Saboff
barraclough
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2011-03-10 10:55:56 PST
Is this a regression from Safari 5 that you're seeing in nightly builds?
Alexey Proskuryakov
Comment 2
2011-03-10 10:56:24 PST
Is this a regression from Safari 5 that you're seeing in nightly builds?
Gavin Barraclough
Comment 3
2011-03-10 11:20:47 PST
Reduction to /(x)(?=\1)x/.exec("xx") also fails.
Gavin Barraclough
Comment 4
2011-03-10 14:46:43 PST
The bug relates to the pre-checking of input. In the reduced case of /(x)(?=\1)x/.exec("xx") , we: (1) check 2 characters are available (increments input position by 2, to end of string) (2) check the initial capturing 'x' (at input offset -2) (3) enter the parenthetical assertion (4) try to check the back reference at offset -1. The first step of trying to mach the back-ref is to check that matchSize (1) bytes are available to check. However they are not, since they have been pre-checked for allow for the part of the pattern after the parenthetical assertion. (this can be demonstrated by testing with input "xxy" - the presence of the 'y' allows the matchSize check to pass, and then the back ref test works correctly). A simple fix would appear to be to make the back reference only check matchSize bytes less the input offset from the end, however this would not be correct. (E.g. suppose the back ref were quantified such as '\1{2}' - given the above example the first match of the back ref would need check 0 bytes but the second would need to check 1). The same problem can be demonstrated with the following test: /(?=((?:ab)*))a/.exec("ab") This should have results ["a","ab"], but we currently produce ["a",""]. (Again, an additional character of input fixes the issue, e.g. "abc" gets the right result). This demonstrates that the issue is not specific to back-references, but can effect anything in a parenthetical assertion. The correct fix here is most likely to prevent pre-checking of input from spanning over parenthetical assertions. I.e., given an expression /abc(?=$FOO)bar/, rather than pre-checking 6 at the start of the expression we should just pre-check 3, then run the parenthetical assertion, then pre-check the next three after the parenthetic assertion has matched. An alternative that might be slightly more performant would be to fully pre-check at at the head of the outer alternative, then unwind the input pointer around the alternative. (E.g. in the case of /abc(?=$FOO)bar/ we could pre-check 6, when we enter the assertion decrement input position by 3, on assertion match increment again by 3, with matching adjustments when backtracking through the assertion). If we made the change in this fashion we'd also need modify the way that the input offsets within the assertion are calculated, since they would no longer be correct. This is probably not an important case to optimize, and this sounds fragile, so I'd suggest just breaking the pre-checking from spanning over the assertion.
Peter Varga
Comment 5
2011-03-11 06:47:14 PST
I guess I have already reported a very similar problem at
https://bugs.webkit.org/show_bug.cgi?id=55589
The test case at that bug (pcre-test-1) isn't a fallback case therefore the YARR JIT passes on it. The YARR JIT handles the look-ahead assertions correctly thus the YARR Interpreter needs a fix.
Gavin Barraclough
Comment 6
2011-03-11 15:45:25 PST
Peter, ah yes- that looks like a related bug, cheers! - hopefully should be fixed by this. Michael, here are all the test regexes we were looking at yesterday, along with a list of test inputs we were using - should all be good for a regression test. cheers, G. /^P([1-6])(?!\1)([1-6])$/ "P11" /(x)(?=\1)x/ "x", "xx", "xxy" /(x)zzz(?=\1)x/ "xzzzx", "xzzzxy" /(a)\1(?=(b*c))bc/ /(a)a(?=(b*c))bc/ "aabc", "aabcx" /a(?=(b*c))bc/ /(?=((?:ab)*))a/ "ab", "abc" /(?=((?:xx)*))x/ /(?=((xx)*))x/ /(?=(xx))+x/ "x", "xx", "xxx" /(?=a+b)aab/ "aab"
Michael Saboff
Comment 7
2011-03-14 15:25:21 PDT
Created
attachment 85729
[details]
Patch that tests assertion pre-checked character length independent from the remaining expression
Gavin Barraclough
Comment 8
2011-03-14 16:50:05 PDT
Comment on
attachment 85729
[details]
Patch that tests assertion pre-checked character length independent from the remaining expression looks great Michael.
Michael Saboff
Comment 9
2011-03-14 17:36:41 PDT
Committed
r81085
: <
http://trac.webkit.org/changeset/81085
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug