There seems to be a bug with look-ahead assertions that use a back reference: "P11".match(/^P([1-6])(?=\1)([1-6])$/) This obviously should match as there is another 1 following the first 1 (matched string of the first group). In opposite to this, using a negated look-ahead assertion instead does match although there is another 1 following: "P11".match(/^P([1-6])(?!\1)([1-6])$/)
Is this a regression from Safari 5 that you're seeing in nightly builds?
Reduction to /(x)(?=\1)x/.exec("xx") also fails.
The bug relates to the pre-checking of input. In the reduced case of /(x)(?=\1)x/.exec("xx") , we: (1) check 2 characters are available (increments input position by 2, to end of string) (2) check the initial capturing 'x' (at input offset -2) (3) enter the parenthetical assertion (4) try to check the back reference at offset -1. The first step of trying to mach the back-ref is to check that matchSize (1) bytes are available to check. However they are not, since they have been pre-checked for allow for the part of the pattern after the parenthetical assertion. (this can be demonstrated by testing with input "xxy" - the presence of the 'y' allows the matchSize check to pass, and then the back ref test works correctly). A simple fix would appear to be to make the back reference only check matchSize bytes less the input offset from the end, however this would not be correct. (E.g. suppose the back ref were quantified such as '\1{2}' - given the above example the first match of the back ref would need check 0 bytes but the second would need to check 1). The same problem can be demonstrated with the following test: /(?=((?:ab)*))a/.exec("ab") This should have results ["a","ab"], but we currently produce ["a",""]. (Again, an additional character of input fixes the issue, e.g. "abc" gets the right result). This demonstrates that the issue is not specific to back-references, but can effect anything in a parenthetical assertion. The correct fix here is most likely to prevent pre-checking of input from spanning over parenthetical assertions. I.e., given an expression /abc(?=$FOO)bar/, rather than pre-checking 6 at the start of the expression we should just pre-check 3, then run the parenthetical assertion, then pre-check the next three after the parenthetic assertion has matched. An alternative that might be slightly more performant would be to fully pre-check at at the head of the outer alternative, then unwind the input pointer around the alternative. (E.g. in the case of /abc(?=$FOO)bar/ we could pre-check 6, when we enter the assertion decrement input position by 3, on assertion match increment again by 3, with matching adjustments when backtracking through the assertion). If we made the change in this fashion we'd also need modify the way that the input offsets within the assertion are calculated, since they would no longer be correct. This is probably not an important case to optimize, and this sounds fragile, so I'd suggest just breaking the pre-checking from spanning over the assertion.
I guess I have already reported a very similar problem at https://bugs.webkit.org/show_bug.cgi?id=55589 The test case at that bug (pcre-test-1) isn't a fallback case therefore the YARR JIT passes on it. The YARR JIT handles the look-ahead assertions correctly thus the YARR Interpreter needs a fix.
Peter, ah yes- that looks like a related bug, cheers! - hopefully should be fixed by this. Michael, here are all the test regexes we were looking at yesterday, along with a list of test inputs we were using - should all be good for a regression test. cheers, G. /^P([1-6])(?!\1)([1-6])$/ "P11" /(x)(?=\1)x/ "x", "xx", "xxy" /(x)zzz(?=\1)x/ "xzzzx", "xzzzxy" /(a)\1(?=(b*c))bc/ /(a)a(?=(b*c))bc/ "aabc", "aabcx" /a(?=(b*c))bc/ /(?=((?:ab)*))a/ "ab", "abc" /(?=((?:xx)*))x/ /(?=((xx)*))x/ /(?=(xx))+x/ "x", "xx", "xxx" /(?=a+b)aab/ "aab"
Created attachment 85729 [details] Patch that tests assertion pre-checked character length independent from the remaining expression
Comment on attachment 85729 [details] Patch that tests assertion pre-checked character length independent from the remaining expression looks great Michael.
Committed r81085: <http://trac.webkit.org/changeset/81085>