The time has come to change the YARR JIT's fallback from PCRE to YARR Interpreter. Now the regexp matching by Interpreter seems faster. Additionally PCRE causes some regressions: eg.: - in JavaScriptCore tests: ecma_3/RegExp/15.10.2-1.js ecma_3/RegExp/regress-209919.js - in fast/js tests: fast/js/regexp-look-ahead-empty.html fast/js/regexp-overflow.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.8_Atom/S15.10.2.8_A2_T1.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6.html Here is an old bugreport related to this regression issue: https://bugs.webkit.org/show_bug.cgi?id=38117
Created attachment 69043 [details] proposed patch V8 performance result: ref mod v8-regexp: 1.025x as fast 310.6ms +/- 0.2% 303.0ms +/- 0.3%
Anyone can suggest something what should be done with the regexp-overflow fast/js test? The maximum number of iteration isn't checked in YARR. I think we should either remove the test or implement the overflow checking.
Attachment 69043 [details] did not build on win: Build output: http://queues.webkit.org/results/4132027
If the interpreter is fast enough now this is fantastic news! We should double check the perf on a couple more machines & get this landed. Michael, have you managed to get back to Peter on his other bugs yet?
(In reply to comment #2) > Anyone can suggest something what should be done with the regexp-overflow fast/js test? > The maximum number of iteration isn't checked in YARR. I think we should either remove the test or implement the overflow checking. I think we can just check in passing results for these. This is not defining spec-required behaviour, just testing limitations of the previous implementation, and checking that the regex engine does not crash on complex inputs. This is not behaviour that we need to continue to support.
Looking at the results, a couple of bugs are showing up in the YARR interpreter. We should probably fix these before landing this patch. One bug looks like case-insensitivity for backref checking isn't working (tests 271-272 in the JSC regex tests). I'd imagine this one is a trivial fix. The other, 416, looks a little more complex, but ultimately (a\1?) is matching "", which ain't right, hopefully not too hard to track down. Overall just switching over would probably be a net win, but since there are only two bugs in the way & they don't look too evil we should probably stick to our no-regressions policy, it shouldn't hold this patch up for too long.
(In reply to comment #6) > One bug looks like case-insensitivity for backref checking isn't working (tests 271-272 in the JSC regex tests). I'd imagine this one is a trivial fix. The other, 416, looks a little more complex, but ultimately (a\1?) is matching "", which ain't right, hopefully not too hard to track down. Thanks for pointing out these issues! Now I'm working on them.
Tested to confirm the performance, I measure a fractional progression on sunspider a slight regression on v8-rexeg but a small improvement on v8 overall (which makes no sense, since only regex uses regexes). While repeatable, I think this is all within the margin of noise, so from a perf persepective I'd say this is good to go, just need to fix the last functional issue.
Comment on attachment 69043 [details] proposed patch r- based on Gavin's comments about 2 remaining bugs.
Created attachment 70840 [details] proposed patch v2 patch updated
Comment on attachment 70840 [details] proposed patch v2 Fantastic work Peter.
Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'build-and-test', '--no-clean', '--no-update', '--test', '--quiet', '--non-interactive']" exit_code: 2 Last 500 characters of output: cts/CommitQueue/LayoutTests Testing 21592 test cases. fast/regex/slow.html -> timed out Sampling process 74224 for 10 seconds with 10 milliseconds of run time between samples Sampling completed, processing symbols... Sample analysis of process 74224 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt Exiting early after 1 failures. 15090 tests run. 272.11s total testing time 15089 test cases (99%) succeeded 1 test case (<1%) timed out 3 test cases (<1%) had stderr output Full output: http://queues.webkit.org/results/4439081
Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'build-and-test', '--no-clean', '--no-update', '--test', '--quiet', '--non-interactive']" exit_code: 2 Last 500 characters of output: ects/CommitQueue/LayoutTests Testing 21290 test cases. fast/regex/slow.html -> timed out Sampling process 75626 for 10 seconds with 10 milliseconds of run time between samples Sampling completed, processing symbols... Sample analysis of process 75626 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt Exiting early after 1 failures. 14889 tests run. 820.52s total testing time 14888 test cases (99%) succeeded 1 test case (<1%) timed out 1 test case (<1%) had stderr output Full output: http://queues.webkit.org/results/4465083
Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=abarth-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 70840]" exit_code: 2 Last 500 characters of output: 0_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4-expected.txt patching file LayoutTests/fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.8_Atom/S15.10.2.8_A2_T1-expected.txt patching file LayoutTests/fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6-expected.txt Failed to run "[u'/Users/abarth/git/webkit-queue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1 Full output: http://queues.webkit.org/results/5932114
Comment on attachment 70840 [details] proposed patch v2 Landed manually: http://trac.webkit.org/changeset/72197
http://trac.webkit.org/changeset/72197 might have broken SnowLeopard Intel Release (Tests) The following tests are not passing: fast/regex/test1.html
The YARR Interpreter broke the fast/regex/test1.html on Snow Leopard: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r72203%20(20971)/fast/regex/test1-pretty-diff.html I think that layout test has wrong expected results. The result of match is different between YARR Interpreter and pcre in some cases. We already had a discussion about this problem: https://bugs.webkit.org/show_bug.cgi?id=38117 Conclusion was the pcre result is wrong. Is the changing the expected result of this test acceptable?
(In reply to comment #16) > http://trac.webkit.org/changeset/72197 might have broken SnowLeopard Intel Release (Tests) > The following tests are not passing: > fast/regex/test1.html Rolled out by http://trac.webkit.org/changeset/72207
I'm taking a look at these errors, looking at other browser behaviour & the spec.
The new results all look fine. In all cases they match Firefox, bar /(?>a*)*/ (this regex is invalid, it's vestigial of this test suite coming from PCRE, before be adapted to ECMA regex syntax), and /((\3|b)\2(a)){2,}/ (Firefox gets this wrong). I'll land this with new results.
* (Firefox gets this wrong) Hmmm, this is less clear cut true than I thought – I still think we're right here, but it's a slightly subtler case than I'd thought at first glance. The regex is question is: /((\3|b)\2(a)){2,}/.exec("bbaababbabaaaaabbaaaabba"); The problem can be demonstrated by this reduction: /(\2(a)){2}/.exec("aaa") Firefox results in: ["aaa","aa","a"], while yarr gets ["aa","a","a"]. In the first iteration on both engines \2 is clearly undefined, so matches the empty string, and (a) matches the first "a". But behaviour diverges on how the backreference matches in the second iteration of the loop. In Firefox, the backreference matches the nested parentheses from the first iteration of the loop ("a"), and then (a) also matches "a" so the outer parentheses math "aa", and the whole regex match is "aaa". In yarr we again match \2 to the empty string, so the outer parens again match "a", and the whole regex match is "aa". Correct behaviour here is spec'ed under section 15.10.2.5. Latter iterations of a quantified term are executed by closure 'd' of the RepeatMatcher, specified in step 2 of RepeatMatcher's description. All calls to repeat iterations - m(xr, d) - are passing state xr (from steps 7, 8c, and 9). xr is defined in step 6 to be a new state with captures provided by cap, where cap is a fresh copy of the captures array passed in, but with all captures within the expression (parenIndex ... parenIndex+parenCount) being set to undefined (in steps 3, 4). Based on this definition, in the second iteration of the above regex I think it is clear that the backreference should only be able to match the empty string (thus agreeing with yarr's behaviour), since in the state passed in the subpattern should be undefined. I'll file a bug with Firefox.
Fixed in r72489. Peter, I've landed your patch with updated results for fast/regex/test1. There was one small issue I fixed in your patch before landing. RegExp::match had a debug printf for error values other than -1. Since yarr's interpreter was returning its internal error code (including values other than -1) this was firing – and we should not have fprintf output. For the time being I've restricted error values returned from the interpreter to -1 (which I think matches PCRE's behavior), and replaced the fprintf with an ASSERT to guard this. I'm not sure whether this is the behaviour going forwards, or whether we'll want to start returning richer error codes from the interpreter, but it seemed the right decision to get this patch landed. cheers, G.
Gavin, Thank you for updates and landing. I have tried to solve a Qt related issue before updating the results: https://bugs.webkit.org/show_bug.cgi?id=49908