Bug 46719

Summary: YARR JIT should fallback to YARR Interpreter instead of PCRE
Product: WebKit Reporter: Peter Varga <pvarga>
Component: JavaScriptCoreAssignee: Peter Varga <pvarga>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, abecsi, barraclough, commit-queue, eric, ggaren, msaboff, ossy, webkit.review.bot, zherczeg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 46893, 46904, 47906, 49661    
Bug Blocks: 54978    
Attachments:
Description Flags
proposed patch
ggaren: review-
proposed patch v2 none

Peter Varga
Reported 2010-09-28 06:43:40 PDT
The time has come to change the YARR JIT's fallback from PCRE to YARR Interpreter. Now the regexp matching by Interpreter seems faster. Additionally PCRE causes some regressions: eg.: - in JavaScriptCore tests: ecma_3/RegExp/15.10.2-1.js ecma_3/RegExp/regress-209919.js - in fast/js tests: fast/js/regexp-look-ahead-empty.html fast/js/regexp-overflow.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.8_Atom/S15.10.2.8_A2_T1.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6.html Here is an old bugreport related to this regression issue: https://bugs.webkit.org/show_bug.cgi?id=38117
Attachments
proposed patch (69.80 KB, patch)
2010-09-28 06:54 PDT, Peter Varga
ggaren: review-
proposed patch v2 (69.12 KB, patch)
2010-10-15 01:54 PDT, Peter Varga
no flags
Peter Varga
Comment 1 2010-09-28 06:54:42 PDT
Created attachment 69043 [details] proposed patch V8 performance result: ref mod v8-regexp: 1.025x as fast 310.6ms +/- 0.2% 303.0ms +/- 0.3%
Peter Varga
Comment 2 2010-09-28 08:38:47 PDT
Anyone can suggest something what should be done with the regexp-overflow fast/js test? The maximum number of iteration isn't checked in YARR. I think we should either remove the test or implement the overflow checking.
WebKit Review Bot
Comment 3 2010-09-28 09:57:14 PDT
Gavin Barraclough
Comment 4 2010-09-28 10:20:06 PDT
If the interpreter is fast enough now this is fantastic news! We should double check the perf on a couple more machines & get this landed. Michael, have you managed to get back to Peter on his other bugs yet?
Gavin Barraclough
Comment 5 2010-09-29 12:33:42 PDT
(In reply to comment #2) > Anyone can suggest something what should be done with the regexp-overflow fast/js test? > The maximum number of iteration isn't checked in YARR. I think we should either remove the test or implement the overflow checking. I think we can just check in passing results for these. This is not defining spec-required behaviour, just testing limitations of the previous implementation, and checking that the regex engine does not crash on complex inputs. This is not behaviour that we need to continue to support.
Gavin Barraclough
Comment 6 2010-09-29 13:04:09 PDT
Looking at the results, a couple of bugs are showing up in the YARR interpreter. We should probably fix these before landing this patch. One bug looks like case-insensitivity for backref checking isn't working (tests 271-272 in the JSC regex tests). I'd imagine this one is a trivial fix. The other, 416, looks a little more complex, but ultimately (a\1?) is matching "", which ain't right, hopefully not too hard to track down. Overall just switching over would probably be a net win, but since there are only two bugs in the way & they don't look too evil we should probably stick to our no-regressions policy, it shouldn't hold this patch up for too long.
Peter Varga
Comment 7 2010-09-30 01:15:10 PDT
(In reply to comment #6) > One bug looks like case-insensitivity for backref checking isn't working (tests 271-272 in the JSC regex tests). I'd imagine this one is a trivial fix. The other, 416, looks a little more complex, but ultimately (a\1?) is matching "", which ain't right, hopefully not too hard to track down. Thanks for pointing out these issues! Now I'm working on them.
Gavin Barraclough
Comment 8 2010-10-05 15:18:25 PDT
Tested to confirm the performance, I measure a fractional progression on sunspider a slight regression on v8-rexeg but a small improvement on v8 overall (which makes no sense, since only regex uses regexes). While repeatable, I think this is all within the margin of noise, so from a perf persepective I'd say this is good to go, just need to fix the last functional issue.
Geoffrey Garen
Comment 9 2010-10-14 15:16:52 PDT
Comment on attachment 69043 [details] proposed patch r- based on Gavin's comments about 2 remaining bugs.
Peter Varga
Comment 10 2010-10-15 01:54:41 PDT
Created attachment 70840 [details] proposed patch v2 patch updated
Gavin Barraclough
Comment 11 2010-10-18 16:17:45 PDT
Comment on attachment 70840 [details] proposed patch v2 Fantastic work Peter.
WebKit Commit Bot
Comment 12 2010-10-18 18:15:23 PDT
Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'build-and-test', '--no-clean', '--no-update', '--test', '--quiet', '--non-interactive']" exit_code: 2 Last 500 characters of output: cts/CommitQueue/LayoutTests Testing 21592 test cases. fast/regex/slow.html -> timed out Sampling process 74224 for 10 seconds with 10 milliseconds of run time between samples Sampling completed, processing symbols... Sample analysis of process 74224 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt Exiting early after 1 failures. 15090 tests run. 272.11s total testing time 15089 test cases (99%) succeeded 1 test case (<1%) timed out 3 test cases (<1%) had stderr output Full output: http://queues.webkit.org/results/4439081
WebKit Commit Bot
Comment 13 2010-10-18 18:26:24 PDT
Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'build-and-test', '--no-clean', '--no-update', '--test', '--quiet', '--non-interactive']" exit_code: 2 Last 500 characters of output: ects/CommitQueue/LayoutTests Testing 21290 test cases. fast/regex/slow.html -> timed out Sampling process 75626 for 10 seconds with 10 milliseconds of run time between samples Sampling completed, processing symbols... Sample analysis of process 75626 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt Exiting early after 1 failures. 14889 tests run. 820.52s total testing time 14888 test cases (99%) succeeded 1 test case (<1%) timed out 1 test case (<1%) had stderr output Full output: http://queues.webkit.org/results/4465083
WebKit Commit Bot
Comment 14 2010-11-16 12:47:58 PST
Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=abarth-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 70840]" exit_code: 2 Last 500 characters of output: 0_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4-expected.txt patching file LayoutTests/fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.8_Atom/S15.10.2.8_A2_T1-expected.txt patching file LayoutTests/fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6-expected.txt Failed to run "[u'/Users/abarth/git/webkit-queue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1 Full output: http://queues.webkit.org/results/5932114
Csaba Osztrogonác
Comment 15 2010-11-17 05:37:11 PST
Comment on attachment 70840 [details] proposed patch v2 Landed manually: http://trac.webkit.org/changeset/72197
WebKit Review Bot
Comment 16 2010-11-17 06:32:16 PST
http://trac.webkit.org/changeset/72197 might have broken SnowLeopard Intel Release (Tests) The following tests are not passing: fast/regex/test1.html
Peter Varga
Comment 17 2010-11-17 07:07:43 PST
The YARR Interpreter broke the fast/regex/test1.html on Snow Leopard: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r72203%20(20971)/fast/regex/test1-pretty-diff.html I think that layout test has wrong expected results. The result of match is different between YARR Interpreter and pcre in some cases. We already had a discussion about this problem: https://bugs.webkit.org/show_bug.cgi?id=38117 Conclusion was the pcre result is wrong. Is the changing the expected result of this test acceptable?
Csaba Osztrogonác
Comment 18 2010-11-17 07:54:54 PST
(In reply to comment #16) > http://trac.webkit.org/changeset/72197 might have broken SnowLeopard Intel Release (Tests) > The following tests are not passing: > fast/regex/test1.html Rolled out by http://trac.webkit.org/changeset/72207
Gavin Barraclough
Comment 19 2010-11-19 17:35:53 PST
I'm taking a look at these errors, looking at other browser behaviour & the spec.
Gavin Barraclough
Comment 20 2010-11-19 23:21:02 PST
The new results all look fine. In all cases they match Firefox, bar /(?>a*)*/ (this regex is invalid, it's vestigial of this test suite coming from PCRE, before be adapted to ECMA regex syntax), and /((\3|b)\2(a)){2,}/ (Firefox gets this wrong). I'll land this with new results.
Gavin Barraclough
Comment 21 2010-11-20 14:09:44 PST
* (Firefox gets this wrong) Hmmm, this is less clear cut true than I thought – I still think we're right here, but it's a slightly subtler case than I'd thought at first glance. The regex is question is: /((\3|b)\2(a)){2,}/.exec("bbaababbabaaaaabbaaaabba"); The problem can be demonstrated by this reduction: /(\2(a)){2}/.exec("aaa") Firefox results in: ["aaa","aa","a"], while yarr gets ["aa","a","a"]. In the first iteration on both engines \2 is clearly undefined, so matches the empty string, and (a) matches the first "a". But behaviour diverges on how the backreference matches in the second iteration of the loop. In Firefox, the backreference matches the nested parentheses from the first iteration of the loop ("a"), and then (a) also matches "a" so the outer parentheses math "aa", and the whole regex match is "aaa". In yarr we again match \2 to the empty string, so the outer parens again match "a", and the whole regex match is "aa". Correct behaviour here is spec'ed under section 15.10.2.5. Latter iterations of a quantified term are executed by closure 'd' of the RepeatMatcher, specified in step 2 of RepeatMatcher's description. All calls to repeat iterations - m(xr, d) - are passing state xr (from steps 7, 8c, and 9). xr is defined in step 6 to be a new state with captures provided by cap, where cap is a fresh copy of the captures array passed in, but with all captures within the expression (parenIndex ... parenIndex+parenCount) being set to undefined (in steps 3, 4). Based on this definition, in the second iteration of the above regex I think it is clear that the backreference should only be able to match the empty string (thus agreeing with yarr's behaviour), since in the state passed in the subpattern should be undefined. I'll file a bug with Firefox.
Gavin Barraclough
Comment 22 2010-11-20 19:28:51 PST
Fixed in r72489. Peter, I've landed your patch with updated results for fast/regex/test1. There was one small issue I fixed in your patch before landing. RegExp::match had a debug printf for error values other than -1. Since yarr's interpreter was returning its internal error code (including values other than -1) this was firing – and we should not have fprintf output. For the time being I've restricted error values returned from the interpreter to -1 (which I think matches PCRE's behavior), and replaced the fprintf with an ASSERT to guard this. I'm not sure whether this is the behaviour going forwards, or whether we'll want to start returning richer error codes from the interpreter, but it seemed the right decision to get this patch landed. cheers, G.
Peter Varga
Comment 23 2010-11-22 06:58:11 PST
Gavin, Thank you for updates and landing. I have tried to solve a Qt related issue before updating the results: https://bugs.webkit.org/show_bug.cgi?id=49908
Note You need to log in before you can comment on or make changes to this bug.