46719 – YARR JIT should fallback to YARR Interpreter instead of PCRE

RESOLVED FIXED 46719

YARR JIT should fallback to YARR Interpreter instead of PCRE

https://bugs.webkit.org/show_bug.cgi?id=46719

Summary YARR JIT should fallback to YARR Interpreter instead of PCRE

Peter Varga

Reported 2010-09-28 06:43:40 PDT

The time has come to change the YARR JIT's fallback from PCRE to YARR Interpreter. Now the regexp matching by Interpreter seems faster. Additionally PCRE causes some regressions: eg.: - in JavaScriptCore tests: ecma_3/RegExp/15.10.2-1.js ecma_3/RegExp/regress-209919.js - in fast/js tests: fast/js/regexp-look-ahead-empty.html fast/js/regexp-overflow.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.8_Atom/S15.10.2.8_A2_T1.html fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6.html Here is an old bugreport related to this regression issue: https://bugs.webkit.org/show_bug.cgi?id=38117

Attachments
proposed patch (69.80 KB, patch) 2010-09-28 06:54 PDT, Peter Varga	ggaren: review-	Details Formatted Diff Diff
proposed patch v2 (69.12 KB, patch) 2010-10-15 01:54 PDT, Peter Varga	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Peter Varga

Comment 1 2010-09-28 06:54:42 PDT

Created attachment 69043 [details] proposed patch V8 performance result: ref mod v8-regexp: 1.025x as fast 310.6ms +/- 0.2% 303.0ms +/- 0.3%

Peter Varga

Comment 2 2010-09-28 08:38:47 PDT

Anyone can suggest something what should be done with the regexp-overflow fast/js test? The maximum number of iteration isn't checked in YARR. I think we should either remove the test or implement the overflow checking.

WebKit Review Bot

Comment 3 2010-09-28 09:57:14 PDT

Attachment 69043 [details] did not build on win: Build output: http://queues.webkit.org/results/4132027

Gavin Barraclough

Comment 4 2010-09-28 10:20:06 PDT

If the interpreter is fast enough now this is fantastic news! We should double check the perf on a couple more machines & get this landed. Michael, have you managed to get back to Peter on his other bugs yet?

Gavin Barraclough

Comment 5 2010-09-29 12:33:42 PDT

(In reply to comment #2) > Anyone can suggest something what should be done with the regexp-overflow fast/js test? > The maximum number of iteration isn't checked in YARR. I think we should either remove the test or implement the overflow checking. I think we can just check in passing results for these. This is not defining spec-required behaviour, just testing limitations of the previous implementation, and checking that the regex engine does not crash on complex inputs. This is not behaviour that we need to continue to support.

Gavin Barraclough

Comment 6 2010-09-29 13:04:09 PDT

Looking at the results, a couple of bugs are showing up in the YARR interpreter. We should probably fix these before landing this patch. One bug looks like case-insensitivity for backref checking isn't working (tests 271-272 in the JSC regex tests). I'd imagine this one is a trivial fix. The other, 416, looks a little more complex, but ultimately (a\1?) is matching "", which ain't right, hopefully not too hard to track down. Overall just switching over would probably be a net win, but since there are only two bugs in the way & they don't look too evil we should probably stick to our no-regressions policy, it shouldn't hold this patch up for too long.

Peter Varga

Comment 7 2010-09-30 01:15:10 PDT

(In reply to comment #6) > One bug looks like case-insensitivity for backref checking isn't working (tests 271-272 in the JSC regex tests). I'd imagine this one is a trivial fix. The other, 416, looks a little more complex, but ultimately (a\1?) is matching "", which ain't right, hopefully not too hard to track down. Thanks for pointing out these issues! Now I'm working on them.

Gavin Barraclough

Comment 8 2010-10-05 15:18:25 PDT

Tested to confirm the performance, I measure a fractional progression on sunspider a slight regression on v8-rexeg but a small improvement on v8 overall (which makes no sense, since only regex uses regexes). While repeatable, I think this is all within the margin of noise, so from a perf persepective I'd say this is good to go, just need to fix the last functional issue.

Geoffrey Garen

Comment 9 2010-10-14 15:16:52 PDT

Comment on attachment 69043 [details] proposed patch r- based on Gavin's comments about 2 remaining bugs.

Peter Varga

Comment 10 2010-10-15 01:54:41 PDT

Created attachment 70840 [details] proposed patch v2 patch updated

Gavin Barraclough

Comment 11 2010-10-18 16:17:45 PDT

Comment on attachment 70840 [details] proposed patch v2 Fantastic work Peter.

WebKit Commit Bot

Comment 12 2010-10-18 18:15:23 PDT

Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'build-and-test', '--no-clean', '--no-update', '--test', '--quiet', '--non-interactive']" exit_code: 2 Last 500 characters of output: cts/CommitQueue/LayoutTests Testing 21592 test cases. fast/regex/slow.html -> timed out Sampling process 74224 for 10 seconds with 10 milliseconds of run time between samples Sampling completed, processing symbols... Sample analysis of process 74224 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt Exiting early after 1 failures. 15090 tests run. 272.11s total testing time 15089 test cases (99%) succeeded 1 test case (<1%) timed out 3 test cases (<1%) had stderr output Full output: http://queues.webkit.org/results/4439081

WebKit Commit Bot

Comment 13 2010-10-18 18:26:24 PDT

Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'build-and-test', '--no-clean', '--no-update', '--test', '--quiet', '--non-interactive']" exit_code: 2 Last 500 characters of output: ects/CommitQueue/LayoutTests Testing 21290 test cases. fast/regex/slow.html -> timed out Sampling process 75626 for 10 seconds with 10 milliseconds of run time between samples Sampling completed, processing symbols... Sample analysis of process 75626 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt Exiting early after 1 failures. 14889 tests run. 820.52s total testing time 14888 test cases (99%) succeeded 1 test case (<1%) timed out 1 test case (<1%) had stderr output Full output: http://queues.webkit.org/results/4465083

WebKit Commit Bot

Comment 14 2010-11-16 12:47:58 PST

Comment on attachment 70840 [details] proposed patch v2 Rejecting patch 70840 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=abarth-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 70840]" exit_code: 2 Last 500 characters of output: 0_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4-expected.txt patching file LayoutTests/fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.8_Atom/S15.10.2.8_A2_T1-expected.txt patching file LayoutTests/fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6-expected.txt Failed to run "[u'/Users/abarth/git/webkit-queue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1 Full output: http://queues.webkit.org/results/5932114

Csaba Osztrogonác

Comment 15 2010-11-17 05:37:11 PST

Comment on attachment 70840 [details] proposed patch v2 Landed manually: http://trac.webkit.org/changeset/72197

WebKit Review Bot

Comment 16 2010-11-17 06:32:16 PST

http://trac.webkit.org/changeset/72197 might have broken SnowLeopard Intel Release (Tests) The following tests are not passing: fast/regex/test1.html

Peter Varga

Comment 17 2010-11-17 07:07:43 PST

The YARR Interpreter broke the fast/regex/test1.html on Snow Leopard: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r72203%20(20971)/fast/regex/test1-pretty-diff.html I think that layout test has wrong expected results. The result of match is different between YARR Interpreter and pcre in some cases. We already had a discussion about this problem: https://bugs.webkit.org/show_bug.cgi?id=38117 Conclusion was the pcre result is wrong. Is the changing the expected result of this test acceptable?

Csaba Osztrogonác

Comment 18 2010-11-17 07:54:54 PST

(In reply to comment #16) > http://trac.webkit.org/changeset/72197 might have broken SnowLeopard Intel Release (Tests) > The following tests are not passing: > fast/regex/test1.html Rolled out by http://trac.webkit.org/changeset/72207

Gavin Barraclough

Comment 19 2010-11-19 17:35:53 PST

I'm taking a look at these errors, looking at other browser behaviour & the spec.

Gavin Barraclough

Comment 20 2010-11-19 23:21:02 PST

The new results all look fine. In all cases they match Firefox, bar /(?>a*)*/ (this regex is invalid, it's vestigial of this test suite coming from PCRE, before be adapted to ECMA regex syntax), and /((\3|b)\2(a)){2,}/ (Firefox gets this wrong). I'll land this with new results.

Gavin Barraclough

Comment 21 2010-11-20 14:09:44 PST

* (Firefox gets this wrong) Hmmm, this is less clear cut true than I thought – I still think we're right here, but it's a slightly subtler case than I'd thought at first glance. The regex is question is: /((\3|b)\2(a)){2,}/.exec("bbaababbabaaaaabbaaaabba"); The problem can be demonstrated by this reduction: /(\2(a)){2}/.exec("aaa") Firefox results in: ["aaa","aa","a"], while yarr gets ["aa","a","a"]. In the first iteration on both engines \2 is clearly undefined, so matches the empty string, and (a) matches the first "a". But behaviour diverges on how the backreference matches in the second iteration of the loop. In Firefox, the backreference matches the nested parentheses from the first iteration of the loop ("a"), and then (a) also matches "a" so the outer parentheses math "aa", and the whole regex match is "aaa". In yarr we again match \2 to the empty string, so the outer parens again match "a", and the whole regex match is "aa". Correct behaviour here is spec'ed under section 15.10.2.5. Latter iterations of a quantified term are executed by closure 'd' of the RepeatMatcher, specified in step 2 of RepeatMatcher's description. All calls to repeat iterations - m(xr, d) - are passing state xr (from steps 7, 8c, and 9). xr is defined in step 6 to be a new state with captures provided by cap, where cap is a fresh copy of the captures array passed in, but with all captures within the expression (parenIndex ... parenIndex+parenCount) being set to undefined (in steps 3, 4). Based on this definition, in the second iteration of the above regex I think it is clear that the backreference should only be able to match the empty string (thus agreeing with yarr's behaviour), since in the state passed in the subpattern should be undefined. I'll file a bug with Firefox.

Gavin Barraclough

Comment 22 2010-11-20 19:28:51 PST

Fixed in r72489. Peter, I've landed your patch with updated results for fast/regex/test1. There was one small issue I fixed in your patch before landing. RegExp::match had a debug printf for error values other than -1. Since yarr's interpreter was returning its internal error code (including values other than -1) this was firing – and we should not have fprintf output. For the time being I've restricted error values returned from the interpreter to -1 (which I think matches PCRE's behavior), and replaced the fprintf with an ASSERT to guard this. I'm not sure whether this is the behaviour going forwards, or whether we'll want to start returning richer error codes from the interpreter, but it seemed the right decision to get this patch landed. cheers, G.

Peter Varga

Comment 23 2010-11-22 06:58:11 PST

Gavin, Thank you for updates and landing. I have tried to solve a Qt related issue before updating the results: https://bugs.webkit.org/show_bug.cgi?id=49908

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component JavaScriptCore

Assignee

Peter Varga

Reported

2010-09-28 06:43 PDT

Modified

2011-02-22 18:05 PST History

CC List

10 users Show

URL

Keywords

Depends on

46893 46904 47906 49661

Blocks

54978

Dependencies

tree graph