Bug 46719 - YARR JIT should fallback to YARR Interpreter instead of PCRE
Summary: YARR JIT should fallback to YARR Interpreter instead of PCRE
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Peter Varga
URL:
Keywords:
Depends on: 46893 46904 47906 49661
Blocks: 54978
  Show dependency treegraph
 
Reported: 2010-09-28 06:43 PDT by Peter Varga
Modified: 2011-02-22 18:05 PST (History)
10 users (show)

See Also:


Attachments
proposed patch (69.80 KB, patch)
2010-09-28 06:54 PDT, Peter Varga
ggaren: review-
Details | Formatted Diff | Diff
proposed patch v2 (69.12 KB, patch)
2010-10-15 01:54 PDT, Peter Varga
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Varga 2010-09-28 06:43:40 PDT
The time has come to change the YARR JIT's fallback from PCRE to YARR Interpreter.
Now the regexp matching by Interpreter seems faster.

Additionally PCRE causes some regressions:

eg.: 
  - in JavaScriptCore tests:
       ecma_3/RegExp/15.10.2-1.js
       ecma_3/RegExp/regress-209919.js
  - in fast/js tests:
       fast/js/regexp-look-ahead-empty.html
       fast/js/regexp-overflow.html
       fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4.html
       fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.8_Atom/S15.10.2.8_A2_T1.html
       fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6.html

Here is an old bugreport related to this regression issue:
https://bugs.webkit.org/show_bug.cgi?id=38117
Comment 1 Peter Varga 2010-09-28 06:54:42 PDT
Created attachment 69043 [details]
proposed patch

V8 performance result:
                                    ref                mod
v8-regexp:  1.025x as fast  310.6ms +/- 0.2%   303.0ms +/- 0.3%
Comment 2 Peter Varga 2010-09-28 08:38:47 PDT
Anyone can suggest something what should be done with the regexp-overflow fast/js test?
The maximum number of iteration isn't checked in YARR. I think we should either remove the test or implement the overflow checking.
Comment 3 WebKit Review Bot 2010-09-28 09:57:14 PDT
Attachment 69043 [details] did not build on win:
Build output: http://queues.webkit.org/results/4132027
Comment 4 Gavin Barraclough 2010-09-28 10:20:06 PDT
If the interpreter is fast enough now this is fantastic news! We should double check the perf on a couple more machines & get this landed.

Michael, have you managed to get back to Peter on his other bugs yet?
Comment 5 Gavin Barraclough 2010-09-29 12:33:42 PDT
(In reply to comment #2)
> Anyone can suggest something what should be done with the regexp-overflow fast/js test?
> The maximum number of iteration isn't checked in YARR. I think we should either remove the test or implement the overflow checking.

I think we can just check in passing results for these.  This is not defining spec-required behaviour, just testing limitations of the previous implementation, and checking that the regex engine does not crash on complex inputs.  This is not behaviour that we need to continue to support.
Comment 6 Gavin Barraclough 2010-09-29 13:04:09 PDT
Looking at the results, a couple of bugs are showing up in the YARR interpreter.  We should probably fix these before landing this patch.

One bug looks like case-insensitivity for backref checking isn't working (tests 271-272 in the JSC regex tests).  I'd imagine this one is a trivial fix.  The other, 416, looks a little more complex, but ultimately (a\1?) is matching "", which ain't right, hopefully not too hard to track down.

Overall just switching over would probably be a net win, but since there are only two bugs in the way & they don't look too evil we should probably stick to our no-regressions policy, it shouldn't hold this patch up for too long.
Comment 7 Peter Varga 2010-09-30 01:15:10 PDT
(In reply to comment #6)
> One bug looks like case-insensitivity for backref checking isn't working (tests 271-272 in the JSC regex tests).  I'd imagine this one is a trivial fix.  The other, 416, looks a little more complex, but ultimately (a\1?) is matching "", which ain't right, hopefully not too hard to track down.

Thanks for pointing out these issues! Now I'm working on them.
Comment 8 Gavin Barraclough 2010-10-05 15:18:25 PDT
Tested to confirm the performance, I measure a fractional  progression on sunspider a slight regression on v8-rexeg but a small improvement on v8 overall (which makes no sense, since only regex uses regexes).

While repeatable, I think this is all within the margin of noise, so from a perf persepective I'd say this is good to go, just need to fix the last functional issue.
Comment 9 Geoffrey Garen 2010-10-14 15:16:52 PDT
Comment on attachment 69043 [details]
proposed patch

r- based on Gavin's comments about 2 remaining bugs.
Comment 10 Peter Varga 2010-10-15 01:54:41 PDT
Created attachment 70840 [details]
proposed patch v2

patch updated
Comment 11 Gavin Barraclough 2010-10-18 16:17:45 PDT
Comment on attachment 70840 [details]
proposed patch v2

Fantastic work Peter.
Comment 12 WebKit Commit Bot 2010-10-18 18:15:23 PDT
Comment on attachment 70840 [details]
proposed patch v2

Rejecting patch 70840 from commit-queue.

Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'build-and-test', '--no-clean', '--no-update', '--test', '--quiet', '--non-interactive']" exit_code: 2
Last 500 characters of output:
cts/CommitQueue/LayoutTests
Testing 21592 test cases.
fast/regex/slow.html -> timed out
Sampling process 74224 for 10 seconds with 10 milliseconds of run time between samples
Sampling completed, processing symbols...
Sample analysis of process 74224 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt

Exiting early after 1 failures. 15090 tests run.
272.11s total testing time

15089 test cases (99%) succeeded
1 test case (<1%) timed out
3 test cases (<1%) had stderr output

Full output: http://queues.webkit.org/results/4439081
Comment 13 WebKit Commit Bot 2010-10-18 18:26:24 PDT
Comment on attachment 70840 [details]
proposed patch v2

Rejecting patch 70840 from commit-queue.

Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'build-and-test', '--no-clean', '--no-update', '--test', '--quiet', '--non-interactive']" exit_code: 2
Last 500 characters of output:
ects/CommitQueue/LayoutTests
Testing 21290 test cases.
fast/regex/slow.html -> timed out
Sampling process 75626 for 10 seconds with 10 milliseconds of run time between samples
Sampling completed, processing symbols...
Sample analysis of process 75626 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt

Exiting early after 1 failures. 14889 tests run.
820.52s total testing time

14888 test cases (99%) succeeded
1 test case (<1%) timed out
1 test case (<1%) had stderr output

Full output: http://queues.webkit.org/results/4465083
Comment 14 WebKit Commit Bot 2010-11-16 12:47:58 PST
Comment on attachment 70840 [details]
proposed patch v2

Rejecting patch 70840 from commit-queue.

Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=abarth-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 70840]" exit_code: 2
Last 500 characters of output:
0_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4-expected.txt
patching file LayoutTests/fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.8_Atom/S15.10.2.8_A2_T1-expected.txt
patching file LayoutTests/fast/js/sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6-expected.txt

Failed to run "[u'/Users/abarth/git/webkit-queue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1

Full output: http://queues.webkit.org/results/5932114
Comment 15 Csaba Osztrogonác 2010-11-17 05:37:11 PST
Comment on attachment 70840 [details]
proposed patch v2

Landed manually: http://trac.webkit.org/changeset/72197
Comment 16 WebKit Review Bot 2010-11-17 06:32:16 PST
http://trac.webkit.org/changeset/72197 might have broken SnowLeopard Intel Release (Tests)
The following tests are not passing:
fast/regex/test1.html
Comment 17 Peter Varga 2010-11-17 07:07:43 PST
The YARR Interpreter broke the fast/regex/test1.html on Snow Leopard:
http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r72203%20(20971)/fast/regex/test1-pretty-diff.html

I think that layout test has wrong expected results. The result of match is different between YARR Interpreter and
pcre in some cases. We already had a discussion about this problem:
https://bugs.webkit.org/show_bug.cgi?id=38117

Conclusion was the pcre result is wrong.
Is the changing the expected result of this test acceptable?
Comment 18 Csaba Osztrogonác 2010-11-17 07:54:54 PST
(In reply to comment #16)
> http://trac.webkit.org/changeset/72197 might have broken SnowLeopard Intel Release (Tests)
> The following tests are not passing:
> fast/regex/test1.html

Rolled out by http://trac.webkit.org/changeset/72207
Comment 19 Gavin Barraclough 2010-11-19 17:35:53 PST
I'm taking a look at these errors, looking at other browser behaviour & the spec.
Comment 20 Gavin Barraclough 2010-11-19 23:21:02 PST
The new results all look fine.  In all cases they match Firefox, bar /(?>a*)*/ (this regex is invalid, it's vestigial of this test suite coming from PCRE, before be adapted to ECMA regex syntax), and /((\3|b)\2(a)){2,}/ (Firefox gets this wrong).

I'll land this with new results.
Comment 21 Gavin Barraclough 2010-11-20 14:09:44 PST
* (Firefox gets this wrong)

Hmmm, this is less clear cut true than I thought – I still think we're right here, but it's a slightly subtler case than I'd thought at first glance.

The regex is question is:

/((\3|b)\2(a)){2,}/.exec("bbaababbabaaaaabbaaaabba");

The problem can be demonstrated by this reduction:

/(\2(a)){2}/.exec("aaa")

Firefox results in: ["aaa","aa","a"], while yarr gets ["aa","a","a"].

In the first iteration on both engines \2 is clearly undefined, so matches the empty string, and (a) matches the first "a".  But behaviour diverges on how the backreference matches in the second iteration of the loop.

In Firefox, the backreference matches the nested parentheses from the first iteration of the loop ("a"), and then (a) also matches "a" so the outer parentheses math "aa", and the whole regex match is "aaa".

In yarr we again match \2 to the empty string, so the outer parens again match "a", and the whole regex  match is "aa".

Correct behaviour here is spec'ed under section 15.10.2.5.  Latter iterations of a quantified term are executed by closure 'd' of the RepeatMatcher, specified in step 2 of RepeatMatcher's description.  All calls to repeat iterations - m(xr, d) - are passing state xr (from steps 7, 8c, and 9).  xr is defined in step 6 to be a new state with captures provided by cap, where cap is a fresh copy of the captures array passed in, but with all captures within the expression (parenIndex ... parenIndex+parenCount) being set to undefined (in steps 3, 4).

Based on this definition, in the second iteration of the above regex I think it is clear that the backreference should only be able to match the empty string (thus agreeing with yarr's behaviour), since in the state passed in the subpattern should be undefined.

I'll file a bug with Firefox.
Comment 22 Gavin Barraclough 2010-11-20 19:28:51 PST
Fixed in r72489.

Peter,

I've landed your patch with updated results for fast/regex/test1.

There was one small issue I fixed in your patch before landing.  RegExp::match had a debug printf for error values other than -1.  Since yarr's interpreter was returning its internal error code (including values other than -1) this was firing – and we should not have fprintf output.  For the time being I've restricted error values returned from the interpreter to -1 (which I think matches PCRE's behavior), and replaced the fprintf with an ASSERT to guard this.  I'm not sure whether this is the behaviour going forwards, or whether we'll want to start returning richer error codes from the interpreter, but it seemed the right decision to get this patch landed.

cheers,
G.
Comment 23 Peter Varga 2010-11-22 06:58:11 PST
Gavin,

Thank you for updates and landing.
I have tried to solve a Qt related issue before updating the results: https://bugs.webkit.org/show_bug.cgi?id=49908