Bug 45751 - Extend YARR Interpreter with beginning character look-up optimization
Summary: Extend YARR Interpreter with beginning character look-up optimization
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 45748
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-14 07:44 PDT by Peter Varga
Modified: 2010-11-17 03:06 PST (History)
11 users (show)

See Also:


Attachments
proposed patch (6.39 KB, patch)
2010-09-14 07:47 PDT, Peter Varga
no flags Details | Formatted Diff | Diff
proposed patch v2 (6.22 KB, patch)
2010-10-21 07:56 PDT, Peter Varga
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Varga 2010-09-14 07:44:23 PDT
Search the start index of the first potential match before the pattern
matching process. Use the vector of possible beginning characters which are collected
by YARR's parser.
Comment 1 Peter Varga 2010-09-14 07:47:51 PDT
Created attachment 67550 [details]
proposed patch

Performance results:

                                      ref               mod
regexp-dna:   2.15x as fast   273.3ms +/- 0.1%  127.0ms +/- 0.0% 
v8-regexp:    1.23x as fast  1794.8ms +/- 0.1% 1460.6ms +/- 0.6%
Comment 2 Early Warning System Bot 2010-09-14 08:05:03 PDT
Attachment 67550 [details] did not build on qt:
Build output: http://queues.webkit.org/results/4007008
Comment 3 Csaba Osztrogonác 2010-09-14 08:06:17 PDT
(In reply to comment #2)
> Attachment 67550 [details] did not build on qt:
> Build output: http://queues.webkit.org/results/4007008

It will build on Qt after https://bugs.webkit.org/show_bug.cgi?id=45748 fixed.
Comment 4 WebKit Review Bot 2010-09-14 08:09:09 PDT
Attachment 67550 [details] did not build on gtk:
Build output: http://queues.webkit.org/results/4004008
Comment 5 WebKit Review Bot 2010-09-14 09:11:35 PDT
Attachment 67550 [details] did not build on win:
Build output: http://queues.webkit.org/results/4061010
Comment 6 Peter Varga 2010-09-20 09:21:14 PDT
The new performance results without the collecting of character classes:

                                       ref                mod
regexp-dna:  1.74x as fast     297.1ms +/- 0.3%   170.8ms +/- 0.2%
v8-regexp:   1.065x as fast   2628.8ms +/- 0.6%  2468.0ms +/- 0.3%
Comment 7 Michael Saboff 2010-09-30 17:46:36 PDT
I did some analysis of this patch and why v8-regexp is slower with it.

Using shark, I examined the performance of the JIT code generated with and without the patch.  I looked at the top four regular expressions by time spent in the corresponding JIT'ed code. This patch doesn't change the JIT'ed code for the #1 regular expression, as it begins with (^|[^\\]).  Expressions 2, 3 and 4 are fixed strings of /NQ_VQ/, /d1/ and /d2/ respectively which are impacted by this code.

I think that this patch slows down these expressions since the code looks for the beginning characters in a loop, advancing index until the characters are found and then falls through to the original code which looks again at the first characters.  For each of these patterns, the patch isn't providing much benefit as they degenerate to looking for the beginning characters as well.  It may make sense to not emit the look for beginning characters when the whole pattern is a fixed string.

Although much more work, you may want to code up eliminating the normal character match for prefixes handled by the beginning check code.  This certainly would get complicated for captured characters.
Comment 8 Peter Varga 2010-10-01 05:33:28 PDT
(In reply to comment #7)

Thanks, Michael for the advices.

Avoiding this optimization in case of fixed strings was a really good idea
because it significantly reduces the number of cases where the optimization is
searching for beginning characters in case of the v8-regexp test.
The new version of patch has been uploaded at 
https://bugs.webkit.org/show_bug.cgi?id=45748 but I haven't noticed a significant
performance improvement compared to the previous results.

Your other idea to try to avoid the normal matching of prefixes seems to be too 
complex and probably it doesn't yield significant performance improvements on the
v8-regexp test, because the beginning character look-up optimization
isn't able to make this test faster. The cause for this is that there are a lot
of special patterns (eg. which contain assertions like BOL) in v8-regexp.
Comment 9 Michael Saboff 2010-10-01 14:08:27 PDT
I ran some tests on patch #4 of 45748 and this patch on a MacPro.

The performance changes I saw are:
                        
   SunSpider  base: 274.0  with mods: 273.8 (+.1%)
   regexp-dna base: 16.8 with mods: 16.0 (+5%)
  V8 Base base: 1778.7 with mods: 1778.6 (no change)
  v8-regexp base:174.1 with mods: 172.5 (+.9%)

The concern now is why the total v8 time is unchanged with a 1.6ms improvement on v8-regexp.  I didn't analyze what other tests did with the patches.  Note that since we don't see any regression, I believe we can move forward with this change.
Comment 10 Geoffrey Garen 2010-10-15 15:04:27 PDT
Comment on attachment 67550 [details]
proposed patch

OK, this patch is ready to go, but we can't land it until we land bug 45748.
Comment 11 Peter Varga 2010-10-21 07:56:46 PDT
Created attachment 71436 [details]
proposed patch v2

update patch.
Comment 12 Eric Seidel (no email) 2010-10-21 08:45:00 PDT
Comment on attachment 67550 [details]
proposed patch

Cleared Geoffrey Garen's review+ from obsolete attachment 67550 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Comment 13 Gavin Barraclough 2010-11-16 12:35:51 PST
Comment on attachment 71436 [details]
proposed patch v2

Looks good to go!
Comment 14 Early Warning System Bot 2010-11-16 13:15:47 PST
Attachment 71436 [details] did not build on qt:
Build output: http://queues.webkit.org/results/6024098
Comment 15 WebKit Commit Bot 2010-11-16 16:40:12 PST
Comment on attachment 71436 [details]
proposed patch v2

Rejecting patch 71436 from commit-queue.

Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=abarth-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 71436]" exit_code: 2
Last 500 characters of output:
5 succeeded at 1212 with fuzz 1 (offset 42 lines).
Hunk #6 FAILED at 1323.
3 out of 6 hunks FAILED -- saving rejects to file JavaScriptCore/yarr/RegexInterpreter.cpp.rej
patching file JavaScriptCore/yarr/RegexInterpreter.h
Hunk #1 succeeded at 324 (offset 15 lines).
Hunk #2 succeeded at 336 (offset 15 lines).
Hunk #3 succeeded at 349 (offset 15 lines).

Failed to run "[u'/Users/abarth/git/webkit-queue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1

Full output: http://queues.webkit.org/results/6138004
Comment 16 Csaba Osztrogonác 2010-11-17 03:06:02 PST
Comment on attachment 71436 [details]
proposed patch v2

Peter resolved the conflict and then I landed it: http://trac.webkit.org/changeset/72186