RESOLVED FIXED 45751
Extend YARR Interpreter with beginning character look-up optimization
https://bugs.webkit.org/show_bug.cgi?id=45751
Summary Extend YARR Interpreter with beginning character look-up optimization
Peter Varga
Reported 2010-09-14 07:44:23 PDT
Search the start index of the first potential match before the pattern matching process. Use the vector of possible beginning characters which are collected by YARR's parser.
Attachments
proposed patch (6.39 KB, patch)
2010-09-14 07:47 PDT, Peter Varga
no flags
proposed patch v2 (6.22 KB, patch)
2010-10-21 07:56 PDT, Peter Varga
no flags
Peter Varga
Comment 1 2010-09-14 07:47:51 PDT
Created attachment 67550 [details] proposed patch Performance results: ref mod regexp-dna: 2.15x as fast 273.3ms +/- 0.1% 127.0ms +/- 0.0% v8-regexp: 1.23x as fast 1794.8ms +/- 0.1% 1460.6ms +/- 0.6%
Early Warning System Bot
Comment 2 2010-09-14 08:05:03 PDT
Csaba Osztrogonác
Comment 3 2010-09-14 08:06:17 PDT
(In reply to comment #2) > Attachment 67550 [details] did not build on qt: > Build output: http://queues.webkit.org/results/4007008 It will build on Qt after https://bugs.webkit.org/show_bug.cgi?id=45748 fixed.
WebKit Review Bot
Comment 4 2010-09-14 08:09:09 PDT
WebKit Review Bot
Comment 5 2010-09-14 09:11:35 PDT
Peter Varga
Comment 6 2010-09-20 09:21:14 PDT
The new performance results without the collecting of character classes: ref mod regexp-dna: 1.74x as fast 297.1ms +/- 0.3% 170.8ms +/- 0.2% v8-regexp: 1.065x as fast 2628.8ms +/- 0.6% 2468.0ms +/- 0.3%
Michael Saboff
Comment 7 2010-09-30 17:46:36 PDT
I did some analysis of this patch and why v8-regexp is slower with it. Using shark, I examined the performance of the JIT code generated with and without the patch. I looked at the top four regular expressions by time spent in the corresponding JIT'ed code. This patch doesn't change the JIT'ed code for the #1 regular expression, as it begins with (^|[^\\]). Expressions 2, 3 and 4 are fixed strings of /NQ_VQ/, /d1/ and /d2/ respectively which are impacted by this code. I think that this patch slows down these expressions since the code looks for the beginning characters in a loop, advancing index until the characters are found and then falls through to the original code which looks again at the first characters. For each of these patterns, the patch isn't providing much benefit as they degenerate to looking for the beginning characters as well. It may make sense to not emit the look for beginning characters when the whole pattern is a fixed string. Although much more work, you may want to code up eliminating the normal character match for prefixes handled by the beginning check code. This certainly would get complicated for captured characters.
Peter Varga
Comment 8 2010-10-01 05:33:28 PDT
(In reply to comment #7) Thanks, Michael for the advices. Avoiding this optimization in case of fixed strings was a really good idea because it significantly reduces the number of cases where the optimization is searching for beginning characters in case of the v8-regexp test. The new version of patch has been uploaded at https://bugs.webkit.org/show_bug.cgi?id=45748 but I haven't noticed a significant performance improvement compared to the previous results. Your other idea to try to avoid the normal matching of prefixes seems to be too complex and probably it doesn't yield significant performance improvements on the v8-regexp test, because the beginning character look-up optimization isn't able to make this test faster. The cause for this is that there are a lot of special patterns (eg. which contain assertions like BOL) in v8-regexp.
Michael Saboff
Comment 9 2010-10-01 14:08:27 PDT
I ran some tests on patch #4 of 45748 and this patch on a MacPro. The performance changes I saw are: SunSpider base: 274.0 with mods: 273.8 (+.1%) regexp-dna base: 16.8 with mods: 16.0 (+5%) V8 Base base: 1778.7 with mods: 1778.6 (no change) v8-regexp base:174.1 with mods: 172.5 (+.9%) The concern now is why the total v8 time is unchanged with a 1.6ms improvement on v8-regexp. I didn't analyze what other tests did with the patches. Note that since we don't see any regression, I believe we can move forward with this change.
Geoffrey Garen
Comment 10 2010-10-15 15:04:27 PDT
Comment on attachment 67550 [details] proposed patch OK, this patch is ready to go, but we can't land it until we land bug 45748.
Peter Varga
Comment 11 2010-10-21 07:56:46 PDT
Created attachment 71436 [details] proposed patch v2 update patch.
Eric Seidel (no email)
Comment 12 2010-10-21 08:45:00 PDT
Comment on attachment 67550 [details] proposed patch Cleared Geoffrey Garen's review+ from obsolete attachment 67550 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Gavin Barraclough
Comment 13 2010-11-16 12:35:51 PST
Comment on attachment 71436 [details] proposed patch v2 Looks good to go!
Early Warning System Bot
Comment 14 2010-11-16 13:15:47 PST
WebKit Commit Bot
Comment 15 2010-11-16 16:40:12 PST
Comment on attachment 71436 [details] proposed patch v2 Rejecting patch 71436 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=abarth-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 71436]" exit_code: 2 Last 500 characters of output: 5 succeeded at 1212 with fuzz 1 (offset 42 lines). Hunk #6 FAILED at 1323. 3 out of 6 hunks FAILED -- saving rejects to file JavaScriptCore/yarr/RegexInterpreter.cpp.rej patching file JavaScriptCore/yarr/RegexInterpreter.h Hunk #1 succeeded at 324 (offset 15 lines). Hunk #2 succeeded at 336 (offset 15 lines). Hunk #3 succeeded at 349 (offset 15 lines). Failed to run "[u'/Users/abarth/git/webkit-queue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1 Full output: http://queues.webkit.org/results/6138004
Csaba Osztrogonác
Comment 16 2010-11-17 03:06:02 PST
Comment on attachment 71436 [details] proposed patch v2 Peter resolved the conflict and then I landed it: http://trac.webkit.org/changeset/72186
Note You need to log in before you can comment on or make changes to this bug.