Search the start index of the first potential match before the pattern matching process. Use the vector of possible beginning characters which are collected by YARR's parser.
Created attachment 67550 [details] proposed patch Performance results: ref mod regexp-dna: 2.15x as fast 273.3ms +/- 0.1% 127.0ms +/- 0.0% v8-regexp: 1.23x as fast 1794.8ms +/- 0.1% 1460.6ms +/- 0.6%
Attachment 67550 [details] did not build on qt: Build output: http://queues.webkit.org/results/4007008
(In reply to comment #2) > Attachment 67550 [details] did not build on qt: > Build output: http://queues.webkit.org/results/4007008 It will build on Qt after https://bugs.webkit.org/show_bug.cgi?id=45748 fixed.
Attachment 67550 [details] did not build on gtk: Build output: http://queues.webkit.org/results/4004008
Attachment 67550 [details] did not build on win: Build output: http://queues.webkit.org/results/4061010
The new performance results without the collecting of character classes: ref mod regexp-dna: 1.74x as fast 297.1ms +/- 0.3% 170.8ms +/- 0.2% v8-regexp: 1.065x as fast 2628.8ms +/- 0.6% 2468.0ms +/- 0.3%
I did some analysis of this patch and why v8-regexp is slower with it. Using shark, I examined the performance of the JIT code generated with and without the patch. I looked at the top four regular expressions by time spent in the corresponding JIT'ed code. This patch doesn't change the JIT'ed code for the #1 regular expression, as it begins with (^|[^\\]). Expressions 2, 3 and 4 are fixed strings of /NQ_VQ/, /d1/ and /d2/ respectively which are impacted by this code. I think that this patch slows down these expressions since the code looks for the beginning characters in a loop, advancing index until the characters are found and then falls through to the original code which looks again at the first characters. For each of these patterns, the patch isn't providing much benefit as they degenerate to looking for the beginning characters as well. It may make sense to not emit the look for beginning characters when the whole pattern is a fixed string. Although much more work, you may want to code up eliminating the normal character match for prefixes handled by the beginning check code. This certainly would get complicated for captured characters.
(In reply to comment #7) Thanks, Michael for the advices. Avoiding this optimization in case of fixed strings was a really good idea because it significantly reduces the number of cases where the optimization is searching for beginning characters in case of the v8-regexp test. The new version of patch has been uploaded at https://bugs.webkit.org/show_bug.cgi?id=45748 but I haven't noticed a significant performance improvement compared to the previous results. Your other idea to try to avoid the normal matching of prefixes seems to be too complex and probably it doesn't yield significant performance improvements on the v8-regexp test, because the beginning character look-up optimization isn't able to make this test faster. The cause for this is that there are a lot of special patterns (eg. which contain assertions like BOL) in v8-regexp.
I ran some tests on patch #4 of 45748 and this patch on a MacPro. The performance changes I saw are: SunSpider base: 274.0 with mods: 273.8 (+.1%) regexp-dna base: 16.8 with mods: 16.0 (+5%) V8 Base base: 1778.7 with mods: 1778.6 (no change) v8-regexp base:174.1 with mods: 172.5 (+.9%) The concern now is why the total v8 time is unchanged with a 1.6ms improvement on v8-regexp. I didn't analyze what other tests did with the patches. Note that since we don't see any regression, I believe we can move forward with this change.
Comment on attachment 67550 [details] proposed patch OK, this patch is ready to go, but we can't land it until we land bug 45748.
Created attachment 71436 [details] proposed patch v2 update patch.
Comment on attachment 67550 [details] proposed patch Cleared Geoffrey Garen's review+ from obsolete attachment 67550 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Comment on attachment 71436 [details] proposed patch v2 Looks good to go!
Attachment 71436 [details] did not build on qt: Build output: http://queues.webkit.org/results/6024098
Comment on attachment 71436 [details] proposed patch v2 Rejecting patch 71436 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=abarth-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 71436]" exit_code: 2 Last 500 characters of output: 5 succeeded at 1212 with fuzz 1 (offset 42 lines). Hunk #6 FAILED at 1323. 3 out of 6 hunks FAILED -- saving rejects to file JavaScriptCore/yarr/RegexInterpreter.cpp.rej patching file JavaScriptCore/yarr/RegexInterpreter.h Hunk #1 succeeded at 324 (offset 15 lines). Hunk #2 succeeded at 336 (offset 15 lines). Hunk #3 succeeded at 349 (offset 15 lines). Failed to run "[u'/Users/abarth/git/webkit-queue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1 Full output: http://queues.webkit.org/results/6138004
Comment on attachment 71436 [details] proposed patch v2 Peter resolved the conflict and then I landed it: http://trac.webkit.org/changeset/72186