RESOLVED INVALID Bug 12638
Some regular expressions could potentially be matched much faster
https://bugs.webkit.org/show_bug.cgi?id=12638
Summary Some regular expressions could potentially be matched much faster
David Smith
Reported 2007-02-06 13:12:46 PST
The regular expression at the URL (32 repetitions of a? followed by 32 repetitions of a, and attempting to match on a string of 32 repetitions of a) takes a long time to run in Safari, but runs extremely quickly in the attached C program. The C program was obtained from http://swtch.com/~rsc/regexp/regexp1.html, which also has an explanation of why it's faster. The downside is that backreferences and other similar features are not supported, but it might be possible to decide which method to use based on checking whether the regex uses features requiring the slower method.
Attachments
regex matching program (7.67 KB, application/octet-stream)
2007-02-06 13:14 PST, David Smith
no flags
Patch to fix size limitation. (1.85 KB, patch)
2008-09-11 02:10 PDT, Erik Corry
no flags
David Smith
Comment 1 2007-02-06 13:14:17 PST
Created attachment 12982 [details] regex matching program
David Kilzer (:ddkilzer)
Comment 2 2007-02-06 14:49:30 PST
JavaScriptCore uses a modified version of the PCRE library to do its regular expression matching. The embedded version is at version 6.4. Versions 6.5, 6.6, 6.7 and 7.0 have been released since 6.4. If you have time, please test this expression with the newer versions of pcre to find out if upgrading will benefit WebKit. (It's on my mental to-do list to upgrade the embedded version, but I've been distracted by other shiny objects like webarchives recently.) Apple's UTF-16 changes (and Mitz's recent non-null-terminated API changes) to PCRE really need to be contributed back to the PCRE project so that (some day) PCRE could be linked to WebKit instead of being embedded in it.
David Smith
Comment 3 2007-02-06 15:12:55 PST
It's much harder to cause a hang after r19434, so that's a definite improvement. I don't know if it's avoiding the exponential behavior mentioned in the linked article, or just fast enough to run even that fairly quickly. I will try to investigate later.
Cameron Zwarich (cpst)
Comment 4 2007-07-10 22:04:45 PDT
I tried it with the latest version of PCRE (7.2 at the time of this post) with the included program pcredemo, and as in the Russ Cox article, it stops after 22 repetitions because it reaches its default backtracking limit. Prior to that, it is still slow.
Erik Corry
Comment 5 2008-09-11 02:10:25 PDT
Created attachment 23342 [details] Patch to fix size limitation. See comment above.
David Kilzer (:ddkilzer)
Comment 6 2008-09-11 05:53:45 PDT
(In reply to comment #5) > Created an attachment (id=23342) [edit] > Patch to fix size limitation. > > See comment above. I think this should have been attached to Bug 14873 instead.
David Kilzer (:ddkilzer)
Comment 7 2008-09-11 05:56:57 PDT
Comment on attachment 23342 [details] Patch to fix size limitation. Please reattach to Bug 14873. How does this patch affect SunSpider?
David Smith
Comment 8 2009-01-02 23:22:30 PST
Retesting in r39572, this hangs. Presumably due to avoiding the 22 repetition limit that PCRE imposed.
Gavin Barraclough
Comment 9 2012-09-06 17:18:42 PDT
The patch attached to this bug is no longer valid - we no longer use PCRE. I don't think we're particularly interested in optimizing for the artificial stress tests. An automaton based matcher may have a benefit for real web pages, but we should find benchmarks representative of those cases. I don't think there is actionable work here.
Note You need to log in before you can comment on or make changes to this bug.