Bug 64202 - Enh: Improve handling of RegExp in the form of /.*blah.*/
Summary: Enh: Improve handling of RegExp in the form of /.*blah.*/
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-08 14:12 PDT by Michael Saboff
Modified: 2011-07-13 16:05 PDT (History)
0 users

See Also:


Attachments
Proposed patch (36.51 KB, patch)
2011-07-11 09:29 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated patch to fix win compilation error. (36.53 KB, patch)
2011-07-11 15:52 PDT, Michael Saboff
barraclough: review-
Details | Formatted Diff | Diff
Patch with suggested updates (37.16 KB, patch)
2011-07-13 11:20 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Previous patch with regex JIT turned back on. (36.38 KB, patch)
2011-07-13 11:30 PDT, Michael Saboff
barraclough: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2011-07-08 14:12:41 PDT
The current regular expression code handles REs in the form of /.*blah.*/ by greedily matching all characters with the first ".*" and then back tracking to find the "blah". Without capturing parens anywhere in the RE, we can look for the "blah" and then process the leading and trailing ".*".  This should provide a big win for these types of REs.
Comment 1 Michael Saboff 2011-07-11 09:29:21 PDT
Created attachment 100311 [details]
Proposed patch

Proposed enhancement to both the Yarr interpreter and JIT to handle expressions in the form of /[^].*[?]<sub-expr>.*[$]/[m].  The terms in between the leading and trailing .*'s cannot capture and also this enhancement is limited to single alternative expressions.  Process the inner terms and then look for the beginning of the string and end of the string.
Comment 2 Michael Saboff 2011-07-11 15:52:16 PDT
Created attachment 100374 [details]
Updated patch to fix win compilation error.
Comment 3 Gavin Barraclough 2011-07-12 17:35:26 PDT
Comment on attachment 100374 [details]
Updated patch to fix win compilation error.

View in context: https://bugs.webkit.org/attachment.cgi?id=100374&action=review

> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1074
> +        if (pattern->m_multiline) {

These two if statements could be merged & save a repeating the match assign & return true.

if (!pattern->m_multiline && ((term.anchors.m_bol && matchBegin) || (term.anchors.m_eol && matchEnd != input.end())))
    return false;

> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1379
> +            BACKTRACK();

I don't think we should ever be able to backtrack into here; I think this should be ASSERT_NOT_REACHED()?

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1011
> +        saveStartIndex.append(branch32(Equal, matchPos, Imm32(m_checked)));

Please modify per our discussion, /.*\n\d+.*/ "abc\n123".

> Source/JavaScriptCore/yarr/YarrPattern.cpp:716
> +        for (size_t alt = 0; alt < alternatives.size(); ++alt) {

We shouldn't need a for-loop here, given the above guard!   possibly should switch the check to "if (alternatives.size() != 1) return;" (I don't think we ever have 0 alternatives, but just to be safe!), then drop the loop & just work over alternatives[0].
Comment 4 Michael Saboff 2011-07-13 11:20:16 PDT
Created attachment 100690 [details]
Patch with suggested updates
Comment 5 Michael Saboff 2011-07-13 11:30:23 PDT
Created attachment 100694 [details]
Previous patch with regex JIT turned back on.
Comment 6 Michael Saboff 2011-07-13 16:05:45 PDT
Committed r90962: <http://trac.webkit.org/changeset/90962>