WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
64202
Enh: Improve handling of RegExp in the form of /.*blah.*/
https://bugs.webkit.org/show_bug.cgi?id=64202
Summary
Enh: Improve handling of RegExp in the form of /.*blah.*/
Michael Saboff
Reported
2011-07-08 14:12:41 PDT
The current regular expression code handles REs in the form of /.*blah.*/ by greedily matching all characters with the first ".*" and then back tracking to find the "blah". Without capturing parens anywhere in the RE, we can look for the "blah" and then process the leading and trailing ".*". This should provide a big win for these types of REs.
Attachments
Proposed patch
(36.51 KB, patch)
2011-07-11 09:29 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Updated patch to fix win compilation error.
(36.53 KB, patch)
2011-07-11 15:52 PDT
,
Michael Saboff
barraclough
: review-
Details
Formatted Diff
Diff
Patch with suggested updates
(37.16 KB, patch)
2011-07-13 11:20 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Previous patch with regex JIT turned back on.
(36.38 KB, patch)
2011-07-13 11:30 PDT
,
Michael Saboff
barraclough
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Michael Saboff
Comment 1
2011-07-11 09:29:21 PDT
Created
attachment 100311
[details]
Proposed patch Proposed enhancement to both the Yarr interpreter and JIT to handle expressions in the form of /[^].*[?]<sub-expr>.*[$]/[m]. The terms in between the leading and trailing .*'s cannot capture and also this enhancement is limited to single alternative expressions. Process the inner terms and then look for the beginning of the string and end of the string.
Michael Saboff
Comment 2
2011-07-11 15:52:16 PDT
Created
attachment 100374
[details]
Updated patch to fix win compilation error.
Gavin Barraclough
Comment 3
2011-07-12 17:35:26 PDT
Comment on
attachment 100374
[details]
Updated patch to fix win compilation error. View in context:
https://bugs.webkit.org/attachment.cgi?id=100374&action=review
> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1074 > + if (pattern->m_multiline) {
These two if statements could be merged & save a repeating the match assign & return true. if (!pattern->m_multiline && ((term.anchors.m_bol && matchBegin) || (term.anchors.m_eol && matchEnd != input.end()))) return false;
> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1379 > + BACKTRACK();
I don't think we should ever be able to backtrack into here; I think this should be ASSERT_NOT_REACHED()?
> Source/JavaScriptCore/yarr/YarrJIT.cpp:1011 > + saveStartIndex.append(branch32(Equal, matchPos, Imm32(m_checked)));
Please modify per our discussion, /.*\n\d+.*/ "abc\n123".
> Source/JavaScriptCore/yarr/YarrPattern.cpp:716 > + for (size_t alt = 0; alt < alternatives.size(); ++alt) {
We shouldn't need a for-loop here, given the above guard! possibly should switch the check to "if (alternatives.size() != 1) return;" (I don't think we ever have 0 alternatives, but just to be safe!), then drop the loop & just work over alternatives[0].
Michael Saboff
Comment 4
2011-07-13 11:20:16 PDT
Created
attachment 100690
[details]
Patch with suggested updates
Michael Saboff
Comment 5
2011-07-13 11:30:23 PDT
Created
attachment 100694
[details]
Previous patch with regex JIT turned back on.
Michael Saboff
Comment 6
2011-07-13 16:05:45 PDT
Committed
r90962
: <
http://trac.webkit.org/changeset/90962
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug