RESOLVED FIXED 64202
Enh: Improve handling of RegExp in the form of /.*blah.*/
https://bugs.webkit.org/show_bug.cgi?id=64202
Summary Enh: Improve handling of RegExp in the form of /.*blah.*/
Michael Saboff
Reported 2011-07-08 14:12:41 PDT
The current regular expression code handles REs in the form of /.*blah.*/ by greedily matching all characters with the first ".*" and then back tracking to find the "blah". Without capturing parens anywhere in the RE, we can look for the "blah" and then process the leading and trailing ".*". This should provide a big win for these types of REs.
Attachments
Proposed patch (36.51 KB, patch)
2011-07-11 09:29 PDT, Michael Saboff
no flags
Updated patch to fix win compilation error. (36.53 KB, patch)
2011-07-11 15:52 PDT, Michael Saboff
barraclough: review-
Patch with suggested updates (37.16 KB, patch)
2011-07-13 11:20 PDT, Michael Saboff
no flags
Previous patch with regex JIT turned back on. (36.38 KB, patch)
2011-07-13 11:30 PDT, Michael Saboff
barraclough: review+
Michael Saboff
Comment 1 2011-07-11 09:29:21 PDT
Created attachment 100311 [details] Proposed patch Proposed enhancement to both the Yarr interpreter and JIT to handle expressions in the form of /[^].*[?]<sub-expr>.*[$]/[m]. The terms in between the leading and trailing .*'s cannot capture and also this enhancement is limited to single alternative expressions. Process the inner terms and then look for the beginning of the string and end of the string.
Michael Saboff
Comment 2 2011-07-11 15:52:16 PDT
Created attachment 100374 [details] Updated patch to fix win compilation error.
Gavin Barraclough
Comment 3 2011-07-12 17:35:26 PDT
Comment on attachment 100374 [details] Updated patch to fix win compilation error. View in context: https://bugs.webkit.org/attachment.cgi?id=100374&action=review > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1074 > + if (pattern->m_multiline) { These two if statements could be merged & save a repeating the match assign & return true. if (!pattern->m_multiline && ((term.anchors.m_bol && matchBegin) || (term.anchors.m_eol && matchEnd != input.end()))) return false; > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:1379 > + BACKTRACK(); I don't think we should ever be able to backtrack into here; I think this should be ASSERT_NOT_REACHED()? > Source/JavaScriptCore/yarr/YarrJIT.cpp:1011 > + saveStartIndex.append(branch32(Equal, matchPos, Imm32(m_checked))); Please modify per our discussion, /.*\n\d+.*/ "abc\n123". > Source/JavaScriptCore/yarr/YarrPattern.cpp:716 > + for (size_t alt = 0; alt < alternatives.size(); ++alt) { We shouldn't need a for-loop here, given the above guard! possibly should switch the check to "if (alternatives.size() != 1) return;" (I don't think we ever have 0 alternatives, but just to be safe!), then drop the loop & just work over alternatives[0].
Michael Saboff
Comment 4 2011-07-13 11:20:16 PDT
Created attachment 100690 [details] Patch with suggested updates
Michael Saboff
Comment 5 2011-07-13 11:30:23 PDT
Created attachment 100694 [details] Previous patch with regex JIT turned back on.
Michael Saboff
Comment 6 2011-07-13 16:05:45 PDT
Note You need to log in before you can comment on or make changes to this bug.