Bug 124319 - Regex runtime depends on input size when it shouldn't
Summary: Regex runtime depends on input size when it shouldn't
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P3 Normal
Assignee: Nobody
Depends on:
Reported: 2013-11-13 15:52 PST by Peter Isza
Modified: 2013-11-30 13:19 PST (History)
3 users (show)

See Also:

Regexp runtime tester script (626 bytes, text/html)
2013-11-13 15:52 PST, Peter Isza
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Isza 2013-11-13 15:52:10 PST
Created attachment 216876 [details]
Regexp runtime tester script

I have run the following regular expressions on strings that contain only letter 'a':
  1. /^b+/
  2. /^(b)+/
  3. /^(?:b+)/
  4. /^(?:(?:b)+)/

Obviously, there should be no matches. The regex engine should quit after reading the first character, so the running time shouldn't depend on what's after the first letter in the string. But in case #2 and #4 it does depend on it (linearly).

The bug appears in Safari and Firefox (and doesn't appear in Chrome, IE and Opera).

It turns out that in cases #2 and #4 YARR can't compile the expression, so it is using the interpreter. Fair enough.

Then it turns out that the interpreter uses some magic backtracking algorithm instead of a finite state automaton. Alright.

This magic algorithm seems to look for the matches first, and then decides whether the match is in the beginning or not:
if (((matchBegin && term.anchors.m_bol)
             || ((matchEnd != input.end()) && term.anchors.m_eol))
            && !pattern->m_multiline)
            return false;

Could you please fix this?
Comment 1 Peter Isza 2013-11-13 15:58:38 PST
The code I pasted here isn't related, but the problem remains the same.