UNCONFIRMED 122891
Yarr does not compile Peacekeeper email validation regex
https://bugs.webkit.org/show_bug.cgi?id=122891
Summary Yarr does not compile Peacekeeper email validation regex
Jan de Mooij
Reported 2013-10-16 05:27:03 PDT
Created attachment 214358 [details] Shell testcase I'm attaching a simple shell version of Peacekeeper's stringValidateForm test. The good news is that Yarr is able to compile 4 of the 5 regular expressions. The bad news is that the other regex is interpreted and this slows us down. If I run the test in Safari it takes 1200 ms, with the email regex commented out it's 460 ms so the slow regex is where we spend most of our time on this test. The email validation part is this: input = "jaakko.alajoki@futuremark.com"; result = /^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/.test(input);
Attachments
Shell testcase (664 bytes, application/x-javascript)
2013-10-16 05:27 PDT, Jan de Mooij
no flags
Jan de Mooij
Comment 1 2013-10-16 05:29:47 PDT
Yarr JIT compilation is aborting here I think: // We can currently only compile quantity 1 subpatterns that are // not copies. We generate a copy in the case of a range quantifier, // e.g. /(?:x){3,9}/, or /(?:x)+/ (These are effectively expanded to // /(?:x){3,3}(?:x){0,6}/ and /(?:x)(?:x)*/ repectively). The problem // comes where the subpattern is capturing, in which case we would // need to restore the capture from the first subpattern upon a // failure in the second.
Gavin Barraclough
Comment 2 2013-10-16 10:52:09 PDT
Yarr JIT now compiles regular expressions twice, in one of two mode – 'match-only', and 'include-subpatterns'. Match-only is used for when the regexp is first run, to scan for the match, at which point we return a boolean result (in the case of 'test' matches), or returned as a lazily populated regexp matches array object. Include-subpattens is used when an subpatten match is explicitly required, for example if an entry in the matches array is accessed. The restriction referenced in the comment is that the JIT won't backtrack sub pattern matches, so we can't compile quantified captures unless we can guarantee they won't backtrack. But this restriction doesn't apply for match-only compilations, which aren't recording the matched sub patterns anyway. We can probably pretty much say, if (compileMode == MatchOnly) then we can compile any parens. We'd have to look a little more closely to check whether any of the restrictions still need to apply.
Gavin Barraclough
Comment 3 2013-10-16 17:13:45 PDT
Actually, no, I'm way oversimplifying here. :-( – each iteration would also need separate backtracking state internally, so this does effectively demand stack allocation.
Note You need to log in before you can comment on or make changes to this bug.