Bug 200190 - JavaScriptCore's Regex can't match the content.
Summary: JavaScriptCore's Regex can't match the content.
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Local Build
Hardware: PC Linux
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-07-26 19:29 PDT by Yang Tian
Modified: 2019-07-30 19:05 PDT (History)
3 users (show)

See Also:


Attachments
testcase (360 bytes, text/javascript)
2019-07-26 19:29 PDT, Yang Tian
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yang Tian 2019-07-26 19:29:49 PDT
Created attachment 375010 [details]
testcase

#Testcase:
1 var func = function() {
2     var reg = /(.*d.*){5,}/;
3     var str1 = 'adadshare.com/files/61674290/a.a.Third_GG_SG_.partlasPlayground.zip | C:\Documents and Settings';
4     // var str2 = 'adadshare.com/files/61674290/a.a.d_GG_SG_.partlasPlayground.zip | C:\Documents and Settings';
5     print(str1.search(reg));
6     // print(str2.search(reg));
7 };
8 
9 func();

#Command:
./webkit/WebKitBuild/Release/bin/jsc testcase.js

#Output:
-1

#Expected output:
0

#Description:
When executing the above testcase, the expected output should be "0" according to ECMAScript-262 standard. However, JavaScriptCore outputs "-1" while other JS engines such as V8, SpiderMonkey, QuickJS output "0".

And if I remove a few characters from str1(such as replace str1(line3) with str2(line 4)), the error disappears and JavaScriptCore outputs "0" normally. This is so weird that I suspect it's a bug of JavaScriptCore.
Comment 1 Radar WebKit Bug Importer 2019-07-26 21:45:44 PDT
<rdar://problem/53613823>
Comment 2 Michael Saboff 2019-07-29 11:14:04 PDT
This is technically a bug in the JSC Regex engine.  The JSC regex engine is a backtracking matcher that can execute in polynomial time for certain expressions.  The engine has a loop limit that will fail if a match isn't found after 1,000,000 iterations.

The test regular expression when applied to str1 is hitting that 1,000,000 loop iteration limit due to the 2 ".*" components of the regular expression.  I increased the limit and the test case requires 1,034,633 loop iterations to match.  This high loop count is needed because the ".*" will match all non-newline characters including the 'd' you are looking for and go past them to the end of the string.  Then the matcher backtracks one character and tries matching the rest of the pattern.  It keeps on doing this until it determines there is a match or not.  The addition of the second ".*" makes this regular expression quadratic in the JSC matcher.  Matching str2, which is slightly shorter, matches in 921,193 loop iterations.

If you are looking for 5 or more 'd' characters with 0 or more non-'d' characters in between, I suggest that you change the pattern to something like /([^d]*d[^d]*){5,}/.  This pattern matches in 7 loop iterations of the matcher, thus performing 150,000x faster than the original regular expression.  This pattern will be faster on all JavaScript engines.  You could also eliminate the second ".*", /(.*d){5,}/ or move it to the end of the expression, /(.*d){5,}.*/, both of these cases will take 59 iterations through the matcher loop.

If we double the loop limit to 2,000,000 iterations, the test regex will only be able to match a string ~10 characters longer before hitting the limit.
Comment 3 Yang Tian 2019-07-29 19:07:45 PDT
Can I modify this loop limit by myself? if so, please tell me what should i do, thanks!
Comment 4 Yang Tian 2019-07-29 19:19:31 PDT
(In reply to Michael Saboff from comment #2)
> This is technically a bug in the JSC Regex engine.  The JSC regex engine is
> a backtracking matcher that can execute in polynomial time for certain
> expressions.  The engine has a loop limit that will fail if a match isn't
> found after 1,000,000 iterations.
> 
> The test regular expression when applied to str1 is hitting that 1,000,000
> loop iteration limit due to the 2 ".*" components of the regular expression.
> I increased the limit and the test case requires 1,034,633 loop iterations
> to match.  This high loop count is needed because the ".*" will match all
> non-newline characters including the 'd' you are looking for and go past
> them to the end of the string.  Then the matcher backtracks one character
> and tries matching the rest of the pattern.  It keeps on doing this until it
> determines there is a match or not.  The addition of the second ".*" makes
> this regular expression quadratic in the JSC matcher.  Matching str2, which
> is slightly shorter, matches in 921,193 loop iterations.
> 
> If you are looking for 5 or more 'd' characters with 0 or more non-'d'
> characters in between, I suggest that you change the pattern to something
> like /([^d]*d[^d]*){5,}/.  This pattern matches in 7 loop iterations of the
> matcher, thus performing 150,000x faster than the original regular
> expression.  This pattern will be faster on all JavaScript engines.  You
> could also eliminate the second ".*", /(.*d){5,}/ or move it to the end of
> the expression, /(.*d){5,}.*/, both of these cases will take 59 iterations
> through the matcher loop.
> 
> If we double the loop limit to 2,000,000 iterations, the test regex will
> only be able to match a string ~10 characters longer before hitting the
> limit.

Can I solve this problem by modifing the loop limit myself? If so, please tell me what should i do. If not, Can i treat this as a bug of JSC Regex engine?
Comment 5 Michael Saboff 2019-07-30 09:34:49 PDT
The loop limit is compiled in and can't be changed via JavaScript.

We consider this a bug and are working on a resolution.  In the mean time, is it possible to change your regex?  For example, do you need the .* at the beginning and end of the capture group?  If you changed your regex to /.*(d.*){5,}/ you will get the same answer.  If you want an even more efficient regex, you could use /([^d]*d[^d]*){5,}/.
Comment 6 Yang Tian 2019-07-30 19:05:26 PDT
(In reply to Michael Saboff from comment #5)
> The loop limit is compiled in and can't be changed via JavaScript.
> 
> We consider this a bug and are working on a resolution.  In the mean time,
> is it possible to change your regex?  For example, do you need the .* at the
> beginning and end of the capture group?  If you changed your regex to
> /.*(d.*){5,}/ you will get the same answer.  If you want an even more
> efficient regex, you could use /([^d]*d[^d]*){5,}/.

I got it, thanks for your reply :)