UNCONFIRMED25071
Certain regular expressions fail with large strings
https://bugs.webkit.org/show_bug.cgi?id=25071
Summary Certain regular expressions fail with large strings
Paul Nicholls
Reported 2009-04-06 22:16:54 PDT
When running the "test" function on certain regular expressions against large strings, there is a tipping point at which it will fail silently (return false when it should return true). The tipping point may vary from system to system - not 100% sure - but the code below assumes a tipping point of 25000 (25000 returns true, 25001 returns false). If both return true for you, increase the upper limit in the second for loop to add more "a" characters to the middle of the string. Tested in Safari 4 public beta (528.16) on 32-bit Windows Vista Business, running on a Core2 Duo E8300 (2.83GHz) with 3GB of RAM. Sample code to reproduce (regular expression tailored to test data and reduced as much as possible while still being affected): r=/("(z|a)*?"|.)+/; s='{"'; for(i=0;i<24996;i++) s+='a'; s+='"}'; l='{"'; for(i=0;i<24997;i++) l+='a'; l+='"}'; document.getElementById('cfsavecontent').value=l; alert(r.test(s)); alert(r.test(l));
Attachments
Illustrates the problem with a different regular expression that attempts to extract html tag attributes (1.42 MB, text/html)
2020-02-25 12:02 PST, Yan Shapochnik
no flags
Mark Rowe (bdash)
Comment 1 2009-04-07 01:10:39 PDT
Can you please test with the latest WebKit nightly build? I believe that we've had some changes in this area recently.
Mark Rowe (bdash)
Comment 2 2009-04-07 01:30:36 PDT
In my test on Mac OS X with a build of WebKit from this afternoon, I see the match pass with a length of 250,000 and fail with a length of 250,001. This seems consistent with a 10x increase in the limit that you were previously hitting. Is that sufficient for your needs?
Gavin Barraclough
Comment 3 2012-03-12 18:39:51 PDT
We currently limit the number of match attempts to 1000000 (see matchLimit in Yarr.h). There are two ways that our current behavior seem undesirable here: (1) There is no need to fail to match this regular expression, it's not *that* slow. Rather than the current 1000000 limit, Yarr to better cooperate with JSC's timeout checking mechanism to stop slow regular expressions. (2) When the VM terminates regular expression processing early due to resource restrictions, we shouldn't just be returning false, indicating no-match - since this string does match. It would be better to throw an error indicating a problem. I'll file two new specific bugs for these problems.
Yan Shapochnik
Comment 4 2020-02-25 12:02:59 PST
Created attachment 391674 [details] Illustrates the problem with a different regular expression that attempts to extract html tag attributes
Yan Shapochnik
Comment 5 2020-02-25 12:06:40 PST
Encountered this issue with TinyMCE which attempts to extract <img> tag attributes for evaluation. This behavior is encountered when the <img> src contained a large base64 encoded image.
Note You need to log in before you can comment on or make changes to this bug.