Bug 14873

Summary: REGRESSION (Safari 4): regular expression pattern size limit lower than Safari 3.2, other browsers, breaks SAP
Product: WebKit Reporter: Derk-Jan Hartman <hartman.wiki>
Component: JavaScriptCoreAssignee: Geoffrey Garen <ggaren>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, ddkilzer, erikcorry, feng, ggaren, mrowe, vicki
Priority: P2 Keywords: InRadar, NeedsReduction
Version: 523.x (Safari 3)   
Hardware: Mac   
OS: OS X 10.4   
URL: http://en.wikipedia.org/wiki/User:Lupin/badwords
Attachments:
Description Flags
Test maximum regex length
none
patch
none
patch sam: review+

Derk-Jan Hartman
Reported 2007-08-03 12:58:50 PDT
This link is a set of badwords that get fed to Regexp, in a Javascript antivandal tool. The list is soo enormous that Safari hits its technical limit for the length of regular expressions. Not that strange, but bdash told me to report it regardless. (PS. FF 2.* does handle this RegExp (PS ERR20 of the regexp lib that is being used.)
Attachments
Test maximum regex length (1.05 KB, text/html)
2007-08-12 14:20 PDT, David Kilzer (:ddkilzer)
no flags
patch (7.40 KB, patch)
2009-03-19 13:15 PDT, Geoffrey Garen
no flags
patch (8.02 KB, patch)
2009-03-19 13:23 PDT, Geoffrey Garen
sam: review+
Alexey Proskuryakov
Comment 1 2007-08-06 06:11:16 PDT
How does one reproduce the problem? Just opening the bug URL doesn't cause any errors.
Derk-Jan Hartman
Comment 2 2007-08-09 15:25:37 PDT
To reproduce: Register in Wikipedia. Open en.wikipedia.org/wiki/User:yourusername/monobook.js Add to this page the following line: importScript("User:Lupin/recent2.js"); Now go to either of the following links: http://en.wikipedia.org/wiki/User:Lupin/Filter_recent_changes http://en.wikipedia.org/wiki/User:Lupin/All_recent_changes http://en.wikipedia.org/wiki/User:Lupin/Recent_IP_edits http://en.wikipedia.org/wiki/User:Lupin/Monitor_my_watchlist http://en.wikipedia.org/wiki/User:Lupin/Live_spellcheck These are tools that check the recent changes of specific sets of pages in wikipedia for words and other signs of vandalism (blanking). The bad words list is used by at least "filter recent changes".
David Kilzer (:ddkilzer)
Comment 3 2007-08-10 06:58:52 PDT
Confirmed with a local debug build of WebKit r24922 with Safari 3 Public Beta v. 3.0.3 (522.12.1) on Mac OS X 10.4.10 (8R218). Console output: (event handler):Invalid regular expression: regular expression too large JavaScript Console: Invalid regular expression: regular expression too large http://en.wikipedia.org/w/index.php?title=User%3ALupin%2Frecent2.js&action=raw&ctype=text/javascript Line: 120
David Kilzer (:ddkilzer)
Comment 4 2007-08-10 07:02:17 PDT
NOTE: After following the steps in Comment #2, you may have to reload the page twice. Safari 2.0.4 (419.3) with original WebKit on Mac OS X 10.4.10 (8R218) apparently silently fails. It adds the orange "update" rectangles to the screen output, but there is no content with them, and no error message. This is not a regression.
Derk-Jan Hartman
Comment 5 2007-08-12 12:43:29 PDT
I did some looking around myself after this and found the following: This error translates to ERR20 of the regexp lib and is triggered by a size check (in pcre/pcre_compile.c) against the define MAX_PATTERN_SIZE MAX_PATTERN_SIZE itself is defined in pcre/pcre_internal.h This define is dependant on the define LINK_SIZE in the same file. Normally the max size is 64K for a regexp, but if you set LINK_SIZE higher (3 or 4 instead of 2), then the pcre lib can actually handle higher sizes than that. LINK_SIZE can be defined on the compile commandline according to the header file. I hope to have saved someone some time and good luck :D
David Kilzer (:ddkilzer)
Comment 6 2007-08-12 12:50:53 PDT
It would be interesting to know what the maximum size of a regular expression in Firefox and MSIE 6/7 are. I imagine you could simple create a gigantic regex, then try to use it until you get an exception.
David Kilzer (:ddkilzer)
Comment 7 2007-08-12 14:20:38 PDT
Created attachment 15940 [details] Test maximum regex length This file tests the maximum length that a regular expression may contain in a browser. My results: Safari 3 Public Beta v. 3.0.3 (522.12.1) with WebKit r25023 debug build: 32767 Opera 9.22: 2097151 Could someone test MSIE 6 and 7?
David Kilzer (:ddkilzer)
Comment 8 2007-08-12 14:26:48 PDT
(In reply to comment #7) > Created an attachment (id=15940) [edit] > Test maximum regex length DO NOT try this on Safari 2! It doesn't throw an exception when the regex is too big, so there's no way for the script to know when it's hit a limit. (Maybe checking the value of r would work, though?)
Derk-Jan Hartman
Comment 9 2007-08-16 08:04:57 PDT
(In reply to comment #7) > Could someone test MSIE 6 and 7? MSIE 6 on WinXP on my iMac Intel parallels install: limit: 67108863
Claus Conrad
Comment 10 2008-01-29 16:01:25 PST
(In reply to comment #9) > MSIE 6 on WinXP on my iMac Intel parallels install: > limit: 67108863 Same limit on my MSIE 7 on WinXP on Macbook Intel Parallels.
Darin Adler
Comment 11 2008-01-30 05:17:39 PST
It's easy to make maximum regular expression length much longer. We can change the links to be 32 bits rather than 16 bits. This makes compiled regular expressions larger and also will make matching slower. This is equivalent to setting LINK_SIZE to 4 in the original PCRE source. As part of improving JavaScript performance, we were looking at some optimizations that would make compiled regular expressions larger than they are today. One of these ideas was to make the compiled regular expressions use a pointer for each opcode instead of a byte code. The idea was that this was affordable because regular expressions are usually very small. I suppose this optimization is a bad idea if there are some huge regular expressions to be processed. But on the other hand, this would cause us to change the link size to be the same size as a pointer, which would eliminate arbitrary size limit on the regular expression entirely. So it would blow up the size of compiled regular expressions, but eliminate the 64K limit.
Erik Corry
Comment 12 2008-09-11 02:09:15 PDT
I have a patch for this. It's against V8's copy of JSCRE so it may not go in cleanly to the head. I can't see any performance regressions from it, strangely enough. It runs the tests I have thrown at it so far including the Mozilla tests that used to fail due to large regexps. However I don't think the Mozilla tests test very thoroughly since they passed with the larger size limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to actually work.
David Kilzer (:ddkilzer)
Comment 13 2008-09-11 06:11:19 PDT
(In reply to comment #12) > I have a patch for this. It's against V8's copy of JSCRE so it may not go in > cleanly to the head. I can't see any performance regressions from it, > strangely enough. It runs the tests I have thrown at it so far including the > Mozilla tests that used to fail due to large regexps. However I don't think > the Mozilla tests test very thoroughly since they passed with the larger size > limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to > actually work. The patch was attached to Bug 12638 as Attachment #23342 [details]. Please report the results of running SunSpider with and without this patch.
Erik Corry
Comment 14 2008-10-28 01:55:45 PDT
$ ./sunspider-compare-results --shell=/auto/JavaScriptV8/Archive/611/bin/v8 v8-611.txt v8-611-large-regexps.txt TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 1513.6ms +/- 2.9% 1500.7ms +/- 0.8% ============================================================================= 3d: - 161.0ms +/- 10.8% 152.9ms +/- 1.4% cube: ?? 39.9ms +/- 3.4% 40.3ms +/- 3.1% not conclusive: might be *1.010x as slow* morph: - 68.2ms +/- 23.6% 60.0ms +/- 1.1% raytrace: - 52.9ms +/- 0.8% 52.6ms +/- 1.1% access: ?? 98.7ms +/- 2.5% 101.1ms +/- 3.1% not conclusive: might be *1.024x as slow* binary-trees: ?? 6.9ms +/- 13.3% 7.2ms +/- 18.0% not conclusive: might be *1.043x as slow* fannkuch: - 34.1ms +/- 3.5% 33.9ms +/- 2.1% nbody: *1.065x as slow* 33.8ms +/- 1.7% 36.0ms +/- 9.0% significant nsieve: ?? 23.9ms +/- 5.5% 24.0ms +/- 2.0% not conclusive: might be *1.004x as slow* bitops: ?? 73.9ms +/- 1.8% 75.2ms +/- 2.3% not conclusive: might be *1.018x as slow* 3bit-bits-in-byte: ?? 6.3ms +/- 5.5% 6.4ms +/- 7.8% not conclusive: might be *1.016x as slow* bits-in-byte: ?? 13.8ms +/- 3.3% 13.9ms +/- 2.9% not conclusive: might be *1.007x as slow* bitwise-and: ?? 19.2ms +/- 3.8% 20.1ms +/- 5.4% not conclusive: might be *1.047x as slow* nsieve-bits: ?? 34.6ms +/- 1.4% 34.8ms +/- 1.6% not conclusive: might be *1.006x as slow* controlflow: ?? 5.1ms +/- 15.4% 5.9ms +/- 20.1% not conclusive: might be *1.157x as slow* recursive: ?? 5.1ms +/- 15.4% 5.9ms +/- 20.1% not conclusive: might be *1.157x as slow* crypto: - 87.7ms +/- 17.3% 84.7ms +/- 3.6% aes: *1.031x as slow* 29.0ms +/- 1.6% 29.9ms +/- 4.0% significant md5: *1.073x as slow* 27.4ms +/- 2.2% 29.4ms +/- 9.5% significant sha1: - 31.3ms +/- 47.1% 25.4ms +/- 2.4% date: - 268.8ms +/- 2.0% 269.0ms +/- 1.4% format-tofte: - 111.2ms +/- 4.9% 110.7ms +/- 1.6% format-xparb: ?? 157.6ms +/- 0.4% 158.3ms +/- 2.2% not conclusive: might be *1.004x as slow* math: - 147.7ms +/- 4.1% 145.6ms +/- 1.4% cordic: ?? 69.9ms +/- 0.9% 70.9ms +/- 1.9% not conclusive: might be *1.014x as slow* partial-sums: - 57.4ms +/- 10.8% 54.7ms +/- 3.5% spectral-norm: - 20.4ms +/- 5.8% 20.0ms +/- 1.7% regexp: *1.011x as slow* 226.6ms +/- 0.3% 229.2ms +/- 0.4% significant dna: *1.011x as slow* 226.6ms +/- 0.3% 229.2ms +/- 0.4% significant string: 1.016x as fast 444.1ms +/- 0.8% 437.1ms +/- 1.4% significant base64: - 55.3ms +/- 1.7% 54.4ms +/- 4.5% fasta: ?? 43.1ms +/- 1.6% 43.4ms +/- 3.7% not conclusive: might be *1.007x as slow* tagcloud: 1.017x as fast 120.4ms +/- 0.8% 118.4ms +/- 0.9% significant unpack-code: 1.035x as fast 163.6ms +/- 1.8% 158.0ms +/- 2.7% significant validate-input: ?? 61.7ms +/- 1.0% 62.9ms +/- 5.3% not conclusive: might be *1.019x as slow*
David Kilzer (:ddkilzer)
Comment 15 2008-10-28 04:20:57 PDT
(In reply to comment #13) > Please report the results of running SunSpider with and without this patch. Thanks! > The patch was attached to Bug 12638 as Attachment #23342 [details] [edit]. Please repost the patch to this bug and set the "review?" flag to have it reviewed. See this page for more details: http://webkit.org/coding/contributing.html
Derk-Jan Hartman
Comment 16 2008-10-30 14:09:27 PDT
it seems this ticket is stale, even though there appears to be a working patch, can someone please update it, take the necessary steps ? (/me has not idea what those are)
Erik Corry
Comment 17 2009-01-22 02:55:31 PST
New versions of JSC are not using the PCRE-based library for this regexp any more so my patch no longer works. I guess WREC has a similar limitation that prevents this huge regexp from working. It works fine in tracemonkey and tip-of-tree V8. I have no plans to debug WREC to find the problem there so the current state of affairs is that there is no patch to fix this problem.
Vicki Murley
Comment 18 2009-02-19 10:50:56 PST
Geoffrey Garen
Comment 19 2009-03-19 13:15:09 PDT
Geoffrey Garen
Comment 20 2009-03-19 13:23:59 PDT
Created attachment 28762 [details] patch Fixed a layout test that I missed the first time around.
Geoffrey Garen
Comment 21 2009-03-19 13:33:30 PDT
Committed revision 41842.
Note You need to log in before you can comment on or make changes to this bug.