WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 14873
REGRESSION (Safari 4): regular expression pattern size limit lower than Safari 3.2, other browsers, breaks SAP
https://bugs.webkit.org/show_bug.cgi?id=14873
Summary
REGRESSION (Safari 4): regular expression pattern size limit lower than Safar...
Derk-Jan Hartman
Reported
2007-08-03 12:58:50 PDT
This link is a set of badwords that get fed to Regexp, in a Javascript antivandal tool. The list is soo enormous that Safari hits its technical limit for the length of regular expressions. Not that strange, but bdash told me to report it regardless. (PS. FF 2.* does handle this RegExp (PS ERR20 of the regexp lib that is being used.)
Attachments
Test maximum regex length
(1.05 KB, text/html)
2007-08-12 14:20 PDT
,
David Kilzer (:ddkilzer)
no flags
Details
patch
(7.40 KB, patch)
2009-03-19 13:15 PDT
,
Geoffrey Garen
no flags
Details
Formatted Diff
Diff
patch
(8.02 KB, patch)
2009-03-19 13:23 PDT
,
Geoffrey Garen
sam
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2007-08-06 06:11:16 PDT
How does one reproduce the problem? Just opening the bug URL doesn't cause any errors.
Derk-Jan Hartman
Comment 2
2007-08-09 15:25:37 PDT
To reproduce: Register in Wikipedia. Open en.wikipedia.org/wiki/User:yourusername/monobook.js Add to this page the following line: importScript("User:Lupin/recent2.js"); Now go to either of the following links:
http://en.wikipedia.org/wiki/User:Lupin/Filter_recent_changes
http://en.wikipedia.org/wiki/User:Lupin/All_recent_changes
http://en.wikipedia.org/wiki/User:Lupin/Recent_IP_edits
http://en.wikipedia.org/wiki/User:Lupin/Monitor_my_watchlist
http://en.wikipedia.org/wiki/User:Lupin/Live_spellcheck
These are tools that check the recent changes of specific sets of pages in wikipedia for words and other signs of vandalism (blanking). The bad words list is used by at least "filter recent changes".
David Kilzer (:ddkilzer)
Comment 3
2007-08-10 06:58:52 PDT
Confirmed with a local debug build of WebKit
r24922
with Safari 3 Public Beta v. 3.0.3 (522.12.1) on Mac OS X 10.4.10 (8R218). Console output: (event handler):Invalid regular expression: regular expression too large JavaScript Console: Invalid regular expression: regular expression too large
http://en.wikipedia.org/w/index.php?title=User%3ALupin%2Frecent2.js&action=raw&ctype=text/javascript
Line: 120
David Kilzer (:ddkilzer)
Comment 4
2007-08-10 07:02:17 PDT
NOTE: After following the steps in
Comment #2
, you may have to reload the page twice. Safari 2.0.4 (419.3) with original WebKit on Mac OS X 10.4.10 (8R218) apparently silently fails. It adds the orange "update" rectangles to the screen output, but there is no content with them, and no error message. This is not a regression.
Derk-Jan Hartman
Comment 5
2007-08-12 12:43:29 PDT
I did some looking around myself after this and found the following: This error translates to ERR20 of the regexp lib and is triggered by a size check (in pcre/pcre_compile.c) against the define MAX_PATTERN_SIZE MAX_PATTERN_SIZE itself is defined in pcre/pcre_internal.h This define is dependant on the define LINK_SIZE in the same file. Normally the max size is 64K for a regexp, but if you set LINK_SIZE higher (3 or 4 instead of 2), then the pcre lib can actually handle higher sizes than that. LINK_SIZE can be defined on the compile commandline according to the header file. I hope to have saved someone some time and good luck :D
David Kilzer (:ddkilzer)
Comment 6
2007-08-12 12:50:53 PDT
It would be interesting to know what the maximum size of a regular expression in Firefox and MSIE 6/7 are. I imagine you could simple create a gigantic regex, then try to use it until you get an exception.
David Kilzer (:ddkilzer)
Comment 7
2007-08-12 14:20:38 PDT
Created
attachment 15940
[details]
Test maximum regex length This file tests the maximum length that a regular expression may contain in a browser. My results: Safari 3 Public Beta v. 3.0.3 (522.12.1) with WebKit
r25023
debug build: 32767 Opera 9.22: 2097151 Could someone test MSIE 6 and 7?
David Kilzer (:ddkilzer)
Comment 8
2007-08-12 14:26:48 PDT
(In reply to
comment #7
)
> Created an attachment (id=15940) [edit] > Test maximum regex length
DO NOT try this on Safari 2! It doesn't throw an exception when the regex is too big, so there's no way for the script to know when it's hit a limit. (Maybe checking the value of r would work, though?)
Derk-Jan Hartman
Comment 9
2007-08-16 08:04:57 PDT
(In reply to
comment #7
)
> Could someone test MSIE 6 and 7?
MSIE 6 on WinXP on my iMac Intel parallels install: limit: 67108863
Claus Conrad
Comment 10
2008-01-29 16:01:25 PST
(In reply to
comment #9
)
> MSIE 6 on WinXP on my iMac Intel parallels install: > limit: 67108863
Same limit on my MSIE 7 on WinXP on Macbook Intel Parallels.
Darin Adler
Comment 11
2008-01-30 05:17:39 PST
It's easy to make maximum regular expression length much longer. We can change the links to be 32 bits rather than 16 bits. This makes compiled regular expressions larger and also will make matching slower. This is equivalent to setting LINK_SIZE to 4 in the original PCRE source. As part of improving JavaScript performance, we were looking at some optimizations that would make compiled regular expressions larger than they are today. One of these ideas was to make the compiled regular expressions use a pointer for each opcode instead of a byte code. The idea was that this was affordable because regular expressions are usually very small. I suppose this optimization is a bad idea if there are some huge regular expressions to be processed. But on the other hand, this would cause us to change the link size to be the same size as a pointer, which would eliminate arbitrary size limit on the regular expression entirely. So it would blow up the size of compiled regular expressions, but eliminate the 64K limit.
Erik Corry
Comment 12
2008-09-11 02:09:15 PDT
I have a patch for this. It's against V8's copy of JSCRE so it may not go in cleanly to the head. I can't see any performance regressions from it, strangely enough. It runs the tests I have thrown at it so far including the Mozilla tests that used to fail due to large regexps. However I don't think the Mozilla tests test very thoroughly since they passed with the larger size limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to actually work.
David Kilzer (:ddkilzer)
Comment 13
2008-09-11 06:11:19 PDT
(In reply to
comment #12
)
> I have a patch for this. It's against V8's copy of JSCRE so it may not go in > cleanly to the head. I can't see any performance regressions from it, > strangely enough. It runs the tests I have thrown at it so far including the > Mozilla tests that used to fail due to large regexps. However I don't think > the Mozilla tests test very thoroughly since they passed with the larger size > limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to > actually work.
The patch was attached to
Bug 12638
as
Attachment #23342
[details]
. Please report the results of running SunSpider with and without this patch.
Erik Corry
Comment 14
2008-10-28 01:55:45 PDT
$ ./sunspider-compare-results --shell=/auto/JavaScriptV8/Archive/611/bin/v8 v8-611.txt v8-611-large-regexps.txt TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 1513.6ms +/- 2.9% 1500.7ms +/- 0.8% ============================================================================= 3d: - 161.0ms +/- 10.8% 152.9ms +/- 1.4% cube: ?? 39.9ms +/- 3.4% 40.3ms +/- 3.1% not conclusive: might be *1.010x as slow* morph: - 68.2ms +/- 23.6% 60.0ms +/- 1.1% raytrace: - 52.9ms +/- 0.8% 52.6ms +/- 1.1% access: ?? 98.7ms +/- 2.5% 101.1ms +/- 3.1% not conclusive: might be *1.024x as slow* binary-trees: ?? 6.9ms +/- 13.3% 7.2ms +/- 18.0% not conclusive: might be *1.043x as slow* fannkuch: - 34.1ms +/- 3.5% 33.9ms +/- 2.1% nbody: *1.065x as slow* 33.8ms +/- 1.7% 36.0ms +/- 9.0% significant nsieve: ?? 23.9ms +/- 5.5% 24.0ms +/- 2.0% not conclusive: might be *1.004x as slow* bitops: ?? 73.9ms +/- 1.8% 75.2ms +/- 2.3% not conclusive: might be *1.018x as slow* 3bit-bits-in-byte: ?? 6.3ms +/- 5.5% 6.4ms +/- 7.8% not conclusive: might be *1.016x as slow* bits-in-byte: ?? 13.8ms +/- 3.3% 13.9ms +/- 2.9% not conclusive: might be *1.007x as slow* bitwise-and: ?? 19.2ms +/- 3.8% 20.1ms +/- 5.4% not conclusive: might be *1.047x as slow* nsieve-bits: ?? 34.6ms +/- 1.4% 34.8ms +/- 1.6% not conclusive: might be *1.006x as slow* controlflow: ?? 5.1ms +/- 15.4% 5.9ms +/- 20.1% not conclusive: might be *1.157x as slow* recursive: ?? 5.1ms +/- 15.4% 5.9ms +/- 20.1% not conclusive: might be *1.157x as slow* crypto: - 87.7ms +/- 17.3% 84.7ms +/- 3.6% aes: *1.031x as slow* 29.0ms +/- 1.6% 29.9ms +/- 4.0% significant md5: *1.073x as slow* 27.4ms +/- 2.2% 29.4ms +/- 9.5% significant sha1: - 31.3ms +/- 47.1% 25.4ms +/- 2.4% date: - 268.8ms +/- 2.0% 269.0ms +/- 1.4% format-tofte: - 111.2ms +/- 4.9% 110.7ms +/- 1.6% format-xparb: ?? 157.6ms +/- 0.4% 158.3ms +/- 2.2% not conclusive: might be *1.004x as slow* math: - 147.7ms +/- 4.1% 145.6ms +/- 1.4% cordic: ?? 69.9ms +/- 0.9% 70.9ms +/- 1.9% not conclusive: might be *1.014x as slow* partial-sums: - 57.4ms +/- 10.8% 54.7ms +/- 3.5% spectral-norm: - 20.4ms +/- 5.8% 20.0ms +/- 1.7% regexp: *1.011x as slow* 226.6ms +/- 0.3% 229.2ms +/- 0.4% significant dna: *1.011x as slow* 226.6ms +/- 0.3% 229.2ms +/- 0.4% significant string: 1.016x as fast 444.1ms +/- 0.8% 437.1ms +/- 1.4% significant base64: - 55.3ms +/- 1.7% 54.4ms +/- 4.5% fasta: ?? 43.1ms +/- 1.6% 43.4ms +/- 3.7% not conclusive: might be *1.007x as slow* tagcloud: 1.017x as fast 120.4ms +/- 0.8% 118.4ms +/- 0.9% significant unpack-code: 1.035x as fast 163.6ms +/- 1.8% 158.0ms +/- 2.7% significant validate-input: ?? 61.7ms +/- 1.0% 62.9ms +/- 5.3% not conclusive: might be *1.019x as slow*
David Kilzer (:ddkilzer)
Comment 15
2008-10-28 04:20:57 PDT
(In reply to
comment #13
)
> Please report the results of running SunSpider with and without this patch.
Thanks!
> The patch was attached to
Bug 12638
as
Attachment #23342
[details]
[edit].
Please repost the patch to this bug and set the "review?" flag to have it reviewed. See this page for more details:
http://webkit.org/coding/contributing.html
Derk-Jan Hartman
Comment 16
2008-10-30 14:09:27 PDT
it seems this ticket is stale, even though there appears to be a working patch, can someone please update it, take the necessary steps ? (/me has not idea what those are)
Erik Corry
Comment 17
2009-01-22 02:55:31 PST
New versions of JSC are not using the PCRE-based library for this regexp any more so my patch no longer works. I guess WREC has a similar limitation that prevents this huge regexp from working. It works fine in tracemonkey and tip-of-tree V8. I have no plans to debug WREC to find the problem there so the current state of affairs is that there is no patch to fix this problem.
Vicki Murley
Comment 18
2009-02-19 10:50:56 PST
In Radar <
rdar://problem/6603562
>
Geoffrey Garen
Comment 19
2009-03-19 13:15:09 PDT
Created
attachment 28761
[details]
patch
Geoffrey Garen
Comment 20
2009-03-19 13:23:59 PDT
Created
attachment 28762
[details]
patch Fixed a layout test that I missed the first time around.
Geoffrey Garen
Comment 21
2009-03-19 13:33:30 PDT
Committed revision 41842.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug