(PS ERR20 of the regexp lib that is being used.)
How does one reproduce the problem? Just opening the bug URL doesn't cause any errors.
Register in Wikipedia.
Add to this page the following line:
Now go to either of the following links:
These are tools that check the recent changes of specific sets of pages in wikipedia for words and other signs of vandalism (blanking). The bad words list is used by at least "filter recent changes".
Confirmed with a local debug build of WebKit r24922 with Safari 3 Public Beta v. 3.0.3 (522.12.1) on Mac OS X 10.4.10 (8R218).
(event handler):Invalid regular expression: regular expression too large
Invalid regular expression: regular expression too large
NOTE: After following the steps in Comment #2, you may have to reload the page twice.
Safari 2.0.4 (419.3) with original WebKit on Mac OS X 10.4.10 (8R218) apparently silently fails. It adds the orange "update" rectangles to the screen output, but there is no content with them, and no error message. This is not a regression.
I did some looking around myself after this and found the following:
This error translates to ERR20 of the regexp lib and is triggered by a size check (in pcre/pcre_compile.c) against the define MAX_PATTERN_SIZE
MAX_PATTERN_SIZE itself is defined in pcre/pcre_internal.h
This define is dependant on the define LINK_SIZE in the same file.
Normally the max size is 64K for a regexp, but if you set LINK_SIZE higher (3 or 4 instead of 2), then the pcre lib can actually handle higher sizes than that. LINK_SIZE can be defined on the compile commandline according to the header file.
I hope to have saved someone some time and good luck :D
It would be interesting to know what the maximum size of a regular expression in Firefox and MSIE 6/7 are. I imagine you could simple create a gigantic regex, then try to use it until you get an exception.
Created attachment 15940 [details]
Test maximum regex length
This file tests the maximum length that a regular expression may contain in a browser. My results:
Safari 3 Public Beta v. 3.0.3 (522.12.1) with WebKit r25023 debug build: 32767
Opera 9.22: 2097151
Could someone test MSIE 6 and 7?
(In reply to comment #7)
> Created an attachment (id=15940) 
> Test maximum regex length
DO NOT try this on Safari 2! It doesn't throw an exception when the regex is too big, so there's no way for the script to know when it's hit a limit. (Maybe checking the value of r would work, though?)
(In reply to comment #7)
> Could someone test MSIE 6 and 7?
MSIE 6 on WinXP on my iMac Intel parallels install:
(In reply to comment #9)
> MSIE 6 on WinXP on my iMac Intel parallels install:
> limit: 67108863
Same limit on my MSIE 7 on WinXP on Macbook Intel Parallels.
It's easy to make maximum regular expression length much longer. We can change the links to be 32 bits rather than 16 bits. This makes compiled regular expressions larger and also will make matching slower. This is equivalent to setting LINK_SIZE to 4 in the original PCRE source.
One of these ideas was to make the compiled regular expressions use a pointer for each opcode instead of a byte code. The idea was that this was affordable because regular expressions are usually very small. I suppose this optimization is a bad idea if there are some huge regular expressions to be processed. But on the other hand, this would cause us to change the link size to be the same size as a pointer, which would eliminate arbitrary size limit on the regular expression entirely. So it would blow up the size of compiled regular expressions, but eliminate the 64K limit.
I have a patch for this. It's against V8's copy of JSCRE so it may not go in cleanly to the head. I can't see any performance regressions from it, strangely enough. It runs the tests I have thrown at it so far including the Mozilla tests that used to fail due to large regexps. However I don't think the Mozilla tests test very thoroughly since they passed with the larger size limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to actually work.
(In reply to comment #12)
> I have a patch for this. It's against V8's copy of JSCRE so it may not go in
> cleanly to the head. I can't see any performance regressions from it,
> strangely enough. It runs the tests I have thrown at it so far including the
> Mozilla tests that used to fail due to large regexps. However I don't think
> the Mozilla tests test very thoroughly since they passed with the larger size
> limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to
> actually work.
The patch was attached to Bug 12638 as Attachment #23342 [details].
Please report the results of running SunSpider with and without this patch.
TEST COMPARISON FROM TO DETAILS
** TOTAL **: - 1513.6ms +/- 2.9% 1500.7ms +/- 0.8%
3d: - 161.0ms +/- 10.8% 152.9ms +/- 1.4%
cube: ?? 39.9ms +/- 3.4% 40.3ms +/- 3.1% not conclusive: might be *1.010x as slow*
morph: - 68.2ms +/- 23.6% 60.0ms +/- 1.1%
raytrace: - 52.9ms +/- 0.8% 52.6ms +/- 1.1%
access: ?? 98.7ms +/- 2.5% 101.1ms +/- 3.1% not conclusive: might be *1.024x as slow*
binary-trees: ?? 6.9ms +/- 13.3% 7.2ms +/- 18.0% not conclusive: might be *1.043x as slow*
fannkuch: - 34.1ms +/- 3.5% 33.9ms +/- 2.1%
nbody: *1.065x as slow* 33.8ms +/- 1.7% 36.0ms +/- 9.0% significant
nsieve: ?? 23.9ms +/- 5.5% 24.0ms +/- 2.0% not conclusive: might be *1.004x as slow*
bitops: ?? 73.9ms +/- 1.8% 75.2ms +/- 2.3% not conclusive: might be *1.018x as slow*
3bit-bits-in-byte: ?? 6.3ms +/- 5.5% 6.4ms +/- 7.8% not conclusive: might be *1.016x as slow*
bits-in-byte: ?? 13.8ms +/- 3.3% 13.9ms +/- 2.9% not conclusive: might be *1.007x as slow*
bitwise-and: ?? 19.2ms +/- 3.8% 20.1ms +/- 5.4% not conclusive: might be *1.047x as slow*
nsieve-bits: ?? 34.6ms +/- 1.4% 34.8ms +/- 1.6% not conclusive: might be *1.006x as slow*
controlflow: ?? 5.1ms +/- 15.4% 5.9ms +/- 20.1% not conclusive: might be *1.157x as slow*
recursive: ?? 5.1ms +/- 15.4% 5.9ms +/- 20.1% not conclusive: might be *1.157x as slow*
crypto: - 87.7ms +/- 17.3% 84.7ms +/- 3.6%
aes: *1.031x as slow* 29.0ms +/- 1.6% 29.9ms +/- 4.0% significant
md5: *1.073x as slow* 27.4ms +/- 2.2% 29.4ms +/- 9.5% significant
sha1: - 31.3ms +/- 47.1% 25.4ms +/- 2.4%
date: - 268.8ms +/- 2.0% 269.0ms +/- 1.4%
format-tofte: - 111.2ms +/- 4.9% 110.7ms +/- 1.6%
format-xparb: ?? 157.6ms +/- 0.4% 158.3ms +/- 2.2% not conclusive: might be *1.004x as slow*
math: - 147.7ms +/- 4.1% 145.6ms +/- 1.4%
cordic: ?? 69.9ms +/- 0.9% 70.9ms +/- 1.9% not conclusive: might be *1.014x as slow*
partial-sums: - 57.4ms +/- 10.8% 54.7ms +/- 3.5%
spectral-norm: - 20.4ms +/- 5.8% 20.0ms +/- 1.7%
regexp: *1.011x as slow* 226.6ms +/- 0.3% 229.2ms +/- 0.4% significant
dna: *1.011x as slow* 226.6ms +/- 0.3% 229.2ms +/- 0.4% significant
string: 1.016x as fast 444.1ms +/- 0.8% 437.1ms +/- 1.4% significant
base64: - 55.3ms +/- 1.7% 54.4ms +/- 4.5%
fasta: ?? 43.1ms +/- 1.6% 43.4ms +/- 3.7% not conclusive: might be *1.007x as slow*
tagcloud: 1.017x as fast 120.4ms +/- 0.8% 118.4ms +/- 0.9% significant
unpack-code: 1.035x as fast 163.6ms +/- 1.8% 158.0ms +/- 2.7% significant
validate-input: ?? 61.7ms +/- 1.0% 62.9ms +/- 5.3% not conclusive: might be *1.019x as slow*
(In reply to comment #13)
> Please report the results of running SunSpider with and without this patch.
> The patch was attached to Bug 12638 as Attachment #23342 [details] .
Please repost the patch to this bug and set the "review?" flag to have it reviewed. See this page for more details:
it seems this ticket is stale, even though there appears to be a working patch, can someone please update it, take the necessary steps ? (/me has not idea what those are)
New versions of JSC are not using the PCRE-based library for this regexp any more so my patch no longer works. I guess WREC has a similar limitation that prevents this huge regexp from working. It works fine in tracemonkey and tip-of-tree V8. I have no plans to debug WREC to find the problem there so the current state of affairs is that there is no patch to fix this problem.
In Radar <rdar://problem/6603562>
Created attachment 28761 [details]
Created attachment 28762 [details]
Fixed a layout test that I missed the first time around.
Committed revision 41842.