Bug 14873 - REGRESSION (Safari 4): regular expression pattern size limit lower than Safari 3.2, other browsers, breaks SAP
: REGRESSION (Safari 4): regular expression pattern size limit lower than Safar...
Status: RESOLVED FIXED
: WebKit
JavaScriptCore
: 523.x (Safari 3)
: Macintosh Mac OS X 10.4
: P2 Normal
Assigned To:
: http://en.wikipedia.org/wiki/User:Lup...
: InRadar, NeedsReduction
:
:
  Show dependency treegraph
 
Reported: 2007-08-03 12:58 PST by
Modified: 2009-03-19 13:33 PST (History)


Attachments
Test maximum regex length (1.05 KB, text/html)
2007-08-12 14:20 PST, David Kilzer (:ddkilzer)
no flags Details
patch (7.40 KB, patch)
2009-03-19 13:15 PST, Geoffrey Garen
no flags Review Patch | Details | Formatted Diff | Diff
patch (8.02 KB, patch)
2009-03-19 13:23 PST, Geoffrey Garen
sam: review+
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2007-08-03 12:58:50 PST
This link is a set of badwords that get fed to Regexp, in a Javascript antivandal tool. The list is soo enormous that Safari hits its technical limit for the length of regular expressions. Not that strange, but bdash told me to report it regardless. (PS. FF 2.* does handle this RegExp

(PS ERR20 of the regexp lib that is being used.)
------- Comment #1 From 2007-08-06 06:11:16 PST -------
How does one reproduce the problem? Just opening the bug URL doesn't cause any errors.
------- Comment #2 From 2007-08-09 15:25:37 PST -------
To reproduce:
Register in Wikipedia.
Open en.wikipedia.org/wiki/User:yourusername/monobook.js

Add to this page the following line:
importScript("User:Lupin/recent2.js");

Now go to either of the following links:
http://en.wikipedia.org/wiki/User:Lupin/Filter_recent_changes
http://en.wikipedia.org/wiki/User:Lupin/All_recent_changes
http://en.wikipedia.org/wiki/User:Lupin/Recent_IP_edits
http://en.wikipedia.org/wiki/User:Lupin/Monitor_my_watchlist
http://en.wikipedia.org/wiki/User:Lupin/Live_spellcheck

These are tools that check the recent changes of specific sets of pages in wikipedia for words and other signs of vandalism (blanking). The bad words list is used by at least "filter recent changes".
------- Comment #3 From 2007-08-10 06:58:52 PST -------
Confirmed with a local debug build of WebKit r24922 with Safari 3 Public Beta v. 3.0.3 (522.12.1) on Mac OS X 10.4.10 (8R218).

Console output:
(event handler):Invalid regular expression: regular expression too large

JavaScript Console:
Invalid regular expression: regular expression too large
http://en.wikipedia.org/w/index.php?title=User%3ALupin%2Frecent2.js&action=raw&ctype=text/javascript
Line: 120
------- Comment #4 From 2007-08-10 07:02:17 PST -------
NOTE: After following the steps in Comment #2, you may have to reload the page twice.

Safari 2.0.4 (419.3) with original WebKit on Mac OS X 10.4.10 (8R218) apparently silently fails.  It adds the orange "update" rectangles to the screen output, but there is no content with them, and no error message.  This is not a regression.
------- Comment #5 From 2007-08-12 12:43:29 PST -------
I did some looking around myself after this and found the following:

This error translates to ERR20 of the regexp lib and is triggered by a size check (in pcre/pcre_compile.c) against the define MAX_PATTERN_SIZE

MAX_PATTERN_SIZE itself is defined in pcre/pcre_internal.h
This define is dependant on the define LINK_SIZE in the same file.

Normally the max size is 64K for a regexp, but if you set LINK_SIZE higher (3 or 4 instead of 2), then the pcre lib can actually handle higher sizes than that. LINK_SIZE can be defined on the compile commandline according to the header file.

I hope to have saved someone some time and good luck :D
------- Comment #6 From 2007-08-12 12:50:53 PST -------
It would be interesting to know what the maximum size of a regular expression in Firefox and MSIE 6/7 are.  I imagine you could simple create a gigantic regex, then try to use it until you get an exception.
------- Comment #7 From 2007-08-12 14:20:38 PST -------
Created an attachment (id=15940) [details]
Test maximum regex length

This file tests the maximum length that a regular expression may contain in a browser.  My results:

Safari 3 Public Beta v. 3.0.3 (522.12.1) with WebKit r25023 debug build:  32767
Opera 9.22:  2097151

Could someone test MSIE 6 and 7?
------- Comment #8 From 2007-08-12 14:26:48 PST -------
(In reply to comment #7)
> Created an attachment (id=15940) [edit] [details]
> Test maximum regex length

DO NOT try this on Safari 2!  It doesn't throw an exception when the regex is too big, so there's no way for the script to know when it's hit a limit.  (Maybe checking the value of r would work, though?)
------- Comment #9 From 2007-08-16 08:04:57 PST -------
(In reply to comment #7)
> Could someone test MSIE 6 and 7?

MSIE 6 on WinXP on my iMac Intel parallels install:
limit: 67108863
------- Comment #10 From 2008-01-29 16:01:25 PST -------
(In reply to comment #9)
> MSIE 6 on WinXP on my iMac Intel parallels install:
> limit: 67108863

Same limit on my MSIE 7 on WinXP on Macbook Intel Parallels.
------- Comment #11 From 2008-01-30 05:17:39 PST -------
It's easy to make maximum regular expression length much longer. We can change the links to be 32 bits rather than 16 bits. This makes compiled regular expressions larger and also will make matching slower. This is equivalent to setting LINK_SIZE to 4 in the original PCRE source.

As part of improving JavaScript performance, we were looking at some optimizations that would make compiled regular expressions larger than they are today.

One of these ideas was to make the compiled regular expressions use a pointer for each opcode instead of a byte code. The idea was that this was affordable because regular expressions are usually very small. I suppose this optimization is a bad idea if there are some huge regular expressions to be processed. But on the other hand, this would cause us to change the link size to be the same size as a pointer, which would eliminate arbitrary size limit on the regular expression entirely. So it would blow up the size of compiled regular expressions, but eliminate the 64K limit.
------- Comment #12 From 2008-09-11 02:09:15 PST -------
I have a patch for this.  It's against V8's copy of JSCRE so it may not go in cleanly to the head.  I can't see any performance regressions from it, strangely enough.  It runs the tests I have thrown at it so far including the Mozilla tests that used to fail due to large regexps.  However I don't think the Mozilla tests test very thoroughly since they passed with the larger size limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to actually work.
------- Comment #13 From 2008-09-11 06:11:19 PST -------
(In reply to comment #12)
> I have a patch for this.  It's against V8's copy of JSCRE so it may not go in
> cleanly to the head.  I can't see any performance regressions from it,
> strangely enough.  It runs the tests I have thrown at it so far including the
> Mozilla tests that used to fail due to large regexps.  However I don't think
> the Mozilla tests test very thoroughly since they passed with the larger size
> limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to
> actually work.

The patch was attached to Bug 12638 as Attachment #23342 [details].

Please report the results of running SunSpider with and without this patch.
------- Comment #14 From 2008-10-28 01:55:45 PST -------
$ ./sunspider-compare-results --shell=/auto/JavaScriptV8/Archive/611/bin/v8 v8-611.txt v8-611-large-regexps.txt                                              


TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           -                 1513.6ms +/- 2.9%   1500.7ms +/- 0.8% 

=============================================================================

  3d:                  -                  161.0ms +/- 10.8%    152.9ms +/- 1.4% 
    cube:              ??                  39.9ms +/- 3.4%     40.3ms +/- 3.1%     not conclusive: might be *1.010x as slow*
    morph:             -                   68.2ms +/- 23.6%     60.0ms +/- 1.1% 
    raytrace:          -                   52.9ms +/- 0.8%     52.6ms +/- 1.1% 

  access:              ??                  98.7ms +/- 2.5%    101.1ms +/- 3.1%     not conclusive: might be *1.024x as slow*
    binary-trees:      ??                   6.9ms +/- 13.3%      7.2ms +/- 18.0%     not conclusive: might be *1.043x as slow*
    fannkuch:          -                   34.1ms +/- 3.5%     33.9ms +/- 2.1% 
    nbody:             *1.065x as slow*    33.8ms +/- 1.7%     36.0ms +/- 9.0%     significant
    nsieve:            ??                  23.9ms +/- 5.5%     24.0ms +/- 2.0%     not conclusive: might be *1.004x as slow*

  bitops:              ??                  73.9ms +/- 1.8%     75.2ms +/- 2.3%     not conclusive: might be *1.018x as slow*
    3bit-bits-in-byte: ??                   6.3ms +/- 5.5%      6.4ms +/- 7.8%     not conclusive: might be *1.016x as slow*
    bits-in-byte:      ??                  13.8ms +/- 3.3%     13.9ms +/- 2.9%     not conclusive: might be *1.007x as slow*
    bitwise-and:       ??                  19.2ms +/- 3.8%     20.1ms +/- 5.4%     not conclusive: might be *1.047x as slow*
    nsieve-bits:       ??                  34.6ms +/- 1.4%     34.8ms +/- 1.6%     not conclusive: might be *1.006x as slow*

  controlflow:         ??                   5.1ms +/- 15.4%      5.9ms +/- 20.1%     not conclusive: might be *1.157x as slow*
    recursive:         ??                   5.1ms +/- 15.4%      5.9ms +/- 20.1%     not conclusive: might be *1.157x as slow*

  crypto:              -                   87.7ms +/- 17.3%     84.7ms +/- 3.6% 
    aes:               *1.031x as slow*    29.0ms +/- 1.6%     29.9ms +/- 4.0%     significant
    md5:               *1.073x as slow*    27.4ms +/- 2.2%     29.4ms +/- 9.5%     significant
    sha1:              -                   31.3ms +/- 47.1%     25.4ms +/- 2.4% 

  date:                -                  268.8ms +/- 2.0%    269.0ms +/- 1.4% 
    format-tofte:      -                  111.2ms +/- 4.9%    110.7ms +/- 1.6% 
    format-xparb:      ??                 157.6ms +/- 0.4%    158.3ms +/- 2.2%     not conclusive: might be *1.004x as slow*

  math:                -                  147.7ms +/- 4.1%    145.6ms +/- 1.4% 
    cordic:            ??                  69.9ms +/- 0.9%     70.9ms +/- 1.9%     not conclusive: might be *1.014x as slow*
    partial-sums:      -                   57.4ms +/- 10.8%     54.7ms +/- 3.5% 
    spectral-norm:     -                   20.4ms +/- 5.8%     20.0ms +/- 1.7% 

  regexp:              *1.011x as slow*   226.6ms +/- 0.3%    229.2ms +/- 0.4%     significant
    dna:               *1.011x as slow*   226.6ms +/- 0.3%    229.2ms +/- 0.4%     significant

  string:              1.016x as fast     444.1ms +/- 0.8%    437.1ms +/- 1.4%     significant
    base64:            -                   55.3ms +/- 1.7%     54.4ms +/- 4.5% 
    fasta:             ??                  43.1ms +/- 1.6%     43.4ms +/- 3.7%     not conclusive: might be *1.007x as slow*
    tagcloud:          1.017x as fast     120.4ms +/- 0.8%    118.4ms +/- 0.9%     significant
    unpack-code:       1.035x as fast     163.6ms +/- 1.8%    158.0ms +/- 2.7%     significant
    validate-input:    ??                  61.7ms +/- 1.0%     62.9ms +/- 5.3%     not conclusive: might be *1.019x as slow*
------- Comment #15 From 2008-10-28 04:20:57 PST -------
(In reply to comment #13)
> Please report the results of running SunSpider with and without this patch.

Thanks!

> The patch was attached to Bug 12638 as Attachment #23342 [details] [edit].

Please repost the patch to this bug and set the "review?" flag to have it reviewed.  See this page for more details:

http://webkit.org/coding/contributing.html
------- Comment #16 From 2008-10-30 14:09:27 PST -------
it seems this ticket is stale, even though there appears to be a working patch, can someone please update it, take the necessary steps ? (/me has not idea what those are)
------- Comment #17 From 2009-01-22 02:55:31 PST -------
New versions of JSC are not using the PCRE-based library for this regexp any more so my patch no longer works.  I guess WREC has a similar limitation that prevents this huge regexp from working.  It works fine in tracemonkey and tip-of-tree V8.  I have no plans to debug WREC to find the problem there so the current state of affairs is that there is no patch to fix this problem.
------- Comment #18 From 2009-02-19 10:50:56 PST -------
In Radar <rdar://problem/6603562> 
------- Comment #19 From 2009-03-19 13:15:09 PST -------
Created an attachment (id=28761) [details]
patch
------- Comment #20 From 2009-03-19 13:23:59 PST -------
Created an attachment (id=28762) [details]
patch

Fixed a layout test that I missed the first time around.
------- Comment #21 From 2009-03-19 13:33:30 PST -------
Committed revision 41842.