Bug 14873 - REGRESSION (Safari 4): regular expression pattern size limit lower than Safari 3.2, other browsers, breaks SAP
Summary: REGRESSION (Safari 4): regular expression pattern size limit lower than Safar...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Geoffrey Garen
URL: http://en.wikipedia.org/wiki/User:Lup...
Keywords: InRadar, NeedsReduction
Depends on:
Blocks:
 
Reported: 2007-08-03 12:58 PDT by Derk-Jan Hartman
Modified: 2009-03-19 13:33 PDT (History)
7 users (show)

See Also:


Attachments
Test maximum regex length (1.05 KB, text/html)
2007-08-12 14:20 PDT, David Kilzer (:ddkilzer)
no flags Details
patch (7.40 KB, patch)
2009-03-19 13:15 PDT, Geoffrey Garen
no flags Details | Formatted Diff | Diff
patch (8.02 KB, patch)
2009-03-19 13:23 PDT, Geoffrey Garen
sam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Derk-Jan Hartman 2007-08-03 12:58:50 PDT
This link is a set of badwords that get fed to Regexp, in a Javascript antivandal tool. The list is soo enormous that Safari hits its technical limit for the length of regular expressions. Not that strange, but bdash told me to report it regardless. (PS. FF 2.* does handle this RegExp

(PS ERR20 of the regexp lib that is being used.)
Comment 1 Alexey Proskuryakov 2007-08-06 06:11:16 PDT
How does one reproduce the problem? Just opening the bug URL doesn't cause any errors.
Comment 2 Derk-Jan Hartman 2007-08-09 15:25:37 PDT
To reproduce:
Register in Wikipedia.
Open en.wikipedia.org/wiki/User:yourusername/monobook.js

Add to this page the following line:
importScript("User:Lupin/recent2.js");

Now go to either of the following links:
http://en.wikipedia.org/wiki/User:Lupin/Filter_recent_changes
http://en.wikipedia.org/wiki/User:Lupin/All_recent_changes
http://en.wikipedia.org/wiki/User:Lupin/Recent_IP_edits
http://en.wikipedia.org/wiki/User:Lupin/Monitor_my_watchlist
http://en.wikipedia.org/wiki/User:Lupin/Live_spellcheck

These are tools that check the recent changes of specific sets of pages in wikipedia for words and other signs of vandalism (blanking). The bad words list is used by at least "filter recent changes".
Comment 3 David Kilzer (:ddkilzer) 2007-08-10 06:58:52 PDT
Confirmed with a local debug build of WebKit r24922 with Safari 3 Public Beta v. 3.0.3 (522.12.1) on Mac OS X 10.4.10 (8R218).

Console output:
(event handler):Invalid regular expression: regular expression too large

JavaScript Console:
Invalid regular expression: regular expression too large
http://en.wikipedia.org/w/index.php?title=User%3ALupin%2Frecent2.js&action=raw&ctype=text/javascript
Line: 120

Comment 4 David Kilzer (:ddkilzer) 2007-08-10 07:02:17 PDT
NOTE: After following the steps in Comment #2, you may have to reload the page twice.

Safari 2.0.4 (419.3) with original WebKit on Mac OS X 10.4.10 (8R218) apparently silently fails.  It adds the orange "update" rectangles to the screen output, but there is no content with them, and no error message.  This is not a regression.

Comment 5 Derk-Jan Hartman 2007-08-12 12:43:29 PDT
I did some looking around myself after this and found the following:

This error translates to ERR20 of the regexp lib and is triggered by a size check (in pcre/pcre_compile.c) against the define MAX_PATTERN_SIZE

MAX_PATTERN_SIZE itself is defined in pcre/pcre_internal.h
This define is dependant on the define LINK_SIZE in the same file.

Normally the max size is 64K for a regexp, but if you set LINK_SIZE higher (3 or 4 instead of 2), then the pcre lib can actually handle higher sizes than that. LINK_SIZE can be defined on the compile commandline according to the header file.

I hope to have saved someone some time and good luck :D
Comment 6 David Kilzer (:ddkilzer) 2007-08-12 12:50:53 PDT
It would be interesting to know what the maximum size of a regular expression in Firefox and MSIE 6/7 are.  I imagine you could simple create a gigantic regex, then try to use it until you get an exception.

Comment 7 David Kilzer (:ddkilzer) 2007-08-12 14:20:38 PDT
Created attachment 15940 [details]
Test maximum regex length

This file tests the maximum length that a regular expression may contain in a browser.  My results:

Safari 3 Public Beta v. 3.0.3 (522.12.1) with WebKit r25023 debug build:  32767
Opera 9.22:  2097151

Could someone test MSIE 6 and 7?
Comment 8 David Kilzer (:ddkilzer) 2007-08-12 14:26:48 PDT
(In reply to comment #7)
> Created an attachment (id=15940) [edit]
> Test maximum regex length

DO NOT try this on Safari 2!  It doesn't throw an exception when the regex is too big, so there's no way for the script to know when it's hit a limit.  (Maybe checking the value of r would work, though?)

Comment 9 Derk-Jan Hartman 2007-08-16 08:04:57 PDT
(In reply to comment #7)
> Could someone test MSIE 6 and 7?

MSIE 6 on WinXP on my iMac Intel parallels install:
limit: 67108863
Comment 10 Claus Conrad 2008-01-29 16:01:25 PST
(In reply to comment #9)
> MSIE 6 on WinXP on my iMac Intel parallels install:
> limit: 67108863

Same limit on my MSIE 7 on WinXP on Macbook Intel Parallels.
Comment 11 Darin Adler 2008-01-30 05:17:39 PST
It's easy to make maximum regular expression length much longer. We can change the links to be 32 bits rather than 16 bits. This makes compiled regular expressions larger and also will make matching slower. This is equivalent to setting LINK_SIZE to 4 in the original PCRE source.

As part of improving JavaScript performance, we were looking at some optimizations that would make compiled regular expressions larger than they are today.

One of these ideas was to make the compiled regular expressions use a pointer for each opcode instead of a byte code. The idea was that this was affordable because regular expressions are usually very small. I suppose this optimization is a bad idea if there are some huge regular expressions to be processed. But on the other hand, this would cause us to change the link size to be the same size as a pointer, which would eliminate arbitrary size limit on the regular expression entirely. So it would blow up the size of compiled regular expressions, but eliminate the 64K limit.
Comment 12 Erik Corry 2008-09-11 02:09:15 PDT
I have a patch for this.  It's against V8's copy of JSCRE so it may not go in cleanly to the head.  I can't see any performance regressions from it, strangely enough.  It runs the tests I have thrown at it so far including the Mozilla tests that used to fail due to large regexps.  However I don't think the Mozilla tests test very thoroughly since they passed with the larger size limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to actually work.
Comment 13 David Kilzer (:ddkilzer) 2008-09-11 06:11:19 PDT
(In reply to comment #12)
> I have a patch for this.  It's against V8's copy of JSCRE so it may not go in
> cleanly to the head.  I can't see any performance regressions from it,
> strangely enough.  It runs the tests I have thrown at it so far including the
> Mozilla tests that used to fail due to large regexps.  However I don't think
> the Mozilla tests test very thoroughly since they passed with the larger size
> limit even before I fixed putLinkValueAllowZero and GetLinkValueAllowZero to
> actually work.

The patch was attached to Bug 12638 as Attachment #23342 [details].

Please report the results of running SunSpider with and without this patch.
Comment 14 Erik Corry 2008-10-28 01:55:45 PDT
$ ./sunspider-compare-results --shell=/auto/JavaScriptV8/Archive/611/bin/v8 v8-611.txt v8-611-large-regexps.txt                                              


TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           -                 1513.6ms +/- 2.9%   1500.7ms +/- 0.8% 

=============================================================================

  3d:                  -                  161.0ms +/- 10.8%    152.9ms +/- 1.4% 
    cube:              ??                  39.9ms +/- 3.4%     40.3ms +/- 3.1%     not conclusive: might be *1.010x as slow*
    morph:             -                   68.2ms +/- 23.6%     60.0ms +/- 1.1% 
    raytrace:          -                   52.9ms +/- 0.8%     52.6ms +/- 1.1% 

  access:              ??                  98.7ms +/- 2.5%    101.1ms +/- 3.1%     not conclusive: might be *1.024x as slow*
    binary-trees:      ??                   6.9ms +/- 13.3%      7.2ms +/- 18.0%     not conclusive: might be *1.043x as slow*
    fannkuch:          -                   34.1ms +/- 3.5%     33.9ms +/- 2.1% 
    nbody:             *1.065x as slow*    33.8ms +/- 1.7%     36.0ms +/- 9.0%     significant
    nsieve:            ??                  23.9ms +/- 5.5%     24.0ms +/- 2.0%     not conclusive: might be *1.004x as slow*

  bitops:              ??                  73.9ms +/- 1.8%     75.2ms +/- 2.3%     not conclusive: might be *1.018x as slow*
    3bit-bits-in-byte: ??                   6.3ms +/- 5.5%      6.4ms +/- 7.8%     not conclusive: might be *1.016x as slow*
    bits-in-byte:      ??                  13.8ms +/- 3.3%     13.9ms +/- 2.9%     not conclusive: might be *1.007x as slow*
    bitwise-and:       ??                  19.2ms +/- 3.8%     20.1ms +/- 5.4%     not conclusive: might be *1.047x as slow*
    nsieve-bits:       ??                  34.6ms +/- 1.4%     34.8ms +/- 1.6%     not conclusive: might be *1.006x as slow*

  controlflow:         ??                   5.1ms +/- 15.4%      5.9ms +/- 20.1%     not conclusive: might be *1.157x as slow*
    recursive:         ??                   5.1ms +/- 15.4%      5.9ms +/- 20.1%     not conclusive: might be *1.157x as slow*

  crypto:              -                   87.7ms +/- 17.3%     84.7ms +/- 3.6% 
    aes:               *1.031x as slow*    29.0ms +/- 1.6%     29.9ms +/- 4.0%     significant
    md5:               *1.073x as slow*    27.4ms +/- 2.2%     29.4ms +/- 9.5%     significant
    sha1:              -                   31.3ms +/- 47.1%     25.4ms +/- 2.4% 

  date:                -                  268.8ms +/- 2.0%    269.0ms +/- 1.4% 
    format-tofte:      -                  111.2ms +/- 4.9%    110.7ms +/- 1.6% 
    format-xparb:      ??                 157.6ms +/- 0.4%    158.3ms +/- 2.2%     not conclusive: might be *1.004x as slow*

  math:                -                  147.7ms +/- 4.1%    145.6ms +/- 1.4% 
    cordic:            ??                  69.9ms +/- 0.9%     70.9ms +/- 1.9%     not conclusive: might be *1.014x as slow*
    partial-sums:      -                   57.4ms +/- 10.8%     54.7ms +/- 3.5% 
    spectral-norm:     -                   20.4ms +/- 5.8%     20.0ms +/- 1.7% 

  regexp:              *1.011x as slow*   226.6ms +/- 0.3%    229.2ms +/- 0.4%     significant
    dna:               *1.011x as slow*   226.6ms +/- 0.3%    229.2ms +/- 0.4%     significant

  string:              1.016x as fast     444.1ms +/- 0.8%    437.1ms +/- 1.4%     significant
    base64:            -                   55.3ms +/- 1.7%     54.4ms +/- 4.5% 
    fasta:             ??                  43.1ms +/- 1.6%     43.4ms +/- 3.7%     not conclusive: might be *1.007x as slow*
    tagcloud:          1.017x as fast     120.4ms +/- 0.8%    118.4ms +/- 0.9%     significant
    unpack-code:       1.035x as fast     163.6ms +/- 1.8%    158.0ms +/- 2.7%     significant
    validate-input:    ??                  61.7ms +/- 1.0%     62.9ms +/- 5.3%     not conclusive: might be *1.019x as slow*
Comment 15 David Kilzer (:ddkilzer) 2008-10-28 04:20:57 PDT
(In reply to comment #13)
> Please report the results of running SunSpider with and without this patch.

Thanks!

> The patch was attached to Bug 12638 as Attachment #23342 [details] [edit].

Please repost the patch to this bug and set the "review?" flag to have it reviewed.  See this page for more details:

http://webkit.org/coding/contributing.html
Comment 16 Derk-Jan Hartman 2008-10-30 14:09:27 PDT
it seems this ticket is stale, even though there appears to be a working patch, can someone please update it, take the necessary steps ? (/me has not idea what those are)
Comment 17 Erik Corry 2009-01-22 02:55:31 PST
New versions of JSC are not using the PCRE-based library for this regexp any more so my patch no longer works.  I guess WREC has a similar limitation that prevents this huge regexp from working.  It works fine in tracemonkey and tip-of-tree V8.  I have no plans to debug WREC to find the problem there so the current state of affairs is that there is no patch to fix this problem.
Comment 18 Vicki Murley 2009-02-19 10:50:56 PST
In Radar <rdar://problem/6603562> 
Comment 19 Geoffrey Garen 2009-03-19 13:15:09 PDT
Created attachment 28761 [details]
patch
Comment 20 Geoffrey Garen 2009-03-19 13:23:59 PDT
Created attachment 28762 [details]
patch

Fixed a layout test that I missed the first time around.
Comment 21 Geoffrey Garen 2009-03-19 13:33:30 PDT
Committed revision 41842.