Bug 15861 - 5.8% of string-validate-input.js is spent creating RegExpImps
Summary: 5.8% of string-validate-input.js is spent creating RegExpImps
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Geoffrey Garen
URL:
Keywords:
Depends on:
Blocks: 15900 15902
  Show dependency treegraph
 
Reported: 2007-11-06 13:09 PST by Eric Seidel (no email)
Modified: 2007-11-08 15:50 PST (History)
2 users (show)

See Also:


Attachments
Patch1 (25.66 KB, patch)
2007-11-06 23:41 PST, Geoffrey Garen
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2007-11-06 13:09:21 PST
JavaScriptCore needs to pull constants out of loops

SpiderMonkey beats us on the "validate-input" test dramatically.
    validate-input:    58.2% *slower*   115.2ms +/- 1.2%    182.2ms +/- 0.7%     significant

I think much of that is due to them having some magical way to recognize constants and pull them out of for loops.

in doTest()

There is a "pattern" constant regexp, which we re-compile every time through the loop. If I hack the test to pull that out of the loop, we get a 14% speed increase in that one test:

    validate-input:    14.6% faster     162.8ms +/- 0.8%    139.0ms +/- 1.1%     significant
Comment 1 Eric Seidel (no email) 2007-11-06 13:12:21 PST
Another way to handle this specific case would be to just unique RegExp objects based on Identifier, then once any one of them is compiled, they all are.
Comment 2 Geoffrey Garen 2007-11-06 16:18:03 PST
I'm working on this. I think we can just store the compiled regexp in the node tree.
Comment 3 Maciej Stachowiak 2007-11-06 21:05:59 PST
Keep in mind that regexp literals are not truly constants, they create a new object every time and this difference can be observed. It should be safe to cache the compiled guts of the regexp, but not the JavaScript wrapper object.
Comment 4 Eric Seidel (no email) 2007-11-06 21:30:46 PST
It would be interesting to test this observation, and see what mozilla caches. :)
Comment 5 Maciej Stachowiak 2007-11-06 23:33:50 PST
This test program seems to show Mozilla caching the whole JS-level regexp object (we do not). I wonder what the standard says?

<script>
var a = "";

for (var i = 0; i < 5; i++) {
    var b = /a/;
    b.foo = "foo";
    alert(a.foo);
    a = b;
}

</script>
Comment 6 Maciej Stachowiak 2007-11-06 23:37:30 PST
Wowsers. This test program seems to show even more thoroughly that Opera and Mozilla reuse the same regexp object for a regexp created repeatedly from the same source line:

<script>
var a = "";

for (var i = 0; i < 5; i++) {
    var b = /a/;
    b.foo = "bar";
    alert(a.foo);
    alert(a ===	b);
    b.foo = "";
    a = b;
}

</script>

But not if you create a second instance of the /a/ regexp!
Comment 7 Geoffrey Garen 2007-11-06 23:41:38 PST
Created attachment 17099 [details]
Patch1

Buy this one, get the next one free!
Comment 8 Geoffrey Garen 2007-11-06 23:47:01 PST
I don't think that re-using the JS wrapper is OK. The JS wrapper holds the lastIndex property, which influences the behavior of matching functions. If two regular expressions got their lastIndex properties tangled up, you'd be in crazy-ville. (Population 2, apparently.)
Comment 9 Geoffrey Garen 2007-11-06 23:54:05 PST
ECMA 7.8.5:

A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned. The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object.

OK, population 3 :).
Comment 10 Maciej Stachowiak 2007-11-07 00:55:27 PST
Do you still want your patch reviewed or would you like to do the more aggressive optimization that ECMA apparently mandates? :-)
Comment 11 Darin Adler 2007-11-07 08:24:23 PST
Comment on attachment 17099 [details]
Patch1

An even better way to eliminate duplication of flags would be to have a way to ask a JSRegExp what its flags are.

I think subpatterns is a coined word, so the "P" should not be capitalized.

r=me
Comment 12 Geoffrey Garen 2007-11-07 09:14:55 PST
(In reply to comment #10)
> Do you still want your patch reviewed or would you like to do the more
> aggressive optimization that ECMA apparently mandates? :-)

I'm still going to check this one in -- constructing fewer RegExpImps will be great, but making RegExpImp construction faster is good, too.
Comment 13 Geoffrey Garen 2007-11-07 09:22:59 PST
Committed revision 27571.

Now I'll get to work on creating only one RegExpImp. Kind of tricky, since just putting a ProtectedPtr in the node tree will cause a retain cycle if you set the containing function as a property of the RegExpImp.
Comment 14 Eric Seidel (no email) 2007-11-07 10:14:37 PST
Wow, what a fantastic change!
ggaren++ !

This was landed in r27571, so closing.  I'm assuming ggaren will file a second bug about "creating only one RegExpImp"
Comment 15 Eric Seidel (no email) 2007-11-07 10:15:59 PST
Maybe we can convince mjs to sent out updated numbers comparing against SpiderMonkey (or I just need to download the darn thing and run it myself).  I doubt we beat them on validate-input yet, but we're probably limited on ActivationImp or something else instead now.
Comment 16 Geoffrey Garen 2007-11-08 15:50:49 PST
Re-titling to reflect the patch that landed with this bug.