Bug 15861

Summary: 5.8% of string-validate-input.js is spent creating RegExpImps
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: JavaScriptCoreAssignee: Geoffrey Garen <ggaren>
Status: RESOLVED FIXED    
Severity: Normal CC: ggaren, mjs
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.4   
Bug Depends on:    
Bug Blocks: 15900, 15902    
Attachments:
Description Flags
Patch1 darin: review+

Description Eric Seidel (no email) 2007-11-06 13:09:21 PST
JavaScriptCore needs to pull constants out of loops

SpiderMonkey beats us on the "validate-input" test dramatically.
    validate-input:    58.2% *slower*   115.2ms +/- 1.2%    182.2ms +/- 0.7%     significant

I think much of that is due to them having some magical way to recognize constants and pull them out of for loops.

in doTest()

There is a "pattern" constant regexp, which we re-compile every time through the loop. If I hack the test to pull that out of the loop, we get a 14% speed increase in that one test:

    validate-input:    14.6% faster     162.8ms +/- 0.8%    139.0ms +/- 1.1%     significant
Comment 1 Eric Seidel (no email) 2007-11-06 13:12:21 PST
Another way to handle this specific case would be to just unique RegExp objects based on Identifier, then once any one of them is compiled, they all are.
Comment 2 Geoffrey Garen 2007-11-06 16:18:03 PST
I'm working on this. I think we can just store the compiled regexp in the node tree.
Comment 3 Maciej Stachowiak 2007-11-06 21:05:59 PST
Keep in mind that regexp literals are not truly constants, they create a new object every time and this difference can be observed. It should be safe to cache the compiled guts of the regexp, but not the JavaScript wrapper object.
Comment 4 Eric Seidel (no email) 2007-11-06 21:30:46 PST
It would be interesting to test this observation, and see what mozilla caches. :)
Comment 5 Maciej Stachowiak 2007-11-06 23:33:50 PST
This test program seems to show Mozilla caching the whole JS-level regexp object (we do not). I wonder what the standard says?

<script>
var a = "";

for (var i = 0; i < 5; i++) {
    var b = /a/;
    b.foo = "foo";
    alert(a.foo);
    a = b;
}

</script>
Comment 6 Maciej Stachowiak 2007-11-06 23:37:30 PST
Wowsers. This test program seems to show even more thoroughly that Opera and Mozilla reuse the same regexp object for a regexp created repeatedly from the same source line:

<script>
var a = "";

for (var i = 0; i < 5; i++) {
    var b = /a/;
    b.foo = "bar";
    alert(a.foo);
    alert(a ===	b);
    b.foo = "";
    a = b;
}

</script>

But not if you create a second instance of the /a/ regexp!
Comment 7 Geoffrey Garen 2007-11-06 23:41:38 PST
Created attachment 17099 [details]
Patch1

Buy this one, get the next one free!
Comment 8 Geoffrey Garen 2007-11-06 23:47:01 PST
I don't think that re-using the JS wrapper is OK. The JS wrapper holds the lastIndex property, which influences the behavior of matching functions. If two regular expressions got their lastIndex properties tangled up, you'd be in crazy-ville. (Population 2, apparently.)
Comment 9 Geoffrey Garen 2007-11-06 23:54:05 PST
ECMA 7.8.5:

A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned. The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object.

OK, population 3 :).
Comment 10 Maciej Stachowiak 2007-11-07 00:55:27 PST
Do you still want your patch reviewed or would you like to do the more aggressive optimization that ECMA apparently mandates? :-)
Comment 11 Darin Adler 2007-11-07 08:24:23 PST
Comment on attachment 17099 [details]
Patch1

An even better way to eliminate duplication of flags would be to have a way to ask a JSRegExp what its flags are.

I think subpatterns is a coined word, so the "P" should not be capitalized.

r=me
Comment 12 Geoffrey Garen 2007-11-07 09:14:55 PST
(In reply to comment #10)
> Do you still want your patch reviewed or would you like to do the more
> aggressive optimization that ECMA apparently mandates? :-)

I'm still going to check this one in -- constructing fewer RegExpImps will be great, but making RegExpImp construction faster is good, too.
Comment 13 Geoffrey Garen 2007-11-07 09:22:59 PST
Committed revision 27571.

Now I'll get to work on creating only one RegExpImp. Kind of tricky, since just putting a ProtectedPtr in the node tree will cause a retain cycle if you set the containing function as a property of the RegExpImp.
Comment 14 Eric Seidel (no email) 2007-11-07 10:14:37 PST
Wow, what a fantastic change!
ggaren++ !

This was landed in r27571, so closing.  I'm assuming ggaren will file a second bug about "creating only one RegExpImp"
Comment 15 Eric Seidel (no email) 2007-11-07 10:15:59 PST
Maybe we can convince mjs to sent out updated numbers comparing against SpiderMonkey (or I just need to download the darn thing and run it myself).  I doubt we beat them on validate-input yet, but we're probably limited on ActivationImp or something else instead now.
Comment 16 Geoffrey Garen 2007-11-08 15:50:49 PST
Re-titling to reflect the patch that landed with this bug.