Bug 15861

Summary: 5.8% of string-validate-input.js is spent creating RegExpImps
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: JavaScriptCoreAssignee: Geoffrey Garen <ggaren>
Status: RESOLVED FIXED    
Severity: Normal CC: ggaren, mjs
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.4   
Bug Depends on:    
Bug Blocks: 15900, 15902    
Attachments:
Description Flags
Patch1 darin: review+

Eric Seidel (no email)
Reported 2007-11-06 13:09:21 PST
JavaScriptCore needs to pull constants out of loops SpiderMonkey beats us on the "validate-input" test dramatically. validate-input: 58.2% *slower* 115.2ms +/- 1.2% 182.2ms +/- 0.7% significant I think much of that is due to them having some magical way to recognize constants and pull them out of for loops. in doTest() There is a "pattern" constant regexp, which we re-compile every time through the loop. If I hack the test to pull that out of the loop, we get a 14% speed increase in that one test: validate-input: 14.6% faster 162.8ms +/- 0.8% 139.0ms +/- 1.1% significant
Attachments
Patch1 (25.66 KB, patch)
2007-11-06 23:41 PST, Geoffrey Garen
darin: review+
Eric Seidel (no email)
Comment 1 2007-11-06 13:12:21 PST
Another way to handle this specific case would be to just unique RegExp objects based on Identifier, then once any one of them is compiled, they all are.
Geoffrey Garen
Comment 2 2007-11-06 16:18:03 PST
I'm working on this. I think we can just store the compiled regexp in the node tree.
Maciej Stachowiak
Comment 3 2007-11-06 21:05:59 PST
Keep in mind that regexp literals are not truly constants, they create a new object every time and this difference can be observed. It should be safe to cache the compiled guts of the regexp, but not the JavaScript wrapper object.
Eric Seidel (no email)
Comment 4 2007-11-06 21:30:46 PST
It would be interesting to test this observation, and see what mozilla caches. :)
Maciej Stachowiak
Comment 5 2007-11-06 23:33:50 PST
This test program seems to show Mozilla caching the whole JS-level regexp object (we do not). I wonder what the standard says? <script> var a = ""; for (var i = 0; i < 5; i++) { var b = /a/; b.foo = "foo"; alert(a.foo); a = b; } </script>
Maciej Stachowiak
Comment 6 2007-11-06 23:37:30 PST
Wowsers. This test program seems to show even more thoroughly that Opera and Mozilla reuse the same regexp object for a regexp created repeatedly from the same source line: <script> var a = ""; for (var i = 0; i < 5; i++) { var b = /a/; b.foo = "bar"; alert(a.foo); alert(a === b); b.foo = ""; a = b; } </script> But not if you create a second instance of the /a/ regexp!
Geoffrey Garen
Comment 7 2007-11-06 23:41:38 PST
Created attachment 17099 [details] Patch1 Buy this one, get the next one free!
Geoffrey Garen
Comment 8 2007-11-06 23:47:01 PST
I don't think that re-using the JS wrapper is OK. The JS wrapper holds the lastIndex property, which influences the behavior of matching functions. If two regular expressions got their lastIndex properties tangled up, you'd be in crazy-ville. (Population 2, apparently.)
Geoffrey Garen
Comment 9 2007-11-06 23:54:05 PST
ECMA 7.8.5: A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned. The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object. OK, population 3 :).
Maciej Stachowiak
Comment 10 2007-11-07 00:55:27 PST
Do you still want your patch reviewed or would you like to do the more aggressive optimization that ECMA apparently mandates? :-)
Darin Adler
Comment 11 2007-11-07 08:24:23 PST
Comment on attachment 17099 [details] Patch1 An even better way to eliminate duplication of flags would be to have a way to ask a JSRegExp what its flags are. I think subpatterns is a coined word, so the "P" should not be capitalized. r=me
Geoffrey Garen
Comment 12 2007-11-07 09:14:55 PST
(In reply to comment #10) > Do you still want your patch reviewed or would you like to do the more > aggressive optimization that ECMA apparently mandates? :-) I'm still going to check this one in -- constructing fewer RegExpImps will be great, but making RegExpImp construction faster is good, too.
Geoffrey Garen
Comment 13 2007-11-07 09:22:59 PST
Committed revision 27571. Now I'll get to work on creating only one RegExpImp. Kind of tricky, since just putting a ProtectedPtr in the node tree will cause a retain cycle if you set the containing function as a property of the RegExpImp.
Eric Seidel (no email)
Comment 14 2007-11-07 10:14:37 PST
Wow, what a fantastic change! ggaren++ ! This was landed in r27571, so closing. I'm assuming ggaren will file a second bug about "creating only one RegExpImp"
Eric Seidel (no email)
Comment 15 2007-11-07 10:15:59 PST
Maybe we can convince mjs to sent out updated numbers comparing against SpiderMonkey (or I just need to download the darn thing and run it myself). I doubt we beat them on validate-input yet, but we're probably limited on ActivationImp or something else instead now.
Geoffrey Garen
Comment 16 2007-11-08 15:50:49 PST
Re-titling to reflect the patch that landed with this bug.
Note You need to log in before you can comment on or make changes to this bug.