| Summary: | GC should have fast-path for destroying strings. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Andreas Kling <kling> | ||||
| Component: | JavaScriptCore | Assignee: | Andreas Kling <kling> | ||||
| Status: | RESOLVED INVALID | ||||||
| Severity: | Normal | CC: | barraclough, ggaren, kling, mhahnenberg | ||||
| Priority: | P2 | ||||||
| Version: | 528+ (Nightly build) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Attachments: |
|
||||||
|
Description
Andreas Kling
2014-05-23 00:56:12 PDT
Created attachment 231948 [details]
Patch idea
Benchmark report for Octane on CabMook (MacBookPro10,1).
VMs tested:
"ToT" at /Volumes/Data/Source/Safari/Reference-OpenSource/WebKitBuild/Release/jsc
"Hacks" at /Volumes/Data/Source/Safari/OpenSource/WebKitBuild/Release/jsc
Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in milliseconds.
ToT Hacks
encrypt 1.08497+-0.01829 1.07993+-0.02012
decrypt 19.30935+-0.24879 19.29161+-0.02956
deltablue x2 1.21815+-0.00833 1.21247+-0.01500
earley 2.19743+-0.05550 2.18668+-0.00854
boyer 23.90228+-0.06863 ? 23.96938+-0.14821 ?
navier-stokes x2 23.86052+-0.01006 ? 23.86080+-0.00542 ?
raytrace x2 8.10303+-0.12664 ? 8.15091+-0.17171 ?
richards x2 0.60843+-0.04798 0.60282+-0.01505
splay x2 1.55989+-0.08370 1.49182+-0.01882 might be 1.0456x faster
regexp x2 199.19518+-0.92605 197.05836+-2.10837 might be 1.0108x faster
pdfjs x2 266.75545+-0.98525 ^ 262.35065+-0.61502 ^ definitely 1.0168x faster
mandreel x2 299.90476+-2.45375 ? 301.06825+-3.98014 ?
gbemu x2 272.13303+-3.90926 ? 276.64499+-7.16828 ? might be 1.0166x slower
closure 2.37485+-0.01098 2.37377+-0.01897
jquery 31.55491+-0.20196 31.54801+-0.22624
box2d x2 120.80826+-6.90088 120.15285+-5.52892
zlib x2 1798.62225+-5.93989 1780.48275+-88.35760 might be 1.0102x faster
typescript x2 3297.27832+-59.34699 ? 3307.64124+-34.31568 ?
<arithmetic> 422.01728+-3.34998 421.39617+-6.21773 might be 1.0015x faster
<geometric> * 35.74942+-0.37086 35.56974+-0.26158 might be 1.0051x faster
<harmonic> 3.52166+-0.12538 3.48135+-0.04920 might be 1.0116x faster
Comment on attachment 231948 [details]
Patch idea
I'm not a huge fan of this. The brittleness of the tight coupling between GC and string, and the fact that non-string has to do an extra branch, feels more significant than the speedup.
Maybe this would look better if there were an explicit String dtorType, and strings all got segregated into their own special MarkedBlock.
(In reply to comment #3) > (From update of attachment 231948 [details]) > I'm not a huge fan of this. The brittleness of the tight coupling between GC and string, and the fact that non-string has to do an extra branch, feels more significant than the speedup. I paid for the cell type branch by killing the branch on dtorType, no? ;) > Maybe this would look better if there were an explicit String dtorType, and strings all got segregated into their own special MarkedBlock. Yeah maybe. That seems like it could have other effects though. I can try it and see. How about this: (1) A tiny patch to specialize the dtor call as a template parameter, so there's no branch on it. Is great good. (2) A bigger patch to put all strings in their own MarkedAllocator. Specialize the sweeping of that fellow, too. Now, we've removed a branch from sweeping strings, and the system is still pretty flexible. Eventually, we will probably build on this to allocating StringImpls inline in JSString, and remove malloc / free / destruction from most strings. GC actually does have a fast path for string destruction now that they get their own space. |