I have this little idea for making GC destroy JSStrings more efficiently.
Created attachment 231948 [details] Patch idea
Benchmark report for Octane on CabMook (MacBookPro10,1). VMs tested: "ToT" at /Volumes/Data/Source/Safari/Reference-OpenSource/WebKitBuild/Release/jsc "Hacks" at /Volumes/Data/Source/Safari/OpenSource/WebKitBuild/Release/jsc Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. ToT Hacks encrypt 1.08497+-0.01829 1.07993+-0.02012 decrypt 19.30935+-0.24879 19.29161+-0.02956 deltablue x2 1.21815+-0.00833 1.21247+-0.01500 earley 2.19743+-0.05550 2.18668+-0.00854 boyer 23.90228+-0.06863 ? 23.96938+-0.14821 ? navier-stokes x2 23.86052+-0.01006 ? 23.86080+-0.00542 ? raytrace x2 8.10303+-0.12664 ? 8.15091+-0.17171 ? richards x2 0.60843+-0.04798 0.60282+-0.01505 splay x2 1.55989+-0.08370 1.49182+-0.01882 might be 1.0456x faster regexp x2 199.19518+-0.92605 197.05836+-2.10837 might be 1.0108x faster pdfjs x2 266.75545+-0.98525 ^ 262.35065+-0.61502 ^ definitely 1.0168x faster mandreel x2 299.90476+-2.45375 ? 301.06825+-3.98014 ? gbemu x2 272.13303+-3.90926 ? 276.64499+-7.16828 ? might be 1.0166x slower closure 2.37485+-0.01098 2.37377+-0.01897 jquery 31.55491+-0.20196 31.54801+-0.22624 box2d x2 120.80826+-6.90088 120.15285+-5.52892 zlib x2 1798.62225+-5.93989 1780.48275+-88.35760 might be 1.0102x faster typescript x2 3297.27832+-59.34699 ? 3307.64124+-34.31568 ? <arithmetic> 422.01728+-3.34998 421.39617+-6.21773 might be 1.0015x faster <geometric> * 35.74942+-0.37086 35.56974+-0.26158 might be 1.0051x faster <harmonic> 3.52166+-0.12538 3.48135+-0.04920 might be 1.0116x faster
Comment on attachment 231948 [details] Patch idea I'm not a huge fan of this. The brittleness of the tight coupling between GC and string, and the fact that non-string has to do an extra branch, feels more significant than the speedup. Maybe this would look better if there were an explicit String dtorType, and strings all got segregated into their own special MarkedBlock.
(In reply to comment #3) > (From update of attachment 231948 [details]) > I'm not a huge fan of this. The brittleness of the tight coupling between GC and string, and the fact that non-string has to do an extra branch, feels more significant than the speedup. I paid for the cell type branch by killing the branch on dtorType, no? ;) > Maybe this would look better if there were an explicit String dtorType, and strings all got segregated into their own special MarkedBlock. Yeah maybe. That seems like it could have other effects though. I can try it and see.
How about this: (1) A tiny patch to specialize the dtor call as a template parameter, so there's no branch on it. Is great good. (2) A bigger patch to put all strings in their own MarkedAllocator. Specialize the sweeping of that fellow, too. Now, we've removed a branch from sweeping strings, and the system is still pretty flexible. Eventually, we will probably build on this to allocating StringImpls inline in JSString, and remove malloc / free / destruction from most strings.
GC actually does have a fast path for string destruction now that they get their own space.