Bug 164823

Summary: Slight Octane regression from concurrent GC's eager object zero-fill
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED WONTFIX    
Severity: Normal CC: cdumez, commit-queue, keith_miller, mark.lam, msaboff, saam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Bug Depends on: 164885    
Bug Blocks: 149432    
Attachments:
Description Flags
one possible approach
none
the patch ggaren: review+

Description Filip Pizlo 2016-11-16 09:41:37 PST
Patch forthcoming.
Comment 1 Filip Pizlo 2016-11-16 09:49:58 PST
Created attachment 294946 [details]
one possible approach

This fixes the eager zero-fill performance problems by doing the zero-fill conditionally.

But I'm also going to attempt a fix where the GC manages the zero-filling.
Comment 2 Filip Pizlo 2016-11-16 13:02:10 PST
(In reply to comment #1)
> Created attachment 294946 [details]
> one possible approach
> 
> This fixes the eager zero-fill performance problems by doing the zero-fill
> conditionally.
> 
> But I'm also going to attempt a fix where the GC manages the zero-filling.

Thinking about it more, doing the zero-filling in GC doesn't work. It fails to protect us from the case where the collector races with a transition on an object that was allocated without eager zero-fill.

Also, the GC zero-fill patch was not really faster.
Comment 3 Filip Pizlo 2016-11-16 13:02:53 PST
Created attachment 294967 [details]
the patch
Comment 4 Geoffrey Garen 2016-11-16 13:05:59 PST
Comment on attachment 294967 [details]
the patch

r=me
Comment 5 Filip Pizlo 2016-11-16 14:25:13 PST
Landed in https://trac.webkit.org/changeset/208811
Comment 6 Filip Pizlo 2016-11-16 15:50:38 PST
I'm beginning to think that no version of this optimization is sound: it's possible that an object allocated before GC started, and so without zero-fill, could be part of a memory ordering race while the GC is running.

I think that a better approach would be to remove object initializations and then put fences on PutStructure. To make that efficient on ARM, we would have to do redundant PutStructure elimination in DFG IR.
Comment 7 Filip Pizlo 2016-11-16 16:02:20 PST
(In reply to comment #6)
> I'm beginning to think that no version of this optimization is sound: it's
> possible that an object allocated before GC started, and so without
> zero-fill, could be part of a memory ordering race while the GC is running.
> 
> I think that a better approach would be to remove object initializations and
> then put fences on PutStructure. To make that efficient on ARM, we would
> have to do redundant PutStructure elimination in DFG IR.

Rolled out in r208822.
Comment 8 Chris Dumez 2016-11-16 21:02:10 PST
(In reply to comment #5)
> Landed in https://trac.webkit.org/changeset/208811

FYI, one of our Mac bots has caught up with r208811 and there was no progression on Octane.
Comment 9 Chris Dumez 2016-11-18 09:30:20 PST
(In reply to comment #8)
> (In reply to comment #5)
> > Landed in https://trac.webkit.org/changeset/208811
> 
> FYI, one of our Mac bots has caught up with r208811 and there was no
> progression on Octane.

FYI, A/B testing shows there was no Speedometer progression from r208811 on ARM either.
Comment 10 Filip Pizlo 2016-12-13 10:13:52 PST
I don't think that we know that there was actually any regression due to this specifically, based on subsequent testing.