Bug 165963

Summary: The GC should be able to reflect upon its pipeline of work
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: JavaScriptCoreAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: commit-queue, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 165910    
Attachments:
Description Flags
a little something like this
none
a bit more
none
so fun
none
more
none
the patch
none
more
none
it compiles!
none
seems to work sorta
none
even more
none
giving up on the new scheduler
none
reverted to the old scheduling algorithm none

Description Filip Pizlo 2016-12-16 11:14:33 PST
Patch forthcoming.
Comment 1 Filip Pizlo 2016-12-17 10:37:35 PST
Created attachment 297409 [details]
a little something like this

Work in progress.
Comment 2 Filip Pizlo 2016-12-17 10:38:26 PST
Comment on attachment 297409 [details]
a little something like this

View in context: https://bugs.webkit.org/attachment.cgi?id=297409&action=review

> Source/JavaScriptCore/heap/SpaceTimeScheduler.cpp:22
> - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * (INCLUDING NEGfLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE

Oops.  Reverted.
Comment 3 Filip Pizlo 2016-12-18 11:14:33 PST
Created attachment 297442 [details]
a bit more
Comment 4 Filip Pizlo 2016-12-22 15:05:23 PST
Created attachment 297678 [details]
so fun
Comment 5 Filip Pizlo 2016-12-30 17:24:28 PST
Created attachment 297865 [details]
more
Comment 6 Filip Pizlo 2017-01-02 13:30:09 PST
Created attachment 297909 [details]
the patch
Comment 7 WebKit Commit Bot 2017-01-02 13:32:14 PST
Attachment 297909 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/heap/MarkingConstraint.cpp:38:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/heap/MarkingConstraint.cpp:38:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/heap/MarkingConstraint.cpp:39:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/heap/MarkingConstraint.cpp:40:  Wrong number of spaces before statement. (expected: 8)  [whitespace/indent] [4]
Total errors found: 4 in 25 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Filip Pizlo 2017-01-02 13:36:17 PST
Comment on attachment 297909 [details]
the patch

Ooops, did not mean to set r?
Comment 9 Filip Pizlo 2017-01-05 21:51:18 PST
Created attachment 298173 [details]
more
Comment 10 Filip Pizlo 2017-01-06 08:29:40 PST
Created attachment 298202 [details]
it compiles!
Comment 11 Filip Pizlo 2017-01-06 11:17:12 PST
Created attachment 298216 [details]
seems to work sorta
Comment 12 Filip Pizlo 2017-01-06 15:18:28 PST
Created attachment 298232 [details]
even more

This is getting fun.
Comment 13 Radar WebKit Bug Importer 2017-01-06 15:29:00 PST
<rdar://problem/29909896>
Comment 14 Filip Pizlo 2017-01-06 15:38:38 PST
The initial implementation boosted splay-latency but hurt on retreating wavefront worst-case pathologies like hash-map, because the only scheduler was a tad bit quicker to choose a longer pause.  It looks like the old scheduler was getting luckier on hash-map, but was still having a bad time - lots of wasted iterations before finishing a GC cycle.

I'm now adding some exponential backoff to the space-time scheduler.  This appears to preserve the patch's advantage on splay-latency while also progressing hash-map.

Ultimately, the goal of this patch is to create a framework that can be more easily extended, since it doesn't hide the cost of constraint evaluation from the scheduler.
Comment 15 Filip Pizlo 2017-01-06 19:44:59 PST
Created attachment 298250 [details]
giving up on the new scheduler

Going to go back to the old scheduler.  The new one is empirically worse.
Comment 16 Filip Pizlo 2017-01-06 19:45:32 PST
(In reply to comment #15)
> Created attachment 298250 [details]
> giving up on the new scheduler
> 
> Going to go back to the old scheduler.  The new one is empirically worse.

I should clarify: the same configuration that led to better splay-latency performance also killed Speedometer performance.  We don't want that.
Comment 17 Filip Pizlo 2017-01-07 11:05:59 PST
Created attachment 298280 [details]
reverted to the old scheduling algorithm

I think that what made the old scheduling algorithm so effective is that its decisions were so time-driven. It stayed in phase except when we "snapped" phase after a round of constraint solving. This meant that if we encountered jank while in some mode (resumed or suspended) and overran our budget then the scheduler would degrade to something like a coin flip. It turns out that having this kind of non-determinism is _good_ for the GC, since it protects us from many run-away scenarios. For example if a deterministic scheduler encounters a huge object, then we'd yield to mutator unless the scheduler had some mechanism to detect this case.

Altogether it seems that the good things about the old scheduler are:
- Eventually stop-the-world, which we arrive at somewhat elastically.
- Periodic stop-the-world increments that have a chance of getting lucky and declaring termination.
- Prioritize collector progress. In a retreating wavefront collector, the mutator has the speed advantage - so we need to give the collector some unfair advantage.
- Allow constraint solving to take as long as it needs to take to generate work.
- Degrade to coin flip when we encounter jank.

I can't easily think of another scheduling algorithm that achieves these things while staying so simple. Every time I've tried to dramatically change this scheduler, I've gotten regressions all over the place. I think I'll stick to making only incremental improvements. Probably if I do come back to improving the scheduler, then I'll try to make the scheduler's period adapt to the time it took to run constraints. We should use a longer period when we took longer to run constraints so that the mutator has more time to itself.
Comment 18 Filip Pizlo 2017-01-07 11:06:27 PST
(In reply to comment #17)
> Created attachment 298280 [details]
> reverted to the old scheduling algorithm
> 
> I think that what made the old scheduling algorithm so effective is that its
> decisions were so time-driven. It stayed in phase except when we "snapped"
> phase after a round of constraint solving. This meant that if we encountered
> jank while in some mode (resumed or suspended) and overran our budget then
> the scheduler would degrade to something like a coin flip. It turns out that
> having this kind of non-determinism is _good_ for the GC, since it protects
> us from many run-away scenarios. For example if a deterministic scheduler
> encounters a huge object, then we'd yield to mutator unless the scheduler
> had some mechanism to detect this case.
> 
> Altogether it seems that the good things about the old scheduler are:
> - Eventually stop-the-world, which we arrive at somewhat elastically.
> - Periodic stop-the-world increments that have a chance of getting lucky and
> declaring termination.
> - Prioritize collector progress. In a retreating wavefront collector, the
> mutator has the speed advantage - so we need to give the collector some
> unfair advantage.
> - Allow constraint solving to take as long as it needs to take to generate
> work.
> - Degrade to coin flip when we encounter jank.
> 
> I can't easily think of another scheduling algorithm that achieves these
> things while staying so simple. Every time I've tried to dramatically change
> this scheduler, I've gotten regressions all over the place. I think I'll
> stick to making only incremental improvements. Probably if I do come back to
> improving the scheduler, then I'll try to make the scheduler's period adapt
> to the time it took to run constraints. We should use a longer period when
> we took longer to run constraints so that the mutator has more time to
> itself.

Still testing this new scheduler... so far I just confirmed that it does the right things on splay and hash-map.
Comment 19 Filip Pizlo 2017-01-08 11:34:10 PST
The patch is actually fixing this and its parent bug.

*** This bug has been marked as a duplicate of bug 165910 ***