Bug 146378 - [JetStream] Raise the percentile of mandreel-latency and splay-latency
Summary: [JetStream] Raise the percentile of mandreel-latency and splay-latency
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-26 21:27 PDT by Filip Pizlo
Modified: 2015-06-27 20:48 PDT (History)
14 users (show)

See Also:


Attachments
the patch (4.43 KB, patch)
2015-06-26 21:30 PDT, Filip Pizlo
mark.lam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2015-06-26 21:27:59 PDT
The current percentile is 95%.  When I looked at the sample lists in our GC, it was clear that the worst 5% samples completely amortize our GC pauses.  Our GC pauses can be quite bad.  Clearly, splay-latency is meant to test whether we have an incremental GC that ensures that you don't have bad worst-case pauses.  But 95% is too small, because it doesn't really capture those pauses.  Raising the percentile to above 99% appears to do the trick.  99.5% or more seems like a good bet.  The trade-off there is just that if we set it too high, then we won't have enough statistics.  Doing this very clearly rewards GCs that are incremental, and punishes GCs that aren't (like ours).  That's what we want, since in the future we want to use this test to guide any improvements to the worst-case performance of our GC.

The way that the percentile is selected will also affect mandreel-latency.  That's a good thing, because 95% is probably too low for that test as well.  That test ends up with >10k samples.  The goal of using 95% in the first place was to get enough samples to have a stable average.  But if we have >10k samples, we can push that percentile up much higher and still get good statistics while achieving the effect we want - i.e. getting the worst case.

I don't think that we need to do the same thing for cdjs.  That test only takes 200 samples, so 95% means we report the average of the worst 10 samples.  That's probably good enough.
Comment 1 Filip Pizlo 2015-06-26 21:30:34 PDT
Created attachment 255691 [details]
the patch
Comment 2 Mark Lam 2015-06-26 22:25:34 PDT
Comment on attachment 255691 [details]
the patch

rs=me
Comment 3 Filip Pizlo 2015-06-27 20:48:11 PDT
Landed in http://trac.webkit.org/changeset/186041