Bug 145762 - JetStream should have a more rational story for jitter-oriented latency tests
Summary: JetStream should have a more rational story for jitter-oriented latency tests
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on: 145764
Blocks:
  Show dependency treegraph
 
Reported: 2015-06-08 11:05 PDT by Filip Pizlo
Modified: 2015-06-10 11:47 PDT (History)
15 users (show)

See Also:


Attachments
the patch (12.78 KB, patch)
2015-06-08 11:10 PDT, Filip Pizlo
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2015-06-08 11:05:27 PDT
Currently we have some latency tests that are meant to measure jitter.  They do this by computing the RMS.  But the RMS is a pretty bad metric.  The thing that it rewards isn't really the thing that you'd want your browser to do.  These RMS-based tests involve taking the geomean of the RMS of some samples and the sample average.  The lower the geomean, the better (in the JetStream harness we then invert the scores so that higher is better, but let's ignore that for this discussion and assume that lower is better).  Here's an example of how this can go bad.  A browser that always computes a task in some horribly long time (say, 1000ms) but never varies that time will perform better than a browser that usually computes the task super quickly (say, 10ms) and sometimes just a little bit less quickly (say, 15ms).  The former browser will have an RMS of 0 and an average of 1000.  The latter will have a RMS somewhere around 3.5 and an average of 12.5 (assuming equal probability of 10ms and 15ms).  The geomean of (0, 1000) is 0.  The geomean of (3.5, 12.5) is 6.6.  Lower is better, so the former browser scores higher - even though it's obviously never better to have a browser always complete a task in 1000ms when a different browser can do it in 15ms in the worst case.

JetStream should not have this pathology.  The right way of avoiding it is to replace RMS with some other metric of how bad things get.  A good metric is the average of the worst percentile.  The worst 1% or the worst 5% would be good things to average.  This will catch cases where the VM jittered due to JIT or GC, but it never have the pathology that we end up giving the better score to a VM whose best case is worst than another VM's worst case.
Comment 1 Filip Pizlo 2015-06-08 11:10:14 PDT
Created attachment 254500 [details]
the patch
Comment 2 Filip Pizlo 2015-06-08 11:11:31 PDT
Comment on attachment 254500 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=254500&action=review

> PerformanceTests/JetStream/JetStreamDriver.js:1
> -// Copyright (C) 2014 Apple Inc. All rights reserved.
> +// Copyright (C) 2014, 2015 Apple Inc. All rights reserved.

I'll revert this.
Comment 3 Geoffrey Garen 2015-06-08 12:36:37 PDT
Comment on attachment 254500 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=254500&action=review

r=me

> PerformanceTests/JetStream/Octane2/base.js:196
> +  console.log("Called with numbers.length = " + numbers.length);

Revert?

> PerformanceTests/JetStream/Octane2/base.js:202
> +  console.log("Done sorting.");

Revert?

> PerformanceTests/JetStream/Octane2/base.js:217
> +  console.log("numbers we want: " + numbersWeWant);

Revert?

> PerformanceTests/JetStream/Octane2/base.js:224
> +  console.log("result = " + result);

Revert?

> PerformanceTests/JetStream/Octane2/base.js:267
> +  console.log("Pushing result: " + result.time + " " + result.latency);

Revert?

> PerformanceTests/JetStream/Octane2/base.js:342
> +    var latency = BenchmarkSuite.AverageAbovePercentile(latencySamples, /* percentile = */ 95) * 1000;

Can we just put 95 into a local named percentile instead?
Comment 4 Filip Pizlo 2015-06-08 12:38:59 PDT
(In reply to comment #3)
> Comment on attachment 254500 [details]
> the patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=254500&action=review
> 
> r=me
> 
> > PerformanceTests/JetStream/Octane2/base.js:196
> > +  console.log("Called with numbers.length = " + numbers.length);
> 
> Revert?
> 
> > PerformanceTests/JetStream/Octane2/base.js:202
> > +  console.log("Done sorting.");
> 
> Revert?
> 
> > PerformanceTests/JetStream/Octane2/base.js:217
> > +  console.log("numbers we want: " + numbersWeWant);
> 
> Revert?
> 
> > PerformanceTests/JetStream/Octane2/base.js:224
> > +  console.log("result = " + result);
> 
> Revert?
> 
> > PerformanceTests/JetStream/Octane2/base.js:267
> > +  console.log("Pushing result: " + result.time + " " + result.latency);
> 
> Revert?
> 
> > PerformanceTests/JetStream/Octane2/base.js:342
> > +    var latency = BenchmarkSuite.AverageAbovePercentile(latencySamples, /* percentile = */ 95) * 1000;
> 
> Can we just put 95 into a local named percentile instead?

Yup, I fixed all of those.
Comment 5 Ryosuke Niwa 2015-06-08 12:46:04 PDT
Oops, it looks like run-benchmark currently hard-codes 1.0.1 as the benchmark directory and this will break the script :(

Could you hold off on landing this patch until Dewei or I fix it?
Comment 6 Filip Pizlo 2015-06-10 11:47:22 PDT
Landed in http://trac.webkit.org/changeset/185425