145762 – JetStream should have a more rational story for jitter-oriented latency tests

RESOLVED FIXED 145762

JetStream should have a more rational story for jitter-oriented latency tests

https://bugs.webkit.org/show_bug.cgi?id=145762

Summary JetStream should have a more rational story for jitter-oriented latency tests

Filip Pizlo

Reported 2015-06-08 11:05:27 PDT

Currently we have some latency tests that are meant to measure jitter. They do this by computing the RMS. But the RMS is a pretty bad metric. The thing that it rewards isn't really the thing that you'd want your browser to do. These RMS-based tests involve taking the geomean of the RMS of some samples and the sample average. The lower the geomean, the better (in the JetStream harness we then invert the scores so that higher is better, but let's ignore that for this discussion and assume that lower is better). Here's an example of how this can go bad. A browser that always computes a task in some horribly long time (say, 1000ms) but never varies that time will perform better than a browser that usually computes the task super quickly (say, 10ms) and sometimes just a little bit less quickly (say, 15ms). The former browser will have an RMS of 0 and an average of 1000. The latter will have a RMS somewhere around 3.5 and an average of 12.5 (assuming equal probability of 10ms and 15ms). The geomean of (0, 1000) is 0. The geomean of (3.5, 12.5) is 6.6. Lower is better, so the former browser scores higher - even though it's obviously never better to have a browser always complete a task in 1000ms when a different browser can do it in 15ms in the worst case. JetStream should not have this pathology. The right way of avoiding it is to replace RMS with some other metric of how bad things get. A good metric is the average of the worst percentile. The worst 1% or the worst 5% would be good things to average. This will catch cases where the VM jittered due to JIT or GC, but it never have the pathology that we end up giving the better score to a VM whose best case is worst than another VM's worst case.

Attachments
the patch (12.78 KB, patch) 2015-06-08 11:10 PDT, Filip Pizlo	ggaren: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Filip Pizlo

Comment 1 2015-06-08 11:10:14 PDT

Created attachment 254500 [details] the patch

Filip Pizlo

Comment 2 2015-06-08 11:11:31 PDT

Comment on attachment 254500 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=254500&action=review > PerformanceTests/JetStream/JetStreamDriver.js:1 > -// Copyright (C) 2014 Apple Inc. All rights reserved. > +// Copyright (C) 2014, 2015 Apple Inc. All rights reserved. I'll revert this.

Geoffrey Garen

Comment 3 2015-06-08 12:36:37 PDT

Comment on attachment 254500 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=254500&action=review r=me > PerformanceTests/JetStream/Octane2/base.js:196 > + console.log("Called with numbers.length = " + numbers.length); Revert? > PerformanceTests/JetStream/Octane2/base.js:202 > + console.log("Done sorting."); Revert? > PerformanceTests/JetStream/Octane2/base.js:217 > + console.log("numbers we want: " + numbersWeWant); Revert? > PerformanceTests/JetStream/Octane2/base.js:224 > + console.log("result = " + result); Revert? > PerformanceTests/JetStream/Octane2/base.js:267 > + console.log("Pushing result: " + result.time + " " + result.latency); Revert? > PerformanceTests/JetStream/Octane2/base.js:342 > + var latency = BenchmarkSuite.AverageAbovePercentile(latencySamples, /* percentile = */ 95) * 1000; Can we just put 95 into a local named percentile instead?

Filip Pizlo

Comment 4 2015-06-08 12:38:59 PDT

(In reply to comment #3) > Comment on attachment 254500 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=254500&action=review > > r=me > > > PerformanceTests/JetStream/Octane2/base.js:196 > > + console.log("Called with numbers.length = " + numbers.length); > > Revert? > > > PerformanceTests/JetStream/Octane2/base.js:202 > > + console.log("Done sorting."); > > Revert? > > > PerformanceTests/JetStream/Octane2/base.js:217 > > + console.log("numbers we want: " + numbersWeWant); > > Revert? > > > PerformanceTests/JetStream/Octane2/base.js:224 > > + console.log("result = " + result); > > Revert? > > > PerformanceTests/JetStream/Octane2/base.js:267 > > + console.log("Pushing result: " + result.time + " " + result.latency); > > Revert? > > > PerformanceTests/JetStream/Octane2/base.js:342 > > + var latency = BenchmarkSuite.AverageAbovePercentile(latencySamples, /* percentile = */ 95) * 1000; > > Can we just put 95 into a local named percentile instead? Yup, I fixed all of those.

Ryosuke Niwa

Comment 5 2015-06-08 12:46:04 PDT

Oops, it looks like run-benchmark currently hard-codes 1.0.1 as the benchmark directory and this will break the script :( Could you hold off on landing this patch until Dewei or I fix it?

Filip Pizlo

Comment 6 2015-06-10 11:47:22 PDT

Landed in http://trac.webkit.org/changeset/185425

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Tools / Tests

Assignee

Filip Pizlo

Reported

2015-06-08 11:05 PDT

Modified

2015-06-10 11:47 PDT History

CC List

15 users Show

URL

Keywords

Depends on

145764

Blocks

Dependencies

tree graph