Bug 98115 - PerfTests have incorrect standard deviation calculation
Summary: PerfTests have incorrect standard deviation calculation
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 420+
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Philip Rogers
URL:
Keywords:
: 77401 (view as bug list)
Depends on:
Blocks: 77037
  Show dependency treegraph
 
Reported: 2012-10-01 22:36 PDT by Philip Rogers
Modified: 2012-10-02 14:23 PDT (History)
7 users (show)

See Also:


Attachments
Fix PerfTest standard deviation calculation. (4.00 KB, patch)
2012-10-01 22:48 PDT, Philip Rogers
no flags Details | Formatted Diff | Diff
Update per reviewer comments (13.69 KB, patch)
2012-10-01 23:48 PDT, Philip Rogers
no flags Details | Formatted Diff | Diff
Rebaselined (13.77 KB, patch)
2012-10-02 00:04 PDT, Philip Rogers
rniwa: review+
Details | Formatted Diff | Diff
Added link to algorithm in the ChangeLog (14.13 KB, patch)
2012-10-02 00:17 PDT, Philip Rogers
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Rogers 2012-10-01 22:36:27 PDT
Our performance test numbers calculate standard deviation incorrectly:

(wrong)
RESULT PageLoad: svg: files: worldcup.svg= 2197.27248895 ms
median= 2157.99593925 ms, stdev= 693.92660072 ms, min= 2127.62403488 ms, max= 2865.42391777 ms

(right)
RESULT PageLoad: svg: files: worldcup.svg= 2177.03914642 ms
median= 2146.1250782 ms, stdev= 162.849482453 ms, min= 2105.28707504 ms, max= 2840.33107758 ms
Comment 1 Philip Rogers 2012-10-01 22:48:52 PDT
Created attachment 166608 [details]
Fix PerfTest standard deviation calculation.
Comment 2 Ryosuke Niwa 2012-10-01 22:52:45 PDT
Comment on attachment 166608 [details]
Fix PerfTest standard deviation calculation.

View in context: https://bugs.webkit.org/attachment.cgi?id=166608&action=review

Could you also change the algorithm in runner.js? I've always wanted to fix that algorithm.

> Tools/Scripts/webkitpy/performance_tests/perftest.py:236
> +            'stdev': math.sqrt(squareSum / (len(sorted_test_times) - 1)),

So we're now computing the sample standard deviation as opposed to the sample deviation of the sample:
http://en.wikipedia.org/wiki/Standard_deviation#Estimation

You should explain that in the change log.
Comment 3 Ryosuke Niwa 2012-10-01 22:54:28 PDT
Comment on attachment 166608 [details]
Fix PerfTest standard deviation calculation.

View in context: https://bugs.webkit.org/attachment.cgi?id=166608&action=review

> Tools/Scripts/webkitpy/performance_tests/perftest.py:227
>              delta = time - mean
>              sweep = i + 1.0
>              mean += delta / sweep

We can rewrite this as
mean += (time - mean) / sweep
and eliminate delta now.
Comment 4 Philip Rogers 2012-10-01 23:48:46 PDT
Created attachment 166621 [details]
Update per reviewer comments
Comment 5 Philip Rogers 2012-10-01 23:50:01 PDT
(In reply to comment #2)
> (From update of attachment 166608 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=166608&action=review
> 
> Could you also change the algorithm in runner.js? I've always wanted to fix that algorithm.

Done :)

> 
> > Tools/Scripts/webkitpy/performance_tests/perftest.py:236
> > +            'stdev': math.sqrt(squareSum / (len(sorted_test_times) - 1)),
> 
> So we're now computing the sample standard deviation as opposed to the sample deviation of the sample:
> http://en.wikipedia.org/wiki/Standard_deviation#Estimation
> 
> You should explain that in the change log.

Done
Comment 6 Philip Rogers 2012-10-01 23:51:52 PDT
(In reply to comment #3)
> (From update of attachment 166608 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=166608&action=review
> 
> > Tools/Scripts/webkitpy/performance_tests/perftest.py:227
> >              delta = time - mean
> >              sweep = i + 1.0
> >              mean += delta / sweep
> 
> We can rewrite this as
> mean += (time - mean) / sweep
> and eliminate delta now.

I'm not sure it's quite that simple (we need to store the mean before and after the update). This isn't performance-sensitive code and I think the readability (copied directly from Knuth) is useful here for future statistics hackers.
Comment 7 Philip Rogers 2012-10-01 23:53:56 PDT
Comment on attachment 166621 [details]
Update per reviewer comments

Clearing flags since patch doesn't apply. Let me rebaseline after rniwa's change today and reupload.
Comment 8 Philip Rogers 2012-10-02 00:04:58 PDT
Created attachment 166623 [details]
Rebaselined
Comment 9 Ryosuke Niwa 2012-10-02 00:11:09 PDT
Comment on attachment 166623 [details]
Rebaselined

Please include the URL to the algorithm you cited.
Comment 10 Ryosuke Niwa 2012-10-02 00:12:08 PDT
*** Bug 77401 has been marked as a duplicate of this bug. ***
Comment 11 Philip Rogers 2012-10-02 00:17:26 PDT
Created attachment 166626 [details]
Added link to algorithm in the ChangeLog
Comment 12 WebKit Review Bot 2012-10-02 01:10:40 PDT
Comment on attachment 166626 [details]
Added link to algorithm in the ChangeLog

Clearing flags on attachment: 166626

Committed r130135: <http://trac.webkit.org/changeset/130135>
Comment 13 WebKit Review Bot 2012-10-02 01:10:44 PDT
All reviewed patches have been landed.  Closing bug.
Comment 14 Csaba Osztrogonác 2012-10-02 04:00:21 PDT
(In reply to comment #12)
> (From update of attachment 166626 [details])
> Clearing flags on attachment: 166626
> 
> Committed r130135: <http://trac.webkit.org/changeset/130135>

FYI: It broke the perf tests again. See the bots for details.
Comment 15 Csaba Osztrogonác 2012-10-02 14:13:07 PDT
Fix landed in http://trac.webkit.org/changeset/130181
Comment 16 Philip Rogers 2012-10-02 14:15:43 PDT
(In reply to comment #15)
> Fix landed in http://trac.webkit.org/changeset/130181

Should we also make this change in the python standard deviation method?
Comment 17 Ryosuke Niwa 2012-10-02 14:23:47 PDT
(In reply to comment #16)
> (In reply to comment #15)
> > Fix landed in http://trac.webkit.org/changeset/130181
> 
> Should we also make this change in the python standard deviation method?

That's a good point. We should do that although PageLoadTest always gets 20 values.