Bug 90062

Summary: Output numbers of run-perf-tests should be rounded in accordance with sample standard deviations
Product: WebKit Reporter: Kentaro Hara <haraken>
Component: Tools / TestsAssignee: Kentaro Hara <haraken>
Status: REOPENED ---    
Severity: Normal CC: abarth, dpranke, eric, pdr, rniwa, webkit.review.bot, zimmermann, zoltan
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 77037    
Attachments:
Description Flags
Patch
none
Fixes the bug none

Description Kentaro Hara 2012-06-27 04:32:21 PDT
Currently the output of performance tests looks like this:

  DESCRIPTION: This benchmark covers DOM attributes that return small integers.
  RESULT Bindings: scroll-top= 191.379586433 runs/s
  median= 191.731644624 runs/s, stdev= 2.56134259787 runs/s, min= 180.412371134 runs/s, max= 192.771084337 runs/s

"191.731644624 runs/s" is too verbose. It should be "191.73 runs/s". Whichever units we use (runs/s or ms), rounding off the output to the second decimal place would be enough and preferable.
Comment 1 Kentaro Hara 2012-06-27 04:35:27 PDT
Created attachment 149726 [details]
Patch
Comment 2 Ryosuke Niwa 2012-06-27 09:31:02 PDT
Comment on attachment 149726 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=149726&action=review

> Tools/ChangeLog:16
> +        "191.731644624 runs/s" is too verbose. It should be "191.73 runs/s".
> +        Whichever unit we use (runs/s or ms), rounding off the output to the second
> +        decimal place would be enough and preferable.

No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures.
e.g. there are tests that only have 0.7 runs/s for example. For those tests, printing 2 decimals points would mean that we'd get only 2 significant digits. That's not acceptable.
In order to figure out the right significant figures, we need to first figure out the measurement errors.
From measurement errors, we can find out the sufficient significant figures for standard deviations, from which we can find out the right significant figures for the mean, median, etc...

Just picking 2 decimal point arbitrarily isn't an acceptable solution here.
Comment 3 Ryosuke Niwa 2012-06-27 09:33:22 PDT
+WildFox since he would surely know all about significant figures for he is a physicist :)
Comment 4 Kentaro Hara 2012-06-27 16:51:12 PDT
(In reply to comment #2)
> No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures.

Are the numbers used for arithmetic calculations? I've thought that _log.info() lines are just for human readability, and the arithmetic calculations (e.g. drawing perf-bot graphs etc) are done by using 'results', which contains all decimals.
Comment 5 Ryosuke Niwa 2012-06-27 16:55:02 PDT
(In reply to comment #4)
> (In reply to comment #2)
> > No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures.
> 
> Are the numbers used for arithmetic calculations? I've thought that _log.info() lines are just for human readability, and the arithmetic calculations (e.g. drawing perf-bot graphs etc) are done by using 'results', which contains all decimals.

Yes, I (humans) compute average between different runs all the time.
Comment 6 Kentaro Hara 2012-06-27 16:59:47 PDT
OK, marking as WONTFIX.
Comment 7 Ryosuke Niwa 2013-03-01 14:55:45 PST
Let's do this properly.
Comment 8 Ryosuke Niwa 2013-03-01 15:12:58 PST
Created attachment 191049 [details]
Fixes the bug
Comment 9 Eric Seidel (no email) 2013-03-01 15:16:22 PST
Comment on attachment 191049 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review

> Tools/ChangeLog:12
> +        We show one more digit than the most significant digit in the sample standard deviation since it's
> +        helpful when doing back-of-the-envelop calculations using these numbers.

I have no stats background... but it seems like digits are either significant or not, right?  If you're including non-significant digits in these calculations, isn't that just noise?

> Tools/Scripts/webkitpy/performance_tests/perftest.py:159
> +            return round(x, -highest_decimal_place(x) + sig_fig - 1)

Confused by the -1?

> Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91
> +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms

Why the change to percent?
Comment 10 Ryosuke Niwa 2013-03-01 15:32:13 PST
Comment on attachment 191049 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review

>> Tools/ChangeLog:12
>> +        helpful when doing back-of-the-envelop calculations using these numbers.
> 
> I have no stats background... but it seems like digits are either significant or not, right?  If you're including non-significant digits in these calculations, isn't that just noise?

"most significant digit" as in http://planetmath.org/MostSignificantDigit.html
i.e. left most digit of a number when it's shown in the positional notation: http://en.wikipedia.org/wiki/Positional_notation

Maybe there's a better word for it?

>> Tools/Scripts/webkitpy/performance_tests/perftest.py:159
>> +            return round(x, -highest_decimal_place(x) + sig_fig - 1)
> 
> Confused by the -1?

Say we have 1.2345, then to round it at 3rd sig. fig. we do: round(1.2345, 2), not round(1.2345, 3)
Comment 11 Ryosuke Niwa 2013-03-01 15:33:21 PST
Comment on attachment 191049 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review

>> Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91
>> +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms
> 
> Why the change to percent?

I find that to be more useful maybe maybe not?
Comment 12 Eric Seidel (no email) 2013-03-02 00:50:44 PST
The patch looks reasonable, but you should really ask someone who's taken a stats class to review. :)
Comment 13 Zoltan Horvath 2013-03-02 13:35:39 PST
(In reply to comment #11)
> (From update of attachment 191049 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review
> 
> >> Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91
> >> +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms
> > 
> > Why the change to percent?
> 
> I find that to be more useful maybe maybe not?

(In reply to comment #12)
> The patch looks reasonable, but you should really ask someone who's taken a stats class to review. :)


I prefer using the percentage, it's more straightforward.
Comment 14 Ryosuke Niwa 2015-06-05 14:49:08 PDT
Comment on attachment 191049 [details]
Fixes the bug

This patch is too old.