REOPENED 90062
Output numbers of run-perf-tests should be rounded in accordance with sample standard deviations
https://bugs.webkit.org/show_bug.cgi?id=90062
Summary Output numbers of run-perf-tests should be rounded in accordance with sample ...
Kentaro Hara
Reported 2012-06-27 04:32:21 PDT
Currently the output of performance tests looks like this: DESCRIPTION: This benchmark covers DOM attributes that return small integers. RESULT Bindings: scroll-top= 191.379586433 runs/s median= 191.731644624 runs/s, stdev= 2.56134259787 runs/s, min= 180.412371134 runs/s, max= 192.771084337 runs/s "191.731644624 runs/s" is too verbose. It should be "191.73 runs/s". Whichever units we use (runs/s or ms), rounding off the output to the second decimal place would be enough and preferable.
Attachments
Patch (2.12 KB, patch)
2012-06-27 04:35 PDT, Kentaro Hara
no flags
Fixes the bug (5.94 KB, patch)
2013-03-01 15:12 PST, Ryosuke Niwa
no flags
Kentaro Hara
Comment 1 2012-06-27 04:35:27 PDT
Ryosuke Niwa
Comment 2 2012-06-27 09:31:02 PDT
Comment on attachment 149726 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=149726&action=review > Tools/ChangeLog:16 > + "191.731644624 runs/s" is too verbose. It should be "191.73 runs/s". > + Whichever unit we use (runs/s or ms), rounding off the output to the second > + decimal place would be enough and preferable. No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures. e.g. there are tests that only have 0.7 runs/s for example. For those tests, printing 2 decimals points would mean that we'd get only 2 significant digits. That's not acceptable. In order to figure out the right significant figures, we need to first figure out the measurement errors. From measurement errors, we can find out the sufficient significant figures for standard deviations, from which we can find out the right significant figures for the mean, median, etc... Just picking 2 decimal point arbitrarily isn't an acceptable solution here.
Ryosuke Niwa
Comment 3 2012-06-27 09:33:22 PDT
+WildFox since he would surely know all about significant figures for he is a physicist :)
Kentaro Hara
Comment 4 2012-06-27 16:51:12 PDT
(In reply to comment #2) > No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures. Are the numbers used for arithmetic calculations? I've thought that _log.info() lines are just for human readability, and the arithmetic calculations (e.g. drawing perf-bot graphs etc) are done by using 'results', which contains all decimals.
Ryosuke Niwa
Comment 5 2012-06-27 16:55:02 PDT
(In reply to comment #4) > (In reply to comment #2) > > No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures. > > Are the numbers used for arithmetic calculations? I've thought that _log.info() lines are just for human readability, and the arithmetic calculations (e.g. drawing perf-bot graphs etc) are done by using 'results', which contains all decimals. Yes, I (humans) compute average between different runs all the time.
Kentaro Hara
Comment 6 2012-06-27 16:59:47 PDT
OK, marking as WONTFIX.
Ryosuke Niwa
Comment 7 2013-03-01 14:55:45 PST
Let's do this properly.
Ryosuke Niwa
Comment 8 2013-03-01 15:12:58 PST
Created attachment 191049 [details] Fixes the bug
Eric Seidel (no email)
Comment 9 2013-03-01 15:16:22 PST
Comment on attachment 191049 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review > Tools/ChangeLog:12 > + We show one more digit than the most significant digit in the sample standard deviation since it's > + helpful when doing back-of-the-envelop calculations using these numbers. I have no stats background... but it seems like digits are either significant or not, right? If you're including non-significant digits in these calculations, isn't that just noise? > Tools/Scripts/webkitpy/performance_tests/perftest.py:159 > + return round(x, -highest_decimal_place(x) + sig_fig - 1) Confused by the -1? > Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91 > +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms Why the change to percent?
Ryosuke Niwa
Comment 10 2013-03-01 15:32:13 PST
Comment on attachment 191049 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review >> Tools/ChangeLog:12 >> + helpful when doing back-of-the-envelop calculations using these numbers. > > I have no stats background... but it seems like digits are either significant or not, right? If you're including non-significant digits in these calculations, isn't that just noise? "most significant digit" as in http://planetmath.org/MostSignificantDigit.html i.e. left most digit of a number when it's shown in the positional notation: http://en.wikipedia.org/wiki/Positional_notation Maybe there's a better word for it? >> Tools/Scripts/webkitpy/performance_tests/perftest.py:159 >> + return round(x, -highest_decimal_place(x) + sig_fig - 1) > > Confused by the -1? Say we have 1.2345, then to round it at 3rd sig. fig. we do: round(1.2345, 2), not round(1.2345, 3)
Ryosuke Niwa
Comment 11 2013-03-01 15:33:21 PST
Comment on attachment 191049 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review >> Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91 >> +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms > > Why the change to percent? I find that to be more useful maybe maybe not?
Eric Seidel (no email)
Comment 12 2013-03-02 00:50:44 PST
The patch looks reasonable, but you should really ask someone who's taken a stats class to review. :)
Zoltan Horvath
Comment 13 2013-03-02 13:35:39 PST
(In reply to comment #11) > (From update of attachment 191049 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review > > >> Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91 > >> +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms > > > > Why the change to percent? > > I find that to be more useful maybe maybe not? (In reply to comment #12) > The patch looks reasonable, but you should really ask someone who's taken a stats class to review. :) I prefer using the percentage, it's more straightforward.
Ryosuke Niwa
Comment 14 2015-06-05 14:49:08 PDT
Comment on attachment 191049 [details] Fixes the bug This patch is too old.
Note You need to log in before you can comment on or make changes to this bug.