Bug 90062

Summary:

Output numbers of run-perf-tests should be rounded in accordance with sample standard deviations

Product:

WebKit

Reporter:

Kentaro Hara <haraken>

Component:

Tools / Tests

Assignee:

Kentaro Hara <haraken>

Status:

REOPENED

Severity:

Normal

CC:

abarth, dpranke, eric, pdr, rniwa, webkit.review.bot, zimmermann, zoltan

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Bug Depends on:

Bug Blocks:

77037

Attachments:

Description	Flags
Patch	none
Fixes the bug	none

Kentaro Hara

Reported 2012-06-27 04:32:21 PDT

Currently the output of performance tests looks like this: DESCRIPTION: This benchmark covers DOM attributes that return small integers. RESULT Bindings: scroll-top= 191.379586433 runs/s median= 191.731644624 runs/s, stdev= 2.56134259787 runs/s, min= 180.412371134 runs/s, max= 192.771084337 runs/s "191.731644624 runs/s" is too verbose. It should be "191.73 runs/s". Whichever units we use (runs/s or ms), rounding off the output to the second decimal place would be enough and preferable.

Attachments
Patch (2.12 KB, patch) 2012-06-27 04:35 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
Fixes the bug (5.94 KB, patch) 2013-03-01 15:12 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Kentaro Hara

Comment 1 2012-06-27 04:35:27 PDT

Created attachment 149726 [details] Patch

Ryosuke Niwa

Comment 2 2012-06-27 09:31:02 PDT

Comment on attachment 149726 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=149726&action=review > Tools/ChangeLog:16 > + "191.731644624 runs/s" is too verbose. It should be "191.73 runs/s". > + Whichever unit we use (runs/s or ms), rounding off the output to the second > + decimal place would be enough and preferable. No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures. e.g. there are tests that only have 0.7 runs/s for example. For those tests, printing 2 decimals points would mean that we'd get only 2 significant digits. That's not acceptable. In order to figure out the right significant figures, we need to first figure out the measurement errors. From measurement errors, we can find out the sufficient significant figures for standard deviations, from which we can find out the right significant figures for the mean, median, etc... Just picking 2 decimal point arbitrarily isn't an acceptable solution here.

Ryosuke Niwa

Comment 3 2012-06-27 09:33:22 PDT

+WildFox since he would surely know all about significant figures for he is a physicist :)

Kentaro Hara

Comment 4 2012-06-27 16:51:12 PDT

(In reply to comment #2) > No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures. Are the numbers used for arithmetic calculations? I've thought that _log.info() lines are just for human readability, and the arithmetic calculations (e.g. drawing perf-bot graphs etc) are done by using 'results', which contains all decimals.

Ryosuke Niwa

Comment 5 2012-06-27 16:55:02 PDT

(In reply to comment #4) > (In reply to comment #2) > > No, the whole point of printing out all decimals is so that you can apply arithmetic operations on those numbers and still preserve enough significant figures. > > Are the numbers used for arithmetic calculations? I've thought that _log.info() lines are just for human readability, and the arithmetic calculations (e.g. drawing perf-bot graphs etc) are done by using 'results', which contains all decimals. Yes, I (humans) compute average between different runs all the time.

Kentaro Hara

Comment 6 2012-06-27 16:59:47 PDT

OK, marking as WONTFIX.

Ryosuke Niwa

Comment 7 2013-03-01 14:55:45 PST

Let's do this properly.

Ryosuke Niwa

Comment 8 2013-03-01 15:12:58 PST

Created attachment 191049 [details] Fixes the bug

Eric Seidel (no email)

Comment 9 2013-03-01 15:16:22 PST

Comment on attachment 191049 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review > Tools/ChangeLog:12 > + We show one more digit than the most significant digit in the sample standard deviation since it's > + helpful when doing back-of-the-envelop calculations using these numbers. I have no stats background... but it seems like digits are either significant or not, right? If you're including non-significant digits in these calculations, isn't that just noise? > Tools/Scripts/webkitpy/performance_tests/perftest.py:159 > + return round(x, -highest_decimal_place(x) + sig_fig - 1) Confused by the -1? > Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91 > +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms Why the change to percent?

Ryosuke Niwa

Comment 10 2013-03-01 15:32:13 PST

Comment on attachment 191049 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review >> Tools/ChangeLog:12 >> + helpful when doing back-of-the-envelop calculations using these numbers. > > I have no stats background... but it seems like digits are either significant or not, right? If you're including non-significant digits in these calculations, isn't that just noise? "most significant digit" as in http://planetmath.org/MostSignificantDigit.html i.e. left most digit of a number when it's shown in the positional notation: http://en.wikipedia.org/wiki/Positional_notation Maybe there's a better word for it? >> Tools/Scripts/webkitpy/performance_tests/perftest.py:159 >> + return round(x, -highest_decimal_place(x) + sig_fig - 1) > > Confused by the -1? Say we have 1.2345, then to round it at 3rd sig. fig. we do: round(1.2345, 2), not round(1.2345, 3)

Ryosuke Niwa

Comment 11 2013-03-01 15:33:21 PST

Comment on attachment 191049 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review >> Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91 >> +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms > > Why the change to percent? I find that to be more useful maybe maybe not?

Eric Seidel (no email)

Comment 12 2013-03-02 00:50:44 PST

The patch looks reasonable, but you should really ask someone who's taken a stats class to review. :)

Zoltan Horvath

Comment 13 2013-03-02 13:35:39 PST

(In reply to comment #11) > (From update of attachment 191049 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=191049&action=review > > >> Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py:91 > >> +median= 1488 ms, stdev= 1.02 %, min= 1471 ms, max= 1510 ms > > > > Why the change to percent? > > I find that to be more useful maybe maybe not? (In reply to comment #12) > The patch looks reasonable, but you should really ask someone who's taken a stats class to review. :) I prefer using the percentage, it's more straightforward.

Ryosuke Niwa

Comment 14 2015-06-05 14:49:08 PDT

Comment on attachment 191049 [details] Fixes the bug This patch is too old.

Note You need to log in before you can comment on or make changes to this bug.