WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 97510
Some perf. tests have variances that differ greatly between runs
https://bugs.webkit.org/show_bug.cgi?id=97510
Summary
Some perf. tests have variances that differ greatly between runs
Ryosuke Niwa
Reported
2012-09-24 18:32:03 PDT
Created
attachment 165486
[details]
Two consecutive runs of perf. tests at
r129387
floats_20_100 has really small in-run variances but has a large between-run variance:
http://webkit-perf.appspot.com/graph.html#tests=[[477032,2001,3001]]&sel=1347931595661,1348536395661,408.9015180414373,486.55303319295246&displayrange=7&datatype=running
Bindings/scroll-top has the same problem:
http://webkit-perf.appspot.com/graph.html#tests=[[2932950,2001,3001],[2932950,2001,963028],[2932950,2001,32196]]&sel=1347932196853.843,1348536395661,24.128788890260637,166.97916767813942&displayrange=7&datatype=running
So does Dromaeo/jslib-attr-prototype:
http://webkit-perf.appspot.com/graph.html#tests=[[45011,2001,3001],[45011,2001,963028],[45011,2001,32196]]&sel=1347932196853.843,1348536395661&displayrange=7&datatype=running
All these tests result in false positive on results page (see attachment).
Attachments
Two consecutive runs of perf. tests at r129387
(197.04 KB, text/html)
2012-09-24 18:32 PDT
,
Ryosuke Niwa
no flags
Details
layout_20_100 (original)
(19.14 KB, text/html)
2012-09-24 18:36 PDT
,
Ryosuke Niwa
no flags
Details
lyaout_20_100 (approach 1)
(21.33 KB, text/html)
2012-09-24 18:36 PDT
,
Ryosuke Niwa
no flags
Details
layout_20_100 (approach 2)
(19.01 KB, text/html)
2012-09-24 18:36 PDT
,
Ryosuke Niwa
no flags
Details
scroll-top (original)
(21.35 KB, text/html)
2012-09-24 18:37 PDT
,
Ryosuke Niwa
no flags
Details
scroll-top (approach 1)
(34.99 KB, text/html)
2012-09-24 18:38 PDT
,
Ryosuke Niwa
no flags
Details
scroll-top (approach 2)
(20.40 KB, text/html)
2012-09-24 18:38 PDT
,
Ryosuke Niwa
no flags
Details
js-attr-prototype (original)
(17.89 KB, text/html)
2012-09-24 18:39 PDT
,
Ryosuke Niwa
no flags
Details
js-attr-prototype (approach 1)
(17.84 KB, text/html)
2012-09-24 18:39 PDT
,
Ryosuke Niwa
no flags
Details
js-attr-prototype (approach 2)
(18.40 KB, text/html)
2012-09-24 18:40 PDT
,
Ryosuke Niwa
no flags
Details
Work in progress
(10.95 KB, patch)
2013-02-19 01:54 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Work in progress 2
(28.82 KB, patch)
2013-02-28 01:49 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Patch
(30.26 KB, patch)
2013-02-28 15:58 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Fixed a harness test
(31.64 KB, patch)
2013-02-28 16:01 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Updated per comment and introduced iteration groups
(35.83 KB, patch)
2013-03-01 20:35 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Ryosuke Niwa
Comment 1
2012-09-24 18:35:26 PDT
I've considered the following two approaches to solve this problem: 1. Increase the number of samples we take in each test (js code change). 2. Reduce the number of samples taken in each test by a factor of roughly 4, and run the same in 4 different instances of DumpRenderTree. I'm going to post a whole bunch of results pages now but the results appear to indicate that we should do take the approach 2.
Ryosuke Niwa
Comment 2
2012-09-24 18:36:03 PDT
Created
attachment 165487
[details]
layout_20_100 (original)
Ryosuke Niwa
Comment 3
2012-09-24 18:36:35 PDT
Created
attachment 165488
[details]
lyaout_20_100 (approach 1)
Ryosuke Niwa
Comment 4
2012-09-24 18:36:58 PDT
Created
attachment 165489
[details]
layout_20_100 (approach 2)
Ryosuke Niwa
Comment 5
2012-09-24 18:37:42 PDT
Created
attachment 165490
[details]
scroll-top (original)
Ryosuke Niwa
Comment 6
2012-09-24 18:38:01 PDT
Created
attachment 165491
[details]
scroll-top (approach 1)
Ryosuke Niwa
Comment 7
2012-09-24 18:38:30 PDT
Created
attachment 165492
[details]
scroll-top (approach 2)
Ryosuke Niwa
Comment 8
2012-09-24 18:39:15 PDT
Created
attachment 165493
[details]
js-attr-prototype (original)
Ryosuke Niwa
Comment 9
2012-09-24 18:39:48 PDT
Created
attachment 165494
[details]
js-attr-prototype (approach 1)
Ryosuke Niwa
Comment 10
2012-09-24 18:40:24 PDT
Created
attachment 165495
[details]
js-attr-prototype (approach 2)
Ryosuke Niwa
Comment 11
2012-09-24 18:42:24 PDT
To elaborate more on the two approaches, let me give you an example. Suppose we have a sample test that has 20 iterations. Approach 1 increases the number of iterations to, say, 100. Approach 2 reduces the number of iterations to 5, but then runs it 4 times, each using a different instance of DumpRenderTree.
Kentaro Hara
Comment 12
2012-09-24 18:51:24 PDT
General comment: In my experience, an average value is strongly affected by a couple of outliers. How about calculating the average value after discarding one or two outliers (i.e. discarding one or two largest values)? Or how about using a median instead of an average? What we are interested in is not the distribution of execution times but the execution time in "common cases". In that sense, in order to observe what we want to observe, it might make more sense to observe a median or an average that ignores outliers than observe a pure average of all values.
Ryosuke Niwa
Comment 13
2012-09-24 19:53:23 PDT
(In reply to
comment #12
)
> General comment: In my experience, an average value is strongly affected by a couple of outliers. How about calculating the average value after discarding one or two outliers (i.e. discarding one or two largest values)? Or how about using a median instead of an average?
You can take a look at each graph (click on the test to show the graph, and then click on the graph to adjust the y-axis), but I don't think discarding one or two extrema or using median wouldn't help here because some of these tests have bi-modal distributions.
Ryosuke Niwa
Comment 14
2012-09-24 19:57:27 PDT
Also, take a look at the graph on
https://bug-97510-attachments.webkit.org/attachment.cgi?id=165488
(layout_20_100 with 100 iterations). There, values are not only multi-modal but both means and medians are centered at different values in different runs.
Stephanie Lewis
Comment 15
2012-10-01 15:44:04 PDT
IMO you want to run enough iterations so that issues in repeating code are found (i.e. if the javascript heap size increases each time slowing stuff down), but running multiple instances does improve variance and it reduces the risk of an entire run being an outlier.
Ryosuke Niwa
Comment 16
2012-10-01 16:18:21 PDT
(In reply to
comment #15
)
> IMO you want to run enough iterations so that issues in repeating code are found (i.e. if the javascript heap size increases each time slowing stuff down), but running multiple instances does improve variance and it reduces the risk of an entire run being an outlier.
In V8 at least, there are some global variables that are initialized at startup, which is then used to compute hashes, etc... So just increasing the number of iterations doesn't help. See results labeled "(approach 1)",
Ryosuke Niwa
Comment 17
2013-02-19 01:54:14 PST
Created
attachment 189027
[details]
Work in progress
Ryosuke Niwa
Comment 18
2013-02-28 01:49:56 PST
Created
attachment 190681
[details]
Work in progress 2
Ryosuke Niwa
Comment 19
2013-02-28 15:58:23 PST
Created
attachment 190829
[details]
Patch
Ryosuke Niwa
Comment 20
2013-02-28 16:01:32 PST
Created
attachment 190831
[details]
Fixed a harness test
Benjamin Poulain
Comment 21
2013-03-01 16:49:45 PST
Comment on
attachment 190831
[details]
Fixed a harness test View in context:
https://bugs.webkit.org/attachment.cgi?id=190831&action=review
I can't wait to see the result on the bots.
> PerformanceTests/Dromaeo/resources/dromaeorunner.js:9 > setup: function(testName) { > - PerfTestRunner.prepareToMeasureValuesAsync({iterationCount: 5, doNotMeasureMemoryUsage: true, doNotIgnoreInitialRun: true, unit: 'runs/s'}); > + PerfTestRunner.prepareToMeasureValuesAsync({dromaeoIterationCount: 5, doNotMeasureMemoryUsage: true, doNotIgnoreInitialRun: true, unit: 'runs/s'}); > > var iframe = document.createElement("iframe"); > - var url = DRT.baseURL + "?" + testName + '&numTests=' + PerfTestRunner.iterationCount(); > + var url = DRT.baseURL + "?" + testName + '&numTests=' + 5;
var dromaeoIterationCount; PerfTestRunner.prepareToMeasureValuesAsync({dromaeoIterationCount: dromaeoIterationCount, Foobar) [...] var url = DRT.baseURL + "?" + testName + '&numTests=' + dromaeoIterationCount;
> PerformanceTests/resources/runner.js:158 > + iterationCount = test.dromaeoIterationCount || (window.testRunner ? 5 : 20);
Damn, JavaScript is ugly :-D
> Tools/Scripts/webkitpy/performance_tests/perftest.py:110 > + def __init__(self, port, test_name, test_path, process_count=4):
process_count -> process_run_count or something alike?
> Tools/Scripts/webkitpy/performance_tests/perftest.py:134 > + for _ in range(0, self._process_count):
xrange? Gosh I hate python sometime :)
> Tools/Scripts/webkitpy/performance_tests/perftest.py:138 > + if not self._run_with_driver(driver, time_out_ms): > + return None
You may have 3 run with results and one that failed?
Ryosuke Niwa
Comment 22
2013-03-01 20:35:59 PST
Created
attachment 191091
[details]
Updated per comment and introduced iteration groups
Eric Seidel (no email)
Comment 23
2013-03-03 01:20:19 PST
Comment on
attachment 191091
[details]
Updated per comment and introduced iteration groups View in context:
https://bugs.webkit.org/attachment.cgi?id=191091&action=review
> Tools/Scripts/webkitpy/performance_tests/perftest.py:383 > - for i in range(0, 20): > + for i in range(0, 6):
What does this do?
Eric Seidel (no email)
Comment 24
2013-03-03 01:21:00 PST
Comment on
attachment 191091
[details]
Updated per comment and introduced iteration groups I would have split the iteration change half out into a separate patch. That would ahve reduced the size by half, and made the two independent perf results changes separate.
Ryosuke Niwa
Comment 25
2013-03-03 01:21:44 PST
Comment on
attachment 191091
[details]
Updated per comment and introduced iteration groups View in context:
https://bugs.webkit.org/attachment.cgi?id=191091&action=review
>> Tools/Scripts/webkitpy/performance_tests/perftest.py:383 >> + for i in range(0, 6): > > What does this do?
It'll do 6 runs instead of 20 for a given driver. 20 should have been 21 since we ignore the first run. It was a bug :(
Ryosuke Niwa
Comment 26
2013-03-03 15:21:21 PST
Comment on
attachment 191091
[details]
Updated per comment and introduced iteration groups Clearing flags on attachment: 191091 Committed
r144583
: <
http://trac.webkit.org/changeset/144583
>
Ryosuke Niwa
Comment 27
2013-03-03 15:21:26 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug