Bug 106078 - Statistics used in perftest_unittest.py and perftest_integrationtest.py are bogus
Summary: Statistics used in perftest_unittest.py and perftest_integrationtest.py are b...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords: InRadar
Depends on:
Blocks: 97510
  Show dependency treegraph
 
Reported: 2013-01-04 01:03 PST by Ryosuke Niwa
Modified: 2013-01-07 11:06 PST (History)
10 users (show)

See Also:


Attachments
Patch (26.94 KB, patch)
2013-01-04 01:18 PST, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Patch (26.32 KB, patch)
2013-01-04 01:30 PST, Ryosuke Niwa
tony: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2013-01-04 01:03:08 PST
Right now, run-perf-tests simply reads statistics off of test output text for non-page-loading tests. While this is desirable in the long term where we can have all statistics functions implemented only in JavaScript, it blocks our work to use multiple instances of DRT/WTR to smooth out between-run variance in the bug 97510.

Now that Dromaeo and other perf. tests all report results from each iteration, we can reliably compute statistics in python code instead.
Comment 1 Radar WebKit Bug Importer 2013-01-04 01:04:05 PST
<rdar://problem/12955987>
Comment 2 Ryosuke Niwa 2013-01-04 01:18:44 PST
Created attachment 181282 [details]
Patch
Comment 3 Ryosuke Niwa 2013-01-04 01:25:30 PST
Let me change the scope of this bug. Instead of refactoring the unit & integration tests and modifying perftest.py, we can concentrate on refactoring unit & integration tests.
Comment 4 Ryosuke Niwa 2013-01-04 01:30:46 PST
Created attachment 181283 [details]
Patch
Comment 5 Ryosuke Niwa 2013-01-04 10:21:10 PST
Committed r138810: <http://trac.webkit.org/changeset/138810>
Comment 6 Csaba Osztrogonác 2013-01-07 07:32:41 PST
(In reply to comment #5)
> Committed r138810: <http://trac.webkit.org/changeset/138810>

It broke a webkitpy tests:

   File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py", line 306, in test_run_memory_test
      self.assertEqual(results['Parser/memory-test'], MemoryTestData.results)
  AssertionError: {u'min': 1080.0, u'max': 1120.0, u'median': 1101.0, u'values': [1080.0, 1120.0, 1095.0, 1101.0, 1104.0], u'stdev': 14.508599999999999, u'avg': 1100.0, u'unit': u'ms'} != {'min': 1080, 'max': 1120, 'median': 1101, 'values': [1080, 1120, 1095, 1101, 1104], 'stdev': 14.508609999999999, 'avg': 1100, 'unit': 'ms'}

Could you fix it, please?
Comment 7 Ryosuke Niwa 2013-01-07 10:33:44 PST
(In reply to comment #6)
> (In reply to comment #5)
> > Committed r138810: <http://trac.webkit.org/changeset/138810>
> 
> It broke a webkitpy tests:
> 
>    File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py", line 306, in test_run_memory_test
>       self.assertEqual(results['Parser/memory-test'], MemoryTestData.results)
>   AssertionError: {u'min': 1080.0, u'max': 1120.0, u'median': 1101.0, u'values': [1080.0, 1120.0, 1095.0, 1101.0, 1104.0], u'stdev': 14.508599999999999, u'avg': 1100.0, u'unit': u'ms'} != {'min': 1080, 'max': 1120, 'median': 1101, 'values': [1080, 1120, 1095, 1101, 1104], 'stdev': 14.508609999999999, 'avg': 1100, 'unit': 'ms'}
> 
> Could you fix it, please?

Huh, do you have a very old version of python? It appears to me that there's some significant rounding error there.
Comment 8 Csaba Osztrogonác 2013-01-07 10:40:07 PST
I have Python 2.6.6 (Debian Squeeze), but it fails on Qt, GTK and Chromium 
bots too. But it passes for me with Python 2.7.3 (Ubuntu 12.04).

I don't think if the proper fix is updating python on several bots ...
Comment 9 Ryosuke Niwa 2013-01-07 10:55:19 PST
(In reply to comment #8)
> I have Python 2.6.6 (Debian Squeeze), but it fails on Qt, GTK and Chromium 
> bots too. But it passes for me with Python 2.7.3 (Ubuntu 12.04).

I mean... it's really bad that standard deviation computation has such a large computation error. It's correct for only 6 decimal points...
Comment 10 Ryosuke Niwa 2013-01-07 10:55:34 PST
(In reply to comment #9)
> (In reply to comment #8)
> > I have Python 2.6.6 (Debian Squeeze), but it fails on Qt, GTK and Chromium 
> > bots too. But it passes for me with Python 2.7.3 (Ubuntu 12.04).
> 
> I mean... it's really bad that standard deviation computation has such a large computation error. It's correct for only 6 decimal points...

Ugh... I mean 6 significant figures.
Comment 11 Ryosuke Niwa 2013-01-07 11:06:30 PST
It's mind blowing that people think python is great for scientific computation when its numerical accuracy is much worse than that of JavaScript.
Comment 12 Ryosuke Niwa 2013-01-07 11:06:48 PST
Attempted a fix in http://trac.webkit.org/changeset/138965.