106078 – Statistics used in perftest_unittest.py and perftest_integrationtest.py are bogus

RESOLVED FIXED Bug 106078

Statistics used in perftest_unittest.py and perftest_integrationtest.py are bogus

https://bugs.webkit.org/show_bug.cgi?id=106078

Summary Statistics used in perftest_unittest.py and perftest_integrationtest.py are b...

Ryosuke Niwa

Reported 2013-01-04 01:03:08 PST

Right now, run-perf-tests simply reads statistics off of test output text for non-page-loading tests. While this is desirable in the long term where we can have all statistics functions implemented only in JavaScript, it blocks our work to use multiple instances of DRT/WTR to smooth out between-run variance in the bug 97510. Now that Dromaeo and other perf. tests all report results from each iteration, we can reliably compute statistics in python code instead.

Attachments
Patch (26.94 KB, patch) 2013-01-04 01:18 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Patch (26.32 KB, patch) 2013-01-04 01:30 PST, Ryosuke Niwa	tony: review+	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2013-01-04 01:04:05 PST

<rdar://problem/12955987>

Ryosuke Niwa

Comment 2 2013-01-04 01:18:44 PST

Created attachment 181282 [details] Patch

Ryosuke Niwa

Comment 3 2013-01-04 01:25:30 PST

Let me change the scope of this bug. Instead of refactoring the unit & integration tests and modifying perftest.py, we can concentrate on refactoring unit & integration tests.

Ryosuke Niwa

Comment 4 2013-01-04 01:30:46 PST

Created attachment 181283 [details] Patch

Ryosuke Niwa

Comment 5 2013-01-04 10:21:10 PST

Committed r138810: <http://trac.webkit.org/changeset/138810>

Csaba Osztrogonác

Comment 6 2013-01-07 07:32:41 PST

(In reply to comment #5) > Committed r138810: <http://trac.webkit.org/changeset/138810> It broke a webkitpy tests: File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py", line 306, in test_run_memory_test self.assertEqual(results['Parser/memory-test'], MemoryTestData.results) AssertionError: {u'min': 1080.0, u'max': 1120.0, u'median': 1101.0, u'values': [1080.0, 1120.0, 1095.0, 1101.0, 1104.0], u'stdev': 14.508599999999999, u'avg': 1100.0, u'unit': u'ms'} != {'min': 1080, 'max': 1120, 'median': 1101, 'values': [1080, 1120, 1095, 1101, 1104], 'stdev': 14.508609999999999, 'avg': 1100, 'unit': 'ms'} Could you fix it, please?

Ryosuke Niwa

Comment 7 2013-01-07 10:33:44 PST

(In reply to comment #6) > (In reply to comment #5) > > Committed r138810: <http://trac.webkit.org/changeset/138810> > > It broke a webkitpy tests: > > File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/performance_tests/perftestsrunner_integrationtest.py", line 306, in test_run_memory_test > self.assertEqual(results['Parser/memory-test'], MemoryTestData.results) > AssertionError: {u'min': 1080.0, u'max': 1120.0, u'median': 1101.0, u'values': [1080.0, 1120.0, 1095.0, 1101.0, 1104.0], u'stdev': 14.508599999999999, u'avg': 1100.0, u'unit': u'ms'} != {'min': 1080, 'max': 1120, 'median': 1101, 'values': [1080, 1120, 1095, 1101, 1104], 'stdev': 14.508609999999999, 'avg': 1100, 'unit': 'ms'} > > Could you fix it, please? Huh, do you have a very old version of python? It appears to me that there's some significant rounding error there.

Csaba Osztrogonác

Comment 8 2013-01-07 10:40:07 PST

I have Python 2.6.6 (Debian Squeeze), but it fails on Qt, GTK and Chromium bots too. But it passes for me with Python 2.7.3 (Ubuntu 12.04). I don't think if the proper fix is updating python on several bots ...

Ryosuke Niwa

Comment 9 2013-01-07 10:55:19 PST

(In reply to comment #8) > I have Python 2.6.6 (Debian Squeeze), but it fails on Qt, GTK and Chromium > bots too. But it passes for me with Python 2.7.3 (Ubuntu 12.04). I mean... it's really bad that standard deviation computation has such a large computation error. It's correct for only 6 decimal points...

Ryosuke Niwa

Comment 10 2013-01-07 10:55:34 PST

(In reply to comment #9) > (In reply to comment #8) > > I have Python 2.6.6 (Debian Squeeze), but it fails on Qt, GTK and Chromium > > bots too. But it passes for me with Python 2.7.3 (Ubuntu 12.04). > > I mean... it's really bad that standard deviation computation has such a large computation error. It's correct for only 6 decimal points... Ugh... I mean 6 significant figures.

Ryosuke Niwa

Comment 11 2013-01-07 11:06:30 PST

It's mind blowing that people think python is great for scientific computation when its numerical accuracy is much worse than that of JavaScript.

Ryosuke Niwa

Comment 12 2013-01-07 11:06:48 PST

Attempted a fix in http://trac.webkit.org/changeset/138965.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Ryosuke Niwa

Reported

2013-01-04 01:03 PST

Modified

2013-01-07 11:06 PST History

CC List

10 users Show

URL

Keywords InRadar

Depends on

Blocks

97510

Dependancies

tree graph