RESOLVED WONTFIX 71010
introduce a way to grab performance counters from Inspector UI Performance tests.
https://bugs.webkit.org/show_bug.cgi?id=71010
Summary introduce a way to grab performance counters from Inspector UI Performance te...
Ilya Tikhonovsky
Reported 2011-10-27 05:51:50 PDT
%subj%
Attachments
Patch (6.87 KB, patch)
2011-10-27 06:26 PDT, Ilya Tikhonovsky
no flags
Patch (8.79 KB, patch)
2011-10-28 01:45 PDT, Ilya Tikhonovsky
no flags
Patch (11.35 KB, patch)
2011-11-11 08:52 PST, Ilya Tikhonovsky
no flags
Patch (11.36 KB, patch)
2011-11-11 08:58 PST, Ilya Tikhonovsky
no flags
Patch (17.46 KB, patch)
2011-11-14 01:03 PST, Ilya Tikhonovsky
no flags
Patch (17.47 KB, patch)
2011-11-14 13:16 PST, Ilya Tikhonovsky
no flags
simple perf test driver as a proof-of-concept (3.71 KB, text/x-python-script)
2011-11-15 18:40 PST, Dirk Pranke
no flags
Patch with a separate run-perf-tests that just talks to the Driver classes. (8.65 KB, patch)
2011-11-15 18:55 PST, Dirk Pranke
eric: review-
Ilya Tikhonovsky
Comment 1 2011-10-27 06:26:55 PDT
Pavel Feldman
Comment 2 2011-10-27 06:31:34 PDT
Comment on attachment 112673 [details] Patch Go python?
Ilya Tikhonovsky
Comment 3 2011-10-28 01:45:25 PDT
Yury Semikhatsky
Comment 4 2011-10-31 01:53:58 PDT
Comment on attachment 112833 [details] Patch Clearing r? while Ilya is working on changes for other ports.
Ilya Tikhonovsky
Comment 5 2011-11-11 08:52:19 PST
Ilya Tikhonovsky
Comment 6 2011-11-11 08:58:12 PST
Adam Barth
Comment 7 2011-11-13 22:17:13 PST
Comment on attachment 114710 [details] Patch This patch does not appear to have any tests.
Ilya Tikhonovsky
Comment 8 2011-11-14 01:03:04 PST
Adam Barth
Comment 9 2011-11-14 08:28:08 PST
Comment on attachment 114897 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=114897&action=review Other than that, this looks fine. > Tools/ChangeLog:11 > + Sample command line: > + $ ./webkit/tools/layout_tests/run_webkit_tests.sh --release --no-retry-failures inspector/performance --time-out-ms=30000 --no-pixel-tests --print=performance -f --no-show-results Presumably this works in a pure WebKit checkout too, right? > Tools/Scripts/webkitpy/layout_tests/models/test_results.py:41 > - def __init__(self, test_name, failures=None, test_run_time=None, has_stderr=False): > + def __init__(self, test_name, failures=None, test_run_time=None, has_stderr=False, performance_counters=[]): Non-primitive values don't work well as defaults because all the callers of this function will get the same static array to work with. Instead, we prefer to use None as the default and then do what we do below for self.failures. > Tools/Scripts/webkitpy/layout_tests/port/driver.py:43 > - test_time=0, timeout=False, error='', crashed_process_name=None): > + test_time=0, timeout=False, error='', crashed_process_name=None, performance_counters=[]): Ditto > Tools/Scripts/webkitpy/layout_tests/views/printing_unittest.py:128 > - def get_result(self, test_name, result_type=test_expectations.PASS, run_time=0): > + def get_result(self, test_name, result_type=test_expectations.PASS, run_time=0, performance_counters=[]): Ditto
Ilya Tikhonovsky
Comment 10 2011-11-14 13:16:41 PST
Dirk Pranke
Comment 11 2011-11-14 13:23:57 PST
Comment on attachment 115017 [details] Patch I'm sorry, but I'd like to back up a step or two and understand what the purpose of this flag and these numbers are before we land this - you are using NRWT in a way that looks like it is fairly different from the way we normally run it, and I'd like to understand it better to ensure that there isn't a better way to accomplish what you're trying to accomplish. All of the other tests generate output to stdout and then that output is compared to references (as I'm sure you know). When you use these performance counters, is the only point of the test to get these numbers, or are they supposed to be used for correctness as well? Is this data only ever going to be used by a developer running these tests interactively, or does this data get generated (and stored) on the bots? Are the performance counters generated for every test run, or only in tests that are triggered by some call in the test script (via LayoutTestController or something else)? Also, this data is being written into the logs, but it seems like you might actually want it to be considered part of the test output, and maybe it should be stored in test_results so that it shows up in the HTML results (and in the JSON) as well?
Ilya Tikhonovsky
Comment 12 2011-11-14 14:28:30 PST
(In reply to comment #11) > (From update of attachment 115017 [details]) > I'm sorry, but I'd like to back up a step or two and understand what the purpose of this flag and these numbers are before we land this - you are using NRWT in a way that looks like it is fairly different from the way we normally run it, and I'd like to understand it better to ensure that there isn't a better way to accomplish what you're trying to accomplish. > > All of the other tests generate output to stdout and then that output is compared to references (as I'm sure you know). When you use these performance counters, is the only point of the test to get these numbers, or are they supposed to be used for correctness as well? Is this data only ever going to be used by a developer running these tests interactively, or does this data get generated (and stored) on the bots? Are the performance counters generated for every test run, or only in tests that are triggered by some call in the test script (via LayoutTestController or something else)? > > Also, this data is being written into the logs, but it seems like you might actually want it to be considered part of the test output, and maybe it should be stored in test_results so that it shows up in the HTML results (and in the JSON) as well? We want to have performance counters for common WebInspector actions just because WebInspector has the same problems with performance and memory as other web apps. Almost all WebInspector's code is in WebCore and we test it via layout tests. The main idea was to reuse perf and layout tests infrastructure and produce perf compatible output from such layout tests for all the WebKit platforms. I can print the times and heap sizes or heap size deltas from any standard layout test. The only problem is that numbers are changing from run to run. It is fine for a perf test output but bad for layout test output. With this patch I'm able to get a stable test output just because perf rows were filtered out (As you know we've been filtering out some strings from DRT output). The perf results are dumping as NWRT's own output. You saw that output in ChangeLog. Also I've slightly changed the buildbot scripts but the patch is not ready yet. With these changes the bot grabs the perf data and sends it to the perf script. bug with tests https://bugs.webkit.org/show_bug.cgi?id=72260 the links to buildbot instance and perf data were sent separately.
Eric Seidel (no email)
Comment 13 2011-11-14 15:09:56 PST
Comment on attachment 115017 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=115017&action=review I don't think hacking Perf-modes into NRWT is a good idea. I think these should be handled by separate performance tests with a separate harness. > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:560 > + elif line.startswith("#PERF:"): > + performance_counters.append(line.rstrip()[6:]) So you're only adding support for Chromium? What ServerProcess/normal webkit ports?
Dirk Pranke
Comment 14 2011-11-15 18:40:05 PST
Created attachment 115293 [details] simple perf test driver as a proof-of-concept Here's a sample driver that which is all of 20 lines of code once you get past the boilerplate and command line parsing. This patch would still need the changes to ChromiumDriver and DriverOutput, but that seems fine to me. Depending on how far you go with this, you might need other things like the logic to start and stop servers, but that seems managable as well. What do you all think?
Dirk Pranke
Comment 15 2011-11-15 18:55:32 PST
Created attachment 115295 [details] Patch with a separate run-perf-tests that just talks to the Driver classes.
Dirk Pranke
Comment 16 2011-11-15 18:58:01 PST
Revised so that this is an actual patch ... note that if I was to land something like this, I'd be tempted to make the performance_counters field into some sort of more generic dict that gets handed back, e.g. counters={ "PERF": [list of strings]}. But maybe that's not worth the bother at this point.
Eric Seidel (no email)
Comment 17 2011-11-15 21:14:48 PST
Comment on attachment 115295 [details] Patch with a separate run-perf-tests that just talks to the Driver classes. View in context: https://bugs.webkit.org/attachment.cgi?id=115295&action=review > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:560 > + elif line.startswith("#PERF:"): > + performance_counters.append(line.rstrip()[6:]) I still don't understand why the test harness needs to be involved in collecting #PERF lines. Why can't these be part of stdout or post-processed as part of stderr by the perf-testing infrastructure itself?
Dirk Pranke
Comment 18 2011-11-15 21:40:59 PST
(In reply to comment #17) > I still don't understand why the test harness needs to be involved in collecting #PERF lines. Why can't these be part of stdout or post-processed as part of stderr by the perf-testing infrastructure itself? Ilya wrote that they want to be able to share the actual tests between the layout_tests (correctness) and perf. That would mean that the tests (DRTs) can't write to stdout. I also noted that, in the context of the layout_tests, we generally think that tests should not print to stderr (since we report the stderr output as part of the results.html), and so I wouldn't necessarily want to encourage that mechanism for this purpose. It also makes sense that one would want to separate structured data being reported from unstructured output like debug logging from DRT. Given all that, I'm certainly not married to the approach we're taking here. How would you suggest we meet these goals (or change them)?
Eric Seidel (no email)
Comment 19 2011-11-16 00:17:48 PST
(In reply to comment #18) > (In reply to comment #17) > > I still don't understand why the test harness needs to be involved in collecting #PERF lines. Why can't these be part of stdout or post-processed as part of stderr by the perf-testing infrastructure itself? > > Ilya wrote that they want to be able to share the actual tests between the layout_tests (correctness) and perf. That would mean that the tests (DRTs) can't write to stdout. This is the statement I disagree with. I don't believe tests can be shared as both perf and regression tests. You can have regression tests which test that perf has not regressed some large value (like the order-of-magnitude perf tests for editing), you can have regression tests which test for behavioral regressions, and you can have perf tests which produce machine/environment variable results with no expected value, which you can use for local comparisons (like making something faster, or comparing against some reference build), but are useless for global comparisons (unlike regression tests).
Eric Seidel (no email)
Comment 20 2011-11-16 01:14:36 PST
I will not be accepting patches in this design into WebKit. Given our private threads on the matter, your understanding of performance testing does not match mine (which I believe agrees with WebKit's). If you wish to continue down this line, I would encourage you to develop this in a private repository and get it up and running on your own performance bots before you come back to the project with patches. It's not possible to do meaningful performance testing globally (on all machines, like how regression tests are run). You can do order-of-magnitude performance regression testing like the editing tests do, but that requires no modifications to the regression testing harness. I recommend you develop a separate performance testing system, akin to how v8, sunspider, or the PerformanceTests/ microbenchmarks in webkit work. I expect you may wish to use DumpRenderTree to run your test harness (just cause it's a nice way to use WebKit from the command-line). But you will not use NRWT.
Note You need to log in before you can comment on or make changes to this bug.