Bug 78303 - [PerformanceTests] run-perf-tests should be able to handle Hz-based score.
Summary: [PerformanceTests] run-perf-tests should be able to handle Hz-based score.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Hajime Morrita
URL:
Keywords:
Depends on:
Blocks: 76156
  Show dependency treegraph
 
Reported: 2012-02-09 17:54 PST by Hajime Morrita
Modified: 2012-03-15 20:25 PDT (History)
5 users (show)

See Also:


Attachments
Patch (6.53 KB, patch)
2012-03-14 18:56 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
patch for landing (6.47 KB, patch)
2012-03-15 00:16 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hajime Morrita 2012-02-09 17:54:34 PST
Dromaeo reports score (run/s) by default. I should report time taken.
Comment 1 Hajime Morrita 2012-02-09 21:21:16 PST
I'd like to close this as WONTFIX beucause:
- We already have data. Changing the dimension will kill these data.
- What we area doing is how the official Dromaeo counts.
   We can understand our score as "Dromaeo score" even though it differs from how other tests is doing.
- Since the tests run so fast, we need to give some scale. But determining such scale is a tricky job.
   Currnet "run/s" is further simpler.

Ryosuke, what do you think?
Comment 2 Ryosuke Niwa 2012-02-09 22:59:02 PST
I think we need to update run-perf-tests so that it doesn't it doesn't erroneously add "ms" at the end. It's extremely confusing. The correct unit for these values is "Hz".
Comment 3 Ryosuke Niwa 2012-02-09 23:02:44 PST
By the way, having historical data is okay though. We've already had a bunch of bogus data due to the bot issues so switching the unit type at this point shouldn't be that big of a deal. Also, I'm thinking that we can add collect historical data for tests once tests have stabilized so I don't think losing historical data now should affect our decision here.
Comment 4 Hajime Morrita 2012-02-09 23:11:26 PST
(In reply to comment #2)
> I think we need to update run-perf-tests so that it doesn't it doesn't erroneously add "ms" at the end. It's extremely confusing. The correct unit for these values is "Hz".
Good point. I changed the bug title.
BTW, I personally prefer hz over ms. I guess Ojan has some opinion here.
(IIRC I read he talking about that somewhere.)
Comment 5 Adam Barth 2012-02-09 23:19:36 PST
Either way is fine as long as it's clearly labeled on the graphs and the text output whether bigger-is-better or smaller-is-better.
Comment 6 Ryosuke Niwa 2012-02-09 23:23:17 PST
(In reply to comment #5)
> Either way is fine as long as it's clearly labeled on the graphs and the text output whether bigger-is-better or smaller-is-better.

For labeling, we need upstream changes to perf-o-matic since we currently don't have any notion of units :(
Comment 7 Kentaro Hara 2012-03-14 16:59:40 PDT
Reporting "runs/s" as "ms" is extremely confusing (Actually I've so far given up a few patches because I thought they "regressed" performance). We should fix it somehow. As far as I see the discussion, the possible fixes are as follows:

[1] Just change the label from "ms" to "Hz". (Nit: I would prefer "runs/s" to "Hz" since the official Dromaeo is using "runs/s")
[2] Convert "runs/s" measured by Dromaeo to "ms", and report "ms". (patch: https://bugs.webkit.org/show_bug.cgi?id=80974)
[3] Run Dromaeo with the "runStyle=ms" mode, by which Dromaeo measures "ms" instead of "runs/s". Then report "ms".

IMHO,

- I do not like [1]. Mixing "larger-is-better" graphs and "smaller-is-better" graphs is really confusing. If we do not care historical data, I guess we can avoid [1].
- Both [2] and [3] sound good to me, but [2] would be better. In case of [3], we need to slightly modify webrunner.js to run the "runStyle=ms" mode with the same scale as the "runStyle=runs/s" mode (patch: https://bugs.webkit.org/attachment.cgi?id=131576&action=review). On the other hand, the disadvantage of [2] is that we cannot output the median and stdev, since we cannot calculate the median and stdev of "ms" from those of "runs/s".

WDYT?
Comment 8 Ryosuke Niwa 2012-03-14 17:06:49 PDT
(In reply to comment #7)
> Reporting "runs/s" as "ms" is extremely confusing (Actually I've so far given up a few patches because I thought they "regressed" performance). We should fix it somehow. As far as I see the discussion, the possible fixes are as follows:
> 
> [1] Just change the label from "ms" to "Hz". (Nit: I would prefer "runs/s" to "Hz" since the official Dromaeo is using "runs/s")
> [2] Convert "runs/s" measured by Dromaeo to "ms", and report "ms". (patch: https://bugs.webkit.org/show_bug.cgi?id=80974)
> [3] Run Dromaeo with the "runStyle=ms" mode, by which Dromaeo measures "ms" instead of "runs/s". Then report "ms".
>
> - I do not like [1]. Mixing "larger-is-better" graphs and "smaller-is-better" graphs is really confusing. If we do not care historical data, I guess we can avoid [1].
> - Both [2] and [3] sound good to me, but [2] would be better. In case of [3], we need to slightly modify webrunner.js to run the "runStyle=ms" mode with the same scale as the "runStyle=runs/s" mode (patch: https://bugs.webkit.org/attachment.cgi?id=131576&action=review). On the other hand, the disadvantage of [2] is that we cannot output the median and stdev, since we cannot calculate the median and stdev of "ms" from those of "runs/s".

Let's not do [2] or [3]. It's extremely confusing to invent our own metrics for Dromaeo. Also see https://bugs.webkit.org/show_bug.cgi?id=80776.
Comment 9 Kentaro Hara 2012-03-14 17:10:42 PDT
(In reply to comment #8)
> Let's not do [2] or [3]. It's extremely confusing to invent our own metrics for Dromaeo. Also see https://bugs.webkit.org/show_bug.cgi?id=80776.

Is it because we want to compare the perf-test results with the results when we run Dromaeo manually on the browser?
Comment 10 Ryosuke Niwa 2012-03-14 17:13:01 PDT
(In reply to comment #9)
> Is it because we want to compare the perf-test results with the results when we run Dromaeo manually on the browser?

Right.
Comment 11 Kentaro Hara 2012-03-14 17:15:30 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > Is it because we want to compare the perf-test results with the results when we run Dromaeo manually on the browser?
> 
> Right.

OK, then I'll take the approach [1]. (although I still think that [2] or [3] would be less confusing in total, IMO)
Comment 12 Ryosuke Niwa 2012-03-14 17:16:40 PDT
(In reply to comment #11)
> OK, then I'll take the approach [1]. (although I still think that [2] or [3] would be less confusing in total, IMO)

I think this is just an UI issue on perf-o-matic. We just need to put some information about whether larger is better or smaller is better for each graph (maybe just use different colors?).
Comment 13 Kentaro Hara 2012-03-14 18:56:44 PDT
Created attachment 131973 [details]
Patch
Comment 14 Ryosuke Niwa 2012-03-14 23:56:02 PDT
Comment on attachment 131973 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=131973&action=review

> Tools/Scripts/webkitpy/performance_tests/perftestsrunner.py:294
>                  results[score.group(1)] = float(score.group(2))
> +                if score.group(3):
> +                    unit = score.group(3)

Can we used the named groups instead? e.g. "(?Pblah...)" then match.group('blah')

> PerformanceTests/ChangeLog:16
> +        Running Dromaeo/dom-attr.html (1 of 1)

This

> PerformanceTests/ChangeLog:19
> +        Finished: 30.616529 s

And this line are actually output into stderr so you should probably exclude them (you can confirm this by piping stdout).
Comment 15 Ryosuke Niwa 2012-03-14 23:56:47 PDT
I think we need to include the units in JSON file as well so that perf-o-matic can display them. Could you either file a new bug or address that in your patch as well?
Comment 16 Kentaro Hara 2012-03-15 00:16:27 PDT
Created attachment 131996 [details]
patch for landing
Comment 17 WebKit Review Bot 2012-03-15 20:13:44 PDT
Comment on attachment 131996 [details]
patch for landing

Clearing flags on attachment: 131996

Committed r110933: <http://trac.webkit.org/changeset/110933>