WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
REOPENED
202993
Add the timeout-factor parameter in browserperfdash-benchmark
https://bugs.webkit.org/show_bug.cgi?id=202993
Summary
Add the timeout-factor parameter in browserperfdash-benchmark
Pablo Saavedra
Reported
2019-10-15 10:08:42 PDT
The timeout-factor is useful to run the tests in slower devices where the default timeout value defined in the .plan files is not high enough to let the tests end.
Attachments
patch
(7.51 KB, patch)
2019-10-15 10:12 PDT
,
Pablo Saavedra
no flags
Details
Formatted Diff
Diff
patch
(8.11 KB, patch)
2019-10-16 00:06 PDT
,
Pablo Saavedra
no flags
Details
Formatted Diff
Diff
patch
(8.09 KB, patch)
2019-10-18 06:32 PDT
,
Pablo Saavedra
rniwa
: review-
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Pablo Saavedra
Comment 1
2019-10-15 10:12:18 PDT
Created
attachment 380997
[details]
patch
Pablo Saavedra
Comment 2
2019-10-16 00:06:26 PDT
Created
attachment 381058
[details]
patch
Carlos Alberto Lopez Perez
Comment 3
2019-10-18 06:24:56 PDT
Comment on
attachment 381058
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=381058&action=review
> Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:58 > + parser.add_argument('--timeout-factor', dest='timeoutFactorOverride', type=int, help='Timeout factor useful to run test in slower devices. e.g. 10')
As a user reading the help, I would find the explanation about what timeout-factor is a bit confusing. I would write instead something like: "Multiply the test timeout by the given factor, useful to run tests in slower devices, e.g.: 10'
Pablo Saavedra
Comment 4
2019-10-18 06:32:46 PDT
Created
attachment 381298
[details]
patch
Carlos Alberto Lopez Perez
Comment 5
2019-10-18 07:14:03 PDT
I wonder if it would make sense to also add this parameter to Tools/Scripts/run-benchmark? I guess that if you are running the tests manually one by one with run-benchmark on a slow device it may be useful as well
Ryosuke Niwa
Comment 6
2019-10-18 15:42:44 PDT
Comment on
attachment 381298
[details]
patch The right fix is to create a new plan file for those slower devices.
Carlos Alberto Lopez Perez
Comment 7
2019-10-19 08:11:58 PDT
(In reply to Ryosuke Niwa from
comment #6
)
> Comment on
attachment 381298
[details]
> patch > > The right fix is to create a new plan file for those slower devices.
I don't think that is the right fix. On our bots we run all the plans, by passing the argument "--allplans" to the tool. Duplicating each one of the plans to add different timeouts parameters is totally incompatible with using "--allplans" since you will be running twice the same tests just with different timeout parameters.
Ryosuke Niwa
Comment 8
2019-10-19 14:11:01 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #7
)
> (In reply to Ryosuke Niwa from
comment #6
) > > Comment on
attachment 381298
[details]
> > patch > > > > The right fix is to create a new plan file for those slower devices. > > I don't think that is the right fix. > > On our bots we run all the plans, by passing the argument "--allplans" to > the tool.
The design of run-benchmark is to use --plan and specify a plan file for each platform, device, etc... configuration with customized parameters.
> Duplicating each one of the plans to add different timeouts parameters is > totally incompatible with using "--allplans" since you will be running twice > the same tests just with different timeout parameters.
It looks like --allplans
https://trac.webkit.org/changeset/197505
added this option but that was probably a mistake in the first place. This is not the way run-benchmark is supposed to be used in the first place so it's natural that a fix would require even more code change that's incompatible with the design of the script.
Ryosuke Niwa
Comment 9
2019-10-19 14:12:14 PDT
The only acceptable alternative solution here (other than creating & specifying individual plan files), in my view, is to add --timeout option which overrides the default timeout of each plan.
Pablo Saavedra
Comment 10
2019-10-21 10:37:11 PDT
(In reply to Ryosuke Niwa from
comment #9
)
> The only acceptable alternative solution here (other than creating & > specifying individual plan files), in my view, is to add --timeout option > which overrides the default timeout of each plan.
I'd like to still keep the timeout-factor but if there is no other choice and you are ok with the --timeout, then I could modify the bug to do this in your suggested way
Carlos Alberto Lopez Perez
Comment 11
2019-10-21 11:14:37 PDT
(In reply to Ryosuke Niwa from
comment #9
)
> The only acceptable alternative solution here (other than creating & > specifying individual plan files), in my view, is to add --timeout option > which overrides the default timeout of each plan.
Would it be acceptable to keep --timeout-factor only for the browserperfdash-benchmark runner like the current patch does? It doesn't expose --timeout-factor to the run-benchmark script.
Ryosuke Niwa
Comment 12
2019-10-21 16:01:16 PDT
It would be not acceptable to introduce this concept into webkitpy.benchmark_runner classes. If you can figure out a way to do that, I don't really care because this is mostly a maintenance cost issue.
Carlos Alberto Lopez Perez
Comment 13
2019-10-21 18:01:44 PDT
(In reply to Ryosuke Niwa from
comment #12
)
> It would be not acceptable to introduce this concept into > webkitpy.benchmark_runner classes. If you can figure out a way to do that, I > don't really care because this is mostly a maintenance cost issue.
Ok. Then I guess that a possibility is that in the script browserperfdash-runner we create a new class (for example: BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use there this timeout-factor option.
Ryosuke Niwa
Comment 14
2019-10-22 11:49:45 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #13
)
> (In reply to Ryosuke Niwa from
comment #12
) > > It would be not acceptable to introduce this concept into > > webkitpy.benchmark_runner classes. If you can figure out a way to do that, I > > don't really care because this is mostly a maintenance cost issue. > > Ok. Then I guess that a possibility is that in the script > browserperfdash-runner we create a new class (for example: > BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use > there this timeout-factor option.
I don't think that satisfies the goal of not increasing the maintenance cost of webkitpy.benchmark_runner class. If you did subclass, then the future refactoring will have to be constrained by the existence of this flag.
Pablo Saavedra
Comment 15
2019-10-23 00:11:35 PDT
(In reply to Ryosuke Niwa from
comment #14
)
> (In reply to Carlos Alberto Lopez Perez from
comment #13
) > > (In reply to Ryosuke Niwa from
comment #12
) > > > It would be not acceptable to introduce this concept into > > > webkitpy.benchmark_runner classes. If you can figure out a way to do that, I > > > don't really care because this is mostly a maintenance cost issue. > > > > Ok. Then I guess that a possibility is that in the script > > browserperfdash-runner we create a new class (for example: > > BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use > > there this timeout-factor option. > > I don't think that satisfies the goal of not increasing the maintenance cost > of webkitpy.benchmark_runner class. If you did subclass, then the future > refactoring will have to be constrained by the existence of this flag.
Just to be in the same page. What you suggest is change the proposed `--timeout-factor` in favour of an '--timeout`, right? What I don't see is the point of how this satisfies much better the goal of not increasing the maintenance cost ... because basically implies the same changes in the code (the same changes in the constructor methods, similar addition in the ArgParser ...). Did I miss something?
Ryosuke Niwa
Comment 16
2019-10-23 01:43:33 PDT
(In reply to Pablo Saavedra from
comment #15
)
> (In reply to Ryosuke Niwa from
comment #14
) > > (In reply to Carlos Alberto Lopez Perez from
comment #13
) > > > (In reply to Ryosuke Niwa from
comment #12
) > > > > > > Ok. Then I guess that a possibility is that in the script > > > browserperfdash-runner we create a new class (for example: > > > BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use > > > there this timeout-factor option. > > > > I don't think that satisfies the goal of not increasing the maintenance cost > > of webkitpy.benchmark_runner class. If you did subclass, then the future > > refactoring will have to be constrained by the existence of this flag. > > Just to be in the same page. What you suggest is change the proposed > `--timeout-factor` in favour of an '--timeout`, right? > > What I don't see is the point of how this satisfies much better the goal of > not increasing the maintenance cost ... because basically implies the same > changes in the code (the same changes in the constructor methods, similar > addition in the ArgParser ...). Did I miss something?
Because --timeout is something I can see ourselves using in our port, and it's an option supported by run-webkit-tests and run-perf-tests.
Carlos Alberto Lopez Perez
Comment 17
2019-10-23 04:38:45 PDT
(In reply to Ryosuke Niwa from
comment #16
)
> (In reply to Pablo Saavedra from
comment #15
) > > (In reply to Ryosuke Niwa from
comment #14
) > > > (In reply to Carlos Alberto Lopez Perez from
comment #13
) > > > > (In reply to Ryosuke Niwa from
comment #12
) > > > > > > > > Ok. Then I guess that a possibility is that in the script > > > > browserperfdash-runner we create a new class (for example: > > > > BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use > > > > there this timeout-factor option. > > > > > > I don't think that satisfies the goal of not increasing the maintenance cost > > > of webkitpy.benchmark_runner class. If you did subclass, then the future > > > refactoring will have to be constrained by the existence of this flag. > > > > Just to be in the same page. What you suggest is change the proposed > > `--timeout-factor` in favour of an '--timeout`, right? > > > > What I don't see is the point of how this satisfies much better the goal of > > not increasing the maintenance cost ... because basically implies the same > > changes in the code (the same changes in the constructor methods, similar > > addition in the ArgParser ...). Did I miss something? > > Because --timeout is something I can see ourselves using in our port, and > it's an option supported by run-webkit-tests and run-perf-tests.
Currently the run-benchmark script has already options that are not supported by run-webkit-tests and run-perf-tests (and that are also useful to only one port) like: --device-id DEVICE_ID Undocumented option for mobile device testing. Its even an "Undocumented option". And that is fine for us. So I hope this is not about what you find useful on your port and what not, because that argument doesn't look like a great way of collaborating on a project like WebKit that is all about ports. Speaking about the the two options: --timeout: i can see how this is useful for the run-benchmark script for engineers running manually tests and wanting to run them with a bigger timeout than the plan, and I agree it can be a good idea to add it to the script --timeout-factor: for the browserperfdash-benchmark script it is definitively more useful than the "--timeout" factor. Because this script is not intended to be ran manually, but as part of a CI, where it runs all the plans. So it allows the device to have a multiplier for the plan timeout instead of hardcoding a value. With a timeout-factor of 4, the kraken plan would have 300*4=1200 and the jetstream2 would have 1200*4=4800. However, if we have to hardcode a 4800 value as timeout, that means kraken would have also 4800. Which means that jetstream2 would run with a multiplier of 4 and kraken with one of 16. :\ We can understand that the timeout-factor can be not all that useful for run-benchmark for your use-case (for us it definitively would be), so we have offered to add it only to the browserperfdash-benchmark script. Even by creating a new class to avoid touching the run-benchmark code. The proposed code to add to run-benchmark for timeout-factor (basically the patch in this bug) is just 2 added lines and an option, so I don't see how that can be a real concern regarding maintenance. In any case, if you still think that it will be a maintenance cost that deservers this discussion, then the only way forward I see is that we subclass, and we carry on with the maintenance burden. If you want to re-factor, you can do that ignoring browserperdash-runner, we will try follow-up with any fix on our side after that.
Ryosuke Niwa
Comment 18
2019-10-23 14:21:37 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #17
)>
> So I hope this is not about what you find useful on your port and what not, > because that argument doesn't look like a great way of collaborating on a > project like WebKit that is all about ports.
I find it extremely frustrating that you seem to always on implementing the very first & best solution you come up with instead of listening to the main maintainer of the relevant codebase. This is highly problematic and contrary to the way we normally operate. If the main maintainer of the code r-es a patch and suggests an alternative, then the usual path forward is to work with the said reviewer to come up with an acceptable solution.
> --timeout-factor: for the browserperfdash-benchmark script it is > definitively more useful than the "--timeout" factor. Because this script is > not intended to be ran manually, but as part of a CI, where it runs all the > plans. So it allows the device to have a multiplier for the plan timeout > instead of hardcoding a value. With a timeout-factor of 4, the kraken plan > would have 300*4=1200 and the jetstream2 would have 1200*4=4800. However, if > we have to hardcode a 4800 value as timeout, that means kraken would have > also 4800. Which means that jetstream2 would run with a multiplier of 4 and > kraken with one of 16.
I worked fully time on the perf bots & maintainable for four years, and my experience is that no device will be faster or slower at a constant rate across multitude of benchmarks. For example, MotionMark's performance would radically depend on GPU performance whereas Speedometer doesn't depend on GPU perf at all. The performance characteristics of Speedometer and JetStream are also radically different with regards to the number of CPU cores. When there are more cores, JetStream tends to scale & its runtime tends to shrink. Speedometer is primarily single core benchmark and it would not exhibit such a speed or reduction in its runtime. Furthermore, benchmarks like MotionMark has a fixed runtime regardless of how slow a device is.
> We can understand that the timeout-factor can be not all that useful for > run-benchmark for your use-case (for us it definitively would be), so we > have offered to add it only to the browserperfdash-benchmark script. Even by > creating a new class to avoid touching the run-benchmark code.
I fundamentally object to your premise that this is a useful option even in bot environment.
> In any case, if you still think that it will be a maintenance cost that > deservers this discussion, then the only way forward I see is that we > subclass, and we carry on with the maintenance burden. If you want to > re-factor, you can do that ignoring browserperdash-runner, we will try > follow-up with any fix on our side after that.
This is not an acceptable path forward in my view.
Ryosuke Niwa
Comment 19
2019-10-23 14:26:55 PDT
Let me also clarify. run-bencharmk is one of the most mission critical script for Apple's WebKit team. Any code change to it must be vetted very thoroughly because any brokerage or amendment can totally ruin our effort to defend the performance of WebKit and the rest of the operating systems Apple is working on.
Carlos Alberto Lopez Perez
Comment 20
2019-10-25 06:02:25 PDT
(In reply to Ryosuke Niwa from
comment #18
)
> (In reply to Carlos Alberto Lopez Perez from
comment #17
)> > > So I hope this is not about what you find useful on your port and what not, > > because that argument doesn't look like a great way of collaborating on a > > project like WebKit that is all about ports. > > I find it extremely frustrating that you seem to always on implementing the > very first & best solution you come up with instead of listening to the main > maintainer of the relevant codebase. This is highly problematic and contrary > to the way we normally operate. >
I'm also kind of frustrated with this. And I don't know why you seem to think that I'm not listening what you propose. I have listened carefully to your arguments and to the alternatives proposed. And I still think the best technical solution for the use case we want to solve is the --timeout-factor and not the other alternatives proposed.
> > --timeout-factor: for the browserperfdash-benchmark script it is > > definitively more useful than the "--timeout" factor. Because this script is > > not intended to be ran manually, but as part of a CI, where it runs all the > > plans. So it allows the device to have a multiplier for the plan timeout > > instead of hardcoding a value. With a timeout-factor of 4, the kraken plan > > would have 300*4=1200 and the jetstream2 would have 1200*4=4800. However, if > > we have to hardcode a 4800 value as timeout, that means kraken would have > > also 4800. Which means that jetstream2 would run with a multiplier of 4 and > > kraken with one of 16. > > I worked fully time on the perf bots & maintainable for four years, and my > experience is that no device will be faster or slower at a constant rate > across multitude of benchmarks. For example, MotionMark's performance would > radically depend on GPU performance whereas Speedometer doesn't depend on > GPU perf at all. > > The performance characteristics of Speedometer and JetStream are also > radically different with regards to the number of CPU cores. When there are > more cores, JetStream tends to scale & its runtime tends to shrink. > Speedometer is primarily single core benchmark and it would not exhibit such > a speed or reduction in its runtime.
I agree with the idea that comparing the runtime (or performance values) of different benchmarks on different machines is not as simple as multiplying for a constant. However we are talking about timeouts, and the timeout value of a test it is not some scientific constant that needs to be carefully obtained. It is an approximation of the maximum time that seems reasonable to wait for a test to finish. And that time is usually highly dependent on the computing power of the machine doing the test. So multiplying that timeout value for a constant when the machine is slower seems like another reasonable approximation to me.
> Furthermore, benchmarks like MotionMark > has a fixed runtime regardless of how slow a device is. >
You have a good point here: --timeout-factor is a bad idea if the test has a fixed runtime. Maybe the plan file could include a new "timeout_type" key indicating if the timeout is fixed or depends on the computing power ? That way --timeout-factor would only apply when the runtime is not fixed.
> > We can understand that the timeout-factor can be not all that useful for > > run-benchmark for your use-case (for us it definitively would be), so we > > have offered to add it only to the browserperfdash-benchmark script. Even by > > creating a new class to avoid touching the run-benchmark code. > > I fundamentally object to your premise that this is a useful option even in > bot environment. >
??
> > In any case, if you still think that it will be a maintenance cost that > > deservers this discussion, then the only way forward I see is that we > > subclass, and we carry on with the maintenance burden. If you want to > > re-factor, you can do that ignoring browserperdash-runner, we will try > > follow-up with any fix on our side after that. > > This is not an acceptable path forward in my view.
Ok, I'm giving up with trying to convince you that --timeout-factor is a good idea. Perhaps going with --timeout may also work for our use case.
Pablo Saavedra
Comment 21
2019-12-23 08:12:15 PST
Set as resolved by mistake
Radar WebKit Bug Importer
Comment 22
2019-12-23 08:12:19 PST
<
rdar://problem/58161884
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug