202993 – Add the timeout-factor parameter in browserperfdash-benchmark

Pablo Saavedra

Reported 2019-10-15 10:08:42 PDT

The timeout-factor is useful to run the tests in slower devices where the default timeout value defined in the .plan files is not high enough to let the tests end.

Pablo Saavedra

Comment 1 2019-10-15 10:12:18 PDT

Created attachment 380997 [details] patch

Pablo Saavedra

Comment 2 2019-10-16 00:06:26 PDT

Created attachment 381058 [details] patch

Carlos Alberto Lopez Perez

Comment 3 2019-10-18 06:24:56 PDT

Comment on attachment 381058 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381058&action=review > Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:58 > + parser.add_argument('--timeout-factor', dest='timeoutFactorOverride', type=int, help='Timeout factor useful to run test in slower devices. e.g. 10') As a user reading the help, I would find the explanation about what timeout-factor is a bit confusing. I would write instead something like: "Multiply the test timeout by the given factor, useful to run tests in slower devices, e.g.: 10'

Pablo Saavedra

Comment 4 2019-10-18 06:32:46 PDT

Created attachment 381298 [details] patch

Carlos Alberto Lopez Perez

Comment 5 2019-10-18 07:14:03 PDT

I wonder if it would make sense to also add this parameter to Tools/Scripts/run-benchmark? I guess that if you are running the tests manually one by one with run-benchmark on a slow device it may be useful as well

Ryosuke Niwa

Comment 6 2019-10-18 15:42:44 PDT

Comment on attachment 381298 [details] patch The right fix is to create a new plan file for those slower devices.

Carlos Alberto Lopez Perez

Comment 7 2019-10-19 08:11:58 PDT

(In reply to Ryosuke Niwa from comment #6) > Comment on attachment 381298 [details] > patch > > The right fix is to create a new plan file for those slower devices. I don't think that is the right fix. On our bots we run all the plans, by passing the argument "--allplans" to the tool. Duplicating each one of the plans to add different timeouts parameters is totally incompatible with using "--allplans" since you will be running twice the same tests just with different timeout parameters.

Ryosuke Niwa

Comment 8 2019-10-19 14:11:01 PDT

(In reply to Carlos Alberto Lopez Perez from comment #7) > (In reply to Ryosuke Niwa from comment #6) > > Comment on attachment 381298 [details] > > patch > > > > The right fix is to create a new plan file for those slower devices. > > I don't think that is the right fix. > > On our bots we run all the plans, by passing the argument "--allplans" to > the tool. The design of run-benchmark is to use --plan and specify a plan file for each platform, device, etc... configuration with customized parameters. > Duplicating each one of the plans to add different timeouts parameters is > totally incompatible with using "--allplans" since you will be running twice > the same tests just with different timeout parameters. It looks like --allplans https://trac.webkit.org/changeset/197505 added this option but that was probably a mistake in the first place. This is not the way run-benchmark is supposed to be used in the first place so it's natural that a fix would require even more code change that's incompatible with the design of the script.

Ryosuke Niwa

Comment 9 2019-10-19 14:12:14 PDT

The only acceptable alternative solution here (other than creating & specifying individual plan files), in my view, is to add --timeout option which overrides the default timeout of each plan.

Pablo Saavedra

Comment 10 2019-10-21 10:37:11 PDT

(In reply to Ryosuke Niwa from comment #9) > The only acceptable alternative solution here (other than creating & > specifying individual plan files), in my view, is to add --timeout option > which overrides the default timeout of each plan. I'd like to still keep the timeout-factor but if there is no other choice and you are ok with the --timeout, then I could modify the bug to do this in your suggested way

Carlos Alberto Lopez Perez

Comment 11 2019-10-21 11:14:37 PDT

(In reply to Ryosuke Niwa from comment #9) > The only acceptable alternative solution here (other than creating & > specifying individual plan files), in my view, is to add --timeout option > which overrides the default timeout of each plan. Would it be acceptable to keep --timeout-factor only for the browserperfdash-benchmark runner like the current patch does? It doesn't expose --timeout-factor to the run-benchmark script.

Ryosuke Niwa

Comment 12 2019-10-21 16:01:16 PDT

It would be not acceptable to introduce this concept into webkitpy.benchmark_runner classes. If you can figure out a way to do that, I don't really care because this is mostly a maintenance cost issue.

Carlos Alberto Lopez Perez

Comment 13 2019-10-21 18:01:44 PDT

(In reply to Ryosuke Niwa from comment #12) > It would be not acceptable to introduce this concept into > webkitpy.benchmark_runner classes. If you can figure out a way to do that, I > don't really care because this is mostly a maintenance cost issue. Ok. Then I guess that a possibility is that in the script browserperfdash-runner we create a new class (for example: BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use there this timeout-factor option.

Ryosuke Niwa

Comment 14 2019-10-22 11:49:45 PDT

(In reply to Carlos Alberto Lopez Perez from comment #13) > (In reply to Ryosuke Niwa from comment #12) > > It would be not acceptable to introduce this concept into > > webkitpy.benchmark_runner classes. If you can figure out a way to do that, I > > don't really care because this is mostly a maintenance cost issue. > > Ok. Then I guess that a possibility is that in the script > browserperfdash-runner we create a new class (for example: > BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use > there this timeout-factor option. I don't think that satisfies the goal of not increasing the maintenance cost of webkitpy.benchmark_runner class. If you did subclass, then the future refactoring will have to be constrained by the existence of this flag.

Pablo Saavedra

Comment 15 2019-10-23 00:11:35 PDT

(In reply to Ryosuke Niwa from comment #14) > (In reply to Carlos Alberto Lopez Perez from comment #13) > > (In reply to Ryosuke Niwa from comment #12) > > > It would be not acceptable to introduce this concept into > > > webkitpy.benchmark_runner classes. If you can figure out a way to do that, I > > > don't really care because this is mostly a maintenance cost issue. > > > > Ok. Then I guess that a possibility is that in the script > > browserperfdash-runner we create a new class (for example: > > BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use > > there this timeout-factor option. > > I don't think that satisfies the goal of not increasing the maintenance cost > of webkitpy.benchmark_runner class. If you did subclass, then the future > refactoring will have to be constrained by the existence of this flag. Just to be in the same page. What you suggest is change the proposed `--timeout-factor` in favour of an '--timeout`, right? What I don't see is the point of how this satisfies much better the goal of not increasing the maintenance cost ... because basically implies the same changes in the code (the same changes in the constructor methods, similar addition in the ArgParser ...). Did I miss something?

Ryosuke Niwa

Comment 16 2019-10-23 01:43:33 PDT

(In reply to Pablo Saavedra from comment #15) > (In reply to Ryosuke Niwa from comment #14) > > (In reply to Carlos Alberto Lopez Perez from comment #13) > > > (In reply to Ryosuke Niwa from comment #12) > > > > > > Ok. Then I guess that a possibility is that in the script > > > browserperfdash-runner we create a new class (for example: > > > BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use > > > there this timeout-factor option. > > > > I don't think that satisfies the goal of not increasing the maintenance cost > > of webkitpy.benchmark_runner class. If you did subclass, then the future > > refactoring will have to be constrained by the existence of this flag. > > Just to be in the same page. What you suggest is change the proposed > `--timeout-factor` in favour of an '--timeout`, right? > > What I don't see is the point of how this satisfies much better the goal of > not increasing the maintenance cost ... because basically implies the same > changes in the code (the same changes in the constructor methods, similar > addition in the ArgParser ...). Did I miss something? Because --timeout is something I can see ourselves using in our port, and it's an option supported by run-webkit-tests and run-perf-tests.

Carlos Alberto Lopez Perez

Comment 17 2019-10-23 04:38:45 PDT

(In reply to Ryosuke Niwa from comment #16) > (In reply to Pablo Saavedra from comment #15) > > (In reply to Ryosuke Niwa from comment #14) > > > (In reply to Carlos Alberto Lopez Perez from comment #13) > > > > (In reply to Ryosuke Niwa from comment #12) > > > > > > > > Ok. Then I guess that a possibility is that in the script > > > > browserperfdash-runner we create a new class (for example: > > > > BrowserperfdashRunner) that inherits from the BenchmarkRunner class and use > > > > there this timeout-factor option. > > > > > > I don't think that satisfies the goal of not increasing the maintenance cost > > > of webkitpy.benchmark_runner class. If you did subclass, then the future > > > refactoring will have to be constrained by the existence of this flag. > > > > Just to be in the same page. What you suggest is change the proposed > > `--timeout-factor` in favour of an '--timeout`, right? > > > > What I don't see is the point of how this satisfies much better the goal of > > not increasing the maintenance cost ... because basically implies the same > > changes in the code (the same changes in the constructor methods, similar > > addition in the ArgParser ...). Did I miss something? > > Because --timeout is something I can see ourselves using in our port, and > it's an option supported by run-webkit-tests and run-perf-tests. Currently the run-benchmark script has already options that are not supported by run-webkit-tests and run-perf-tests (and that are also useful to only one port) like: --device-id DEVICE_ID Undocumented option for mobile device testing. Its even an "Undocumented option". And that is fine for us. So I hope this is not about what you find useful on your port and what not, because that argument doesn't look like a great way of collaborating on a project like WebKit that is all about ports. Speaking about the the two options: --timeout: i can see how this is useful for the run-benchmark script for engineers running manually tests and wanting to run them with a bigger timeout than the plan, and I agree it can be a good idea to add it to the script --timeout-factor: for the browserperfdash-benchmark script it is definitively more useful than the "--timeout" factor. Because this script is not intended to be ran manually, but as part of a CI, where it runs all the plans. So it allows the device to have a multiplier for the plan timeout instead of hardcoding a value. With a timeout-factor of 4, the kraken plan would have 300*4=1200 and the jetstream2 would have 1200*4=4800. However, if we have to hardcode a 4800 value as timeout, that means kraken would have also 4800. Which means that jetstream2 would run with a multiplier of 4 and kraken with one of 16. :\ We can understand that the timeout-factor can be not all that useful for run-benchmark for your use-case (for us it definitively would be), so we have offered to add it only to the browserperfdash-benchmark script. Even by creating a new class to avoid touching the run-benchmark code. The proposed code to add to run-benchmark for timeout-factor (basically the patch in this bug) is just 2 added lines and an option, so I don't see how that can be a real concern regarding maintenance. In any case, if you still think that it will be a maintenance cost that deservers this discussion, then the only way forward I see is that we subclass, and we carry on with the maintenance burden. If you want to re-factor, you can do that ignoring browserperdash-runner, we will try follow-up with any fix on our side after that.

Ryosuke Niwa

Comment 18 2019-10-23 14:21:37 PDT

(In reply to Carlos Alberto Lopez Perez from comment #17)> > So I hope this is not about what you find useful on your port and what not, > because that argument doesn't look like a great way of collaborating on a > project like WebKit that is all about ports. I find it extremely frustrating that you seem to always on implementing the very first & best solution you come up with instead of listening to the main maintainer of the relevant codebase. This is highly problematic and contrary to the way we normally operate. If the main maintainer of the code r-es a patch and suggests an alternative, then the usual path forward is to work with the said reviewer to come up with an acceptable solution. > --timeout-factor: for the browserperfdash-benchmark script it is > definitively more useful than the "--timeout" factor. Because this script is > not intended to be ran manually, but as part of a CI, where it runs all the > plans. So it allows the device to have a multiplier for the plan timeout > instead of hardcoding a value. With a timeout-factor of 4, the kraken plan > would have 300*4=1200 and the jetstream2 would have 1200*4=4800. However, if > we have to hardcode a 4800 value as timeout, that means kraken would have > also 4800. Which means that jetstream2 would run with a multiplier of 4 and > kraken with one of 16. I worked fully time on the perf bots & maintainable for four years, and my experience is that no device will be faster or slower at a constant rate across multitude of benchmarks. For example, MotionMark's performance would radically depend on GPU performance whereas Speedometer doesn't depend on GPU perf at all. The performance characteristics of Speedometer and JetStream are also radically different with regards to the number of CPU cores. When there are more cores, JetStream tends to scale & its runtime tends to shrink. Speedometer is primarily single core benchmark and it would not exhibit such a speed or reduction in its runtime. Furthermore, benchmarks like MotionMark has a fixed runtime regardless of how slow a device is. > We can understand that the timeout-factor can be not all that useful for > run-benchmark for your use-case (for us it definitively would be), so we > have offered to add it only to the browserperfdash-benchmark script. Even by > creating a new class to avoid touching the run-benchmark code. I fundamentally object to your premise that this is a useful option even in bot environment. > In any case, if you still think that it will be a maintenance cost that > deservers this discussion, then the only way forward I see is that we > subclass, and we carry on with the maintenance burden. If you want to > re-factor, you can do that ignoring browserperdash-runner, we will try > follow-up with any fix on our side after that. This is not an acceptable path forward in my view.

Ryosuke Niwa

Comment 19 2019-10-23 14:26:55 PDT

Let me also clarify. run-bencharmk is one of the most mission critical script for Apple's WebKit team. Any code change to it must be vetted very thoroughly because any brokerage or amendment can totally ruin our effort to defend the performance of WebKit and the rest of the operating systems Apple is working on.

Carlos Alberto Lopez Perez

Comment 20 2019-10-25 06:02:25 PDT

(In reply to Ryosuke Niwa from comment #18) > (In reply to Carlos Alberto Lopez Perez from comment #17)> > > So I hope this is not about what you find useful on your port and what not, > > because that argument doesn't look like a great way of collaborating on a > > project like WebKit that is all about ports. > > I find it extremely frustrating that you seem to always on implementing the > very first & best solution you come up with instead of listening to the main > maintainer of the relevant codebase. This is highly problematic and contrary > to the way we normally operate. > I'm also kind of frustrated with this. And I don't know why you seem to think that I'm not listening what you propose. I have listened carefully to your arguments and to the alternatives proposed. And I still think the best technical solution for the use case we want to solve is the --timeout-factor and not the other alternatives proposed. > > --timeout-factor: for the browserperfdash-benchmark script it is > > definitively more useful than the "--timeout" factor. Because this script is > > not intended to be ran manually, but as part of a CI, where it runs all the > > plans. So it allows the device to have a multiplier for the plan timeout > > instead of hardcoding a value. With a timeout-factor of 4, the kraken plan > > would have 300*4=1200 and the jetstream2 would have 1200*4=4800. However, if > > we have to hardcode a 4800 value as timeout, that means kraken would have > > also 4800. Which means that jetstream2 would run with a multiplier of 4 and > > kraken with one of 16. > > I worked fully time on the perf bots & maintainable for four years, and my > experience is that no device will be faster or slower at a constant rate > across multitude of benchmarks. For example, MotionMark's performance would > radically depend on GPU performance whereas Speedometer doesn't depend on > GPU perf at all. > > The performance characteristics of Speedometer and JetStream are also > radically different with regards to the number of CPU cores. When there are > more cores, JetStream tends to scale & its runtime tends to shrink. > Speedometer is primarily single core benchmark and it would not exhibit such > a speed or reduction in its runtime. I agree with the idea that comparing the runtime (or performance values) of different benchmarks on different machines is not as simple as multiplying for a constant. However we are talking about timeouts, and the timeout value of a test it is not some scientific constant that needs to be carefully obtained. It is an approximation of the maximum time that seems reasonable to wait for a test to finish. And that time is usually highly dependent on the computing power of the machine doing the test. So multiplying that timeout value for a constant when the machine is slower seems like another reasonable approximation to me. > Furthermore, benchmarks like MotionMark > has a fixed runtime regardless of how slow a device is. > You have a good point here: --timeout-factor is a bad idea if the test has a fixed runtime. Maybe the plan file could include a new "timeout_type" key indicating if the timeout is fixed or depends on the computing power ? That way --timeout-factor would only apply when the runtime is not fixed. > > We can understand that the timeout-factor can be not all that useful for > > run-benchmark for your use-case (for us it definitively would be), so we > > have offered to add it only to the browserperfdash-benchmark script. Even by > > creating a new class to avoid touching the run-benchmark code. > > I fundamentally object to your premise that this is a useful option even in > bot environment. > ?? > > In any case, if you still think that it will be a maintenance cost that > > deservers this discussion, then the only way forward I see is that we > > subclass, and we carry on with the maintenance burden. If you want to > > re-factor, you can do that ignoring browserperdash-runner, we will try > > follow-up with any fix on our side after that. > > This is not an acceptable path forward in my view. Ok, I'm giving up with trying to convince you that --timeout-factor is a good idea. Perhaps going with --timeout may also work for our use case.

Pablo Saavedra

Comment 21 2019-12-23 08:12:15 PST

Set as resolved by mistake

Radar WebKit Bug Importer

Comment 22 2019-12-23 08:12:19 PST

<rdar://problem/58161884>

Attachments
patch (7.51 KB, patch) 2019-10-15 10:12 PDT, Pablo Saavedra	no flags	Details Formatted Diff Diff
patch (8.11 KB, patch) 2019-10-16 00:06 PDT, Pablo Saavedra	no flags	Details Formatted Diff Diff
patch (8.09 KB, patch) 2019-10-18 06:32 PDT, Pablo Saavedra	rniwa: review-	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.