Bug 205546 - WKTR/DRT always trigger the Discrete GPU on dual GPU systems
Summary: WKTR/DRT always trigger the Discrete GPU on dual GPU systems
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dean Jackson
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-12-21 19:10 PST by Dean Jackson
Modified: 2020-01-10 10:02 PST (History)
7 users (show)

See Also:


Attachments
Patch (5.96 KB, patch)
2019-12-23 12:52 PST, Dean Jackson
no flags Details | Formatted Diff | Diff
Patch (6.69 KB, patch)
2019-12-24 11:28 PST, Dean Jackson
ap: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dean Jackson 2019-12-21 19:10:54 PST
Launching either DumpRenderTree or WebKitTestRunner trigger a swap to the discrete GPU, on any test. This means we're not testing the most common situation our users run in (for all CA rendering and WebGL).
Comment 1 Radar WebKit Bug Importer 2019-12-21 19:11:10 PST
<rdar://problem/58139610>
Comment 2 Daniel Bates 2019-12-21 20:07:17 PST
This was intentional. LayoutTestHelper does it.
Comment 3 Alexey Proskuryakov 2019-12-21 20:55:04 PST
We are testing IG on Mac minis, because there is no discrete GPU there, so most of infrastructure. The special WebGL is IG only too.

And yes, we intentionally force discrete where it’s available. Originally, the reason was that switching it on and off frequently would cause panics. But it’s still a good idea for predictability, as there are unsuspecting tests running in parallel.
Comment 4 Dean Jackson 2019-12-22 13:10:04 PST
We should default to it being integrated always. That is the default for WebGL. And it is our most common configuration.

Having said that:

- Alexey is right that the testing infrastructure might not have dual GPUs, but the issue here is that I'm trying to replicate a crash on Intel in DRT, and I can't if that app forces the discrete GPU

- We should also be testing on the discrete GPU since our users will have that enabled if they are doing something else that would trigger it (e.g. Photoshop, Pixelmator, Chess,... a game)
Comment 5 Dean Jackson 2019-12-22 13:12:18 PST
I will revert the change in LayoutTestHelper, but add a command line argument to turn it on.

Note that we had a important update in a .2 release to force integrated in nearly all circumstances, so I think it is important that that remains the default configuration.
Comment 6 Alexey Proskuryakov 2019-12-22 14:59:27 PST
> I will revert the change in LayoutTestHelper 

I disagree. It would be unacceptable to run the tests with IG or DG randomly, depending on which tests happen to run in parallel.

What we are doing in this regard is intentional, all the way from software configuration to which hardware we choose to run on.

If the issue is that you need to reproduce an issue with IG, feel free to edit LayoutTestHelper code locally, or add a non-default switch that bypasses the normal behavior. But we can't have that when running the whole test suite.
Comment 7 Dean Jackson 2019-12-23 10:40:23 PST
(In reply to Alexey Proskuryakov from comment #6)
> > I will revert the change in LayoutTestHelper 
> 
> I disagree. It would be unacceptable to run the tests with IG or DG
> randomly, depending on which tests happen to run in parallel.

I think it should be the other way around. The vast majority of the
tests do not require the DG. We'd be better off running everything
but them on the IG, and then ensure the DG ones are run isolated.

What tests actually use the DG? None of the WebGL ones should
trigger it.
Comment 8 Alexey Proskuryakov 2019-12-23 11:52:45 PST
That's a good question. Let's find out before attempting to change things.

This strikes me as very non-urgent though, because everything is already the way it should be.
Comment 9 Dean Jackson 2019-12-23 12:52:38 PST
Created attachment 386349 [details]
Patch
Comment 10 Dean Jackson 2019-12-23 12:55:10 PST
(In reply to Alexey Proskuryakov from comment #8)
> That's a good question. Let's find out before attempting to change things.

I've made a patch so I can optionally turn off the GPU locking.

> This strikes me as very non-urgent though, because everything is already the
> way it should be.

I disagree that things are the way they should be :) We're not testing the configuration that the vast majority of our customers use. This current situation
is a hack to help the testing infrastructure work.

Anyway, I need the option now, so here's a patch. It won't change anything
by default.
Comment 11 Alexey Proskuryakov 2019-12-23 13:16:27 PST
Comment on attachment 386349 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=386349&action=review

> Tools/DumpRenderTree/mac/LayoutTestHelper.m:265
> +        { "use-integrated-gpu", no_argument, &useIntegratedGPU, true },

The name is misleading. I'm not sure if I fully understand WebKit behavior at this point, but the flag doesn't make tests use integrated GPU, it just makes them use whatever happens to be enabled, and if WebKit chooses to enable discrete GPU for any tests, that will also be allowed.

I think that the intention is to add "--enable-discrete-gpu-if-available" with default of true.

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:332
> +            "--use-integrated-gpu", action="store_true", default=False,
> +            help=("Attempt to use only the integrated GPU on a dual-GPU system.")),

This also seems misleading.

> Tools/Scripts/webkitpy/port/base.py:952
> +    def start_helper(self, pixel_tests=False, use_integrated_gpu=False):

Interesting that this doesn't affect any tests. Tools/Scripts/webkitpy/port/mock_drt.py has a version of this.
Comment 12 Dean Jackson 2019-12-24 10:43:38 PST
Comment on attachment 386349 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=386349&action=review

>> Tools/DumpRenderTree/mac/LayoutTestHelper.m:265
>> +        { "use-integrated-gpu", no_argument, &useIntegratedGPU, true },
> 
> The name is misleading. I'm not sure if I fully understand WebKit behavior at this point, but the flag doesn't make tests use integrated GPU, it just makes them use whatever happens to be enabled, and if WebKit chooses to enable discrete GPU for any tests, that will also be allowed.
> 
> I think that the intention is to add "--enable-discrete-gpu-if-available" with default of true.

You're right that no app can stop the discrete GPU from being used, and any test might trigger it (or any other app on the system).

 If I did it with "--enable-discrete-gpu-if-available" then to turn it off I'd need to use "--no-enable-discrete-gpu-if-available" which is cumbersome, but I don't really care.

(I'm putting aside my suggestion that we default to the IG everywhere, and just trying to give myself the ability to not trigger the DG all the time)

>> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:332
>> +            help=("Attempt to use only the integrated GPU on a dual-GPU system.")),
> 
> This also seems misleading.

I'll rename it to your suggestion. Note that I'm still going to request we swap the default, so that our developers running tests on their laptops are exercising what most customers have. If the bots work better with the DG enabled, then they should be configured to turn it on locally.

>> Tools/Scripts/webkitpy/port/base.py:952
>> +    def start_helper(self, pixel_tests=False, use_integrated_gpu=False):
> 
> Interesting that this doesn't affect any tests. Tools/Scripts/webkitpy/port/mock_drt.py has a version of this.

I'll check.
Comment 13 Dean Jackson 2019-12-24 11:28:12 PST
Created attachment 386392 [details]
Patch
Comment 14 Dean Jackson 2019-12-24 11:28:59 PST
I'm using "--prefer-integrated-gpu" which more accurately represents what will happen (and what the tester is asking for).
Comment 15 Alexey Proskuryakov 2019-12-24 12:40:54 PST
Comment on attachment 386392 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=386392&action=review

> Tools/ChangeLog:9
> +        Add an option "--prefer-integrated-gpu" to run-webkit-test

I'm not a fan of this name either, because the preference is not about impacting what WebKit prefers to use. WebKit will do its thing, which is now something different from it was before, and the policy can certainly change in the future. I want this to more closely match what run-webkit-tests code does, not what WebKit does.

Maybe allow-integrated-gpu?
Comment 16 Dean Jackson 2019-12-25 09:43:12 PST
Comment on attachment 386392 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=386392&action=review

>> Tools/ChangeLog:9
>> +        Add an option "--prefer-integrated-gpu" to run-webkit-test
> 
> I'm not a fan of this name either, because the preference is not about impacting what WebKit prefers to use. WebKit will do its thing, which is now something different from it was before, and the policy can certainly change in the future. I want this to more closely match what run-webkit-tests code does, not what WebKit does.
> 
> Maybe allow-integrated-gpu?

What WebKit does is not optional as far as WebKit is concerned: it will always prefer the integrated GPU. Content may request the discrete GPU, but it isn't guaranteed to get it.

This parameter is a message to the testing system, telling it to not dictate WebKit's ability to use the integrated GPU. This is the problem. LayoutTestHelper is removing WebKit's ability to use the integrated GPU.

In order to get more realistic testing, I think that LayoutTestHelper should stop doing this by default. Instead, there should be a parameter, something like --activate-discrete-gpu, that will turn the discrete GPU on, and thus force all WebKit content to be on that GPU, even if it didn't want to be. If doing this makes our bots more stable then we can enable the flag there, but our developers should not get it by default.
Comment 17 Dean Jackson 2019-12-25 09:51:26 PST
Committed r253910: <https://trac.webkit.org/changeset/253910>
Comment 18 Alexey Proskuryakov 2019-12-25 10:33:58 PST
> > Maybe allow-integrated-gpu?
> 
> What WebKit does is not optional as far as WebKit is concerned: it will
> always prefer the integrated GPU. Content may request the discrete GPU, but
> it isn't guaranteed to get it.
> 
> This parameter is a message to the testing system, telling it to not dictate
> WebKit's ability to use the integrated GPU. This is the problem.
> LayoutTestHelper is removing WebKit's ability to use the integrated GPU.

It almost sounds like you are agreeing that "allow-integrated-gpu" would be more accurate :-)

> In order to get more realistic testing <...>
> If doing this makes our bots more stable then we can enable the flag there,
> but our developers should not get it by default.

Can you define "realistic" in this context, and explain the value of this realism in local testing? I'm not sure if you are talking about the ease of reproducing issues found by bots, or something else.

We have bugs that are specific to versions of integrated GPU, not to mention AMD and Nvidia. If there is a plan to work through these in foreseeable future, we should discuss how to best support such effort in automation. I'm pretty sure that it will be more complex than switching a default for local testing.
Comment 19 Dean Jackson 2019-12-26 11:58:05 PST
(In reply to Alexey Proskuryakov from comment #18)
> > > Maybe allow-integrated-gpu?
> > 
> > What WebKit does is not optional as far as WebKit is concerned: it will
> > always prefer the integrated GPU. Content may request the discrete GPU, but
> > it isn't guaranteed to get it.
> > 
> > This parameter is a message to the testing system, telling it to not dictate
> > WebKit's ability to use the integrated GPU. This is the problem.
> > LayoutTestHelper is removing WebKit's ability to use the integrated GPU.
> 
> It almost sounds like you are agreeing that "allow-integrated-gpu" would be
> more accurate :-)

Maybe! I don't really mind what it is at the moment, because I strongly think the command line option should be flipped so that you have to explicitly lock to the discrete GPU if you want all WebGL tests to run on that GPU (assuming the user hasn't already done that in System Preferences).

> 
> > In order to get more realistic testing <...>
> > If doing this makes our bots more stable then we can enable the flag there,
> > but our developers should not get it by default.
> 
> Can you define "realistic" in this context, and explain the value of this
> realism in local testing? I'm not sure if you are talking about the ease of
> reproducing issues found by bots, or something else.

Reality is:
- most users have a single GPU that is integrated
- most users with an extra GPU have automated graphics switching enabled (the default)
- all of the above get the integrated GPU when running WebGL (*)

(*) Exceptions:
- The WebGL content explicitly requests "high-performance", in which case we *might* give them the discrete GPU
- The discrete GPU is already active, or becomes active

I think that our default testing environment, both for developers and the bots should reflect this reality. i.e. LayoutTestHelper should not enable the discrete GPU unless asked.

If a developer, testing locally, wants to exercise the discrete GPU, they can do so either by passing the flag to run-webkit-tests, changing System Preferences, or launching another application that will turn it on.

Same thing for a bot.

If, as you say, some bots kernel panic when switching GPUs, then they have the choice of using the command line flag, setting System Preferences, skipping all tests that would trigger the discrete GPU, or even adding a WebKit preference that disallows the discrete GPU in WebGL.

However, I'm slightly confused about this, because there are no current tests that trigger the discrete GPU. Was this happening before we switched the default to be the integrated GPU?
> We have bugs that are specific to versions of integrated GPU, not to mention
> AMD and Nvidia. If there is a plan to work through these in foreseeable
> future, we should discuss how to best support such effort in automation. I'm
> pretty sure that it will be more complex than switching a default for local
> testing.

I'm not sure what you mean. If there is a bug on specific hardware, and you don't have that hardware, then no default will help. My quest started when I saw EWS was crashing on a test, and I realised that I had not tested locally on any Intel GPU because LayoutTestHelper was doing something unusual (from a user's point of view). It turns out the crash did not reproduce on my Intel GPU, but it might have.

But yes, we should have bots that run on all supported hardware, and in all GPU environments. I think the best thing to do there would be to work with our GPU team.
Comment 20 Alexey Proskuryakov 2019-12-26 12:22:13 PST
> I saw EWS was crashing on a test

Quick note about this: EWS is 2013 Mac Pro, thus using discrete GPU. Good thing that we caught the issue early!
Comment 21 Alexey Proskuryakov 2019-12-26 13:54:57 PST
> However, I'm slightly confused about this, because there are no current tests that trigger the discrete GPU.

How did this happen? This is like a pretty serious omission if we don't test high-performance at all.

It's one thing to avoid wasting battery on fingerprinting or trivial uses of graphics, and another thing to not test the code path needed for games.

> Was this happening before we switched the default to be the integrated GPU?

Panics and freezes were happening before that. But it's not just about panicking. If there is any test that triggers a different mode globally, that means that other tests randomly run with this mode enabled. That's a nightmare scenario for diagnosing flaky tests.

> I think that our default testing environment, both for developers and the bots should reflect this reality. i.e. LayoutTestHelper should not enable the discrete GPU unless asked.

I think that this concept of IG vs. DG is fundamentally confused, because different IG chipsets have different behaviors too. We have a zero regression policy that applies to hardware models, one doesn't get to say things like "this regression only affects laptops" or "this regression only affects iMacs".

There are two purposes for local testing:

1. Minimize the risk of landing something that has to be rolled back. For this, you want to have a configuration that's similar to the bots. Since EWS has DG and post-commit has IG, neither default is preferable from this standpoint.

2. Perform testing that bots cannot do because of their limited set of configurations. Since there is a wider variety of DG behaviors, it is beneficial to test that locally, so that they get at least some testing, and we don't ship completely untested software to customers. This can be revisited once we have a better solution, but we don't have it now.
Comment 22 James Darpinian 2020-01-09 17:12:30 PST
Is this fully fixed? When I run WebGL layout tests, DumpRenderTree is still triggering a switch to the discrete GPU, though LayoutTestHelper doesn't anymore.
Comment 23 Alexey Proskuryakov 2020-01-10 08:53:14 PST
There were no changes to DumpRenderTree or WebKit here (which is why I didn't like the --prefer-integrated-gpu name). Still, it's strange because no tests were triggering discrete GPU according to Dean.

> though LayoutTestHelper doesn't anymore

Presumably when you run it with the new flag, correct? The default behavior shouldn't have changed here.
Comment 24 James Darpinian 2020-01-10 10:02:18 PST
Yes.