Bug 193772

Summary: run-webkit-tests should check for leaks in WebKit processes
Product: WebKit Reporter: David Kilzer (:ddkilzer) <ddkilzer>
Component: Tools / TestsAssignee: David Kilzer (:ddkilzer) <ddkilzer>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, achristensen, ap, bfulgham, cdumez, ews-watchlist, ggaren, glenn, jbedard, joepeck, lforschler, rniwa, simon.fraser, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=201873
Attachments:
Description Flags
Hack to make rwt --leaks work on com.apple.WebKit.*.Development processes
none
Patch v1
none
Patch v2
none
Archive of layout-test-results from ews105 for mac-highsierra-wk2
none
Patch v3
none
Patch v4
none
Patch v5
none
Patch v6 none

Description David Kilzer (:ddkilzer) 2019-01-24 09:27:04 PST
`run-webkit-tests --leaks` should work with WebKitTestRunner and com.apple.WebKit.* processes.

Currently checking for leaks with WebKitTestRunner (the default) only checks for leaks in the WebKitTestRunner process, skipping all of the com.apple.WebKit.*.Development processes.

<rdar://problem/46526680>
Comment 1 David Kilzer (:ddkilzer) 2019-01-24 09:29:26 PST
Created attachment 360017 [details]
Hack to make rwt --leaks work on com.apple.WebKit.*.Development processes
Comment 2 David Kilzer (:ddkilzer) 2019-03-22 05:42:29 PDT
Created attachment 365715 [details]
Patch v1
Comment 3 David Kilzer (:ddkilzer) 2019-03-22 05:58:17 PDT
Created attachment 365716 [details]
Patch v2

No changes since Patch v1. Just fixed trunk to back out accidental commit of earlier hack.
Comment 4 David Kilzer (:ddkilzer) 2019-03-22 05:59:05 PDT
(In reply to David Kilzer (:ddkilzer) from comment #3)
> Created attachment 365716 [details]
> Patch v2
> 
> No changes since Patch v1. Just fixed trunk to back out accidental commit of
> earlier hack.

See:  <https://trac.webkit.org/changeset/243373/webkit>
Comment 5 Jonathan Bedard 2019-03-22 08:53:42 PDT
Comment on attachment 365716 [details]
Patch v2

View in context: https://bugs.webkit.org/attachment.cgi?id=365716&action=review

> Tools/Scripts/webkitpy/port/base.py:146
> +        self._webkit_processes = {}

This architecture doesn't quite feel right.

This list seems like it should belong to the ServerProcess. Actually, I wouldn't expect there to be any changes on the Port object here. If we had this list (probably called children_processes) associated with the ServerProcess, when the ServerProcess stops (this is where we check for leaks, if I'm not mistaken) we could just iterate through our children_processes. Just like the current code, the Driver would set that list. This way, the logic for gather leaks from associated processes would not be explicitly tied to WebKitTestRunner, at least as long as you had a way to define those associated processes.
Comment 6 EWS Watchlist 2019-03-22 10:15:54 PDT
Comment on attachment 365716 [details]
Patch v2

Attachment 365716 [details] did not pass mac-wk2-ews (mac-wk2):
Output: https://webkit-queues.webkit.org/results/11614167

New failing tests:
imported/w3c/web-platform-tests/mediacapture-record/MediaRecorder-constructor.html
http/wpt/mediarecorder/MediaRecorder-AV-audio-video-dataavailable.html
Comment 7 EWS Watchlist 2019-03-22 10:15:58 PDT
Created attachment 365740 [details]
Archive of layout-test-results from ews105 for mac-highsierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-highsierra-wk2  Platform: Mac OS X 10.13.6
Comment 8 Ryosuke Niwa 2019-03-22 18:49:13 PDT
Comment on attachment 365716 [details]
Patch v2

View in context: https://bugs.webkit.org/attachment.cgi?id=365716&action=review

>> Tools/Scripts/webkitpy/port/base.py:146
>> +        self._webkit_processes = {}
> 
> This architecture doesn't quite feel right.
> 
> This list seems like it should belong to the ServerProcess. Actually, I wouldn't expect there to be any changes on the Port object here. If we had this list (probably called children_processes) associated with the ServerProcess, when the ServerProcess stops (this is where we check for leaks, if I'm not mistaken) we could just iterate through our children_processes. Just like the current code, the Driver would set that list. This way, the logic for gather leaks from associated processes would not be explicitly tied to WebKitTestRunner, at least as long as you had a way to define those associated processes.

Yeah, we definitely shouldn't be adding the concept of web processes to base port class.
Comment 9 David Kilzer (:ddkilzer) 2019-03-24 18:07:35 PDT
Created attachment 365840 [details]
Patch v3

Move Port._webkit_processes to ServerProcess._child_processes.
Comment 10 Jonathan Bedard 2019-03-25 08:28:53 PDT
Comment on attachment 365840 [details]
Patch v3

View in context: https://bugs.webkit.org/attachment.cgi?id=365840&action=review

A few small things. Generally, this looks good to me. Architecture seems to fit well now.

> Tools/Scripts/webkitpy/port/driver.py:263
> +            self._server_process.set_child_processes(child_processes)

Not sure if the temporary variable improves clarity here, 

self._server_process.set_child_processes(self._parse_child_processes_output(block.decoded_content)

might be better.

> Tools/Scripts/webkitpy/port/server_process.py:80
> +        self._child_processes = {}

We need to clear this in _start too, otherwise we could carry over associated processes to the next instance of WebKitTestRunner. That wouldn't break anything now, but I could see it being a problem in the future.

> Tools/Scripts/webkitpy/port/server_process.py:374
> +            if self._child_processes:

We don't need this if statement, an empty dictionary should return an empty list of keys.

> Tools/Scripts/webkitpy/port/simulator_process.py:123
> +            if self._child_processes:

We don't need this if statement, an empty dictionary should return an empty list of keys.
Comment 11 David Kilzer (:ddkilzer) 2019-03-25 09:34:54 PDT
Created attachment 365868 [details]
Patch v4

Addressed Jonathan's feedback. Thanks\!
Comment 12 Alexey Proskuryakov 2019-03-25 10:04:10 PDT
Quite exciting!

Can you explain the new behavior from tool user's point of view? How will we see which leak originated where, and how does this work with LeaksViewer web page?

From tool point of view, how does it make sure that it watches each WebContent process that is created by WebKit? Caching and pre-warming behavior can change without notice, so it seems like a tricky problem.
Comment 13 David Kilzer (:ddkilzer) 2019-03-25 10:40:17 PDT
(In reply to Alexey Proskuryakov from comment #12)
> Quite exciting!
> 
> Can you explain the new behavior from tool user's point of view? How will we
> see which leak originated where, and how does this work with LeaksViewer web
> page?

The goal of this patch is to collect leaks from WK2 processes when an engineer uses `run-webkit-tests --leaks` on the command-line.  Integration with LeaksViewer would be a good next step.  (Have to walk before you can run, and checking for leaks in WK2 processes has been a long time coming.)

Note that this doesn't affect checking for WK1 leaks, so LeaksViewer will still work for that.

> From tool point of view, how does it make sure that it watches each
> WebContent process that is created by WebKit? Caching and pre-warming
> behavior can change without notice, so it seems like a tricky problem.

Trying to run leaks on ephemeral com.apple.WebContent processes was a non-goal for this patch.  (One could argue that if a process doesn't last very long, the leaks are less impactful.)  I suspect running with --singly would probably catch some of them, though.

Do you have suggestions for how to catch ephemeral WebContent processes for leaks?
Comment 14 Alexey Proskuryakov 2019-03-25 11:04:00 PDT
Which process(es) does this patch check when running http tests, which have a number of origins, with a separate process for each?

I'm not sure if thinking of only ephemeral processes as the tricky case is right, as we have so much more process wapping now.

> Do you have suggestions for how to catch ephemeral WebContent processes for leaks?

Not sure how specific you want me to be, but clearly we'll need to communicate process lifetime to python code, and also have a mode where they are not terminated before they can be checked for leaks.
Comment 15 David Kilzer (:ddkilzer) 2019-03-25 12:12:22 PDT
(In reply to Alexey Proskuryakov from comment #14)
> Which process(es) does this patch check when running http tests, which have
> a number of origins, with a separate process for each?

Right now it checks the last process running at the end of each shard of tests.

> I'm not sure if thinking of only ephemeral processes as the tricky case is
> right, as we have so much more process wapping now.
> 
> > Do you have suggestions for how to catch ephemeral WebContent processes for leaks?
> 
> Not sure how specific you want me to be, but clearly we'll need to
> communicate process lifetime to python code, and also have a mode where they
> are not terminated before they can be checked for leaks.

One could also imagine a future enhancement to the sharding algorithm so that only tests running on the same origin are run together.

I don't think we should let the perfect be the enemy of the good here.
Comment 16 Alexey Proskuryakov 2019-03-25 12:22:20 PDT
> One could also imagine a future enhancement to the sharding algorithm so that only tests running on the same origin are run together.

This won't work, because other origins are navigated to during individual tests (e.g. start with http://127.0.0.1:8000, navigate to http://localhost:8000).

> I don't think we should let the perfect be the enemy of the good here.

I agree as long as we don't have the implication of switching bots away from WebKit1, it probably needs to be further polished for automation.
Comment 17 David Kilzer (:ddkilzer) 2019-03-25 12:26:21 PDT
(In reply to Alexey Proskuryakov from comment #16)
> > One could also imagine a future enhancement to the sharding algorithm so that only tests running on the same origin are run together.
> 
> This won't work, because other origins are navigated to during individual
> tests (e.g. start with http://127.0.0.1:8000, navigate to
> http://localhost:8000).

Ah, good to know.

> > I don't think we should let the perfect be the enemy of the good here.
> 
> I agree as long as we don't have the implication of switching bots away from
> WebKit1, it probably needs to be further polished for automation.

Agreed.  It will be really useful if engineers don't have to apply local patches just to get leaks for WK2 tests (to avoid accidental commits like r243373).
Comment 18 Ryosuke Niwa 2019-03-26 16:16:18 PDT
Comment on attachment 365868 [details]
Patch v4

View in context: https://bugs.webkit.org/attachment.cgi?id=365868&action=review

> Tools/Scripts/webkitpy/port/driver.py:262
> +            block = self._read_block(deadline, '', wait_for_stderr_eof=True)
> +            self._server_process.set_child_processes(self._parse_child_processes_output(block.decoded_content))

Driver asking ServerProcess to list child processes then storing child processes is a very strange architecture.
I think one alternative is to move the leak detection code in ServerProcess.stop to Driver
then just store the child processes in Driver directly but that's not necessarily better
if wanted to detect the leaks throughout the testing.

Perhaps this code belongs to ServerProcess, which lazily issues #LIST CHILD PROCESSES command.
It's a weird layering either way though.

> Tools/Scripts/webkitpy/port/driver.py:281
> +            m = re.match('^([^:]+): ([0-9]+)$', line)

Please use named groups as in: r'^(?P<name>[^:]+): (?P<pid>[0-9]+)$'
Also not the leading "r" to escape this string as a regular expression.

> Tools/WebKitTestRunner/TestController.cpp:1053
> +    pid_t webContentPID = WKPageGetProcessIdentifier(TestController::singleton().mainWebView()->page());

Why don't we just add some SPI to WKContext to get the list of PIDs from WebProcessPool directly?
That would probably catch all leaks because we cache up to 30 processes after each use.
Comment 19 Jonathan Bedard 2019-03-26 16:25:29 PDT
(In reply to Ryosuke Niwa from comment #18)
> Comment on attachment 365868 [details]
> Patch v4
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=365868&action=review
> 
> > Tools/Scripts/webkitpy/port/driver.py:262
> > +            block = self._read_block(deadline, '', wait_for_stderr_eof=True)
> > +            self._server_process.set_child_processes(self._parse_child_processes_output(block.decoded_content))
> 
> Driver asking ServerProcess to list child processes then storing child
> processes is a very strange architecture.
> I think one alternative is to move the leak detection code in
> ServerProcess.stop to Driver
> then just store the child processes in Driver directly but that's not
> necessarily better
> if wanted to detect the leaks throughout the testing.
> 
> Perhaps this code belongs to ServerProcess, which lazily issues #LIST CHILD
> PROCESSES command.
> It's a weird layering either way though.

We can't have WebKitTestRunner specific logic in the ServerProcess, the ServerProcess isn't specific to WebKitTestRunner. For example, we use the ServerProcess to run both API tests and ImageDiff.

> 
> > Tools/Scripts/webkitpy/port/driver.py:281
> > +            m = re.match('^([^:]+): ([0-9]+)$', line)
> 
> Please use named groups as in: r'^(?P<name>[^:]+): (?P<pid>[0-9]+)$'
> Also not the leading "r" to escape this string as a regular expression.
> 
> > Tools/WebKitTestRunner/TestController.cpp:1053
> > +    pid_t webContentPID = WKPageGetProcessIdentifier(TestController::singleton().mainWebView()->page());
> 
> Why don't we just add some SPI to WKContext to get the list of PIDs from
> WebProcessPool directly?
> That would probably catch all leaks because we cache up to 30 processes
> after each use.
Comment 20 Ryosuke Niwa 2019-03-26 16:28:06 PDT
(In reply to Jonathan Bedard from comment #19)
> (In reply to Ryosuke Niwa from comment #18)
> > Comment on attachment 365868 [details]
> > Patch v4
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=365868&action=review
> > 
> > > Tools/Scripts/webkitpy/port/driver.py:262
> > > +            block = self._read_block(deadline, '', wait_for_stderr_eof=True)
> > > +            self._server_process.set_child_processes(self._parse_child_processes_output(block.decoded_content))
> > 
> > Driver asking ServerProcess to list child processes then storing child
> > processes is a very strange architecture.
> > I think one alternative is to move the leak detection code in
> > ServerProcess.stop to Driver
> > then just store the child processes in Driver directly but that's not
> > necessarily better
> > if wanted to detect the leaks throughout the testing.
> > 
> > Perhaps this code belongs to ServerProcess, which lazily issues #LIST CHILD
> > PROCESSES command.
> > It's a weird layering either way though.
> 
> We can't have WebKitTestRunner specific logic in the ServerProcess, the
> ServerProcess isn't specific to WebKitTestRunner. For example, we use the
> ServerProcess to run both API tests and ImageDiff.

The idea of child processes don't need to be specific to WebKitTestRunner. ServerProcess can have a generic idea of child processes, or a list of PIDs for which post-test tasks need to be performed.

In fact, one of the things I don't like about the current patch is that LIST CHILD PROCESSES isn't added to DumpRenderTree. We absolutely should and return an empty list.
Comment 21 Jonathan Bedard 2019-03-27 09:07:51 PDT
(In reply to Ryosuke Niwa from comment #20)
> (In reply to Jonathan Bedard from comment #19)
> > (In reply to Ryosuke Niwa from comment #18)
> > > Comment on attachment 365868 [details]
> > > Patch v4
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=365868&action=review
> > > 
> > > > Tools/Scripts/webkitpy/port/driver.py:262
> > > > +            block = self._read_block(deadline, '', wait_for_stderr_eof=True)
> > > > +            self._server_process.set_child_processes(self._parse_child_processes_output(block.decoded_content))
> > > 
> > > Driver asking ServerProcess to list child processes then storing child
> > > processes is a very strange architecture.
> > > I think one alternative is to move the leak detection code in
> > > ServerProcess.stop to Driver
> > > then just store the child processes in Driver directly but that's not
> > > necessarily better
> > > if wanted to detect the leaks throughout the testing.
> > > 
> > > Perhaps this code belongs to ServerProcess, which lazily issues #LIST CHILD
> > > PROCESSES command.
> > > It's a weird layering either way though.
> > 
> > We can't have WebKitTestRunner specific logic in the ServerProcess, the
> > ServerProcess isn't specific to WebKitTestRunner. For example, we use the
> > ServerProcess to run both API tests and ImageDiff.
> 
> The idea of child processes don't need to be specific to WebKitTestRunner.
> ServerProcess can have a generic idea of child processes, or a list of PIDs
> for which post-test tasks need to be performed.
> 
> In fact, one of the things I don't like about the current patch is that LIST
> CHILD PROCESSES isn't added to DumpRenderTree. We absolutely should and
> return an empty list.

The idea of child processes isn't specific to WebKitTestRunner, but the '#LIST CHILD PROCESSES' bit is unique to WebKitTestRunner. That's the code that needs to remain in the Driver, it would break API tests and ImageDiff if it was moved to ServerProcess.

I'm also not sure why you would want to add this to DumpRenderTree, why would we want to do extra IPC just to answer a question you already know the answer to? If DumpRenderTree might have child processes in the future, that would be understandable, but by design, DumpRenderTree will never have child processes.
Comment 22 Ryosuke Niwa 2019-03-27 10:00:59 PDT
(In reply to Jonathan Bedard from comment #21)
> (In reply to Ryosuke Niwa from comment #20)
> > (In reply to Jonathan Bedard from comment #19)
> > > We can't have WebKitTestRunner specific logic in the ServerProcess, the
> > > ServerProcess isn't specific to WebKitTestRunner. For example, we use the
> > > ServerProcess to run both API tests and ImageDiff.
> > 
> > The idea of child processes don't need to be specific to WebKitTestRunner.
> > ServerProcess can have a generic idea of child processes, or a list of PIDs
> > for which post-test tasks need to be performed.
> > 
> > In fact, one of the things I don't like about the current patch is that LIST
> > CHILD PROCESSES isn't added to DumpRenderTree. We absolutely should and
> > return an empty list.
> 
> The idea of child processes isn't specific to WebKitTestRunner, but the
> '#LIST CHILD PROCESSES' bit is unique to WebKitTestRunner. That's the code
> that needs to remain in the Driver, it would break API tests and ImageDiff
> if it was moved to ServerProcess.
> 
> I'm also not sure why you would want to add this to DumpRenderTree, why
> would we want to do extra IPC just to answer a question you already know the
> answer to? If DumpRenderTree might have child processes in the future, that
> would be understandable, but by design, DumpRenderTree will never have child
> processes.

That's really implementation details of WebKitTestRunner. It's better to have the same command interface across the board so that test runner code doesn't have to do different things for each test runner kind.
Comment 23 David Kilzer (:ddkilzer) 2019-03-27 10:42:44 PDT
Comment on attachment 365868 [details]
Patch v4

View in context: https://bugs.webkit.org/attachment.cgi?id=365868&action=review

>>>>> Tools/Scripts/webkitpy/port/driver.py:262
>>>>> +            self._server_process.set_child_processes(self._parse_child_processes_output(block.decoded_content))
>>>> 
>>>> Driver asking ServerProcess to list child processes then storing child processes is a very strange architecture.
>>>> I think one alternative is to move the leak detection code in ServerProcess.stop to Driver
>>>> then just store the child processes in Driver directly but that's not necessarily better
>>>> if wanted to detect the leaks throughout the testing.
>>>> 
>>>> Perhaps this code belongs to ServerProcess, which lazily issues #LIST CHILD PROCESSES command.
>>>> It's a weird layering either way though.
>>> 
>>> We can't have WebKitTestRunner specific logic in the ServerProcess, the ServerProcess isn't specific to WebKitTestRunner. For example, we use the ServerProcess to run both API tests and ImageDiff.
>> 
>> The idea of child processes don't need to be specific to WebKitTestRunner. ServerProcess can have a generic idea of child processes, or a list of PIDs for which post-test tasks need to be performed.
>> 
>> In fact, one of the things I don't like about the current patch is that LIST CHILD PROCESSES isn't added to DumpRenderTree. We absolutely should and return an empty list.
> 
> The idea of child processes isn't specific to WebKitTestRunner, but the '#LIST CHILD PROCESSES' bit is unique to WebKitTestRunner. That's the code that needs to remain in the Driver, it would break API tests and ImageDiff if it was moved to ServerProcess.
> 
> I'm also not sure why you would want to add this to DumpRenderTree, why would we want to do extra IPC just to answer a question you already know the answer to? If DumpRenderTree might have child processes in the future, that would be understandable, but by design, DumpRenderTree will never have child processes.

Sorry, I didn't see that "#CHECK WORLD LEAKS" was added to DumpRenderTree, but did nothing, so it's simple to add support for "#LIST CHILD PROCESSES".

I will post a new patch shortly.
Comment 24 David Kilzer (:ddkilzer) 2019-03-27 10:45:30 PDT
Created attachment 366081 [details]
Patch v5

Add support to DumpRenderTree for handling '#LIST CHILD PROCESSES' command.
Comment 25 David Kilzer (:ddkilzer) 2019-03-27 10:48:46 PDT
Created attachment 366084 [details]
Patch v6

Extract common code for getting result.length().
Comment 26 David Kilzer (:ddkilzer) 2019-03-27 13:23:02 PDT
(In reply to David Kilzer (:ddkilzer) from comment #25)
> Created attachment 366084 [details]
> Patch v6
> 
> Extract common code for getting result.length().

Ryosuke said via offline email that adding DumpRenderTree support was enough to land this updated patch.
Comment 27 David Kilzer (:ddkilzer) 2019-03-27 13:25:32 PDT
Committed r243559: <https://trac.webkit.org/changeset/243559>