Bug 186542

Summary: Turn on PSON in WebKitTestRunner
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: Tools / TestsAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: beidson, commit-queue, dbates, ddkilzer, dean_johnson, ews-watchlist, ggaren, jbedard, lforschler, realdawei, saam, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=186610
https://bugs.webkit.org/show_bug.cgi?id=178902
Bug Depends on: 186438, 186440, 186441, 186532, 186537, 186545, 186546, 186688    
Bug Blocks:    
Attachments:
Description Flags
WIP Patch
none
Archive of layout-test-results from ews123 for ios-simulator-wk2
none
Patch
none
Patch
none
Patch
none
WIP Patch
none
Patch none

Description Chris Dumez 2018-06-11 15:00:29 PDT
Turn on PSON in WebKitTestRunner.
Comment 1 Chris Dumez 2018-06-11 15:01:19 PDT
Created attachment 342467 [details]
WIP Patch

To get EWS results
Comment 2 EWS Watchlist 2018-06-11 17:03:03 PDT
Comment on attachment 342467 [details]
WIP Patch

Attachment 342467 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/8137203

New failing tests:
http/tests/security/xss-DENIED-script-inject-into-inactive-window3.html
http/tests/security/xss-DENIED-script-inject-into-inactive-window2.html
Comment 3 EWS Watchlist 2018-06-11 17:03:05 PDT
Created attachment 342488 [details]
Archive of layout-test-results from ews123 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews123  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.4
Comment 4 Chris Dumez 2018-06-13 16:23:12 PDT
Created attachment 342705 [details]
Patch
Comment 5 Chris Dumez 2018-06-13 16:33:02 PDT
Brady: We may want to fix <rdar://problem/41060769> first? Things may get worse if we enable PSON for all tests.
Comment 6 Chris Dumez 2018-06-18 08:47:42 PDT
Comment on attachment 342705 [details]
Patch

All dependencies have landed. Ping Review?
Comment 7 Chris Dumez 2018-06-18 12:31:29 PDT
Created attachment 342962 [details]
Patch
Comment 8 Brady Eidson 2018-06-18 14:01:57 PDT
After r+'ing I just had the thought - Maybe we should leave the default and allow it to override PSON to *OFF*, for A/B testing purposes?
Comment 9 Chris Dumez 2018-06-18 14:52:39 PDT
Created attachment 342974 [details]
Patch
Comment 10 Dawei Fenton (:realdawei) 2018-06-21 16:01:04 PDT
looks like EWS is running out of resources on Mac-wk2 while trying to repeatedly run this patch. Can you make sure it gets reviewed before landing?

sample output: 
https://webkit-queues.webkit.org/results/8281705

worker/7: OSError('[Errno 35] Resource temporarily unavailable') raised:
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/message_pool.py", line 257, in run
      worker.handle(message.name, message.src, *message.args)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 270, in handle
      self._run_test(test_input, test_list_name)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 294, in _run_test
      result = self._run_test_with_or_without_timeout(test_input, test_timeout_sec, stop_when_done)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 337, in _run_test_with_or_without_timeout
      return self._run_test_in_this_thread(test_input, stop_when_done)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 418, in _run_test_in_this_thread
      return self._run_single_test(self._driver, test_input, stop_when_done)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 422, in _run_single_test
      self._name, driver, test_input, stop_when_done)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 46, in run_single_test
      return runner.run()
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 102, in run
      return self._run_reftest()
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 310, in _run_reftest
      test_result = self._compare_output_with_reference(reference_output, test_output, reference_filename, expectation == '!=')
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 341, in _compare_output_with_reference
      diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/base.py", line 336, in diff_image
      return self._image_differ.diff_image(expected_contents, actual_contents, tolerance)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/image_diff.py", line 58, in diff_image
      self._start(tolerance)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/image_diff.py", line 73, in _start
      self._process.start()
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/server_process.py", line 359, in start
      self._start()
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/server_process.py", line 124, in _start
      universal_newlines=self._universal_newlines)
    File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/system/executive.py", line 481, in popen
      return subprocess.Popen(string_args, **kwargs)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
      errread, errwrite)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1231, in _execute_child
      self.pid = os.fork()
Comment 11 Chris Dumez 2018-06-27 14:59:47 PDT
(In reply to David Fenton (:realdawei) from comment #10)
> looks like EWS is running out of resources on Mac-wk2 while trying to
> repeatedly run this patch. Can you make sure it gets reviewed before landing?
> 
> sample output: 
> https://webkit-queues.webkit.org/results/8281705
> 
> worker/7: OSError('[Errno 35] Resource temporarily unavailable') raised:
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/message_pool.py",
> line 257, in run
>       worker.handle(message.name, message.src, *message.args)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> layout_test_runner.py", line 270, in handle
>       self._run_test(test_input, test_list_name)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> layout_test_runner.py", line 294, in _run_test
>       result = self._run_test_with_or_without_timeout(test_input,
> test_timeout_sec, stop_when_done)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> layout_test_runner.py", line 337, in _run_test_with_or_without_timeout
>       return self._run_test_in_this_thread(test_input, stop_when_done)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> layout_test_runner.py", line 418, in _run_test_in_this_thread
>       return self._run_single_test(self._driver, test_input, stop_when_done)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> layout_test_runner.py", line 422, in _run_single_test
>       self._name, driver, test_input, stop_when_done)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> single_test_runner.py", line 46, in run_single_test
>       return runner.run()
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> single_test_runner.py", line 102, in run
>       return self._run_reftest()
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> single_test_runner.py", line 310, in _run_reftest
>       test_result = self._compare_output_with_reference(reference_output,
> test_output, reference_filename, expectation == '!=')
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/layout_tests/controllers/
> single_test_runner.py", line 341, in _compare_output_with_reference
>       diff_result = self._port.diff_image(reference_driver_output.image,
> actual_driver_output.image, tolerance=0)
>     File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/base.py",
> line 336, in diff_image
>       return self._image_differ.diff_image(expected_contents,
> actual_contents, tolerance)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/image_diff.py", line
> 58, in diff_image
>       self._start(tolerance)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/image_diff.py", line
> 73, in _start
>       self._process.start()
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/server_process.py",
> line 359, in start
>       self._start()
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/port/server_process.py",
> line 124, in _start
>       universal_newlines=self._universal_newlines)
>     File
> "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/common/system/executive.py",
> line 481, in popen
>       return subprocess.Popen(string_args, **kwargs)
>     File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> subprocess.py", line 710, in __init__
>       errread, errwrite)
>     File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
> subprocess.py", line 1231, in _execute_child
>       self.pid = os.fork()

Jonathan seems to think this is likely due to orphaned Web processes. Given the feature, I could be the old Web processes after we do a process swap on navigation.
Comment 12 Jonathan Bedard 2018-06-27 17:38:53 PDT
This is pretty easy to detect, actually.

I built on Mojave and ran 'run-webkit-tests --child-processes 1 fast' and checked the number of WebProcesses running with 'ps aux | grep WebContent | wc -l.' It started ballooning almost immediately, from 3 to 15 in the first 30ish seconds.  We can probably narrow this down to a pair of tests which reproduce the problem.

I will note that I actually think 'orphaning' may be the wrong term. Once the test running closes, it seems like the 'orphaned' process go away. Perhaps something in WebKit is erroneously keeping around a reference to the stale processes?
Comment 13 Chris Dumez 2018-06-27 17:54:30 PDT
(In reply to Jonathan Bedard from comment #12)
> This is pretty easy to detect, actually.
> 
> I built on Mojave and ran 'run-webkit-tests --child-processes 1 fast' and
> checked the number of WebProcesses running with 'ps aux | grep WebContent |
> wc -l.' It started ballooning almost immediately, from 3 to 15 in the first
> 30ish seconds.  We can probably narrow this down to a pair of tests which
> reproduce the problem.
> 
> I will note that I actually think 'orphaning' may be the wrong term. Once
> the test running closes, it seems like the 'orphaned' process go away.
> Perhaps something in WebKit is erroneously keeping around a reference to the
> stale processes?

Brady knows more about this but I believe we intentionally keep some old web processes around for supporting page cache back navigation after process swapping.
Comment 14 Jonathan Bedard 2018-06-28 08:49:16 PDT
(In reply to Chris Dumez from comment #13)
> ...
> 
> Brady knows more about this but I believe we intentionally keep some old web
> processes around for supporting page cache back navigation after process
> swapping.

It seems clear that the exact technique we're using to 'keep some old web processes' is of great concern when running layout tests.

I'm not entirely convinced that this is just a layout test problem, though. What kind of
Comment 15 Brady Eidson 2018-06-28 08:53:57 PDT
(In reply to Jonathan Bedard from comment #14)
> (In reply to Chris Dumez from comment #13)
> > ...
> > 
> > Brady knows more about this but I believe we intentionally keep some old web
> > processes around for supporting page cache back navigation after process
> > swapping.
> 
> It seems clear that the exact technique we're using to 'keep some old web
> processes' is of great concern when running layout tests.
> 
> I'm not entirely convinced that this is just a layout test problem, though.
> What kind of

... What kind of ...?
Comment 16 Jonathan Bedard 2018-06-28 09:09:20 PDT
(Continuing comment 14, accidental posting)

metric are we using to determine when to close a WebProcess.

Talking with Brady on IRC, it seems like we are NOT deliberately keeping WebProcesses around. I observed ~30 WebProcesses when running a single test runner on my local machine, which is definitely too many.
Comment 17 Jonathan Bedard 2018-06-29 10:07:08 PDT
The leak seems to happen when are required to re-create the main WebView during testing. The root cause may be some poor memory management in the test runner.

Running the following 2 tests reproduces the leak:
fast/animation/css-animation-resuming-when-visible-with-style-change.html fast/animation/css-animation-resuming-when-visible-with-style-change2.html
Comment 18 Jonathan Bedard 2018-06-29 15:36:12 PDT
I've linked this to <https://bugs.webkit.org/show_bug.cgi?id=178902>, it seems to be a similar problem. We're leaking a WKProcessPool through WKWebViewConfiguration. I suspect this problem is not limited to the test runner and occurs any time a WKWebView is reconfigured with PSON turned on.
Comment 19 Chris Dumez 2018-07-16 16:51:12 PDT
The leaking of WebContent processes is due to process prewarming. All pre-warmed process that do not get used end up leaking, as do their WebProcessProxy and their WebProcessPool :(

WebProcessPool refs WebProcessProxy and WebProcessProxy refs WebProcessPool.
This reference cycle normally gets broken when a WebProcessProxy's number of pages falls to 0 and WebProcessProxy::maybeShutDown() gets called. if Number of pages is 0, maybeShutDown() calls WebProcessProxy::shutDown() which calls WebProcessPool::disconnectProcess() to break the reference cycle.

For pre-warmed web processes that never get any page, we never call maybeShutDown(), therefore we leak them and their process pool. Not quite sure how to fix either. When is it OK to drop a Pre-warmed web process?
Comment 20 Chris Dumez 2018-07-16 16:57:19 PDT
(In reply to Chris Dumez from comment #19)
> The leaking of WebContent processes is due to process prewarming. All
> pre-warmed process that do not get used end up leaking, as do their
> WebProcessProxy and their WebProcessPool :(
> 
> WebProcessPool refs WebProcessProxy and WebProcessProxy refs WebProcessPool.
> This reference cycle normally gets broken when a WebProcessProxy's number of
> pages falls to 0 and WebProcessProxy::maybeShutDown() gets called. if Number
> of pages is 0, maybeShutDown() calls WebProcessProxy::shutDown() which calls
> WebProcessPool::disconnectProcess() to break the reference cycle.
> 
> For pre-warmed web processes that never get any page, we never call
> maybeShutDown(), therefore we leak them and their process pool. Not quite
> sure how to fix either. When is it OK to drop a Pre-warmed web process?

I guess we can try and get rid of this reference cycle altogether by making WebProcessProxy::m_processPool a raw pointer, I'll try that.
Comment 21 Chris Dumez 2018-07-17 12:00:26 PDT
Created attachment 345177 [details]
WIP Patch
Comment 22 Chris Dumez 2018-07-17 13:52:46 PDT
Created attachment 345188 [details]
Patch
Comment 23 WebKit Commit Bot 2018-07-17 14:50:15 PDT
Comment on attachment 345188 [details]
Patch

Clearing flags on attachment: 345188

Committed r233897: <https://trac.webkit.org/changeset/233897>
Comment 24 WebKit Commit Bot 2018-07-17 14:50:17 PDT
All reviewed patches have been landed.  Closing bug.
Comment 25 Radar WebKit Bug Importer 2018-07-17 14:54:00 PDT
<rdar://problem/42304094>