Bug 188036 - Consecutive DumpRenderTree crash on WinCairo BuildBots
Summary: Consecutive DumpRenderTree crash on WinCairo BuildBots
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 188160
Blocks:
  Show dependency treegraph
 
Reported: 2018-07-25 22:04 PDT by Fujii Hironori
Modified: 2018-12-06 18:31 PST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fujii Hironori 2018-07-25 22:04:26 PDT
Consecutive DumpRenderTree crash on WinCairo BuildBots

It happens occasionally on both test bots:

- WinCairo 64-bit WKL Release (Tests)
- WinCairo 64-bit WKL Debug (Tests)

For example:
https://build.webkit.org/builders/WinCairo%2064-bit%20WKL%20Release%20%28Tests%29/builds/738/steps/layout-test/logs/stdio

> 17:42:28.538 108568 Looking at [u'C:\\WebKit-BuildWorker\\wincairo-wkl-release-tests\\build\\WebKitBuild\\Release\\bin64\\DumpRenderTree.exe', '-']
> 17:42:28.637 108568 This test marked as a crash because of failure to poll the server process.
> 17:42:28.638 108568 looking for crash log for DumpRenderTree:104304
> 17:42:28.640 108568 Looking at [u'C:\\WebKit-BuildWorker\\wincairo-wkl-release-tests\\build\\WebKitBuild\\Release\\bin64\\DumpRenderTree.exe', '-']
> 17:42:28.743 108568 This test marked as a crash because of failure to poll the server process.
> 17:42:28.743 108568 looking for crash log for DumpRenderTree:89312
> 17:42:28.743 108568 worker/7 fast\css\variables\env\safe-area-inset-env-zero.html crashed, (no stderr)
> 17:42:28.752 118408 [8638/15300] fast\css\variables\env\safe-area-inset-env-zero.html failed unexpectedly (DumpRenderTree crashed [pid=104304])
> 17:42:28.751 108568 worker/7 killing driver
> 17:42:28.753 108568 worker/7 fast\css\variables\env\safe-area-inset-env-zero.html failed:
> 17:42:28.753 108568 worker/7  DumpRenderTree crashed [pid=104304]
> 17:42:28.767 108568 Looking at [u'C:\\WebKit-BuildWorker\\wincairo-wkl-release-tests\\build\\WebKitBuild\\Release\\bin64\\DumpRenderTree.exe', '-']
> 17:42:28.882 108568 This test marked as a crash because of failure to poll the server process.
> 17:42:28.882 108568 looking for crash log for DumpRenderTree:105088
> 17:42:28.885 108568 Looking at [u'C:\\WebKit-BuildWorker\\wincairo-wkl-release-tests\\build\\WebKitBuild\\Release\\bin64\\DumpRenderTree.exe', '-']
> 17:42:28.996 108568 This test marked as a crash because of failure to poll the server process.
> 17:42:28.996 108568 looking for crash log for DumpRenderTree:67916
> 17:42:28.996 108568 worker/7 fast\css\will-change\will-change-creates-stacking-context-inline.html crashed, (no stderr)
> 17:42:29.015 118408 [8645/15300] fast\css\will-change\will-change-creates-stacking-context-inline.html failed unexpectedly (DumpRenderTree crashed [pid=105088])
> 17:42:29.004 108568 worker/7 killing driver
> 17:42:29.005 108568 worker/7 fast\css\will-change\will-change-creates-stacking-context-inline.html failed:
> 17:42:29.005 108568 worker/7  DumpRenderTree crashed [pid=105088]
Comment 1 Fujii Hironori 2018-07-25 22:14:10 PDT
I can't reproduce this issue on my bare Windows PC.
However, it's easy to reproduce with Windows Docker on my PC.

> docker run --name build -it --cpu-count=8 --memory=16g webkitdev/msbuild powershell
> 
> Select-VSEnvironment
> 
> $env:http_proxy = "http://..."
> $env:https_proxy = $env:http_proxy
> 
> git clone --depth=1 https://git.webkit.org/git/WebKit.git
> cd WebKit
> perl Tools\Scripts\build-webkit --wincairo
> 
> rm env:http_proxy
> rm env:https_proxy
> $env:WEBKIT_LIBRARIES = (gl).path + "\WebKitLibraries\win"
> python ./Tools/Scripts/run-webkit-tests --no-build --no-show-results --no-new-test-results --exit-after-n-crashes-or-timeouts 50 --exit-after-n-failures 500 --release --dump-render-tree --wincairo --debug-rwt-logging


"docker run" frequently fails "hcsshim: timeout waiting" error on my PC. I need to try it repeatedly to start a Docker container.

Cannot run container with more than 3GB memory · Issue #1094 · docker/for-win
https://github.com/docker/for-win/issues/1094

hcsshim: timeout waiting for notification extra info · Issue #152 · Microsoft/hcsshim
https://github.com/Microsoft/hcsshim/issues/152
Comment 2 Fujii Hironori 2018-07-25 22:19:41 PDT
It seems that one possible workaround is limit the number of CPU as 1. 
Pass --cpu-count=1 to docker, or --child-processes=1 to run-webkit-tests.
Comment 3 Fujii Hironori 2018-07-26 22:43:52 PDT
https://github.com/WebKit/webkit/blob/master/Tools/Scripts/webkitpy/port/server_process.py#L325

poll() returns -4 in that case.
Comment 4 Fujii Hironori 2018-07-27 00:13:57 PDT
I applied a patch to show returncode to trunk@234190.
Here are the patch and the log.
https://gist.github.com/fujii/92f167fe59c231247d39ded503320bd8

There are two values of returncode, -1073741819 and -4.

> 15:48:38.313 3196 This test marked as a crash because of failure to poll the server process. returncode=-1073741819
> 15:48:47.289 24964 This test marked as a crash because of failure to poll the server process. returncode=-1073741819
> 15:50:09.235 9780 This test marked as a crash because of failure to poll the server process. returncode=-4
> 15:50:09.471 9780 This test marked as a crash because of failure to poll the server process. returncode=-4

-4 is the reurncode of consecutive DumpRenderTree crash.

returncode should be the valud of GetExitCodeProcess.
It's weird it is a negative value.
Looking though Python-2.7.15 source code, I can't find the code it makes negative.
Comment 5 Fujii Hironori 2018-07-27 00:33:13 PDT
https://github.com/python/cpython/blob/d098098ce1dcb02d18571551654cbe7b92d291a4/PC/_subprocess.c#L549
https://github.com/python/cpython/blob/d098098ce1dcb02d18571551654cbe7b92d291a4/Include/intobject.h#L38

> return PyInt_FromLong(exit_code);

This code converts DWORD to long. It can make a negative value.
If that true, exit code given by GetExitCodeProcess were fffffffc (-4) or c0000005 (-1073741819).
Comment 6 Fujii Hironori 2018-07-27 00:48:09 PDT
https://stackoverflow.com/questions/17168982/exception-error-c0000005-in-vc

According to the above page, c0000005 means access violation.

Exit code -4 (0xfffffffc) - BOINC Wiki
https://boinc.mundayweb.com/wiki/index.php?title=Exit_code_-4_(0xfffffffc)

> This error notifies you of problems with your page file. It may be accompanied by "Project Application Name" error -4 Can't allocate memory.

It seems that -4 means page allocation failure.
Comment 7 Fujii Hironori 2018-08-15 04:01:19 PDT
A env var WEBKIT_TEST_CHILD_PROCESSES=2 is set on test bots to avoid this issue.
Comment 8 Fujii Hironori 2018-10-30 18:17:32 PDT
(In reply to Fujii Hironori from comment #7)
> A env var WEBKIT_TEST_CHILD_PROCESSES=2 is set on test bots to avoid this
> issue.

This change has solved the issue.