Bug 230513 - Broken pipes during iOS simulator testing
Summary: Broken pipes during iOS simulator testing
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: iPhone / iPad Unspecified
: P2 Normal
Assignee: Jonathan Bedard
URL:
Keywords: InRadar
: 230060 230099 230325 230418 (view as bug list)
Depends on:
Blocks: 226658
  Show dependency treegraph
 
Reported: 2021-09-20 14:46 PDT by ayumi_kojima
Modified: 2022-02-16 18:01 PST (History)
12 users (show)

See Also:


Attachments
test lists (90.77 KB, text/plain)
2021-09-21 18:07 PDT, ayumi_kojima
no flags Details
Revert Patch (7.73 KB, patch)
2021-09-23 13:54 PDT, ayumi_kojima
no flags Details | Formatted Diff | Diff
Patch (4.31 KB, patch)
2022-02-14 08:21 PST, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (1.58 KB, patch)
2022-02-14 11:09 PST, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (1.92 KB, patch)
2022-02-15 16:50 PST, Jonathan Bedard
ews-feeder: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description ayumi_kojima 2021-09-20 14:46:40 PDT
fast/viewport/scroll-delegates-switch-on-page-with-no-composition-mode-asserts.html 

Is flaky crashing on iOS.

History: https://results.webkit.org/?suite=layout-tests&test=fast%2Fviewport%2Fscroll-delegates-switch-on-page-with-no-composition-mode-asserts.html

Crash log:

No crash log found for WebKitTestRunnerApp:39389.
Comment 1 Radar WebKit Bug Importer 2021-09-20 14:46:57 PDT
<rdar://problem/83325409>
Comment 2 ayumi_kojima 2021-09-20 14:51:58 PDT
Marked test expectations: https://trac.webkit.org/changeset/282786/webkit
Comment 3 Alexey Proskuryakov 2021-09-21 13:55:49 PDT
13:33:36.641 57433 worker/4 fast/url/user-visible/rf.html passed
13:33:36.760 57433 worker/4 fast/url/user-visible/srb.html passed
13:33:36.760 57433 worker/4 finished test group
13:33:36.887 57433 worker/4 killed pid 60933
13:33:36.888 57433 worker/4 This test marked as a crash because of a broken pipe when writing to stdin of the server process.
13:34:16.905 57433 worker/4 worker/4 fast/viewport/scroll-delegates-switch-on-page-with-no-composition-mode-asserts.html crashed, (no stderr)
13:34:16.906 57433 worker/4 killing driver
13:34:16.918 57433 worker/4 fast/viewport/scroll-delegates-switch-on-page-with-no-composition-mode-asserts.html failed:
13:34:16.918 57433 worker/4 WebKitTestRunnerApp crashed [pid=60933]
Comment 4 ayumi_kojima 2021-09-21 18:06:52 PDT
Though, it might not be able to reproduce the issue locally, tried it to make sure.

First, I tried to reproduce it by running all tests with multiple workers (run-webkit-tests --force --exit-after-n-crashes-or-timeouts 1000 --exit-after-n-failures 1000 --no-build --no-show-results --no-new-test-results --clobber-old-results --ios-simulator --debug --child-processes 8), but got (most likely) unrelated error, "OSError: [Errno 9] Bad file descriptor (from worker/6)", and it stopped running tests.

Next, tried with test lists (See attached list) with a single worker. Got the same error (OSError: [Errno 9] Bad file descriptor).

Next, tried with test lists with multiple workers. Tests ran successfully, but the test in question (fast/viewport/scroll-delegates-switch-on-page-with-no-composition-mode-asserts.html) didn't "crash".
Comment 5 ayumi_kojima 2021-09-21 18:07:18 PDT
Created attachment 438898 [details]
test lists
Comment 6 ayumi_kojima 2021-09-22 09:50:56 PDT
Another tests that "crashed" with broken pipes in worker/3,4,5:

fast/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html
imported/blink/fast/multicol/basic-rtl.html
imported/w3c/i18n/bidi/bidi-isolate-override-004.html
imported/w3c/web-platform-tests/css/css-backgrounds/background-size/vector/background-size-vector-001.html
imported/w3c/web-platform-tests/css/css-backgrounds/background-size/vector/zero-height-ratio-contain.html
imported/w3c/web-platform-tests/css/css-sizing/aspect-ratio/abspos-001.html
imported/w3c/web-platform-tests/css/css-sizing/aspect-ratio/grid-aspect-ratio-035.html
imported/w3c/web-platform-tests/css/css-transforms/transform-input-012.html
imported/w3c/web-platform-tests/css/css-ui/appearance-auto-001.html
imported/w3c/web-platform-tests/fetch/api/redirect/redirect-mode.any.worker.html
imported/w3c/web-platform-tests/html/semantics/scripting-1/the-script-element/module/currentScript-null.html

https://build.webkit.org/#/builders/28/builds/3310
Comment 7 ayumi_kojima 2021-09-22 09:51:19 PDT
Another one:

worker/0,1,3

fast/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html
imported/w3c/web-platform-tests/IndexedDB/abort-in-initial-upgradeneeded.html
imported/w3c/web-platform-tests/css/css-transforms/preserve3d-and-filter-no-perspective.html
imported/w3c/web-platform-tests/css/css-transforms/transform-inherit-001.html

https://build.webkit.org/#/builders/28/builds/3295
Comment 8 ayumi_kojima 2021-09-22 11:10:57 PDT
*** Bug 230099 has been marked as a duplicate of this bug. ***
Comment 9 ayumi_kojima 2021-09-22 11:12:27 PDT
*** Bug 230060 has been marked as a duplicate of this bug. ***
Comment 10 ayumi_kojima 2021-09-22 11:14:20 PDT
*** Bug 230418 has been marked as a duplicate of this bug. ***
Comment 11 ayumi_kojima 2021-09-22 11:14:33 PDT
*** Bug 230325 has been marked as a duplicate of this bug. ***
Comment 12 ayumi_kojima 2021-09-22 11:14:43 PDT
Pasting comments from Bug 230060

Alexey Proskuryakov 2021-09-12 13:55:30 PDT
Broken pipe is weird, but also not sure why it took 40 seconds between "This test marked as a crash because of a broken pipe" and "crashed".


Simon Fraser (smfr) 2021-09-22 10:24:58 PDT
Seems like some kinds of process unresponsiveness/lack of output result in us calling something a "crash" when it's not. Would be nice to fix this.
Comment 13 ayumi_kojima 2021-09-23 13:53:26 PDT
I was able to reproduce the bug by the following steps:

1. Apply the attached patch locally that reverts all of the protection placed for the OSError exception. 

2. Run all tests using Python2 (python OpenSource/Tools/Scripts/run-webkit-tests --exit-after-n-crashes-or-timeouts 1000 --exit-after-n-failures 1000 --no-build --no-show-results --no-new-test-results --clobber-old-results --ios-simulator --debug --debug-rwt-logging --child-processes 8 ;sudo sysdiagnose -l)

3. Test run stops at around fast/css-intrinsic-dimensions. Observe many timed-out tests.

4. Run tests again using run-webkit-tests <same build> --ios-simulator fast

5. Receives stdout:

[1/12583] fast/check-layout-error-no-attributes.html failed unexpectedly (test timed out, no output from test)
[2/12583] fast/js-promise-from-detached-iframe.html failed unexpectedly (test timed out, no output from test)
[3/12583] fast/animation/animation-display-style-adjustment.html failed unexpectedly (test timed out)

OSError raised: [Errno 9] Bad file descriptor

...
Comment 14 ayumi_kojima 2021-09-23 13:54:53 PDT
Created attachment 439088 [details]
Revert Patch
Comment 15 Jonathan Bedard 2022-02-10 11:33:32 PST
This is caused by some of our http servers, trying to determine which ones
Comment 16 Jonathan Bedard 2022-02-11 13:15:12 PST
It's the WebSocket server, and we don't even need to use the server to run into the problem.
Comment 17 Jonathan Bedard 2022-02-11 13:25:16 PST
This is basically https://bugs.webkit.org/show_bug.cgi?id=206546. Apparently an explicit `python2` invocation is what breaks things.
Comment 18 Jonathan Bedard 2022-02-11 14:34:10 PST
Was able to get a reproduction of this without any web servers. I think we've found some problems with our websocket server, but it doesn't appear to be the only problem.
Comment 19 Jonathan Bedard 2022-02-14 08:21:27 PST
Created attachment 451903 [details]
Patch
Comment 20 Jonathan Bedard 2022-02-14 11:09:12 PST
Created attachment 451925 [details]
Patch
Comment 21 Jonathan Bedard 2022-02-14 11:24:39 PST
(In reply to Jonathan Bedard from comment #20)
> Created attachment 451925 [details]
> Patch

I borrowed one of our bots and was able to reliably reproduce this issue before this change, and unable to do so after. I don't know what I was thinking with the comment I removed, all of my new testing indicates that comment is just wrong.
Comment 22 dewei_zhu 2022-02-15 15:23:21 PST
Comment on attachment 451925 [details]
Patch

r=me
Comment 23 Alexey Proskuryakov 2022-02-15 16:33:41 PST
Comment on attachment 451925 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=451925&action=review

> Tools/Scripts/webkitpy/port/simulator_process.py:76
>                  # If the file descriptor is bad, we don't have to worry about closing it

Don't we still need to close the socket when self._file.close() raises an exception?
Comment 24 Jonathan Bedard 2022-02-15 16:36:10 PST
Comment on attachment 451925 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=451925&action=review

>> Tools/Scripts/webkitpy/port/simulator_process.py:76
>>                  # If the file descriptor is bad, we don't have to worry about closing it
> 
> Don't we still need to close the socket when self._file.close() raises an exception?

Although hopefully a rare circumstance, yes, we probably should.
Comment 25 Jonathan Bedard 2022-02-15 16:50:13 PST
Created attachment 452111 [details]
Patch
Comment 26 Sam Sneddon [:gsnedders] 2022-02-16 07:33:38 PST
Comment on attachment 452111 [details]
Patch

Can you also fix the comment above (L59, referencing Python 2?).

That said, I think this might still be unsafe: I think we need to ensure it's unbuffered otherwise the non-blocking behaviour might end up with nonsense being returned from the buffer? (i.e., buffered=0 to fdopen)
Comment 27 Jonathan Bedard 2022-02-16 08:23:50 PST
(In reply to Sam Sneddon [:gsnedders] from comment #26)
> Comment on attachment 452111 [details]
> Patch
> 
> Can you also fix the comment above (L59, referencing Python 2?).

Will fix it!

> 
> That said, I think this might still be unsafe: I think we need to ensure
> it's unbuffered otherwise the non-blocking behaviour might end up with
> nonsense being returned from the buffer? (i.e., buffered=0 to fdopen)

Third argument to `fdopen` is `buffering`, which is set to 0 (I think for the reason you've outlined, but it's been years since I looked at this code, so I don't totally remember)
Comment 28 EWS 2022-02-16 18:01:10 PST
Committed r289988 (247374@main): <https://commits.webkit.org/247374@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 452111 [details].