RESOLVED FIXED 175269
WPT harness errors on leaks and iOS-sim EWS bots
https://bugs.webkit.org/show_bug.cgi?id=175269
Summary WPT harness errors on leaks and iOS-sim EWS bots
Alexey Proskuryakov
Reported 2017-08-07 10:22:11 PDT
A large number of tests are failing on leaks bot with something that looks like a harness error. See e.g. <https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&revision=220122&showExpectations=true&tests=imported%2Fw3c%2Fweb-platform-tests%2FWebCryptoAPI%2FgenerateKey%2Ffailures_RSASSA-PKCS1-v1_5.worker.html>. Here are some of the failures it hits: -------------------------------------- Harness Error (FAIL), message = Error in remote http://localhost:8800/WebCryptoAPI/generateKey/failures_RSASSA-PKCS1-v1_5.worker.js: NetworkError: A network error occurred. -------------------------------------- layer at (0,0) size 800x600 RenderView at (0,0) size 800x600 layer at (0,0) size 800x8 RenderBlock {HTML} at (0,0) size 800x8 RenderBody {BODY} at (8,8) size 784x0 RenderBlock {DIV} at (0,0) size 784x0 -------------------------------------- CONSOLE MESSAGE: line 1677: TypeError: null is not an object (evaluating 'this.message_target.removeEventListener') CONSOLE MESSAGE: line 3: NetworkError: A network error occurred. FAIL Untitled NetworkError: A network error occurred. -------------------------------------- Harness Error (FAIL), message = Error in remote: [object Event] --------------------------------------
Attachments
add logging (4.21 KB, patch)
2017-08-14 22:17 PDT, youenn fablet
no flags
Results from ios-sim EWS (1.35 MB, application/zip)
2017-08-15 17:57 PDT, Ryan Haddad
no flags
Adding more logs (4.67 KB, patch)
2017-08-24 09:03 PDT, youenn fablet
no flags
Skipping imported/w3c/web-platform-tests/eventsource/event-data.html (1.42 KB, patch)
2017-08-24 16:20 PDT, youenn fablet
no flags
Patch (1.35 KB, patch)
2017-09-15 09:33 PDT, youenn fablet
no flags
youenn fablet
Comment 1 2017-08-07 14:27:29 PDT
Are these errors all related to testing workers in WPT?
youenn fablet
Comment 3 2017-08-07 22:40:20 PDT
Looking at some tests, it seems as if testharnessreport.js is not sent correctly to the browser (dump as render tree and not as text). For some other tests, it is as if testharness.js is not sent correctly (function test undefined). Looking at https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r220343%20(2511)/wptwk_process_log.out.txt, there is no clear error from the WPT server. Looking at one failing test (/XMLHttpRequest/abort-after-receive.htm) vs a successful test (/XMLHttpRequest/abort-after-send.htm), it seems as if the unsuccessful test is not making any HTTP request to get testharnessreport.js. Any idea why retrieval of some HTTP resources would cause issues in that particular configuration? How slow are these bots? We could try to add more wpt logging.
Alexey Proskuryakov
Comment 4 2017-08-08 10:27:32 PDT
> Any idea why retrieval of some HTTP resources would cause issues in that particular configuration? No. If there no request in the log, it could be a WebKit cache bug of some sort, but I don't know how much we can trust WPT logging. > How slow are these bots? Not too slow, the tests run in just 1 hour.
Alexey Proskuryakov
Comment 5 2017-08-12 18:09:55 PDT
This appears to be quite reproducible locally with a release build. run-webkit-tests --no-build --exit-after-n-crashes-or-timeouts 50 --exit-after-n-failures 500 --release --leaks --dump-render-tree --no-retry-failures --no-sample-on-timeout imported/w3c/web-platform-tests
Alexey Proskuryakov
Comment 6 2017-08-12 18:37:15 PDT
...and one needs to delete test results directory before running the tests for this to reproduce! rm -rf WebKitBuild/Release/layout-test-results
youenn fablet
Comment 7 2017-08-14 12:27:30 PDT
I am also able to reproduce the issue fairly consistently when delete the output folder. I filed https://bugs.webkit.org/show_bug.cgi?id=175543 for that particular issue.
Alexey Proskuryakov
Comment 8 2017-08-14 20:36:29 PDT
Hrmpf, this is still happening after the fix for bug 175543.
youenn fablet
Comment 9 2017-08-14 22:17:47 PDT
Created attachment 318112 [details] add logging
youenn fablet
Comment 10 2017-08-14 22:19:45 PDT
(In reply to youenn fablet from comment #9) > Created attachment 318112 [details] > Patch wpt log does not show anything abnormal. It is as if testharnessreport.js is not executed correctly on DRT. Adding some WPT logging to check whether responses have the expected size. Do we have other bots than WK1 having this issue?
Ryan Haddad
Comment 11 2017-08-15 10:46:02 PDT
(In reply to youenn fablet from comment #10) > (In reply to youenn fablet from comment #9) > > Created attachment 318112 [details] > > Patch > > wpt log does not show anything abnormal. > It is as if testharnessreport.js is not executed correctly on DRT. > Adding some WPT logging to check whether responses have the expected size. > > Do we have other bots than WK1 having this issue? I think I was seeing a similar issue with the ios-sim EWS bots last week. They seem to be working right now, but I will monitor them and update this bug if I find any examples.
WebKit Commit Bot
Comment 12 2017-08-15 13:21:29 PDT
Comment on attachment 318112 [details] add logging Clearing flags on attachment: 318112 Committed r220754: <http://trac.webkit.org/changeset/220754>
WebKit Commit Bot
Comment 13 2017-08-15 13:21:31 PDT
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 14 2017-08-15 13:22:21 PDT
Alexey Proskuryakov
Comment 15 2017-08-15 13:24:07 PDT
Re-opening, as this was only logging.
youenn fablet
Comment 16 2017-08-15 17:24:40 PDT
Looking at https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r220760%20(2656)/wptwk_process_log.out.txt and imported/w3c/web-platform-tests/user-timing/test_user_timing_measure_navigation_timing.html test that fails for instance. We got: DEBUG:web-platform-tests:200 GET /resources/testharness.js (http://localhost:8800/user-timing/test_user_timing_measure_navigation_timing.html) 0 DEBUG:web-platform-tests:200 GET /user-timing/resources/webperftestharness.js (http://localhost:8800/user-timing/test_user_timing_measure_navigation_timing.html) 0 DEBUG:web-platform-tests:Wrote content of size 4880 for /user-timing/resources/webperftestharness.js DEBUG:web-platform-tests:Wrote content of size 100163 for /resources/testharness.js DEBUG:web-platform-tests:Got exception when writing response Traceback (most recent call last): File "/Volumes/Data/slave/sierra-leaks/build/LayoutTests/imported/w3c/web-platform-tests/tools/wptserve/wptserve/response.py", line 446, in write self._wfile.write(buffer) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 328, in write self.flush() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) error: [Errno 32] Broken pipe
Ryan Haddad
Comment 17 2017-08-15 17:57:05 PDT
Created attachment 318207 [details] Results from ios-sim EWS Attached an example that I think exhibits this issue on ios-sim EWS from bug https://bugs.webkit.org/show_bug.cgi?id=173457
Alexey Proskuryakov
Comment 18 2017-08-16 11:08:17 PDT
> error: [Errno 32] Broken pipe What process is this pipe connected to?
youenn fablet
Comment 19 2017-08-16 11:12:30 PDT
(In reply to Alexey Proskuryakov from comment #18) > > error: [Errno 32] Broken pipe > > What process is this pipe connected to? DumpRenderTree I would guess.
Alexey Proskuryakov
Comment 20 2017-08-16 12:32:04 PDT
DumpRenderTree uses TCP to talk to the server, not a pipe. So the pipe has to be between processes in the WPT harness. A SIGPIPE means that the other process closes the pipe too early.
youenn fablet
Comment 21 2017-08-16 12:44:08 PDT
Cannot SIGPIPE be sent to WPT if the socket WPT uses to write is not connected anymore, for instance if DRT decided to close the TCP connection?
Alexey Proskuryakov
Comment 22 2017-08-16 12:59:43 PDT
It could technically, but I'm not sure if that's likely. WebKit won't close the connection until it has the response, so the scenario is not very plausible.
youenn fablet
Comment 23 2017-08-16 13:33:49 PDT
(In reply to Alexey Proskuryakov from comment #22) > It could technically, but I'm not sure if that's likely. WebKit won't close > the connection until it has the response, so the scenario is not very > plausible. Do you have any other idea? I was planning to add some more logging to check precisely which resources were failing to be sent and add timestamps so that we can check if somehow the WPT server is hanging.
Robin Morisset
Comment 24 2017-08-17 16:51:26 PDT
Could this bug also explain the failing test result for iOS-sim on https://bugs.webkit.org/show_bug.cgi?id=175610 ?
youenn fablet
Comment 25 2017-08-17 16:55:36 PDT
(In reply to Robin Morisset from comment #24) > Could this bug also explain the failing test result for iOS-sim on > https://bugs.webkit.org/show_bug.cgi?id=175610 ? would need to look at the wpt log but it seems so.
youenn fablet
Comment 26 2017-08-21 08:32:32 PDT
Since it only happens on some bots, could it be something similar to bug 164651?
Alexey Proskuryakov
Comment 27 2017-08-21 09:03:03 PDT
It's similar in that the WPT server isn't working well in both cases, but I don't see a reason to believe that the root causes are the same.
Ryan Haddad
Comment 28 2017-08-22 08:40:18 PDT
This continues to affect ios-sim EWS bots. They often don't finish tests or produce false positives on patches. We need to fix this ASAP.
Ryan Haddad
Comment 29 2017-08-23 17:32:16 PDT
Youenn, have you had a chance to look in to adding additional logging?
youenn fablet
Comment 30 2017-08-24 09:03:36 PDT
Created attachment 318992 [details] Adding more logs
Alexey Proskuryakov
Comment 31 2017-08-24 09:05:49 PDT
Comment on attachment 318992 [details] Adding more logs rs=me
WebKit Commit Bot
Comment 32 2017-08-24 09:49:39 PDT
Comment on attachment 318992 [details] Adding more logs Clearing flags on attachment: 318992 Committed r221141: <http://trac.webkit.org/changeset/221141>
WebKit Commit Bot
Comment 33 2017-08-24 09:49:41 PDT
All reviewed patches have been landed. Closing bug.
Ryan Haddad
Comment 34 2017-08-24 09:52:48 PDT
Reopening since the patch only added logging.
youenn fablet
Comment 35 2017-08-24 16:17:37 PDT
I looked at some logs like https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r221152%20(2804)/wptwk_process_log.out.txt where tests do not run well and https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r221155%20(2805)/wptwk_process_log.out.txt which runs well. When everything is fine, the server got requests and returns responses steadily. When the run is wrong, there are periods like 10 seconds where no requests or responses are logged. The server continues to log "Broken pipe" errors but that is all. I will skip a test that uses message2.py to reduce the number of these "Broken pipe" errors.
youenn fablet
Comment 36 2017-08-24 16:20:49 PDT
Created attachment 319033 [details] Skipping imported/w3c/web-platform-tests/eventsource/event-data.html
WebKit Commit Bot
Comment 37 2017-08-24 16:52:17 PDT
Comment on attachment 319033 [details] Skipping imported/w3c/web-platform-tests/eventsource/event-data.html Clearing flags on attachment: 319033 Committed r221170: <http://trac.webkit.org/changeset/221170>
WebKit Commit Bot
Comment 38 2017-08-24 16:52:19 PDT
All reviewed patches have been landed. Closing bug.
Ryan Haddad
Comment 39 2017-08-25 09:20:24 PDT
Reopening because the issue is not resolved.
youenn fablet
Comment 40 2017-08-25 10:03:36 PDT
On the EWS leak bot, I can see some TCP level errors like: default 09:43:54.343176 -0700 DumpRenderTree TIC TCP Conn Start [21:0x7fbbca7948d0] default 09:43:54.351226 -0700 DumpRenderTree TIC TCP Conn Event [21:0x7fbbca7948d0]: 1 Err(0) default 09:43:54.351343 -0700 DumpRenderTree TIC TCP Conn Connected [21:0x7fbbca7948d0]: Err(0) default 09:43:54.354371 -0700 DumpRenderTree TIC TCP Conn Event [21:0x7fbbca7948d0]: 2 Err(0) error 09:43:54.354614 -0700 DumpRenderTree TIC Read Error [21:0x7fbbca7948d0]: 1:57 default 09:43:54.355162 -0700 DumpRenderTree TIC TCP Conn Cancel [21:0x7fbbca7948d0] default 09:43:54.356212 -0700 DumpRenderTree TIC TCP Conn Destroyed [21:0x7fbbca7948d0] default 09:43:54.378052 -0700 DumpRenderTree TIC TCP Conn Start [22:0x7fbbca79ad40] default 09:43:54.381394 -0700 DumpRenderTree TIC TCP Conn Start [23:0x7fbbca601770] default 09:43:54.381732 -0700 DumpRenderTree TIC TCP Conn Start [24:0x7fbbca1016d0] default 09:43:54.397447 -0700 DumpRenderTree TIC TCP Conn Event [22:0x7fbbca79ad40]: 1 Err(0) default 09:43:54.397536 -0700 DumpRenderTree TIC TCP Conn Connected [22:0x7fbbca79ad40]: Err(0) default 09:43:54.399839 -0700 DumpRenderTree TIC TCP Conn Event [24:0x7fbbca1016d0]: 1 Err(0) default 09:43:54.399951 -0700 DumpRenderTree TIC TCP Conn Connected [24:0x7fbbca1016d0]: Err(0) default 09:43:54.404178 -0700 DumpRenderTree TIC TCP Conn Event [24:0x7fbbca1016d0]: 2 Err(0) default 09:43:54.404477 -0700 DumpRenderTree TIC TCP Conn Event [22:0x7fbbca79ad40]: 2 Err(0) error 09:43:54.405014 -0700 DumpRenderTree TIC Read Error [22:0x7fbbca79ad40]: 1:57 error 09:43:54.405163 -0700 DumpRenderTree TIC Read Error [24:0x7fbbca1016d0]: 1:57 default 09:43:54.405692 -0700 DumpRenderTree TIC TCP Conn Cancel [22:0x7fbbca79ad40] default 09:43:54.406162 -0700 DumpRenderTree TIC TCP Conn Cancel [24:0x7fbbca1016d0] default 09:43:54.406647 -0700 DumpRenderTree TIC TCP Conn Destroyed [22:0x7fbbca79ad40] default 09:43:54.406970 -0700 DumpRenderTree TIC TCP Conn Destroyed [24:0x7fbbca1016d0] default 09:44:20.479517 -0700 DumpRenderTree TIC TCP Conn Event [23:0x7fbbca601770]: 3 Err(60) error 09:44:20.479702 -0700 DumpRenderTree TIC TCP Conn Failed [23:0x7fbbca601770]: 1:60 Err(60) default 09:44:20.479859 -0700 DumpRenderTree TIC TCP Conn Cancel [23:0x7fbbca601770] default 09:44:20.480167 -0700 DumpRenderTree TIC TCP Conn Destroyed [23:0x7fbbca601770] error 09:44:20.481225 -0700 DumpRenderTree HTTP load failed (error code: -1001 [1:60]) error 09:44:20.481645 -0700 DumpRenderTree NSURLConnection finished with error - code -1001 It seems that in some cases, we got some ENOTCON/ETIMEDOUT errors leading to NSURLConnection failing with timeout.
Alexey Proskuryakov
Comment 41 2017-09-07 15:25:56 PDT
Is there anything to undo here (remove logging, unskip tests)?
youenn fablet
Comment 42 2017-09-07 15:47:13 PDT
We should unskip the event-data.html test. The logging will be removed next time we resync WPT tests
youenn fablet
Comment 43 2017-09-15 09:33:34 PDT
WebKit Commit Bot
Comment 44 2017-09-15 10:03:03 PDT
Comment on attachment 320912 [details] Patch Clearing flags on attachment: 320912 Committed r222094: <http://trac.webkit.org/changeset/222094>
WebKit Commit Bot
Comment 45 2017-09-15 10:03:05 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.