Summary: | WPT harness errors on leaks and iOS-sim EWS bots | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Alexey Proskuryakov <ap> | ||||||||||||
Component: | Tools / Tests | Assignee: | youenn fablet <youennf> | ||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||
Severity: | Normal | CC: | cdumez, commit-queue, jlewis3, lforschler, rmorisset, ryanhaddad, webkit-bug-importer, youennf | ||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||
Version: | Other | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=175543 | ||||||||||||||
Attachments: |
|
Description
Alexey Proskuryakov
2017-08-07 10:22:11 PDT
Are these errors all related to testing workers in WPT? I don't think so, failures on many non-worker tests seem to be similar. See e.g. <https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r220343%20(2511)/imported/w3c/web-platform-tests/IndexedDB/idbobjectstore_add5-pretty-diff.html> All results: https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r220343%20(2511)/results.html Looking at some tests, it seems as if testharnessreport.js is not sent correctly to the browser (dump as render tree and not as text). For some other tests, it is as if testharness.js is not sent correctly (function test undefined). Looking at https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r220343%20(2511)/wptwk_process_log.out.txt, there is no clear error from the WPT server. Looking at one failing test (/XMLHttpRequest/abort-after-receive.htm) vs a successful test (/XMLHttpRequest/abort-after-send.htm), it seems as if the unsuccessful test is not making any HTTP request to get testharnessreport.js. Any idea why retrieval of some HTTP resources would cause issues in that particular configuration? How slow are these bots? We could try to add more wpt logging. > Any idea why retrieval of some HTTP resources would cause issues in that particular configuration? No. If there no request in the log, it could be a WebKit cache bug of some sort, but I don't know how much we can trust WPT logging. > How slow are these bots? Not too slow, the tests run in just 1 hour. This appears to be quite reproducible locally with a release build. run-webkit-tests --no-build --exit-after-n-crashes-or-timeouts 50 --exit-after-n-failures 500 --release --leaks --dump-render-tree --no-retry-failures --no-sample-on-timeout imported/w3c/web-platform-tests ...and one needs to delete test results directory before running the tests for this to reproduce! rm -rf WebKitBuild/Release/layout-test-results I am also able to reproduce the issue fairly consistently when delete the output folder. I filed https://bugs.webkit.org/show_bug.cgi?id=175543 for that particular issue. Hrmpf, this is still happening after the fix for bug 175543. Created attachment 318112 [details]
add logging
(In reply to youenn fablet from comment #9) > Created attachment 318112 [details] > Patch wpt log does not show anything abnormal. It is as if testharnessreport.js is not executed correctly on DRT. Adding some WPT logging to check whether responses have the expected size. Do we have other bots than WK1 having this issue? (In reply to youenn fablet from comment #10) > (In reply to youenn fablet from comment #9) > > Created attachment 318112 [details] > > Patch > > wpt log does not show anything abnormal. > It is as if testharnessreport.js is not executed correctly on DRT. > Adding some WPT logging to check whether responses have the expected size. > > Do we have other bots than WK1 having this issue? I think I was seeing a similar issue with the ios-sim EWS bots last week. They seem to be working right now, but I will monitor them and update this bug if I find any examples. Comment on attachment 318112 [details] add logging Clearing flags on attachment: 318112 Committed r220754: <http://trac.webkit.org/changeset/220754> All reviewed patches have been landed. Closing bug. Re-opening, as this was only logging. Looking at https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r220760%20(2656)/wptwk_process_log.out.txt and imported/w3c/web-platform-tests/user-timing/test_user_timing_measure_navigation_timing.html test that fails for instance. We got: DEBUG:web-platform-tests:200 GET /resources/testharness.js (http://localhost:8800/user-timing/test_user_timing_measure_navigation_timing.html) 0 DEBUG:web-platform-tests:200 GET /user-timing/resources/webperftestharness.js (http://localhost:8800/user-timing/test_user_timing_measure_navigation_timing.html) 0 DEBUG:web-platform-tests:Wrote content of size 4880 for /user-timing/resources/webperftestharness.js DEBUG:web-platform-tests:Wrote content of size 100163 for /resources/testharness.js DEBUG:web-platform-tests:Got exception when writing response Traceback (most recent call last): File "/Volumes/Data/slave/sierra-leaks/build/LayoutTests/imported/w3c/web-platform-tests/tools/wptserve/wptserve/response.py", line 446, in write self._wfile.write(buffer) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 328, in write self.flush() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) error: [Errno 32] Broken pipe Created attachment 318207 [details] Results from ios-sim EWS Attached an example that I think exhibits this issue on ios-sim EWS from bug https://bugs.webkit.org/show_bug.cgi?id=173457 > error: [Errno 32] Broken pipe
What process is this pipe connected to?
(In reply to Alexey Proskuryakov from comment #18) > > error: [Errno 32] Broken pipe > > What process is this pipe connected to? DumpRenderTree I would guess. DumpRenderTree uses TCP to talk to the server, not a pipe. So the pipe has to be between processes in the WPT harness. A SIGPIPE means that the other process closes the pipe too early. Cannot SIGPIPE be sent to WPT if the socket WPT uses to write is not connected anymore, for instance if DRT decided to close the TCP connection? It could technically, but I'm not sure if that's likely. WebKit won't close the connection until it has the response, so the scenario is not very plausible. (In reply to Alexey Proskuryakov from comment #22) > It could technically, but I'm not sure if that's likely. WebKit won't close > the connection until it has the response, so the scenario is not very > plausible. Do you have any other idea? I was planning to add some more logging to check precisely which resources were failing to be sent and add timestamps so that we can check if somehow the WPT server is hanging. Could this bug also explain the failing test result for iOS-sim on https://bugs.webkit.org/show_bug.cgi?id=175610 ? (In reply to Robin Morisset from comment #24) > Could this bug also explain the failing test result for iOS-sim on > https://bugs.webkit.org/show_bug.cgi?id=175610 ? would need to look at the wpt log but it seems so. Since it only happens on some bots, could it be something similar to bug 164651? It's similar in that the WPT server isn't working well in both cases, but I don't see a reason to believe that the root causes are the same. This continues to affect ios-sim EWS bots. They often don't finish tests or produce false positives on patches. We need to fix this ASAP. Youenn, have you had a chance to look in to adding additional logging? Created attachment 318992 [details]
Adding more logs
Comment on attachment 318992 [details]
Adding more logs
rs=me
Comment on attachment 318992 [details] Adding more logs Clearing flags on attachment: 318992 Committed r221141: <http://trac.webkit.org/changeset/221141> All reviewed patches have been landed. Closing bug. Reopening since the patch only added logging. I looked at some logs like https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r221152%20(2804)/wptwk_process_log.out.txt where tests do not run well and https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r221155%20(2805)/wptwk_process_log.out.txt which runs well. When everything is fine, the server got requests and returns responses steadily. When the run is wrong, there are periods like 10 seconds where no requests or responses are logged. The server continues to log "Broken pipe" errors but that is all. I will skip a test that uses message2.py to reduce the number of these "Broken pipe" errors. Created attachment 319033 [details]
Skipping imported/w3c/web-platform-tests/eventsource/event-data.html
Comment on attachment 319033 [details] Skipping imported/w3c/web-platform-tests/eventsource/event-data.html Clearing flags on attachment: 319033 Committed r221170: <http://trac.webkit.org/changeset/221170> All reviewed patches have been landed. Closing bug. Reopening because the issue is not resolved. On the EWS leak bot, I can see some TCP level errors like: default 09:43:54.343176 -0700 DumpRenderTree TIC TCP Conn Start [21:0x7fbbca7948d0] default 09:43:54.351226 -0700 DumpRenderTree TIC TCP Conn Event [21:0x7fbbca7948d0]: 1 Err(0) default 09:43:54.351343 -0700 DumpRenderTree TIC TCP Conn Connected [21:0x7fbbca7948d0]: Err(0) default 09:43:54.354371 -0700 DumpRenderTree TIC TCP Conn Event [21:0x7fbbca7948d0]: 2 Err(0) error 09:43:54.354614 -0700 DumpRenderTree TIC Read Error [21:0x7fbbca7948d0]: 1:57 default 09:43:54.355162 -0700 DumpRenderTree TIC TCP Conn Cancel [21:0x7fbbca7948d0] default 09:43:54.356212 -0700 DumpRenderTree TIC TCP Conn Destroyed [21:0x7fbbca7948d0] default 09:43:54.378052 -0700 DumpRenderTree TIC TCP Conn Start [22:0x7fbbca79ad40] default 09:43:54.381394 -0700 DumpRenderTree TIC TCP Conn Start [23:0x7fbbca601770] default 09:43:54.381732 -0700 DumpRenderTree TIC TCP Conn Start [24:0x7fbbca1016d0] default 09:43:54.397447 -0700 DumpRenderTree TIC TCP Conn Event [22:0x7fbbca79ad40]: 1 Err(0) default 09:43:54.397536 -0700 DumpRenderTree TIC TCP Conn Connected [22:0x7fbbca79ad40]: Err(0) default 09:43:54.399839 -0700 DumpRenderTree TIC TCP Conn Event [24:0x7fbbca1016d0]: 1 Err(0) default 09:43:54.399951 -0700 DumpRenderTree TIC TCP Conn Connected [24:0x7fbbca1016d0]: Err(0) default 09:43:54.404178 -0700 DumpRenderTree TIC TCP Conn Event [24:0x7fbbca1016d0]: 2 Err(0) default 09:43:54.404477 -0700 DumpRenderTree TIC TCP Conn Event [22:0x7fbbca79ad40]: 2 Err(0) error 09:43:54.405014 -0700 DumpRenderTree TIC Read Error [22:0x7fbbca79ad40]: 1:57 error 09:43:54.405163 -0700 DumpRenderTree TIC Read Error [24:0x7fbbca1016d0]: 1:57 default 09:43:54.405692 -0700 DumpRenderTree TIC TCP Conn Cancel [22:0x7fbbca79ad40] default 09:43:54.406162 -0700 DumpRenderTree TIC TCP Conn Cancel [24:0x7fbbca1016d0] default 09:43:54.406647 -0700 DumpRenderTree TIC TCP Conn Destroyed [22:0x7fbbca79ad40] default 09:43:54.406970 -0700 DumpRenderTree TIC TCP Conn Destroyed [24:0x7fbbca1016d0] default 09:44:20.479517 -0700 DumpRenderTree TIC TCP Conn Event [23:0x7fbbca601770]: 3 Err(60) error 09:44:20.479702 -0700 DumpRenderTree TIC TCP Conn Failed [23:0x7fbbca601770]: 1:60 Err(60) default 09:44:20.479859 -0700 DumpRenderTree TIC TCP Conn Cancel [23:0x7fbbca601770] default 09:44:20.480167 -0700 DumpRenderTree TIC TCP Conn Destroyed [23:0x7fbbca601770] error 09:44:20.481225 -0700 DumpRenderTree HTTP load failed (error code: -1001 [1:60]) error 09:44:20.481645 -0700 DumpRenderTree NSURLConnection finished with error - code -1001 It seems that in some cases, we got some ENOTCON/ETIMEDOUT errors leading to NSURLConnection failing with timeout. Is there anything to undo here (remove logging, unskip tests)? We should unskip the event-data.html test. The logging will be removed next time we resync WPT tests Created attachment 320912 [details]
Patch
Comment on attachment 320912 [details] Patch Clearing flags on attachment: 320912 Committed r222094: <http://trac.webkit.org/changeset/222094> All reviewed patches have been landed. Closing bug. |