Bug 175269 - WPT harness errors on leaks and iOS-sim EWS bots
Summary: WPT harness errors on leaks and iOS-sim EWS bots
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: youenn fablet
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-08-07 10:22 PDT by Alexey Proskuryakov
Modified: 2017-09-15 10:03 PDT (History)
8 users (show)

See Also:


Attachments
add logging (4.21 KB, patch)
2017-08-14 22:17 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Results from ios-sim EWS (1.35 MB, application/zip)
2017-08-15 17:57 PDT, Ryan Haddad
no flags Details
Adding more logs (4.67 KB, patch)
2017-08-24 09:03 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Skipping imported/w3c/web-platform-tests/eventsource/event-data.html (1.42 KB, patch)
2017-08-24 16:20 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Patch (1.35 KB, patch)
2017-09-15 09:33 PDT, youenn fablet
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2017-08-07 10:22:11 PDT
A large number of tests are failing on leaks bot with something that looks like a harness error. See e.g. <https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&revision=220122&showExpectations=true&tests=imported%2Fw3c%2Fweb-platform-tests%2FWebCryptoAPI%2FgenerateKey%2Ffailures_RSASSA-PKCS1-v1_5.worker.html>.

Here are some of the failures it hits:

--------------------------------------
Harness Error (FAIL), message = Error in remote http://localhost:8800/WebCryptoAPI/generateKey/failures_RSASSA-PKCS1-v1_5.worker.js: NetworkError:  A network error occurred.
--------------------------------------
layer at (0,0) size 800x600
  RenderView at (0,0) size 800x600
layer at (0,0) size 800x8
  RenderBlock {HTML} at (0,0) size 800x8
    RenderBody {BODY} at (8,8) size 784x0
      RenderBlock {DIV} at (0,0) size 784x0
--------------------------------------
CONSOLE MESSAGE: line 1677: TypeError: null is not an object (evaluating 'this.message_target.removeEventListener')
CONSOLE MESSAGE: line 3: NetworkError:  A network error occurred.

FAIL Untitled NetworkError:  A network error occurred.

--------------------------------------
Harness Error (FAIL), message = Error in remote: [object Event]
--------------------------------------
Comment 1 youenn fablet 2017-08-07 14:27:29 PDT
Are these errors all related to testing workers in WPT?
Comment 3 youenn fablet 2017-08-07 22:40:20 PDT
Looking at some tests, it seems as if testharnessreport.js is not sent correctly to the browser (dump as render tree and not as text).
For some other tests, it is as if testharness.js  is not sent correctly (function test undefined).

Looking at https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r220343%20(2511)/wptwk_process_log.out.txt, there is no clear error from the WPT server.

Looking at one failing test (/XMLHttpRequest/abort-after-receive.htm) vs a successful test (/XMLHttpRequest/abort-after-send.htm), it seems as if the unsuccessful test is not making any HTTP request to get testharnessreport.js.

Any idea why retrieval of some HTTP resources would cause issues in that particular configuration?
How slow are these bots?

We could try to add more wpt logging.
Comment 4 Alexey Proskuryakov 2017-08-08 10:27:32 PDT
> Any idea why retrieval of some HTTP resources would cause issues in that particular configuration?

No. If there no request in the log, it could be a WebKit cache bug of some sort, but I don't know how much we can trust WPT logging.

> How slow are these bots?

Not too slow, the tests run in just 1 hour.
Comment 5 Alexey Proskuryakov 2017-08-12 18:09:55 PDT
This appears to be quite reproducible locally with a release build.

run-webkit-tests --no-build --exit-after-n-crashes-or-timeouts 50 --exit-after-n-failures 500 --release --leaks --dump-render-tree --no-retry-failures --no-sample-on-timeout imported/w3c/web-platform-tests
Comment 6 Alexey Proskuryakov 2017-08-12 18:37:15 PDT
...and one needs to delete test results directory before running the tests for this to reproduce!

rm -rf WebKitBuild/Release/layout-test-results
Comment 7 youenn fablet 2017-08-14 12:27:30 PDT
I am also able to reproduce the issue fairly consistently when delete the output folder.

I filed https://bugs.webkit.org/show_bug.cgi?id=175543 for that particular issue.
Comment 8 Alexey Proskuryakov 2017-08-14 20:36:29 PDT
Hrmpf, this is still happening after the fix for bug 175543.
Comment 9 youenn fablet 2017-08-14 22:17:47 PDT
Created attachment 318112 [details]
add logging
Comment 10 youenn fablet 2017-08-14 22:19:45 PDT
(In reply to youenn fablet from comment #9)
> Created attachment 318112 [details]
> Patch

wpt log does not show anything abnormal.
It is as if testharnessreport.js is not executed correctly on DRT.
Adding some WPT logging to check whether responses have the expected size.

Do we have other bots than WK1 having this issue?
Comment 11 Ryan Haddad 2017-08-15 10:46:02 PDT
(In reply to youenn fablet from comment #10)
> (In reply to youenn fablet from comment #9)
> > Created attachment 318112 [details]
> > Patch
> 
> wpt log does not show anything abnormal.
> It is as if testharnessreport.js is not executed correctly on DRT.
> Adding some WPT logging to check whether responses have the expected size.
> 
> Do we have other bots than WK1 having this issue?
I think I was seeing a similar issue with the ios-sim EWS bots last week. They seem to be working right now, but I will monitor them and update this bug if I find any examples.
Comment 12 WebKit Commit Bot 2017-08-15 13:21:29 PDT
Comment on attachment 318112 [details]
add logging

Clearing flags on attachment: 318112

Committed r220754: <http://trac.webkit.org/changeset/220754>
Comment 13 WebKit Commit Bot 2017-08-15 13:21:31 PDT
All reviewed patches have been landed.  Closing bug.
Comment 14 Radar WebKit Bug Importer 2017-08-15 13:22:21 PDT
<rdar://problem/33902404>
Comment 15 Alexey Proskuryakov 2017-08-15 13:24:07 PDT
Re-opening, as this was only logging.
Comment 16 youenn fablet 2017-08-15 17:24:40 PDT
Looking at https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r220760%20(2656)/wptwk_process_log.out.txt and imported/w3c/web-platform-tests/user-timing/test_user_timing_measure_navigation_timing.html	test that fails for instance.

We got:

DEBUG:web-platform-tests:200 GET /resources/testharness.js (http://localhost:8800/user-timing/test_user_timing_measure_navigation_timing.html) 0
DEBUG:web-platform-tests:200 GET /user-timing/resources/webperftestharness.js (http://localhost:8800/user-timing/test_user_timing_measure_navigation_timing.html) 0
DEBUG:web-platform-tests:Wrote content of size 4880 for /user-timing/resources/webperftestharness.js
DEBUG:web-platform-tests:Wrote content of size 100163 for /resources/testharness.js
DEBUG:web-platform-tests:Got exception when writing response Traceback (most recent call last):
  File "/Volumes/Data/slave/sierra-leaks/build/LayoutTests/imported/w3c/web-platform-tests/tools/wptserve/wptserve/response.py", line 446, in write
    self._wfile.write(buffer)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 328, in write
    self.flush()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 307, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
Comment 17 Ryan Haddad 2017-08-15 17:57:05 PDT
Created attachment 318207 [details]
Results from ios-sim EWS

Attached an example that I think exhibits this issue on ios-sim EWS from bug https://bugs.webkit.org/show_bug.cgi?id=173457
Comment 18 Alexey Proskuryakov 2017-08-16 11:08:17 PDT
> error: [Errno 32] Broken pipe

What process is this pipe connected to?
Comment 19 youenn fablet 2017-08-16 11:12:30 PDT
(In reply to Alexey Proskuryakov from comment #18)
> > error: [Errno 32] Broken pipe
> 
> What process is this pipe connected to?

DumpRenderTree I would guess.
Comment 20 Alexey Proskuryakov 2017-08-16 12:32:04 PDT
DumpRenderTree uses TCP to talk to the server, not a pipe. So the pipe has to be between processes in the WPT harness. A SIGPIPE means that the other process closes the pipe too early.
Comment 21 youenn fablet 2017-08-16 12:44:08 PDT
Cannot SIGPIPE be sent to WPT if the socket WPT uses to write is not connected anymore, for instance if DRT decided to close the TCP connection?
Comment 22 Alexey Proskuryakov 2017-08-16 12:59:43 PDT
It could technically, but I'm not sure if that's likely. WebKit won't close the connection until it has the response, so the scenario is not very plausible.
Comment 23 youenn fablet 2017-08-16 13:33:49 PDT
(In reply to Alexey Proskuryakov from comment #22)
> It could technically, but I'm not sure if that's likely. WebKit won't close
> the connection until it has the response, so the scenario is not very
> plausible.

Do you have any other idea?
I was planning to add some more logging to check precisely which resources were failing to be sent and add timestamps so that we can check if somehow the WPT server is hanging.
Comment 24 Robin Morisset 2017-08-17 16:51:26 PDT
Could this bug also explain the failing test result for iOS-sim on https://bugs.webkit.org/show_bug.cgi?id=175610 ?
Comment 25 youenn fablet 2017-08-17 16:55:36 PDT
(In reply to Robin Morisset from comment #24)
> Could this bug also explain the failing test result for iOS-sim on
> https://bugs.webkit.org/show_bug.cgi?id=175610 ?

would need to look at the wpt log but it seems so.
Comment 26 youenn fablet 2017-08-21 08:32:32 PDT
Since it only happens on some bots, could it be something similar to bug 164651?
Comment 27 Alexey Proskuryakov 2017-08-21 09:03:03 PDT
It's similar in that the WPT server isn't working well in both cases, but I don't see a reason to believe that the root causes are the same.
Comment 28 Ryan Haddad 2017-08-22 08:40:18 PDT
This continues to affect ios-sim EWS bots. They often don't finish tests or produce false positives on patches. We need to fix this ASAP.
Comment 29 Ryan Haddad 2017-08-23 17:32:16 PDT
Youenn, have you had a chance to look in to adding additional logging?
Comment 30 youenn fablet 2017-08-24 09:03:36 PDT
Created attachment 318992 [details]
Adding more logs
Comment 31 Alexey Proskuryakov 2017-08-24 09:05:49 PDT
Comment on attachment 318992 [details]
Adding more logs

rs=me
Comment 32 WebKit Commit Bot 2017-08-24 09:49:39 PDT
Comment on attachment 318992 [details]
Adding more logs

Clearing flags on attachment: 318992

Committed r221141: <http://trac.webkit.org/changeset/221141>
Comment 33 WebKit Commit Bot 2017-08-24 09:49:41 PDT
All reviewed patches have been landed.  Closing bug.
Comment 34 Ryan Haddad 2017-08-24 09:52:48 PDT
Reopening since the patch only added logging.
Comment 35 youenn fablet 2017-08-24 16:17:37 PDT
I looked at some logs like https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r221152%20(2804)/wptwk_process_log.out.txt where tests do not run well and https://build.webkit.org/results/Apple%20Sierra%20(Leaks)/r221155%20(2805)/wptwk_process_log.out.txt which runs well.

When everything is fine, the server got requests and returns responses steadily.
When the run is wrong, there are periods like 10 seconds where no requests or responses are logged. 
The server continues to log "Broken pipe" errors but that is all.
I will skip a test that uses message2.py to reduce the number of these "Broken pipe" errors.
Comment 36 youenn fablet 2017-08-24 16:20:49 PDT
Created attachment 319033 [details]
Skipping imported/w3c/web-platform-tests/eventsource/event-data.html
Comment 37 WebKit Commit Bot 2017-08-24 16:52:17 PDT
Comment on attachment 319033 [details]
Skipping imported/w3c/web-platform-tests/eventsource/event-data.html

Clearing flags on attachment: 319033

Committed r221170: <http://trac.webkit.org/changeset/221170>
Comment 38 WebKit Commit Bot 2017-08-24 16:52:19 PDT
All reviewed patches have been landed.  Closing bug.
Comment 39 Ryan Haddad 2017-08-25 09:20:24 PDT
Reopening because the issue is not resolved.
Comment 40 youenn fablet 2017-08-25 10:03:36 PDT
On the EWS leak bot, I can see some TCP level errors like:


default	09:43:54.343176 -0700	DumpRenderTree	TIC TCP Conn Start [21:0x7fbbca7948d0]
default	09:43:54.351226 -0700	DumpRenderTree	TIC TCP Conn Event [21:0x7fbbca7948d0]: 1 Err(0)
default	09:43:54.351343 -0700	DumpRenderTree	TIC TCP Conn Connected [21:0x7fbbca7948d0]: Err(0)
default	09:43:54.354371 -0700	DumpRenderTree	TIC TCP Conn Event [21:0x7fbbca7948d0]: 2 Err(0)
error	09:43:54.354614 -0700	DumpRenderTree	TIC Read Error [21:0x7fbbca7948d0]: 1:57
default	09:43:54.355162 -0700	DumpRenderTree	TIC TCP Conn Cancel [21:0x7fbbca7948d0]
default	09:43:54.356212 -0700	DumpRenderTree	TIC TCP Conn Destroyed [21:0x7fbbca7948d0]
default	09:43:54.378052 -0700	DumpRenderTree	TIC TCP Conn Start [22:0x7fbbca79ad40]
default	09:43:54.381394 -0700	DumpRenderTree	TIC TCP Conn Start [23:0x7fbbca601770]
default	09:43:54.381732 -0700	DumpRenderTree	TIC TCP Conn Start [24:0x7fbbca1016d0]
default	09:43:54.397447 -0700	DumpRenderTree	TIC TCP Conn Event [22:0x7fbbca79ad40]: 1 Err(0)
default	09:43:54.397536 -0700	DumpRenderTree	TIC TCP Conn Connected [22:0x7fbbca79ad40]: Err(0)
default	09:43:54.399839 -0700	DumpRenderTree	TIC TCP Conn Event [24:0x7fbbca1016d0]: 1 Err(0)
default	09:43:54.399951 -0700	DumpRenderTree	TIC TCP Conn Connected [24:0x7fbbca1016d0]: Err(0)
default	09:43:54.404178 -0700	DumpRenderTree	TIC TCP Conn Event [24:0x7fbbca1016d0]: 2 Err(0)
default	09:43:54.404477 -0700	DumpRenderTree	TIC TCP Conn Event [22:0x7fbbca79ad40]: 2 Err(0)
error	09:43:54.405014 -0700	DumpRenderTree	TIC Read Error [22:0x7fbbca79ad40]: 1:57
error	09:43:54.405163 -0700	DumpRenderTree	TIC Read Error [24:0x7fbbca1016d0]: 1:57
default	09:43:54.405692 -0700	DumpRenderTree	TIC TCP Conn Cancel [22:0x7fbbca79ad40]
default	09:43:54.406162 -0700	DumpRenderTree	TIC TCP Conn Cancel [24:0x7fbbca1016d0]
default	09:43:54.406647 -0700	DumpRenderTree	TIC TCP Conn Destroyed [22:0x7fbbca79ad40]
default	09:43:54.406970 -0700	DumpRenderTree	TIC TCP Conn Destroyed [24:0x7fbbca1016d0]
default	09:44:20.479517 -0700	DumpRenderTree	TIC TCP Conn Event [23:0x7fbbca601770]: 3 Err(60)
error	09:44:20.479702 -0700	DumpRenderTree	TIC TCP Conn Failed [23:0x7fbbca601770]: 1:60 Err(60)
default	09:44:20.479859 -0700	DumpRenderTree	TIC TCP Conn Cancel [23:0x7fbbca601770]
default	09:44:20.480167 -0700	DumpRenderTree	TIC TCP Conn Destroyed [23:0x7fbbca601770]
error	09:44:20.481225 -0700	DumpRenderTree	HTTP load failed (error code: -1001 [1:60])
error	09:44:20.481645 -0700	DumpRenderTree	NSURLConnection finished with error - code -1001

It seems that in some cases, we got some ENOTCON/ETIMEDOUT errors leading to NSURLConnection failing with timeout.
Comment 41 Alexey Proskuryakov 2017-09-07 15:25:56 PDT
Is there anything to undo here (remove logging, unskip tests)?
Comment 42 youenn fablet 2017-09-07 15:47:13 PDT
We should unskip the event-data.html test.
The logging will be removed next time we resync WPT tests
Comment 43 youenn fablet 2017-09-15 09:33:34 PDT
Created attachment 320912 [details]
Patch
Comment 44 WebKit Commit Bot 2017-09-15 10:03:03 PDT
Comment on attachment 320912 [details]
Patch

Clearing flags on attachment: 320912

Committed r222094: <http://trac.webkit.org/changeset/222094>
Comment 45 WebKit Commit Bot 2017-09-15 10:03:05 PDT
All reviewed patches have been landed.  Closing bug.