Bug 159166 - LayoutTests intermittently failing to run due to issues starting Web Platform Test server
Summary: LayoutTests intermittently failing to run due to issues starting Web Platform...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: youenn fablet
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2016-06-27 13:52 PDT by Ryan Haddad
Modified: 2016-11-16 09:31 PST (History)
8 users (show)

See Also:


Attachments
Removing ws port from config (1.24 KB, patch)
2016-06-28 01:36 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Patch (11.08 KB, patch)
2016-06-30 00:19 PDT, youenn fablet
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Haddad 2016-06-27 13:52:43 PDT
This has been happening intermittently across Mac testers for the past 2 days or so.

https://build.webkit.org/builders/Apple%20El%20Capitan%20Release%20WK2%20(Tests)/builds/7160
https://build.webkit.org/builders/Apple%20Yosemite%20Release%20WK1%20(Tests)/builds/15787/

12:52:04.076 91713 Starting Web Platform Test server ...
12:52:04.083 91713 Copying WebKit resources files
12:52:04.084 91713 Copying WebKit web platform server config.json
12:52:05.096 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:05.096 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:06.104 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:06.104 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:07.112 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:07.112 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:08.114 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:08.114 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:09.123 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:09.123 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:10.129 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:10.129 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:11.141 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:11.141 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:12.146 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:12.146 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:13.158 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:13.158 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:14.162 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:14.162 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:15.165 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:15.165 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:16.177 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:16.177 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:17.179 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:17.179 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:18.181 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:18.181 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:19.193 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:19.194 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:20.196 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:20.197 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:21.208 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:21.208 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:22.212 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:22.212 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:23.215 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:23.216 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:24.226 91713 Server NOT running on 49001: [Errno 61] Connection refused
12:52:24.226 91713 Waiting for action: <bound method WebPlatformTestServer._is_server_running_on_all_ports of <webkitpy.layout_tests.servers.web_platform_test_server.WebPlatformTestServer object at 0x111e34890>>
12:52:25.231 91713 Stopping wptwk server
12:52:25.231 91713 Cleaning WPT resources files
12:52:25.233 91713 Cleaning WPT web platform server config.json
12:52:25.234 91713 Flushing stdout
12:52:25.234 91713 Flushing stderr
12:52:25.234 91713 Stopping helper
12:52:25.234 91713 Stopping LayoutTestHelper
12:52:25.240 91713 Cleaning up port

ServerError raised: Failed to start wptwk server
Traceback (most recent call last):
  File "/Volumes/Data/slave/elcapitan-release-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 77, in main
    run_details = run(port, options, args, stderr)
  File "/Volumes/Data/slave/elcapitan-release-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py", line 409, in run
    run_details = manager.run(args)
  File "/Volumes/Data/slave/elcapitan-release-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py", line 203, in run
    int(self._options.child_processes), retrying=False)
  File "/Volumes/Data/slave/elcapitan-release-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py", line 262, in _run_tests
    return self._runner.run_tests(self._expectations, test_inputs, tests_to_skip, num_workers, needs_http, needs_websockets, needs_web_platform_test_server, retrying)
  File "/Volumes/Data/slave/elcapitan-release-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 111, in run_tests
    self.start_servers()
  File "/Volumes/Data/slave/elcapitan-release-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py", line 199, in start_servers
    self._port.start_web_platform_test_server()
  File "/Volumes/Data/slave/elcapitan-release-tests-wk2/build/Tools/Scripts/webkitpy/port/base.py", line 952, in start_web_platform_test_server
    self._web_platform_test_server.start()
  File "/Volumes/Data/slave/elcapitan-release-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/servers/http_server_base.py", line 95, in start
    raise ServerError('Failed to start %s server' % self._name)
ServerError: Failed to start wptwk server
Comment 1 Ryan Haddad 2016-06-27 13:53:54 PDT
Not sure if this could be related to <https://trac.webkit.org/changeset/202471>
Comment 2 youenn fablet 2016-06-28 01:35:38 PDT
I am not sure either but it also happens on iOS sim EWS bots, this is rather annoying :(

It seems the web socket server is not able to start properly.
https://trac.webkit.org/changeset/202471 is not supposed to change the web socket server implementation so I am not sure where that comes from.

Do you know where I can find the full wpt_wk log?
Comment 3 youenn fablet 2016-06-28 01:36:40 PDT
Created attachment 282223 [details]
Removing ws port from config
Comment 4 Radar WebKit Bug Importer 2016-06-28 02:50:38 PDT
<rdar://problem/27051759>
Comment 5 Alexey Proskuryakov 2016-06-28 10:02:03 PDT
You can get to the logs by going to test results, and removing "results.html" from the URL, e.g. <https://build.webkit.org/results/Apple%20Yosemite%20Release%20WK1%20(Tests)/r202508%20(15787)/>.

The proposed patch doesn't seem related to this error.
Comment 6 Alexey Proskuryakov 2016-06-29 20:57:01 PDT
This is getting very frequent, and a real problem. Youenn, could you take another look?
Comment 7 Alexey Proskuryakov 2016-06-29 20:58:43 PDT
E.g. in this patch, two EWS bubbles are orange at once because the bots can't start wtpserve: <https://bugs.webkit.org/show_bug.cgi?id=159287>.
Comment 8 youenn fablet 2016-06-29 23:53:36 PDT
(In reply to comment #7)
> E.g. in this patch, two EWS bubbles are orange at once because the bots
> can't start wtpserve: <https://bugs.webkit.org/show_bug.cgi?id=159287>.

From the error log, it seems bots have difficulties generating certificates from openssl. We can try regenerating the certificates and I will prepare a patch accordingly. But there may be an underlying issue...
Comment 9 youenn fablet 2016-06-30 00:19:30 PDT
Created attachment 282416 [details]
Patch
Comment 10 youenn fablet 2016-06-30 00:21:05 PDT
(In reply to comment #9)
> Created attachment 282416 [details]
> Patch

Hopefully this should work.
I think there may be a bug in some of openssl wpt scripts.
Comment 11 youenn fablet 2016-06-30 00:22:00 PDT
The culprit might be https://github.com/w3c/wpt-tools/commit/b388d6b14df8acc3fad070c74b435cd7aff056d1
Comment 12 youenn fablet 2016-06-30 00:26:31 PDT
Raised issue here: https://github.com/w3c/wpt-tools/issues/86
Comment 13 youenn fablet 2016-06-30 07:36:58 PDT
(In reply to comment #12)
> Raised issue here: https://github.com/w3c/wpt-tools/issues/86

OK, the issue is already fixed in w3c repo.
Instead of this patch, we could reimport the latest wpt repo.
Comment 14 Alexey Proskuryakov 2016-06-30 08:51:54 PDT
Comment on attachment 282416 [details]
Patch

I don't quite understand this change, nor whether it's practical to reimport quickly.

Does this prevent re-generating the certificates on every run, and thus sidestep the problem in automatic generation?

Either way, r=me, as we need to resolve this ASAP.
Comment 15 youenn fablet 2016-06-30 08:56:13 PDT
> I don't quite understand this change, nor whether it's practical to reimport
> quickly.

Reimporting would fix the bug without us having to change our current setup.

> Does this prevent re-generating the certificates on every run, and thus
> sidestep the problem in automatic generation?

You are correct.
I don't think serial is used in wpt except for generating certificates when launching servers, hence why the problem should not appear anymore.
Comment 16 WebKit Commit Bot 2016-06-30 09:15:08 PDT
Comment on attachment 282416 [details]
Patch

Clearing flags on attachment: 282416

Committed r202681: <http://trac.webkit.org/changeset/202681>
Comment 17 WebKit Commit Bot 2016-06-30 09:15:13 PDT
All reviewed patches have been landed.  Closing bug.
Comment 18 Chris Dumez 2016-08-01 13:17:27 PDT
Comment on attachment 282416 [details]
Patch

After this change, LayoutTests/imported/w3c/resources/_wpt_certs/* keep getting overridden locally and showing up in my local diff / patches. This is very annoying.
Comment 19 youenn fablet 2016-08-01 13:31:09 PDT
Ah, it might be due to recent changes to wpt python scripts.

We can go back to the previous strategy since not-closed-handle bug is now fixed.
Comment 21 Alexey Proskuryakov 2016-11-15 16:45:21 PST
I looked at one of the bots hitting this, and it has lots of connections that are not properly closed ("sunwebadmin" is port 8800):

tcp4       0      0  localhost.sunwebadmin  localhost.62045        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62049        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62048        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.61980        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.61984        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.61982        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.61986        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.61988        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.61990        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62051        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62055        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62054        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62057        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62059        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62061        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62063        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62067        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62066        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62069        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62073        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62072        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62075        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62079        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62078        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62081        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62086        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62087        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62085        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62093        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62097        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62096        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62099        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62102        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62103        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62105        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62108        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62109        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62111        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62114        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62115        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62117        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62121        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62120        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62123        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62127        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62126        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62129        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62133        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62132        TIME_WAIT  
tcp4       0      0  localhost.sunwebadmin  localhost.62135        TIME_WAIT
Comment 22 Alexey Proskuryakov 2016-11-15 16:48:08 PST
(over 4000 connections in TIMED_WAIT on that bot)
Comment 23 youenn fablet 2016-11-16 08:37:28 PST
(In reply to comment #22)
> (over 4000 connections in TIMED_WAIT on that bot)

I run 50 times LayoutTests/imported/w3c/web-platform-tests/fetch/api/request/request-structure.html which has two sub-resources.
We end up with 151 TIME_WAIT sockets.

Looking at the server, it seems to support HTTP keep-alive and socket reuse
Comment 24 youenn fablet 2016-11-16 08:43:36 PST
(In reply to comment #23)
> (In reply to comment #22)
> > (over 4000 connections in TIMED_WAIT on that bot)
> 
> I run 50 times
> LayoutTests/imported/w3c/web-platform-tests/fetch/api/request/request-
> structure.html which has two sub-resources.
> We end up with 151 TIME_WAIT sockets.
> 
> Looking at the server, it seems to support HTTP keep-alive and socket reuse

But most responses do not have content-length headers