Bug 158137 - W3C Web Platform Tests occasionally flake on the bots (Web Platform Test server fails?)
Summary: W3C Web Platform Tests occasionally flake on the bots (Web Platform Test serv...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 158253
Blocks:
  Show dependency treegraph
 
Reported: 2016-05-26 16:57 PDT by Brady Eidson
Modified: 2020-07-24 10:15 PDT (History)
6 users (show)

See Also:


Attachments
wptwk process log from ews103 (1.14 MB, text/plain)
2016-05-27 14:35 PDT, Ryan Haddad
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brady Eidson 2016-05-26 16:57:15 PDT
W3C Web Platform Test server fails to work correctly sometimes.

We've noticed this a handful of times on the bots, in at least a few different bugs.

It happened in both of these bugs today:
https://bugs.webkit.org/show_bug.cgi?id=158093
https://bugs.webkit.org/show_bug.cgi?id=158111

Sometimes it dumps an empty render tree, other times it clearly dumps the 404 page (presumably for the main resource of the test)
Comment 1 Chris Dumez 2016-05-26 19:21:49 PDT
Are you sure this is what's happening? If you look at the attachment at:
https://bugs.webkit.org/attachment.cgi?id=279916

There is a wptwk_process_log.out.txt file which seems to be the log file for the W3C Web Platform test server. It is pretty big and mostly 200 Found lines.
Comment 2 Brady Eidson 2016-05-27 14:12:11 PDT
(In reply to comment #1)
> Are you sure this is what's happening? 

No.

But it's the only theory we have so far.

I updated the bug summary.
Comment 3 Ryan Haddad 2016-05-27 14:35:17 PDT
Created attachment 279997 [details]
wptwk process log from ews103

Associated EWS log:
https://webkit-queues.webkit.org/results/1393177
Comment 4 youenn fablet 2016-05-28 09:03:49 PDT
The error seems indeed located in wpt server.

Looking at the log, the server starts at some point to return 404 to URLs pointing to files that should exist (resources/testharnessreport.js, existing IDB test files).

The request handler is FileHandler (https://github.com/w3c/wptserve/blob/22c2f947fafa07ce26d381bf5215714067c41ae6/wptserve/handlers.py)
Whenever this part of code is raising OSError or IOError, it is converted to a 404 HTTP response.

Logging these errors may help getting additional information.
I can prepare a patch to add that logging, something like "self.logger.error(traceback.format_exc())" after line 148 in "handlers.py"
Comment 5 youenn fablet 2016-05-28 11:34:05 PDT
(In reply to comment #4)
> The error seems indeed located in wpt server.
> 
> Looking at the log, the server starts at some point to return 404 to URLs
> pointing to files that should exist (resources/testharnessreport.js,
> existing IDB test files).
> 
> The request handler is FileHandler
> (https://github.com/w3c/wptserve/blob/
> 22c2f947fafa07ce26d381bf5215714067c41ae6/wptserve/handlers.py)
> Whenever this part of code is raising OSError or IOError, it is converted to
> a 404 HTTP response.
> 
> Logging these errors may help getting additional information.
> I can prepare a patch to add that logging, something like
> "self.logger.error(traceback.format_exc())" after line 148 in "handlers.py"

Filed bug 158183 for that purpose.

I am also wondering whether this issue happens for all bots.
Comment 6 youenn fablet 2016-05-31 06:30:12 PDT
It seems that wptserver cannot serve some files due to too many opened files at the same time
Here is a log excerpt found as part of a mac bot processing a bug 158222 patch.

DEBUG:web-platform-tests:Traceback (most recent call last):
  File "/Volumes/Data/EWS/WebKit/LayoutTests/imported/w3c/web-platform-tests/tools/wptserve/wptserve/handlers.py", line 133, in __call__
    data = self.get_data(response, path, byte_ranges)
  File "/Volumes/Data/EWS/WebKit/LayoutTests/imported/w3c/web-platform-tests/tools/wptserve/wptserve/handlers.py", line 183, in get_data
    return open(path, 'rb')
IOError: [Errno 24] Too many open files: '/Volumes/Data/EWS/WebKit/LayoutTests/imported/w3c/web-platform-tests/resources/testharness.js'
Comment 7 Alexey Proskuryakov 2016-05-31 09:25:51 PDT
We used to sometimes have runaway DumpRenderTree processes, which would cause running out of system resources, but this doesn't seem to be the case on any of the bots hitting this issue that I checked. Perhaps wtpserve itself leaks file handles?
Comment 8 youenn fablet 2016-05-31 13:32:13 PDT
(In reply to comment #7)
> We used to sometimes have runaway DumpRenderTree processes, which would
> cause running out of system resources, but this doesn't seem to be the case
> on any of the bots hitting this issue that I checked. Perhaps wtpserve
> itself leaks file handles?

It might be the case that wptserve is relying on garbage collector to close the file handles.

AFAIS, only mac ews yosemite bots are hitting this error.
What is the max number of file handles for these bots?
Comment 9 Alexey Proskuryakov 2016-05-31 14:29:47 PDT
It's the default (and since there are multiple limits at play, there is no simple answer).

We can't change it anyway, as tests need to work on engineers' machines too.
Comment 10 youenn fablet 2016-06-01 00:23:01 PDT
I filed https://bugs.webkit.org/show_bug.cgi?id=158253 to close file handles explicitly as a temporary patch.

I also filed https://github.com/w3c/wptserve/pull/83 so that it gets handled directly in w3c wptserve.
Comment 11 youenn fablet 2016-06-07 08:23:16 PDT
Is the problem still happening?
Comment 12 Alexey Proskuryakov 2016-06-07 09:16:01 PDT
I haven't seen it after the patch landed.