WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 215829
[ iOS Release ] imported/w3c/web-platform-tests/* is flaky failing 200+ tests
https://bugs.webkit.org/show_bug.cgi?id=215829
Summary
[ iOS Release ] imported/w3c/web-platform-tests/* is flaky failing 200+ tests
Hector Lopez
Reported
2020-08-25 16:45:05 PDT
imported/w3c/web-platform-tests/* A large group of these tests are flaky failing as a group on Apple iOS 13 Simulator Release WK2(Tests) and Apple iPadOS 13 Simulator Release WK2(Tests) bots. The group number is different every time but is consistently around 200 tests. No failures to APIs accompany these failures. Occurrence is happening on average 1 in 20 builds. Revisions may build successfully on one set of testers but fail on the other group. Almost in every instance the group failures will not happen again on immediate build with no indication that revision was used to clear failures. One commonality that all builds on each tester is this warning: (view as text) warning: (x86_64) /Volumes/Data/slave/ios-simulator-13-release/build/WebKitBuild/ImageDiff.build/Release/ImageDiff.build/Objects-normal/x86_64/ImageDiff.o unable to open object file: No such file or directory warning: (x86_64) /Volumes/Data/slave/ios-simulator-13-release/build/WebKitBuild/ImageDiff.build/Release/ImageDiff.build/Objects-normal/x86_64/PlatformImage.o unable to open object file: No such file or directory warning: (x86_64) /Volumes/Data/slave/ios-simulator-13-release/build/WebKitBuild/ImageDiff.build/Release/ImageDiff.build/Objects-normal/x86_64/PlatformImageCG.o unable to open object file: No such file or directory warning: no debug symbols in executable (-arch x86_64) Access to Apple iOS 13 Simulator Release WK2(Tests):
https://build.webkit.org/builders/Apple%20iOS%2013%20Simulator%20Release%20WK2%20(Tests)?numbuilds=50
Access to Apple iPadOS 13 Simulator Release WK2(Tests)
https://build.webkit.org/builders/Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20(Tests)?numbuilds=50
Log for latest occurrence on iPadOS 13 Simulator Release WK2(Tests) please see attached. For build 4417 at
r266107
with failures gone on next build at
r266112
but no revision seems to be cause of fix and happens again on different revision.
Attachments
Stdio for iPadOS 13 Simulator Release WK2(Tests) build 4427 at r266107
(7.00 MB, text/html)
2020-08-25 16:57 PDT
,
Hector Lopez
no flags
Details
Warnings for iPadOS 13 Simulator Release WK2(Tests) build 4427 at r266107
(1.07 KB, text/html)
2020-08-25 16:59 PDT
,
Hector Lopez
no flags
Details
Patch
(3.23 KB, patch)
2020-09-09 15:05 PDT
,
Sam Sneddon [:gsnedders]
no flags
Details
Formatted Diff
Diff
Patch for landing
(5.41 KB, patch)
2020-09-10 00:39 PDT
,
youenn fablet
no flags
Details
Formatted Diff
Diff
Patch
(1.73 KB, patch)
2020-09-16 11:00 PDT
,
Sam Sneddon [:gsnedders]
no flags
Details
Formatted Diff
Diff
Patch
(2.08 KB, patch)
2020-09-18 07:35 PDT
,
Sam Sneddon [:gsnedders]
youennf
: review+
youennf
: commit-queue-
Details
Formatted Diff
Diff
Patch
(2.60 KB, patch)
2020-09-21 10:56 PDT
,
Sam Sneddon [:gsnedders]
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2020-08-25 16:45:43 PDT
<
rdar://problem/67765880
>
Hector Lopez
Comment 2
2020-08-25 16:57:04 PDT
Created
attachment 407250
[details]
Stdio for iPadOS 13 Simulator Release WK2(Tests) build 4427 at
r266107
Hector Lopez
Comment 3
2020-08-25 16:59:46 PDT
Created
attachment 407251
[details]
Warnings for iPadOS 13 Simulator Release WK2(Tests) build 4427 at
r266107
Ryan Haddad
Comment 4
2020-08-26 17:20:23 PDT
Here is a link to the wptwk process log from the most recent failure:
https://build.webkit.org/results/Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20(Tests)/r266190%20(4445)/wptwk_process_log.out.txt
https://build.webkit.org/builders/Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20%28Tests%29/builds/4445
Ryan Haddad
Comment 5
2020-08-26 17:25:15 PDT
Here is one that for a test run that had no failures for comparison:
https://build.webkit.org/results/Apple%20iOS%2013%20Simulator%20Release%20WK2%20(Tests)/r266190%20(6397)/wptwk_process_log.out.txt
Ryan Haddad
Comment 6
2020-09-09 13:09:15 PDT
Looks like this has popped up on EWS, too:
https://ews-build.s3-us-west-2.amazonaws.com/iOS-13-Simulator-WK2-Tests-EWS/r408353-25299/results.html
Sam Sneddon [:gsnedders]
Comment 7
2020-09-09 14:34:00 PDT
(In reply to Ryan Haddad from
comment #4
)
> Here is a link to the wptwk process log from the most recent failure: >
https://build.webkit.org/results/
> Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20(Tests)/r266190%20(4445)/ > wptwk_process_log.out.txt > >
https://build.webkit.org/builders/
> Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20%28Tests%29/builds/4445
This doesn't seem to give any obvious clue, it just starts returning 404 for files that should definitely exist, and then relatively quickly goes back to operating normally. Let's try adding some extra debugging info there. :shrug:
Sam Sneddon [:gsnedders]
Comment 8
2020-09-09 15:05:26 PDT
Created
attachment 408370
[details]
Patch
EWS Watchlist
Comment 9
2020-09-09 15:06:36 PDT
This patch modifies the imported WPT tests. Please ensure that any changes on the tests (not coming from a WPT import) are exported to WPT. Please see
https://trac.webkit.org/wiki/WPTExportProcess
Sam Sneddon [:gsnedders]
Comment 10
2020-09-09 16:00:02 PDT
Comment on
attachment 408370
[details]
Patch oh no this broke things
Jonathan Bedard
Comment 11
2020-09-09 16:15:02 PDT
(In reply to Sam Sneddon [:gsnedders] from
comment #10
)
> Comment on
attachment 408370
[details]
> Patch > > oh no this broke things
That's pretty surprising, but the test it apparently broke does seem pretty green on trunk infrastructure.
youenn fablet
Comment 12
2020-09-10 00:32:14 PDT
Breakage seems fine, you might just have to rebase the corresponding expected.txt file for now.
youenn fablet
Comment 13
2020-09-10 00:39:08 PDT
Created
attachment 408418
[details]
Patch for landing
EWS
Comment 14
2020-09-10 01:09:23 PDT
Committed
r266816
: <
https://trac.webkit.org/changeset/266816
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 408418
[details]
.
Sam Sneddon [:gsnedders]
Comment 15
2020-09-10 06:42:48 PDT
This hopefully gives us enough to debug this, but doesn't fix it.
youenn fablet
Comment 16
2020-09-11 13:19:54 PDT
Error message is: {\"error\": {\"message\": \"Too many open files\", \"code\": 404}}"
Jonathan Bedard
Comment 17
2020-09-11 13:29:19 PDT
(In reply to youenn fablet from
comment #16
)
> Error message is: > {\"error\": {\"message\": \"Too many open files\", \"code\": 404}}"
Maybe we haven't bumped thee file limit, or maybe we need to run less simulators.
Alexey Proskuryakov
Comment 18
2020-09-11 13:49:27 PDT
Ideally, we'd collect ps and lsof output in state.
Aakash Jain
Comment 19
2020-09-15 11:09:20 PDT
(In reply to youenn fablet from
comment #16
)
> Error message is: > {\"error\": {\"message\": \"Too many open files\", \"code\": 404}}"
Where exactly does this error message appear? Any link?
Sam Sneddon [:gsnedders]
Comment 20
2020-09-15 11:24:28 PDT
(In reply to Aakash Jain from
comment #19
)
> (In reply to youenn fablet from
comment #16
) > > Error message is: > > {\"error\": {\"message\": \"Too many open files\", \"code\": 404}}" > Where exactly does this error message appear? Any link?
Look at many of the actual results in
https://build.webkit.org/results/Apple%20iOS%2013%20Simulator%20Release%20WK2%20(Tests)/r267004%20(6734)/results.html
for example
Aakash Jain
Comment 21
2020-09-15 13:49:51 PDT
(In reply to Jonathan Bedard from
comment #17
)
> Maybe we haven't bumped thee file limit, or maybe we need to run less simulators.
Yeah, we haven't bumped the limits on these bots. Process limit seems to be good enough by default (11136 limit on ews122 - ews126). However, file limit is default: ews122:LaunchAgents buildbot$ launchctl limit maxfiles maxfiles 256 unlimited Maybe we should simply increase the file limit to 100,000 by using the method in
https://trac.webkit.org/wiki/IncreasingKernelLimits#no1
Sam Sneddon [:gsnedders]
Comment 22
2020-09-16 11:00:16 PDT
Created
attachment 408936
[details]
Patch
youenn fablet
Comment 23
2020-09-16 11:09:18 PDT
Typically, in parallel to uploading a patch doing changes in LayoutTests/imported/w3c/web-platform-tests, we generate a PR (for instance using export-w3c-tests). Then we get a review, merge the PR in WPT and commit the change here. Do you plan to do so? I guess we could also first land this change and verify it is a full fix.
Sam Sneddon [:gsnedders]
Comment 24
2020-09-17 05:27:00 PDT
Was planning on verifying this works first before submitting upstream.
Aakash Jain
Comment 25
2020-09-17 12:47:23 PDT
Comment on
attachment 408936
[details]
Patch Breaks layout-tests e.g.:
https://ews-build.webkit.org/#/builders/30/builds/18734
From /Volumes/Data/worker/macOS-Mojave-Release-WK1-Tests-EWS/build/layout-test-results/wptwk_process_log.out.txt on ews101: WARNING:web-platform-tests:Status of subprocess "http on port 8801": failed. Exit with non-zero status: 1 INFO:web-platform-tests:Status of subprocess "https on port 9443": running self.run() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/Volumes/Data/worker/macOS-Mojave-Release-WK1-Tests-EWS/build/LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py", line 422, in create_daemon resource.setrlimit(resource.RLIMIT_NOFILE, (nofile_hard, nofile_hard)) ValueError: current limit exceeds maximum limit Process https on port 9443: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/Volumes/Data/worker/macOS-Mojave-Release-WK1-Tests-EWS/build/LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py", line 422, in create_daemon resource.setrlimit(resource.RLIMIT_NOFILE, (nofile_hard, nofile_hard)) ValueError: current limit exceeds maximum limit
Sam Sneddon [:gsnedders]
Comment 26
2020-09-18 07:35:04 PDT
Created
attachment 409127
[details]
Patch This seems to work in a VM without erroring everywhere, at least
youenn fablet
Comment 27
2020-09-18 07:46:27 PDT
Comment on
attachment 409127
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=409127&action=review
> LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:429 > + resource.setrlimit(resource.RLIMIT_NOFILE, (new_soft, hard))
Should we do it here or in Tools/Scripts/webkitpy/layout_tests/servers/web_platform_test_server.py?
Aakash Jain
Comment 28
2020-09-18 08:05:23 PDT
Comment on
attachment 409127
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=409127&action=review
> LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:427 > + new_soft = min(soft + 1024, maxfilesperproc, hard)
Is increasing soft limit by 1024 enough? It seems pretty less. on ews101 these are the numbers: ews101:~ buildbot$ sysctl -n kern.maxfilesperproc 98304
>>> resource.getrlimit(resource.RLIMIT_NOFILE)
(256, 9223372036854775807) so increasing soft limit from 256 to 1280 would be enough? Previous patch was setting this limit to 9223372036854775807, this one is setting to 1280.
Sam Sneddon [:gsnedders]
Comment 29
2020-09-18 09:08:25 PDT
(In reply to youenn fablet from
comment #27
)
> Comment on
attachment 409127
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=409127&action=review
> > > LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:429 > > + resource.setrlimit(resource.RLIMIT_NOFILE, (new_soft, hard)) > > Should we do it here or in > Tools/Scripts/webkitpy/layout_tests/servers/web_platform_test_server.py?
People have occasionally run into this upstream too, so it makes sense to fix it in the upstream code. (In reply to Aakash Jain from
comment #28
)
> so increasing soft limit from 256 to 1280 would be enough? Previous patch > was setting this limit to 9223372036854775807, this one is setting to 1280.
I think so. A quick look on Linux box I had to hand showed an initial soft limit of 1024.
Darin Adler
Comment 30
2020-09-19 14:59:28 PDT
Comment on
attachment 409127
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=409127&action=review
>> LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:427 >> + new_soft = min(soft + 1024, maxfilesperproc, hard) > > Is increasing soft limit by 1024 enough? It seems pretty less. > > on ews101 these are the numbers: > > ews101:~ buildbot$ sysctl -n kern.maxfilesperproc > 98304
After a tiny bit of investigation I am wondering the same thing. On my desktop machine I typed the command and got this: % sysctl -n kern.maxfilesperproc 49152 I don’t think the intent here is to bump from 49152 to 50176; unlikely to be valuable. So there must be something else going on, on the servers where this is helpful. Sam, can you clarify?
youenn fablet
Comment 31
2020-09-21 06:34:13 PDT
Comment on
attachment 409127
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=409127&action=review
> LayoutTests/imported/w3c/ChangeLog:9 > + (ServerProc.create_daemon):
As per our direct discussion, please add more comments explaining the default value on various platforms, and the value you are selecting here.
> LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:421 > + if sys.platform == "darwin":
Maybe we should verify that for any given system, we have a large enough value.
youenn fablet
Comment 32
2020-09-21 06:35:58 PDT
Really great to see that fixed. I am surprised Chrome or Firefox did not have that issue. Maybe they fixed in a different way though.
youenn fablet
Comment 33
2020-09-21 06:41:30 PDT
Also, even 256+1024 might be a bit small as we more and more run WTR in parallel on the same server. Can we increase it a bit further than that?
Sam Sneddon [:gsnedders]
Comment 34
2020-09-21 06:44:03 PDT
(In reply to Darin Adler from
comment #30
)
> Comment on
attachment 409127
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=409127&action=review
> > >> LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:427 > >> + new_soft = min(soft + 1024, maxfilesperproc, hard) > > > > Is increasing soft limit by 1024 enough? It seems pretty less. > > > > on ews101 these are the numbers: > > > > ews101:~ buildbot$ sysctl -n kern.maxfilesperproc > > 98304 > > After a tiny bit of investigation I am wondering the same thing. On my > desktop machine I typed the command and got this: > > % sysctl -n kern.maxfilesperproc > 49152 > > I don’t think the intent here is to bump from 49152 to 50176; unlikely to be > valuable. So there must be something else going on, on the servers where > this is helpful. > > Sam, can you clarify?
There's three limits in play here: * the processes' current limit (this starts at 256 on macOS and 1024 on the first Linux shell I had to hand), * the kernel limit for how many one process can have (this appears to be 49152 locally, this is kern.maxfilesperproc), and * the kernel limit for how many the system can have (which appears to be 98304 locally, this is kern.maxfiles). What we're doing is bumping the first of those, not the other two. 256 to 1280 is a pretty big jump; we probably don't want to jump all the way to kern.maxfilesperproc (as that then means the one process can get pretty close to kern.maxfiles, and it shouldn't be anywhere near it). Youenn pointed out directly that as we increase parallelism we might hit issues, though Servo was running pretty wide and didn't have any issues, so we'll see. (In reply to youenn fablet from
comment #31
)
> Comment on
attachment 409127
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=409127&action=review
> > > LayoutTests/imported/w3c/ChangeLog:9 > > + (ServerProc.create_daemon): > > As per our direct discussion, please add more comments explaining the > default value on various platforms, and the value you are selecting here. > > > LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:421 > > + if sys.platform == "darwin": > > Maybe we should verify that for any given system, we have a large enough > value.
Getting the system limits is a per-platform endeavour, however, and when it hasn't been observed elsewhere I don't think it's worthwhile trying that immediately? (In reply to youenn fablet from
comment #32
)
> Really great to see that fixed. I am surprised Chrome or Firefox did not > have that issue. Maybe they fixed in a different way though.
I don't think they run wptserve with more than one client hitting it at a time on macOS, rather just sharding across multiple machines each with their own wptserve instance. (In reply to youenn fablet from
comment #33
)
> Also, even 256+1024 might be a bit small as we more and more run WTR in > parallel on the same server. Can we increase it a bit further than that?
Certainly. I was mostly trying to avoid making a gigantic jump in one go and seeing whether this suffices.
Sam Sneddon [:gsnedders]
Comment 35
2020-09-21 10:56:09 PDT
Created
attachment 409283
[details]
Patch
youenn fablet
Comment 36
2020-09-21 12:05:59 PDT
Comment on
attachment 409283
[details]
Patch We cq it before landing the patch in WPT upstream, to validate this will be working as expected. @Sam, can you make the upstream PR right way so that we can land it in a couple of days? If these changes are your last commit, you should be able to do something like: Tools/Scripts/export-w3c-test-changes -g HEAD~1..HEAD -c
EWS
Comment 37
2020-09-21 12:06:36 PDT
ChangeLog entry in LayoutTests/imported/w3c/ChangeLog contains OOPS!.
youenn fablet
Comment 38
2020-09-21 12:07:55 PDT
Comment on
attachment 409283
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=409283&action=review
> LayoutTests/imported/w3c/ChangeLog:6 > + Reviewed by NOBODY (OOPS!).
Ah right, you need to change it to something like: Reviewed by Youenn Fablet.
Aakash Jain
Comment 39
2020-09-21 12:11:43 PDT
Comment on
attachment 409283
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=409283&action=review
> LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:428 > + # 2048 is somewhat arbitrary, but gives us some headroom for wptrunner --parallel
Why such a small number (2048)? Why not increase it to something like 100000? Note that in some parts of our infrastructure we do set this file limit to 100000 on various bots (using method in
https://trac.webkit.org/wiki/IncreasingKernelLimits#no1
) and it has been working fine for years. Also, were you able to reproduce the issue, and verify that by increasing the limit, the problem went away?
EWS
Comment 40
2020-09-21 12:31:40 PDT
Committed
r267354
: <
https://trac.webkit.org/changeset/267354
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 409283
[details]
.
Sam Sneddon [:gsnedders]
Comment 41
2020-09-24 07:59:50 PDT
Upstreamed in
https://github.com/web-platform-tests/wpt/pull/25773
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug