imported/w3c/web-platform-tests/* A large group of these tests are flaky failing as a group on Apple iOS 13 Simulator Release WK2(Tests) and Apple iPadOS 13 Simulator Release WK2(Tests) bots. The group number is different every time but is consistently around 200 tests. No failures to APIs accompany these failures. Occurrence is happening on average 1 in 20 builds. Revisions may build successfully on one set of testers but fail on the other group. Almost in every instance the group failures will not happen again on immediate build with no indication that revision was used to clear failures. One commonality that all builds on each tester is this warning: (view as text) warning: (x86_64) /Volumes/Data/slave/ios-simulator-13-release/build/WebKitBuild/ImageDiff.build/Release/ImageDiff.build/Objects-normal/x86_64/ImageDiff.o unable to open object file: No such file or directory warning: (x86_64) /Volumes/Data/slave/ios-simulator-13-release/build/WebKitBuild/ImageDiff.build/Release/ImageDiff.build/Objects-normal/x86_64/PlatformImage.o unable to open object file: No such file or directory warning: (x86_64) /Volumes/Data/slave/ios-simulator-13-release/build/WebKitBuild/ImageDiff.build/Release/ImageDiff.build/Objects-normal/x86_64/PlatformImageCG.o unable to open object file: No such file or directory warning: no debug symbols in executable (-arch x86_64) Access to Apple iOS 13 Simulator Release WK2(Tests): https://build.webkit.org/builders/Apple%20iOS%2013%20Simulator%20Release%20WK2%20(Tests)?numbuilds=50 Access to Apple iPadOS 13 Simulator Release WK2(Tests) https://build.webkit.org/builders/Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20(Tests)?numbuilds=50 Log for latest occurrence on iPadOS 13 Simulator Release WK2(Tests) please see attached. For build 4417 at r266107 with failures gone on next build at r266112 but no revision seems to be cause of fix and happens again on different revision.
<rdar://problem/67765880>
Created attachment 407250 [details] Stdio for iPadOS 13 Simulator Release WK2(Tests) build 4427 at r266107
Created attachment 407251 [details] Warnings for iPadOS 13 Simulator Release WK2(Tests) build 4427 at r266107
Here is a link to the wptwk process log from the most recent failure: https://build.webkit.org/results/Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20(Tests)/r266190%20(4445)/wptwk_process_log.out.txt https://build.webkit.org/builders/Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20%28Tests%29/builds/4445
Here is one that for a test run that had no failures for comparison: https://build.webkit.org/results/Apple%20iOS%2013%20Simulator%20Release%20WK2%20(Tests)/r266190%20(6397)/wptwk_process_log.out.txt
Looks like this has popped up on EWS, too: https://ews-build.s3-us-west-2.amazonaws.com/iOS-13-Simulator-WK2-Tests-EWS/r408353-25299/results.html
(In reply to Ryan Haddad from comment #4) > Here is a link to the wptwk process log from the most recent failure: > https://build.webkit.org/results/ > Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20(Tests)/r266190%20(4445)/ > wptwk_process_log.out.txt > > https://build.webkit.org/builders/ > Apple%20iPadOS%2013%20Simulator%20Release%20WK2%20%28Tests%29/builds/4445 This doesn't seem to give any obvious clue, it just starts returning 404 for files that should definitely exist, and then relatively quickly goes back to operating normally. Let's try adding some extra debugging info there. :shrug:
Created attachment 408370 [details] Patch
This patch modifies the imported WPT tests. Please ensure that any changes on the tests (not coming from a WPT import) are exported to WPT. Please see https://trac.webkit.org/wiki/WPTExportProcess
Comment on attachment 408370 [details] Patch oh no this broke things
(In reply to Sam Sneddon [:gsnedders] from comment #10) > Comment on attachment 408370 [details] > Patch > > oh no this broke things That's pretty surprising, but the test it apparently broke does seem pretty green on trunk infrastructure.
Breakage seems fine, you might just have to rebase the corresponding expected.txt file for now.
Created attachment 408418 [details] Patch for landing
Committed r266816: <https://trac.webkit.org/changeset/266816> All reviewed patches have been landed. Closing bug and clearing flags on attachment 408418 [details].
This hopefully gives us enough to debug this, but doesn't fix it.
Error message is: {\"error\": {\"message\": \"Too many open files\", \"code\": 404}}"
(In reply to youenn fablet from comment #16) > Error message is: > {\"error\": {\"message\": \"Too many open files\", \"code\": 404}}" Maybe we haven't bumped thee file limit, or maybe we need to run less simulators.
Ideally, we'd collect ps and lsof output in state.
(In reply to youenn fablet from comment #16) > Error message is: > {\"error\": {\"message\": \"Too many open files\", \"code\": 404}}" Where exactly does this error message appear? Any link?
(In reply to Aakash Jain from comment #19) > (In reply to youenn fablet from comment #16) > > Error message is: > > {\"error\": {\"message\": \"Too many open files\", \"code\": 404}}" > Where exactly does this error message appear? Any link? Look at many of the actual results in https://build.webkit.org/results/Apple%20iOS%2013%20Simulator%20Release%20WK2%20(Tests)/r267004%20(6734)/results.html for example
(In reply to Jonathan Bedard from comment #17) > Maybe we haven't bumped thee file limit, or maybe we need to run less simulators. Yeah, we haven't bumped the limits on these bots. Process limit seems to be good enough by default (11136 limit on ews122 - ews126). However, file limit is default: ews122:LaunchAgents buildbot$ launchctl limit maxfiles maxfiles 256 unlimited Maybe we should simply increase the file limit to 100,000 by using the method in https://trac.webkit.org/wiki/IncreasingKernelLimits#no1
Created attachment 408936 [details] Patch
Typically, in parallel to uploading a patch doing changes in LayoutTests/imported/w3c/web-platform-tests, we generate a PR (for instance using export-w3c-tests). Then we get a review, merge the PR in WPT and commit the change here. Do you plan to do so? I guess we could also first land this change and verify it is a full fix.
Was planning on verifying this works first before submitting upstream.
Comment on attachment 408936 [details] Patch Breaks layout-tests e.g.: https://ews-build.webkit.org/#/builders/30/builds/18734 From /Volumes/Data/worker/macOS-Mojave-Release-WK1-Tests-EWS/build/layout-test-results/wptwk_process_log.out.txt on ews101: WARNING:web-platform-tests:Status of subprocess "http on port 8801": failed. Exit with non-zero status: 1 INFO:web-platform-tests:Status of subprocess "https on port 9443": running self.run() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/Volumes/Data/worker/macOS-Mojave-Release-WK1-Tests-EWS/build/LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py", line 422, in create_daemon resource.setrlimit(resource.RLIMIT_NOFILE, (nofile_hard, nofile_hard)) ValueError: current limit exceeds maximum limit Process https on port 9443: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/Volumes/Data/worker/macOS-Mojave-Release-WK1-Tests-EWS/build/LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py", line 422, in create_daemon resource.setrlimit(resource.RLIMIT_NOFILE, (nofile_hard, nofile_hard)) ValueError: current limit exceeds maximum limit
Created attachment 409127 [details] Patch This seems to work in a VM without erroring everywhere, at least
Comment on attachment 409127 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=409127&action=review > LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:429 > + resource.setrlimit(resource.RLIMIT_NOFILE, (new_soft, hard)) Should we do it here or in Tools/Scripts/webkitpy/layout_tests/servers/web_platform_test_server.py?
Comment on attachment 409127 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=409127&action=review > LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:427 > + new_soft = min(soft + 1024, maxfilesperproc, hard) Is increasing soft limit by 1024 enough? It seems pretty less. on ews101 these are the numbers: ews101:~ buildbot$ sysctl -n kern.maxfilesperproc 98304 >>> resource.getrlimit(resource.RLIMIT_NOFILE) (256, 9223372036854775807) so increasing soft limit from 256 to 1280 would be enough? Previous patch was setting this limit to 9223372036854775807, this one is setting to 1280.
(In reply to youenn fablet from comment #27) > Comment on attachment 409127 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=409127&action=review > > > LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:429 > > + resource.setrlimit(resource.RLIMIT_NOFILE, (new_soft, hard)) > > Should we do it here or in > Tools/Scripts/webkitpy/layout_tests/servers/web_platform_test_server.py? People have occasionally run into this upstream too, so it makes sense to fix it in the upstream code. (In reply to Aakash Jain from comment #28) > so increasing soft limit from 256 to 1280 would be enough? Previous patch > was setting this limit to 9223372036854775807, this one is setting to 1280. I think so. A quick look on Linux box I had to hand showed an initial soft limit of 1024.
Comment on attachment 409127 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=409127&action=review >> LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:427 >> + new_soft = min(soft + 1024, maxfilesperproc, hard) > > Is increasing soft limit by 1024 enough? It seems pretty less. > > on ews101 these are the numbers: > > ews101:~ buildbot$ sysctl -n kern.maxfilesperproc > 98304 After a tiny bit of investigation I am wondering the same thing. On my desktop machine I typed the command and got this: % sysctl -n kern.maxfilesperproc 49152 I don’t think the intent here is to bump from 49152 to 50176; unlikely to be valuable. So there must be something else going on, on the servers where this is helpful. Sam, can you clarify?
Comment on attachment 409127 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=409127&action=review > LayoutTests/imported/w3c/ChangeLog:9 > + (ServerProc.create_daemon): As per our direct discussion, please add more comments explaining the default value on various platforms, and the value you are selecting here. > LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:421 > + if sys.platform == "darwin": Maybe we should verify that for any given system, we have a large enough value.
Really great to see that fixed. I am surprised Chrome or Firefox did not have that issue. Maybe they fixed in a different way though.
Also, even 256+1024 might be a bit small as we more and more run WTR in parallel on the same server. Can we increase it a bit further than that?
(In reply to Darin Adler from comment #30) > Comment on attachment 409127 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=409127&action=review > > >> LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:427 > >> + new_soft = min(soft + 1024, maxfilesperproc, hard) > > > > Is increasing soft limit by 1024 enough? It seems pretty less. > > > > on ews101 these are the numbers: > > > > ews101:~ buildbot$ sysctl -n kern.maxfilesperproc > > 98304 > > After a tiny bit of investigation I am wondering the same thing. On my > desktop machine I typed the command and got this: > > % sysctl -n kern.maxfilesperproc > 49152 > > I don’t think the intent here is to bump from 49152 to 50176; unlikely to be > valuable. So there must be something else going on, on the servers where > this is helpful. > > Sam, can you clarify? There's three limits in play here: * the processes' current limit (this starts at 256 on macOS and 1024 on the first Linux shell I had to hand), * the kernel limit for how many one process can have (this appears to be 49152 locally, this is kern.maxfilesperproc), and * the kernel limit for how many the system can have (which appears to be 98304 locally, this is kern.maxfiles). What we're doing is bumping the first of those, not the other two. 256 to 1280 is a pretty big jump; we probably don't want to jump all the way to kern.maxfilesperproc (as that then means the one process can get pretty close to kern.maxfiles, and it shouldn't be anywhere near it). Youenn pointed out directly that as we increase parallelism we might hit issues, though Servo was running pretty wide and didn't have any issues, so we'll see. (In reply to youenn fablet from comment #31) > Comment on attachment 409127 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=409127&action=review > > > LayoutTests/imported/w3c/ChangeLog:9 > > + (ServerProc.create_daemon): > > As per our direct discussion, please add more comments explaining the > default value on various platforms, and the value you are selecting here. > > > LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:421 > > + if sys.platform == "darwin": > > Maybe we should verify that for any given system, we have a large enough > value. Getting the system limits is a per-platform endeavour, however, and when it hasn't been observed elsewhere I don't think it's worthwhile trying that immediately? (In reply to youenn fablet from comment #32) > Really great to see that fixed. I am surprised Chrome or Firefox did not > have that issue. Maybe they fixed in a different way though. I don't think they run wptserve with more than one client hitting it at a time on macOS, rather just sharding across multiple machines each with their own wptserve instance. (In reply to youenn fablet from comment #33) > Also, even 256+1024 might be a bit small as we more and more run WTR in > parallel on the same server. Can we increase it a bit further than that? Certainly. I was mostly trying to avoid making a gigantic jump in one go and seeing whether this suffices.
Created attachment 409283 [details] Patch
Comment on attachment 409283 [details] Patch We cq it before landing the patch in WPT upstream, to validate this will be working as expected. @Sam, can you make the upstream PR right way so that we can land it in a couple of days? If these changes are your last commit, you should be able to do something like: Tools/Scripts/export-w3c-test-changes -g HEAD~1..HEAD -c
ChangeLog entry in LayoutTests/imported/w3c/ChangeLog contains OOPS!.
Comment on attachment 409283 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=409283&action=review > LayoutTests/imported/w3c/ChangeLog:6 > + Reviewed by NOBODY (OOPS!). Ah right, you need to change it to something like: Reviewed by Youenn Fablet.
Comment on attachment 409283 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=409283&action=review > LayoutTests/imported/w3c/web-platform-tests/tools/serve/serve.py:428 > + # 2048 is somewhat arbitrary, but gives us some headroom for wptrunner --parallel Why such a small number (2048)? Why not increase it to something like 100000? Note that in some parts of our infrastructure we do set this file limit to 100000 on various bots (using method in https://trac.webkit.org/wiki/IncreasingKernelLimits#no1) and it has been working fine for years. Also, were you able to reproduce the issue, and verify that by increasing the limit, the problem went away?
Committed r267354: <https://trac.webkit.org/changeset/267354> All reviewed patches have been landed. Closing bug and clearing flags on attachment 409283 [details].
Upstreamed in https://github.com/web-platform-tests/wpt/pull/25773