This just checks what processes and open files we have, to try and see why we're running into "too many open files"
Created attachment 408814 [details] Patch
This patch modifies the imported WPT tests. Please ensure that any changes on the tests (not coming from a WPT import) are exported to WPT. Please see https://trac.webkit.org/wiki/WPTExportProcess
Comment on attachment 408814 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408814&action=review > LayoutTests/imported/w3c/web-platform-tests/tools/wptserve/wptserve/response.py:254 > + self.logger.debug(subprocess.check_output(["ps", "aux"]) missing parenthesis
Created attachment 408824 [details] Patch
Where are we running into "too many open files"? Any bug/radar with the details?
See the see also bug: 215829 :)
Created attachment 408834 [details] Patch
Created attachment 408838 [details] Patch
This entire approach is doomed to failure, because lsof can block and make everything worse.
Comment on attachment 408838 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408838&action=review > LayoutTests/imported/w3c/web-platform-tests/tools/wptserve/wptserve/response.py:257 > + self.logger.debug(subprocess.check_output(["ps", "aux"])) This ('ps aux' and 'lsof -P +c0') will produce huge amount of output (tens of thousands of lines). Are you sure you want to add that much output in the logs? Maybe you just need a count of number of lines of these. Also, when the machine is already running out of file handles, these commands might just slow down/hang the machine further. This might explain why https://ews-build.webkit.org/#/builders/30/builds/17561/steps/11/logs/stdio hanged with this patch.
We can manually run a job on one of the machines dumping lsof every few seconds, this way we could catch it right before the condition occurs. As long as it's not a super quick explosion, that could be good enough.