RESOLVED FIXED315073
[Tools] linux_get_crash_log: Revamp crash log generation for Linux layout tests
https://bugs.webkit.org/show_bug.cgi?id=315073
Summary [Tools] linux_get_crash_log: Revamp crash log generation for Linux layout tests
Carlos Alberto Lopez Perez
Reported 2026-05-18 20:59:48 PDT
The current linux_get_crash_log driver has several issues, specially when using coredumpctl generation. It is not reliable: it defaults to pick the last coredump available. That is a lottery when using several workers, and on the bots of the current infra is even worse because the coredumpctl directory is shared between all the pods in the ndoe. Coredumpctl doesn't record the pid inside the namespace (inside the container) but records the pid of the host, which is different. So then on the webkit tooling we are unable to find the coredump by the pid (the number coredumpctl records is different that what the webkit tooling sees) so it has to fallback to search the coredump by timestamp. See: 17:43:55.962 204 worker/8 Test imported/w3c/web-platform-tests/WebCryptoAPI/sign_verify/eddsa_small_order_points.https.any.worker.html crashed, we will now gather a crash log. 17:43:55.962 204 worker/8 Running "coredumpctl"... 17:43:55.980 204 worker/13 imported/w3c/web-platform-tests/WebCryptoAPI/generateKey/successes_RSA-OAEP.https.any.html?91-100 passed 17:43:55.993 204 worker/8 "coredumpctl" took 0.03s. 17:43:55.993 204 worker/8 Running "coredumpctl info --since=@1771983821.761148"... 17:43:56.004 204 worker/8 "coredumpctl info --since=@1771983821.761148" took 0.01s. 17:43:56.004 204 worker/8 Running "coredumpctl dump 1963618 --output /tmp/tmpsyos3cet" But that is racy, because the code tries to match the coredump just by picking the most recent since the timestamp of the crash, but the list of coredumps that coredumpctl gives you are not limited to the ones of the current container, but are also for all the containers in the same host (including the host) In the bots of the infra where we run 16 parallel workers and 4 bot containers in the same node, all of them at the same time trying to match the coredump with something like "just give me the first coredump since=@crash_timestamp". That is a problem. It would be also nice to be able to limit the number of parallel gbd process that can run to avoid OoM issues
Attachments
Carlos Alberto Lopez Perez
Comment 1 2026-05-18 21:45:06 PDT
EWS
Comment 2 2026-05-19 16:28:52 PDT
Committed 313530@main (aaa99edbf8c0): <https://commits.webkit.org/313530@main> Reviewed commits have been landed. Closing PR #65162 and removing active labels.
Note You need to log in before you can comment on or make changes to this bug.