This keeps being retried by EWS, not sure why. It should be able to determine that the patch is a failure. https://webkit-queues.appspot.com/patch/239920
Looks like this is the culprit: (patchanalysistask.py:208) if (not first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit() and self._results_failed_different_tests(first_results, second_results)): # We could report flaky tests here, but we would need to be careful # to use similar checks to ExpectedFailures._can_trust_results # to make sure we don't report constant failures as flakes when # we happen to hit the --exit-after-N-failures limit. # See https://bugs.webkit.org/show_bug.cgi?id=51272 return False Neither exceeds the failure limit, and they do fail different tests, so it defers. It should probably report the flakiness that it can, and fall through. In general I do think that's just a band-aid though, this whole function needs some stream-lining. It's difficult to reason about because it's got so many special cases that get filtered out one-by-one. I think we can do better.
But were these different tests? Looks like these runs failed the same tests: https://webkit-queues.appspot.com/results/5528053639806976 https://webkit-queues.appspot.com/results/4704366959263744
Fixed in a different bug: https://bugs.webkit.org/show_bug.cgi?id=138184