Since we enabled threaded compositor, we now have a bunch of tests that occasionally pass when they usually fail. In build #18023 we have 84 unexpected passes, for example; that never happened before threaded compositor. What's interesting is these tests either always pass or always fail in a particular run of run-webkit-tests; they are clearly flaky, but they don't contribute to the flakiness count. Carlos Garcia suggests in bug #161242#c20: > I have theory, though. When they > unexpectedly pass, they don't actually pass, we just fail to render both the > actual and expected files in the same way (for example a fully white image > in both cases). That's a problem of the reftests. So, more interesting to > see what's failing, which can be also reproduced locally more easily, would > be to see what we render when they pass.