REGRESSION: WK2 tests have 500+ failures causing an early exit Much discussion on #webkit. We think it was http://trac.webkit.org/changeset/124958 or http://trac.webkit.org/changeset/124581 or both.
to add more context from the #webkit discussion it looks like all the wk2 bots are failing fairly reliably (though not always) with 500+ failures. the wk1 bots seem happy. it looks like things prior to the time of ~r124581 (where we changed to passing --pixel-tests per-test for reftests to work) were pretty happy. A partial revert of that change, however, did not seem to fix things. In r124958, I changed errors from ImageDiff to be treated as test failures (previously we would ignore the failure and treat things as if tests were passing, e.g., false positives). As reported in https://bugs.webkit.org/show_bug.cgi?id=81962, I get lots of ImageDiff warnings *only when running wk2* (don't know why yet), and so I'm suspecting that the change in r124958 has pushed the problems over the edge. i.e., r124958 has made the bug 81962 a lot more serious (arguably, as it should be). I am continuing to do more testing.
So, with my ImageDiff change reverted, I still get 200+ failures near tip-of-tree using WK2 (Release) on Lion. If you actually look at the failures, many (most?) of them appear to be render trees for what should be text-only tests, as if dumpAsText() is having no effect. I don't know how that could be happening, or if there were any changes to WTR recently that might cause that?
Also, it's worth trying to repro these issues when running the tests serially (I'm trying this now, but it's obviously much slower so I don't have results yet), to see if WTR is just interfering w/ itself.
(In reply to comment #2) > So, with my ImageDiff change reverted, I still get 200+ failures near tip-of-tree using WK2 (Release) on Lion. > > If you actually look at the failures, many (most?) of them appear to be render trees for what should be text-only tests, as if dumpAsText() is having no effect. I don't know how that could be happening, or if there were any changes to WTR recently that might cause that? That's terrible... I hope to look at the log to WTR today and see if I can spot anything out of the blue. (In reply to comment #3) > Also, it's worth trying to repro these issues when running the tests serially (I'm trying this now, but it's obviously much slower so I don't have results yet), to see if WTR is just interfering w/ itself. Any word on that effort?
(In reply to comment #4) > (In reply to comment #2) > > So, with my ImageDiff change reverted, I still get 200+ failures near tip-of-tree using WK2 (Release) on Lion. > > > > If you actually look at the failures, many (most?) of them appear to be render trees for what should be text-only tests, as if dumpAsText() is having no effect. I don't know how that could be happening, or if there were any changes to WTR recently that might cause that? > > That's terrible... I hope to look at the log to WTR today and see if I can spot anything out of the blue. > > (In reply to comment #3) > > Also, it's worth trying to repro these issues when running the tests serially (I'm trying this now, but it's obviously much slower so I don't have results yet), to see if WTR is just interfering w/ itself. > > Any word on that effort? well, running serially took an hour and produced ~400 failures at r124580. I don't know why this is so much worse than what I saw on the bots around that time range, unless there's something wrong with my local configuration. My best guess at this point is that there are multiple serious issues with WTR: 1) dumpAsText isn't working in some cases 2) something is broken with how WTR is generating pixel dumps that is causing ImageDiff to fail and as a result, we're failing a *lot* of reftests (which do pixel compares even when pixel tests are disabled). I don't see anything obviously wrong with NRWT; I could revert (or disable) r124958, but that would seem to just ignore the real problem.
Also, for what it's worth, since I don't know all that much about WTR (and as I have other tasks) it probably is more useful for someone else to pick up the ball and start running. If I can be of further help please let me know :).
So, I did a little test bisecting, and it looks like the following command seems to run reasonably well for me: rwt -2 -i fast/repaint -i fast/canvas -i fast/inspector-support -i accessibility -i compositing -i css3 -i http/tests/inspector -i inspector -i http/tests/inspector-enabled in theory we reverted a change to the inspector that was causing all the inspector failures, so you're left with hunting down something causing instability in repaint, compositing, filters, and/or canvas. I imagine further bisecting should track things down pretty quickly, but I'm done for the night/weekend :).
We've concretely identified there's an off-by-one issue in comparing test results, that's tracked by https://bugs.webkit.org/show_bug.cgi?id=94505 which is now a blocking subtask here. (It might be the only issue)
fixed in http://trac.webkit.org/changeset/126418 long long time ago.