http/tests/xmlhttprequest/workers/methods-async.html occasionally timing out on Tiger bot Sadly I don't have any more information. If I see it again, I'll note so here. We need to get some flakey-test monitoring setup like Chromium has: http://src.chromium.org/viewvc/chrome/trunk/src/webkit/tools/layout_tests/flakiness_dashboard.html
I've often wished that DumpRenderTree could dump its current output when a test times out., so it's easier to figure out which test case is failing/dying. That would make cases like this much easier to debug. Maybe there's some way to have run-webkit-tests send a signal to DRT when it's timing out to force it to dump the current contents instead of just closing the fd.
(In reply to comment #1) > I've often wished that DumpRenderTree could dump its current output when a test > times out I really like that idea. We -- not you :) -- should totally implement that and doing something similar in chromium land as well. (In reply to Description) > We need to get some flakey-test monitoring setup I also like the idea of doing some flakiness monitoring tool. I have to admit I've had a hard time interpreting the output of that but I'm starting to grok it. So far I found an instance of a similar hang as far back as this: http://build.webkit.org/results/Tiger%20Intel%20Release/r48111%20(4125)/results.html and there isn't too much history in the buildbot before that.
http://build.webkit.org/results/Tiger%20Intel%20Release/r48230%20(4213)/results.html Another worker test just timed out on the Tiger bots. Maybe this is more common than I thought. Thanks for the quick response!
http/tests/xmlhttprequest/workers/shared-worker-methods.html was the test which just timed out, in case the results link goes away.
shared-workers-methods timing out on tiger bots too. :( http://build.webkit.org/results/Tiger%20Intel%20Release/r48301%20(4269)/results.html
Turns out that DRT already has a graceful timeout period built in to it which flushes the current output. The problem is that run-webkit-tests has its own (less-graceful) timeout, and the timeout value is set too low so we're getting the less-graceful timeout behavior. I've got a patch out for review (https://bugs.webkit.org/show_bug.cgi?id=29223) to address this behavior, and hopefully once that lands we can figure out what's causing this bug.
http/tests/workers/worker-importScripts.html from http://build.webkit.org/results/Tiger%20Intel%20Release/r48881%20(4756)/results.html and http/tests/xmlhttprequest/workers/methods.html from http://build.webkit.org/results/Tiger%20Intel%20Release/r48882%20(4757)/results.html just timed out on the Tiger bots as well. So it looks like this bug is not yet resolved. :(
> just timed out on the Tiger bots as well. So it looks like this bug is not yet > resolved. :( Yeah, we never did anything to address this bug, I just enabled us to debug it more by fixing the timeout handling. Dave, any ideas here? The error in worker-importScripts is really bizarre, since it's happening *between* the execution of the two scripts in an importScripts(script1, script2) call.
http/tests/workers/shared-worker-importScripts.html timeout seen in: http://build.webkit.org/results/Tiger%20Intel%20Release/r48927%20(4789)/results.html http://build.webkit.org/results/Tiger%20Intel%20Release/r48936%20(4796)/results.html and http/tests/xmlhttprequest/workers/shared-worker-xhr-file-not-found.html in: http://build.webkit.org/results/Tiger%20Intel%20Release/r48944%20(4802)/results.html I assume these are all related. But I'm happy to file individual bugs if that would be helpful.
http/tests/workers/worker-importScripts.html in: http://build.webkit.org/results/Tiger%20Intel%20Release/r48923%20(4785)/results.html
More tiger failures from this evening: http/tests/xmlhttprequest/workers/methods.html: http://build.webkit.org/results/Tiger%20Intel%20Release/r48956%20(4813)/results.html http/tests/xmlhttprequest/workers/shared-worker-xhr-file-not-found.html: http://build.webkit.org/results/Tiger%20Intel%20Release/r48944%20(4802)/results.html http/tests/workers/worker-importScripts.html: http://build.webkit.org/results/Tiger%20Intel%20Release/r48936%20(4796)/results.html http://build.webkit.org/results/Tiger%20Intel%20Release/r48923%20(4785)/results.html http/tests/workers/shared-worker-importScripts.html: http://build.webkit.org/results/Tiger%20Intel%20Release/r48927%20(4789)/results.html I wish we had any idea why these worker tests were timing out on the Tiger bot. :(
This is definitely the worst bug on the Tiger bots.
Do we believe these are timing out because the test simply runs out of time, or because of some deadlock/hang in WebCore/DRT?
methods.html timed out again this morning: http://build.webkit.org/results/Tiger%20Intel%20Release/r48971%20(4826)/results.html They seem to all fail in the same place, and all fail pretty early in the test.
It is suspicious that they are all network-related tests, and that we've seen at least one actual crash down in that code lately. I think that Dmitry was looking at it a bit - I have been looking at it as well, but my free cycles are limited today and tomorrow due to sheriff duties. He has some suspicions that the synchronous network request code might have some bustage.
Indeed, I was looking at it although w/o results so far. So far I see that all the tests that fail do ThreadableLoader::loadResourceSynchronously on the worker thread. Trying to get this to reproduce locally.
It looks like it only reproduces on Tiger. So you may need to acquire a Tiger box. It may be an interaction with Tiger's CFNetwork calls.
http/tests/xmlhttprequest/workers/shared-worker-xhr-file-not-found.html just crashed on Leopard: http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r49009%20(5653)/results.html Perhaps this is not a Tiger-only bug. Or perhaps that's a completely unrelated bug. Sadly the buildbots don't spit out crash logs. :(
Another crash from the leopard bot: http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r49068%20(5705)/results.html And 5 more timeouts from Tiger in the last 24 hours. :( http://build.webkit.org/results/Tiger%20Intel%20Release/r49079%20(4904)/results.html http://build.webkit.org/results/Tiger%20Intel%20Release/r49081%20(4906)/results.html http://build.webkit.org/results/Tiger%20Intel%20Release/r49083%20(4908)/results.html http://build.webkit.org/results/Tiger%20Intel%20Release/r49088%20(4912)/results.html http://build.webkit.org/results/Tiger%20Intel%20Release/r49102%20(4924)/results.html We should be able to reproduce this locally using the --iterations and --repeat-each flags on run-webkit-tests.
I'm beginning to think this has more to do with xmlhttprequest tests and less to do with workers. I think we have a random corruption problem, similar to what ap solved with the CString null termination issue, since this is producing most often some sort of network hang, and occasional crashes on both Leopard and tiger. The fact that the hangs all look similar, but seem to be with different tests leads me to believe this is all one root cause. I'm not sure why this happens more often on Tiger than Leopard yet. This is definitely our worst test falkiness bug for Mac at the moment. From this evening: http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r49284%20(5872)/http/tests/xmlhttprequest/xmlhttprequest-onProgress-open-should-zero-length-pretty-diff.html http://build.webkit.org/results/Tiger%20Intel%20Release/r49260%20(5031)/http/tests/xmlhttprequest/workers/shared-worker-methods-async-pretty-diff.html Bug 30194 (also seen this evening) might also be from the same root cause. Not sure.
Another just now: http://build.webkit.org/results/Tiger%20Intel%20Release/r49314%20(5069)/http/tests/xmlhttprequest/workers/shared-worker-methods-pretty-diff.html
The tiger bot was removed several months ago and my understanding is that the code would not compile on Tiger anymore. Closing WONTFIX, feel free to reopen it if I am mistaking.