Bug 29090 - http/tests/xmlhttprequest/workers/methods-async.html occasionally timing out on Tiger bot
Summary: http/tests/xmlhttprequest/workers/methods-async.html occasionally timing out ...
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P1 Normal
Assignee: Nobody
URL: http://build.webkit.org/results/Tiger...
Depends on:
Reported: 2009-09-09 09:58 PDT by Eric Seidel (no email)
Modified: 2011-07-01 06:43 PDT (History)
5 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2009-09-09 09:58:20 PDT
http/tests/xmlhttprequest/workers/methods-async.html occasionally timing out on Tiger bot

Sadly I don't have any more information.  If I see it again, I'll note so here.  We need to get some flakey-test monitoring setup like Chromium has:
Comment 1 Andrew Wilson 2009-09-09 10:20:43 PDT
I've often wished that DumpRenderTree could dump its current output when a test times out., so it's easier to figure out which test case is failing/dying. That would make cases like this much easier to debug.

Maybe there's some way to have run-webkit-tests send a signal to DRT when it's timing out to force it to dump the current contents instead of just closing the fd.
Comment 2 David Levin 2009-09-09 10:31:55 PDT
(In reply to comment #1)
> I've often wished that DumpRenderTree could dump its current output when a test
> times out

I really like that idea.  We -- not you :) -- should totally implement that and doing something similar in chromium land as well.
(In reply to Description)

> We need to get some flakey-test monitoring setup

I also like the idea of doing some flakiness monitoring tool. I have to admit I've had a hard time interpreting the output of that but I'm starting to grok it.

So far I found an instance of a similar hang as far back as this:

and there isn't too much history in the buildbot before that.
Comment 3 Eric Seidel (no email) 2009-09-09 16:01:38 PDT
Another worker test just timed out on the Tiger bots.  Maybe this is more common than I thought.  Thanks for the quick response!
Comment 4 Eric Seidel (no email) 2009-09-09 16:02:04 PDT
http/tests/xmlhttprequest/workers/shared-worker-methods.html was the test which just timed out, in case the results link goes away.
Comment 5 Eric Seidel (no email) 2009-09-11 09:32:25 PDT
shared-workers-methods timing out on tiger bots too. :(
Comment 6 Andrew Wilson 2009-09-12 10:22:43 PDT
Turns out that DRT already has a graceful timeout period built in to it which flushes the current output.
The problem is that run-webkit-tests has its own (less-graceful) timeout, and the timeout value is set too low so we're getting the less-graceful timeout behavior.

I've got a patch out for review (https://bugs.webkit.org/show_bug.cgi?id=29223) to address this behavior, and hopefully once that lands we can figure out what's causing this bug.
Comment 7 Eric Seidel (no email) 2009-09-29 11:06:23 PDT
from http://build.webkit.org/results/Tiger%20Intel%20Release/r48881%20(4756)/results.html
from http://build.webkit.org/results/Tiger%20Intel%20Release/r48882%20(4757)/results.html

just timed out on the Tiger bots as well.  So it looks like this bug is not yet resolved. :(
Comment 8 Andrew Wilson 2009-09-29 11:17:16 PDT

> just timed out on the Tiger bots as well.  So it looks like this bug is not yet
> resolved. :(

Yeah, we never did anything to address this bug, I just enabled us to debug it more by fixing the timeout handling.

Dave, any ideas here? The error in worker-importScripts is really bizarre, since it's happening *between* the execution of the two scripts in an importScripts(script1, script2) call.
Comment 9 Eric Seidel (no email) 2009-09-30 15:54:52 PDT
http/tests/workers/shared-worker-importScripts.html timeout seen in:
and http/tests/xmlhttprequest/workers/shared-worker-xhr-file-not-found.html in:

I assume these are all related.  But I'm happy to file individual bugs if that would be helpful.
Comment 10 Eric Seidel (no email) 2009-09-30 15:55:31 PDT
http/tests/workers/worker-importScripts.html in:
Comment 11 Eric Seidel (no email) 2009-09-30 22:29:25 PDT
More tiger failures from this evening:




I wish we had any idea why these worker tests were timing out on the Tiger bot. :(
Comment 12 Eric Seidel (no email) 2009-10-01 12:18:04 PDT
This is definitely the worst bug on the Tiger bots.
Comment 13 Eric Seidel (no email) 2009-10-01 12:18:52 PDT
Do we believe these are timing out because the test simply runs out of time, or because of some deadlock/hang in WebCore/DRT?
Comment 14 Eric Seidel (no email) 2009-10-01 12:22:32 PDT
methods.html timed out again this morning:

They seem to all fail in the same place, and all fail pretty early in the test.
Comment 15 Andrew Wilson 2009-10-01 12:44:09 PDT
It is suspicious that they are all network-related tests, and that we've seen at least one actual crash down in that code lately.

I think that Dmitry was looking at it a bit - I have been looking at it as well, but my free cycles are limited today and tomorrow due to sheriff duties. He has some suspicions that the synchronous network request code might have some bustage.
Comment 16 Dmitry Titov 2009-10-01 15:07:58 PDT
Indeed, I was looking at it although w/o results so far. So far I see that all the tests that fail do ThreadableLoader::loadResourceSynchronously on the worker thread. Trying to get this to reproduce locally.
Comment 17 Eric Seidel (no email) 2009-10-01 15:14:05 PDT
It looks like it only reproduces on Tiger.  So you may need to acquire a Tiger box.  It may be an interaction with Tiger's CFNetwork calls.
Comment 18 Eric Seidel (no email) 2009-10-02 00:31:20 PDT
http/tests/xmlhttprequest/workers/shared-worker-xhr-file-not-found.html just crashed on Leopard:

Perhaps this is not a Tiger-only bug.  Or perhaps that's a completely unrelated bug.  Sadly the buildbots don't spit out crash logs. :(
Comment 20 Eric Seidel (no email) 2009-10-07 21:36:35 PDT
I'm beginning to think this has more to do with xmlhttprequest tests and less to do with workers.

I think we have a random corruption problem, similar to what ap solved with the CString null termination issue, since this is producing most often some sort of network hang, and occasional crashes on both Leopard and tiger.  The fact that the hangs all look similar, but seem to be with different tests leads me to believe this is all one root cause.  I'm not sure why this happens more often on Tiger than Leopard yet.

This is definitely our worst test falkiness bug for Mac at the moment.

From this evening:

Bug 30194 (also seen this evening) might also be from the same root cause.  Not sure.
Comment 22 Julien Chaffraix 2011-07-01 06:43:53 PDT
The tiger bot was removed several months ago and my understanding is that the code would not compile on Tiger anymore. Closing WONTFIX, feel free to reopen it if I am mistaking.