RESOLVED FIXED35819
REGRESSION(55593?): fast/workers/worker-cloneport.html is timing out on Leopard
https://bugs.webkit.org/show_bug.cgi?id=35819
Summary REGRESSION(55593?): fast/workers/worker-cloneport.html is timing out on Leopard
Eric Seidel (no email)
Reported 2010-03-05 16:28:45 PST
fast/workers/worker-cloneport.html is timing out on Leopard Commit Bot Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests Testing 12449 test cases. fast/workers/worker-cloneport.html -> timed out Sampling process 89159 for 10 seconds with 10 milliseconds of run time between samples Sampling completed, processing symbols... Sample analysis of process 89159 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt Exiting early after 1 failures. 8774 tests run. 343.58s total testing time 8773 test cases (99%) succeeded 1 test case (<1%) timed out 5 test cases (<1%) had stderr output Unabled to successfully build and test I've not seen this time out on the bots recently, but it seems to be consistently failing for the commit-bot. This is a new failure from today as far as I can tell.
Attachments
Hang report from Leopard Commit Bot (91.86 KB, text/plain)
2010-03-05 16:41 PST, Eric Seidel (no email)
no flags
Skip the test until this bug can be looked at (1.09 KB, patch)
2010-03-05 17:07 PST, Eric Seidel (no email)
no flags
Eric Seidel (no email)
Comment 1 2010-03-05 16:29:23 PST
Qt has also seen crashes related to this test. bug 33653.
Eric Seidel (no email)
Comment 2 2010-03-05 16:40:17 PST
I'm forcing the commit-queue bot to do a clean build. We may just have to skip this test for the time being since: 1. This is definitely a recent regression. 2. There are currently 18 patches blocked on this bug.
Eric Seidel (no email)
Comment 3 2010-03-05 16:41:12 PST
Created attachment 50142 [details] Hang report from Leopard Commit Bot
Eric Seidel (no email)
Comment 4 2010-03-05 16:43:01 PST
I can't really tell what's going on from that hang report. Almost the entire time is spent in: 900 WebCore::ScriptExecutionContext::dispatchMessagePortEvents()
Eric Seidel (no email)
Comment 5 2010-03-05 16:55:03 PST
Eric Seidel (no email)
Comment 6 2010-03-05 16:57:53 PST
Eric Seidel (no email)
Comment 7 2010-03-05 17:07:33 PST
Created attachment 50143 [details] Skip the test until this bug can be looked at
Eric Seidel (no email)
Comment 8 2010-03-05 17:09:19 PST
Eric Seidel (no email)
Comment 9 2010-03-05 17:10:19 PST
I forgot to pass --no-close to "webkit-patch land", sorry.
Eric Seidel (no email)
Comment 11 2010-03-05 22:57:08 PST
I'm really not sure that it's caused by r55593. But we could certainly revert it and see if that fixes things.
Dumitru Daniliuc
Comment 12 2010-03-06 00:50:35 PST
I don't see how r55593 could've made this test fail, since it doesn't change any worker-related code, and the test doesn't seem to have any DB-related code. If you need somebody to take a look at that test, I could do it on Monday. But please don't rollback r55593 only because of this test -- sometimes the test is bad, not the patch (see fast/frames/sandboxed-iframe-storage.html in r55593).
Andrew Wilson
Comment 13 2010-03-06 08:40:48 PST
This test does a bunch of work (sends thousands of messages to a port that is being cloned). Looking at the hang report, it *looks* like work is still being done, as expected - so I'm wondering if maybe a patch just introduced a performance regression that's causing this test to time out now. I don't know if the test is necessarily broken, just too slow to complete now?
Andrew Wilson
Comment 14 2010-03-06 18:19:17 PST
(In reply to comment #13) > This test does a bunch of work (sends thousands of messages to a port that is > being cloned). > > Looking at the hang report, it *looks* like work is still being done, as > expected - so I'm wondering if maybe a patch just introduced a performance > regression that's causing this test to time out now. I don't know if the test > is necessarily broken, just too slow to complete now? It seems that I'm able to repro this locally. I'l investigate (at the very least I can walk back to see if r55593 is really at fault).
Andrew Wilson
Comment 15 2010-03-06 21:50:20 PST
OK, this test times out repeatedly when I'm synced to 55593 (about 30% of the time). When I'm synced to 55592, I was able to run it 100 times without any timeouts. So it must be r55593. Assigning to Dumi to investigate further - we probably want to roll r55593 back in the meantime.
Andrew Wilson
Comment 16 2010-03-06 22:12:39 PST
(In reply to comment #15) > Assigning to Dumi to investigate further - we probably want to roll r55593 back > in the meantime. OK, the problem is the change to Document.postTask() in r55593: http://trac.webkit.org/changeset/55593/trunk/WebCore/dom/Document.cpp Unfortunately, there doesn't seem to be any documentation either in the file or in the ChangeLog describing why we changed the behavior of postTask(), but when I revert that change the test passes again. Adam/Dumi - any insight here as to why we changed the Document.postTask() behavior? If all tests pass without that change, I'll revert just that file and you guys can investigate re-introducing that change later.
Dmitry Titov
Comment 17 2010-03-06 23:13:31 PST
I confirm reverting just a Document::postTask change fixes the test back. Dumi is the best to answer, but basically the reason for postTask change was that it did not preserve the order in which tasks were posted, if they were posted from different threads. Dumi contacted me offline and I was worried about this because it would change timing in some worker cases and break them, but couldn't see what could be actually broken. We should revert r55593 and investigate. It's unknown what else it'll break, and now we have a nice repro to work with. I will revert it in a few hours and reinstate the test in a few hours if nobody comes up with a fix.
Andrew Wilson
Comment 18 2010-03-07 08:06:47 PST
(In reply to comment #17) > I confirm reverting just a Document::postTask change fixes the test back. Dumi > is the best to answer, but basically the reason for postTask change was that it > did not preserve the order in which tasks were posted, if they were posted from > different threads. In any case, reverting postTask() breaks open-database-creation-callback.html, so it's not an option to just revert that one method - we have to revert the entire CL.
Dmitry Titov
Comment 20 2010-03-07 09:01:19 PST
Note You need to log in before you can comment on or make changes to this bug.