Bug 35819 - REGRESSION(55593?): fast/workers/worker-cloneport.html is timing out on Leopard
Summary: REGRESSION(55593?): fast/workers/worker-cloneport.html is timing out on Leopard
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Dumitru Daniliuc
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-05 16:28 PST by Eric Seidel (no email)
Modified: 2010-03-08 11:19 PST (History)
6 users (show)

See Also:


Attachments
Hang report from Leopard Commit Bot (91.86 KB, text/plain)
2010-03-05 16:41 PST, Eric Seidel (no email)
no flags Details
Skip the test until this bug can be looked at (1.09 KB, patch)
2010-03-05 17:07 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2010-03-05 16:28:45 PST
fast/workers/worker-cloneport.html is timing out on Leopard Commit Bot

Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests
Testing 12449 test cases.
fast/workers/worker-cloneport.html -> timed out
Sampling process 89159 for 10 seconds with 10 milliseconds of run time between samples
Sampling completed, processing symbols...
Sample analysis of process 89159 written to file /Users/eseidel/Library/Logs/DumpRenderTree/HangReport.txt

Exiting early after 1 failures. 8774 tests run.
343.58s total testing time

8773 test cases (99%) succeeded
1 test case (<1%) timed out
5 test cases (<1%) had stderr output
Unabled to successfully build and test

I've not seen this time out on the bots recently, but it seems to be consistently failing for the commit-bot.  This is a new failure from today as far as I can tell.
Comment 1 Eric Seidel (no email) 2010-03-05 16:29:23 PST
Qt has also seen crashes related to this test.  bug 33653.
Comment 2 Eric Seidel (no email) 2010-03-05 16:40:17 PST
I'm forcing the commit-queue bot to do a clean build.

We may just have to skip this test for the time being since:
1.  This is definitely a recent regression.
2.  There are currently 18 patches blocked on this bug.
Comment 3 Eric Seidel (no email) 2010-03-05 16:41:12 PST
Created attachment 50142 [details]
Hang report from Leopard Commit Bot
Comment 4 Eric Seidel (no email) 2010-03-05 16:43:01 PST
I can't really tell what's going on from that hang report.  Almost the entire time is spent in:
                          900 WebCore::ScriptExecutionContext::dispatchMessagePortEvents()
Comment 5 Eric Seidel (no email) 2010-03-05 16:55:03 PST
Ha!  This is timing out on the bots too!
http://build.webkit.org/results/Leopard%20Intel%20Debug%20(Tests)/r55600%20(11106)/results.html
Comment 6 Eric Seidel (no email) 2010-03-05 16:57:53 PST
http://build.webkit.org/results/Leopard%20Intel%20Debug%20(Tests)/r55593%20(11102)/results.html
is the first timeout I could find on the bots.

http://trac.webkit.org/changeset/55593
Comment 7 Eric Seidel (no email) 2010-03-05 17:07:33 PST
Created attachment 50143 [details]
Skip the test until this bug can be looked at
Comment 8 Eric Seidel (no email) 2010-03-05 17:09:19 PST
Committed r55604: <http://trac.webkit.org/changeset/55604>
Comment 9 Eric Seidel (no email) 2010-03-05 17:10:19 PST
I forgot to pass --no-close to "webkit-patch land", sorry.
Comment 11 Eric Seidel (no email) 2010-03-05 22:57:08 PST
I'm really not sure that it's caused by r55593.  But we could certainly revert it and see if that fixes things.
Comment 12 Dumitru Daniliuc 2010-03-06 00:50:35 PST
I don't see how r55593 could've made this test fail, since it doesn't change any worker-related code, and the test doesn't seem to have any DB-related code. If you need somebody to take a look at that test, I could do it on Monday. But please don't rollback r55593 only because of this test -- sometimes the test is bad, not the patch (see fast/frames/sandboxed-iframe-storage.html in r55593).
Comment 13 Andrew Wilson 2010-03-06 08:40:48 PST
This test does a bunch of work (sends thousands of messages to a port that is being cloned).

Looking at the hang report, it *looks* like work is still being done, as expected - so I'm wondering if maybe a patch just introduced a performance regression that's causing this test to time out now. I don't know if the test is necessarily broken, just too slow to complete now?
Comment 14 Andrew Wilson 2010-03-06 18:19:17 PST
(In reply to comment #13)
> This test does a bunch of work (sends thousands of messages to a port that is
> being cloned).
> 
> Looking at the hang report, it *looks* like work is still being done, as
> expected - so I'm wondering if maybe a patch just introduced a performance
> regression that's causing this test to time out now. I don't know if the test
> is necessarily broken, just too slow to complete now?

It seems that I'm able to repro this locally. I'l investigate (at the very least I can walk back to see if r55593 is really at fault).
Comment 15 Andrew Wilson 2010-03-06 21:50:20 PST
OK, this test times out repeatedly when I'm synced to 55593 (about 30% of the time).

When I'm synced to 55592, I was able to run it 100 times without any timeouts. So it must be r55593.

Assigning to Dumi to investigate further - we probably want to roll r55593 back in the meantime.
Comment 16 Andrew Wilson 2010-03-06 22:12:39 PST
(In reply to comment #15)

> Assigning to Dumi to investigate further - we probably want to roll r55593 back
> in the meantime.

OK, the problem is the change to Document.postTask() in r55593:

http://trac.webkit.org/changeset/55593/trunk/WebCore/dom/Document.cpp

Unfortunately, there doesn't seem to be any documentation either in the file or in the ChangeLog describing why we changed the behavior of postTask(), but when I revert that change the test passes again.

Adam/Dumi - any insight here as to why we changed the Document.postTask() behavior? If all tests pass without that change, I'll revert just that file and you guys can investigate re-introducing that change later.
Comment 17 Dmitry Titov 2010-03-06 23:13:31 PST
I confirm reverting just a Document::postTask change fixes the test back. Dumi is the best to answer, but basically the reason for postTask change was that it did not preserve the order in which tasks were posted, if they were posted from different threads.
Dumi contacted me offline and I was worried about this because it would change timing in some worker cases and break them, but couldn't see what could be actually broken. We should revert r55593 and investigate. It's unknown what else it'll break, and now we have a nice repro to work with. I will revert it in a few hours and reinstate the test in a few hours if nobody comes up with a fix.
Comment 18 Andrew Wilson 2010-03-07 08:06:47 PST
(In reply to comment #17)
> I confirm reverting just a Document::postTask change fixes the test back. Dumi
> is the best to answer, but basically the reason for postTask change was that it
> did not preserve the order in which tasks were posted, if they were posted from
> different threads.

In any case, reverting postTask() breaks open-database-creation-callback.html, so it's not an option to just revert that one method - we have to revert the entire CL.
Comment 20 Dmitry Titov 2010-03-07 09:01:19 PST
Revert of r55593 landed: http://trac.webkit.org/changeset/55635