Bug 51572 - new-run-webkit-tests: shut up and exit when all the threads are wedged
Summary: new-run-webkit-tests: shut up and exit when all the threads are wedged
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Dirk Pranke
URL:
Keywords:
: 46320 46833 (view as bug list)
Depends on: 51600
Blocks:
  Show dependency treegraph
 
Reported: 2010-12-23 15:57 PST by Dirk Pranke
Modified: 2011-02-17 17:34 PST (History)
8 users (show)

See Also:


Attachments
Patch (4.17 KB, patch)
2010-12-23 16:00 PST, Dirk Pranke
no flags Details | Formatted Diff | Diff
use MockTestRunner() (4.27 KB, patch)
2010-12-23 18:55 PST, Dirk Pranke
no flags Details | Formatted Diff | Diff
update w/ fix in r76073, try again (5.02 KB, patch)
2011-02-02 18:18 PST, Dirk Pranke
no flags Details | Formatted Diff | Diff
remove duplicated code (5.78 KB, patch)
2011-02-03 19:12 PST, Dirk Pranke
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Pranke 2010-12-23 15:57:18 PST
shut up and exit when all the threads are wedged
Comment 1 Dirk Pranke 2010-12-23 16:00:18 PST
Created attachment 77380 [details]
Patch
Comment 2 Mihai Parparita 2010-12-23 17:06:20 PST
Comment on attachment 77380 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=77380&action=review

> Tools/Scripts/webkitpy/layout_tests/layout_package/message_broker_unittest.py:148
> +    def update(self):

There's a MockTestRunner in this file, can you use that?
Comment 3 Dirk Pranke 2010-12-23 17:24:54 PST
(In reply to comment #2)
> (From update of attachment 77380 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=77380&action=review
> 
> > Tools/Scripts/webkitpy/layout_tests/layout_package/message_broker_unittest.py:148
> > +    def update(self):
> 
> There's a MockTestRunner in this file, can you use that?

Ah, good point. That's actually in a different class, but there's no compelling need to have two separate classes here, so I'll combine them.
Comment 4 Dirk Pranke 2010-12-23 18:55:39 PST
Created attachment 77397 [details]
use MockTestRunner()
Comment 5 Dirk Pranke 2010-12-23 18:57:01 PST
Adding some more reviewers ... one of you want to review and/or rubber-stamp this? It's very small and should help the bots out (and anyone who runs the layout tests manually ...)
Comment 6 Adam Barth 2010-12-23 18:59:52 PST
Comment on attachment 77397 [details]
use MockTestRunner()

I love the name of this bug.
Comment 7 Eric Seidel (no email) 2010-12-23 19:00:21 PST
Comment on attachment 77397 [details]
use MockTestRunner()

I'm not sure I understand what this does.
Comment 8 Eric Seidel (no email) 2010-12-23 19:01:07 PST
Comment on attachment 77397 [details]
use MockTestRunner()

I think the cq will try to land this if it has a cq+, even if it still has r?  (and will fail).  Setting to cq? to prevent that.
Comment 9 Kenneth Russell 2010-12-23 19:41:12 PST
Comment on attachment 77397 [details]
use MockTestRunner()

I gather that this causes us to log only once per wedged thread and to abort if all threads get wedged. Two questions: (1) if this happens, is bailing out the right thing to do? Or should some other action (like spawning more threads) be taken? (2) Will the warning and early exit be properly handled as a failure?
Comment 10 Dirk Pranke 2010-12-23 20:36:48 PST
(In reply to comment #9)
> (From update of attachment 77397 [details])
> I gather that this causes us to log only once per wedged thread and to abort if all threads get wedged. 

Ken's analysis is right. Right now if we get wedged we will log very frequently and run for a very long time before eventually either getting killed by the bot or we make enough progress to exit (although I'm not sure why the bot doesn't always get stuck). This causes things to fail fast.

 > Two questions: (1) if this happens, is bailing out the right thing to do? Or should some other action (like spawning more threads) be taken?

If threads get wedged, then the process will never exit cleanly and the test run will be considered a failure. In addition, one or more tests will never complete. Given the way the code currently handles sharding the tests, large numbers of tests might not get to run.

This is by no means a real fix, it is just intended to make the behavior less bad. The fix is the other set of patches I've been working on.

>  (2) Will the warning and early exit be properly handled as a failure?

Yes.
Comment 11 Kenneth Russell 2010-12-23 21:15:43 PST
Comment on attachment 77397 [details]
use MockTestRunner()

Even if this isn't the final fix it's still a major improvement over the current situation. r=me
Comment 12 WebKit Commit Bot 2010-12-24 04:12:44 PST
Comment on attachment 77397 [details]
use MockTestRunner()

Clearing flags on attachment: 77397

Committed r74632: <http://trac.webkit.org/changeset/74632>
Comment 13 WebKit Commit Bot 2010-12-24 04:12:52 PST
All reviewed patches have been landed.  Closing bug.
Comment 14 Philippe Normand 2010-12-24 13:45:39 PST
Sorry but I had to rollout this patch. The unittest makes test-webkitpy hang forever on the GTK bots.

See also bug 51600
Comment 15 Dirk Pranke 2011-01-04 14:37:54 PST
Sigh. Sorry for the brokenness - I have no idea what's different about your config that is causing it to hang, but Python will wait for every spawned thread to exit before exiting the process, so that's probably what's going on here.

What version of Linux runs on the GTK bots?
Comment 16 Eric Seidel (no email) 2011-01-05 19:03:58 PST
I just saw a commit-queue node hang in the python tests 9 hours ago, so unless this was re-landed, this was not the only cause.
Comment 17 Philippe Normand 2011-01-10 07:04:40 PST
(In reply to comment #15)
> Sigh. Sorry for the brokenness - I have no idea what's different about your config that is causing it to hang, but Python will wait for every spawned thread to exit before exiting the process, so that's probably what's going on here.
> 
> What version of Linux runs on the GTK bots?

I could reproduce the issue on my laptop running Debian Sid. The GTK Debug bots run on Debian Sid as well, IIRC.
Comment 18 Dirk Pranke 2011-01-10 12:29:41 PST
Hmm. I don't know that I have an easy way to get an instance of Debian Sid to test this on. After I poke into this some more, and if it works for me on the Ubuntu instances I have, I may need your help to figure this one out.
Comment 19 Dirk Pranke 2011-02-02 18:18:44 PST
Created attachment 81020 [details]
update w/ fix in r76073, try again
Comment 20 Kenneth Russell 2011-02-02 18:54:36 PST
Comment on attachment 81020 [details]
update w/ fix in r76073, try again

OK.
Comment 21 Dirk Pranke 2011-02-03 19:12:19 PST
Created attachment 81173 [details]
remove duplicated code
Comment 22 Dirk Pranke 2011-02-03 19:12:45 PST
Committed r77586: <http://trac.webkit.org/changeset/77586>
Comment 23 Dirk Pranke 2011-02-03 20:54:05 PST
Committed r77595: <http://trac.webkit.org/changeset/77595>
Comment 24 Dirk Pranke 2011-02-17 17:30:19 PST
*** Bug 46833 has been marked as a duplicate of this bug. ***
Comment 25 Dirk Pranke 2011-02-17 17:34:02 PST
*** Bug 46320 has been marked as a duplicate of this bug. ***