Bug 106838

Summary: kill whole lighttpd process tree for chromium win
Product: WebKit Reporter: Dirk Pranke <dpranke>
Component: Tools / TestsAssignee: Dirk Pranke <dpranke>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, cmp, eric, kerz, tony, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
patch for landing none

Description Dirk Pranke 2013-01-14 16:04:04 PST
This is a speculative bug fix for http://code.google.com/p/chromium/issues/detail?id=169530 . My guess is that while we are able to kill the initial lighttp process, some child process is getting hung (as evidenced by a bunch of tests timing out) and killing the parent outright doesn't kill the children.
Comment 1 Dirk Pranke 2013-01-14 16:32:55 PST
Created attachment 182658 [details]
Patch
Comment 2 Dirk Pranke 2013-01-14 16:33:48 PST
Tony, what do you think of this? I can add lighttpd.exe to the list of processes to kill as well, but this seems like a "more correct" change.
Comment 3 Tony Chang 2013-01-14 16:43:13 PST
Comment on attachment 182658 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=182658&action=review

I think the change is a fine way to prevent the bot from hanging, but the test run will still be junk because of too many timeouts.  It'll probably confuse garden-o-matic a bit.

> Tools/ChangeLog:9
> +        Land a speculative fix for lighttpd.exe hanging on some google
> +        internal win bots; I think a test is causing a httpd server child process to

I would remove internal since this is failing on the public build.chromium.org too (the Win bots used by the gardening tools).

> Tools/Scripts/webkitpy/layout_tests/servers/http_server_unittest.py:36
> +from webkitpy.thirdparty.autoinstalled import unittest2 as unittest

I would make this change separately.
Comment 4 Dirk Pranke 2013-01-14 16:52:03 PST
(In reply to comment #3)
> (From update of attachment 182658 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=182658&action=review
> 
> I think the change is a fine way to prevent the bot from hanging, but the test run will still be junk because of too many timeouts.  It'll probably confuse garden-o-matic a bit.
>

Yeah, but that's no worse than any other too-many-timeouts build run failure, right?

Note for the record that at the moment it's not clear what's causing the too many timeouts, possibly it's a server bug caused by a particular test, or possibly its the parallelism in the test harness ...
 
> > Tools/ChangeLog:9
> > +        Land a speculative fix for lighttpd.exe hanging on some google
> > +        internal win bots; I think a test is causing a httpd server child process to
> 
> I would remove internal since this is failing on the public build.chromium.org too (the Win bots used by the gardening tools).
> 

It is? I looked earlier today and didn't see any hangs on either the XP or Win7 canaries. I thought I saw one report of a hang earlier on a webkit.org bot, but I didn't see anyting recent there, either.

At any rate, I'm happy to remove the comment.

> > Tools/Scripts/webkitpy/layout_tests/servers/http_server_unittest.py:36
> > +from webkitpy.thirdparty.autoinstalled import unittest2 as unittest
> 
> I would make this change separately.

This was needed for the self.assertIn() method I'm using in the test (since that's new in 2.7 and we need unittest2 for 2.6 compatibility). Does that make this okay or do you want me to switch to a 2.6-compatible method like assertEquals() ?
Comment 5 Tony Chang 2013-01-14 17:00:03 PST
(In reply to comment #4)
> (In reply to comment #3)
> > > Tools/ChangeLog:9
> > > +        Land a speculative fix for lighttpd.exe hanging on some google
> > > +        internal win bots; I think a test is causing a httpd server child process to
> > 
> > I would remove internal since this is failing on the public build.chromium.org too (the Win bots used by the gardening tools).
> > 
> 
> It is? I looked earlier today and didn't see any hangs on either the XP or Win7 canaries. I thought I saw one report of a hang earlier on a webkit.org bot, but I didn't see anyting recent there, either.

WebKit Win7 has been hung since 5am and WebKit Win7 (dbg)(2) has been hung for days.
http://build.chromium.org/p/chromium.webkit/waterfall?show=WebKit%20Win7&show=WebKit%20Win7%20(dbg)(2)

The dbg hang might not be the same. I don't see lots of tests timing out.

> > > Tools/Scripts/webkitpy/layout_tests/servers/http_server_unittest.py:36
> > > +from webkitpy.thirdparty.autoinstalled import unittest2 as unittest
> > 
> > I would make this change separately.
> 
> This was needed for the self.assertIn() method I'm using in the test (since that's new in 2.7 and we need unittest2 for 2.6 compatibility). Does that make this okay or do you want me to switch to a 2.6-compatible method like assertEquals() ?

I would make the switch to unittest2 first, then land this patch.
Comment 6 Dirk Pranke 2013-01-14 17:02:24 PST
(In reply to comment #5)
> > It is? I looked earlier today and didn't see any hangs on either the XP or Win7 canaries. I thought I saw one report of a hang earlier on a webkit.org bot, but I didn't see anyting recent there, either.
> 
> WebKit Win7 has been hung since 5am and WebKit Win7 (dbg)(2) has been hung for days.
> http://build.chromium.org/p/chromium.webkit/waterfall?show=WebKit%20Win7&show=WebKit%20Win7%20(dbg)(2)
> 

Ah, you're right. I misread that one.
Comment 7 Dirk Pranke 2013-01-14 17:05:31 PST
Created attachment 182663 [details]
patch for landing
Comment 8 WebKit Review Bot 2013-01-14 17:42:20 PST
Comment on attachment 182663 [details]
patch for landing

Clearing flags on attachment: 182663

Committed r139697: <http://trac.webkit.org/changeset/139697>
Comment 9 WebKit Review Bot 2013-01-14 17:42:24 PST
All reviewed patches have been landed.  Closing bug.
Comment 10 Tony Chang 2013-01-15 11:21:10 PST
Comment on attachment 182663 [details]
patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=182663&action=review

> Tools/Scripts/webkitpy/layout_tests/servers/http_server.py:224
> +                self._executive.run_command(["taskkill.exe", "/f", "/t", self._pid], error_handler=self._executive.ignore_error)

This command is wrong:
D:\src\src>taskkill.exe /f /t 4608
ERROR: Invalid argument/option - '4608'.
Type "TASKKILL /?" for usage.

It's missing a /pid before the number.
Comment 11 Eric Seidel (no email) 2013-01-15 11:25:05 PST
Executive has a kill() function which does taskkill.exe on windows iirc.
Comment 12 Eric Seidel (no email) 2013-01-15 11:25:35 PST
(In reply to comment #11)
> Executive has a kill() function which does taskkill.exe on windows iirc.

I see the patch now.  nevermind.
Comment 13 Tony Chang 2013-01-15 11:37:16 PST
Fixed in http://trac.webkit.org/changeset/139764 and tested on my Windows machine using new-run-webkit-httpd --server {start,stop}.  I'll fix all the hung Chromium Windows bots :(
Comment 14 Dirk Pranke 2013-01-15 12:40:29 PST
whoops. sigh :(.