Bug 53220

Summary: back-during-onload-hung-page.php causes Chromium WebKit bot to fail
Product: WebKit Reporter: Charles Reis <creis>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: eric, mihaip, ojan, rniwa, tony
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows 7   
Attachments:
Description Flags
Patch none

Description Charles Reis 2011-01-26 19:49:06 PST
One of the Chromium WebKit bots is frequently failing because the PHP process for back-during-onload-hung-page.php doesn't exit for over 600 seconds.  The tests that rely on this file are passing, but they just need it not to respond during the test.  They would still pass if we reduced the timeout from 1000 seconds to something more reasonable, like 60.

URL to the bot:
http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Win%20(deps)
Comment 1 Charles Reis 2011-01-26 19:51:24 PST
Created attachment 80290 [details]
Patch
Comment 2 Eric Seidel (no email) 2011-01-26 22:33:53 PST
I take it we don't have code in run-webkit-tests/new-run-webkit-tests to kill run-away http servers (or in this case php processes?)  Should we?
Comment 3 Charles Reis 2011-01-27 09:57:59 PST
(In reply to comment #2)
> I take it we don't have code in run-webkit-tests/new-run-webkit-tests to kill run-away http servers (or in this case php processes?)  Should we?

I don't know much about those scripts, but they appear to try to kill the process but fail in this case.  Here's one of the error messages from the bot, which looks like it tried and failed to kill the PHP process (according to Nicolas, who found out which process it was after the fact):

command timed out: 600 seconds without output, killing pid 19192
SIGKILL failed to kill process
using fake rc=-1
program finished with exit code -1

remoteFailed: [Failure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process
]


(From: http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Win%20(deps)/builds/334/steps/webkit_tests/logs/stdio)

Maybe we just need a more effective approach for killing such processes?  We only see the problem on one of the bots, so it might be specific to something in the config.
Comment 4 Ryosuke Niwa 2011-01-27 11:12:02 PST
Comment on attachment 80290 [details]
Patch

Clearing flags on attachment: 80290

Committed r76816: <http://trac.webkit.org/changeset/76816>
Comment 5 Ryosuke Niwa 2011-01-27 11:12:07 PST
All reviewed patches have been landed.  Closing bug.
Comment 6 Tony Chang 2011-01-27 11:54:31 PST
(In reply to comment #3)
> (In reply to comment #2)
> > I take it we don't have code in run-webkit-tests/new-run-webkit-tests to kill run-away http servers (or in this case php processes?)  Should we?
> 
> I don't know much about those scripts, but they appear to try to kill the process but fail in this case.  Here's one of the error messages from the bot, which looks like it tried and failed to kill the PHP process (according to Nicolas, who found out which process it was after the fact):
> 
> command timed out: 600 seconds without output, killing pid 19192
> SIGKILL failed to kill process
> using fake rc=-1
> program finished with exit code -1

Maybe I misunderstand, but it looks like the buildbot process is trying to kill the NRWT process.  Maybe it's failing to kill the NRWT because it's holding on to the lighttpd process which holds on to the php process?  I bet it wouldn't be too hard to add a "kill all php.exe processes" when stopping the httpd, but it might cause collateral damage.

I bet apache handles this better.  Another reason to try and switch to apache (although I'm not volunteering).

Nice detective work!