Bug 64898 - Make kill-old-processes kill httpd on mac
Summary: Make kill-old-processes kill httpd on mac
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-20 15:52 PDT by Eric Seidel (no email)
Modified: 2011-07-27 10:50 PDT (History)
7 users (show)

See Also:


Attachments
Patch (7.11 KB, patch)
2011-07-20 15:53 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2011-07-20 15:52:27 PDT
Make kill-old-processes kill httpd on mac
Comment 1 Eric Seidel (no email) 2011-07-20 15:53:38 PDT
Created attachment 101522 [details]
Patch
Comment 2 Eric Seidel (no email) 2011-07-20 15:55:05 PDT
Snow Leopard is stuck again:
http://build.webkit.org/builders/SnowLeopard%20Intel%20Release%20%28Tests%29/builds/31595/steps/layout-test/logs/stdio

There was a typo in http_lock.py earlier this afternoon, which was shortly corrected, but I think it left an httpd process running without the corresponding lock file.

I believe the bots will be more robust if we just kill httpd as an "old process" like how windows does.

We can remove this line if we ever believe that NRWT's locking is bulletproof.
Comment 3 Adam Barth 2011-07-20 15:56:34 PDT
Comment on attachment 101522 [details]
Patch

Is this going to cause a problem for folks who run multiple slaves on the same box?
Comment 4 Adam Barth 2011-07-20 15:57:14 PDT
Comment on attachment 101522 [details]
Patch

IMHO, this whole locking business isn't worth the hassle.  We shouldn't support running multiple instances of the tests on the same machine at the same time.
Comment 5 Eric Seidel (no email) 2011-07-20 15:58:12 PDT
(In reply to comment #3)
> (From update of attachment 101522 [details])
> Is this going to cause a problem for folks who run multiple slaves on the same box?

The only people who do that currently are the Qt bots, I believe.

But yes, it would.  Then again killing "DumpRenderTree" (which the script already does) would do that too, so there must be no mac bots running multiple copies of RWT at this time.
Comment 6 WebKit Review Bot 2011-07-20 16:28:39 PDT
Comment on attachment 101522 [details]
Patch

Clearing flags on attachment: 101522

Committed r91421: <http://trac.webkit.org/changeset/91421>
Comment 7 WebKit Review Bot 2011-07-20 16:28:43 PDT
All reviewed patches have been landed.  Closing bug.
Comment 8 Csaba Osztrogonác 2011-07-26 07:40:10 PDT
(In reply to comment #4)
> (From update of attachment 101522 [details])
> IMHO, this whole locking business isn't worth the hassle.  We shouldn't support running multiple instances of the tests on the same machine at the same time.

We should support running multiple instances of RWT, because we don't have separated physical machines for all bots. We run 8 tester bots on 4 machines.

I hate this locking thing, but the root of the problem is the hard coded TCP port numbers into layout tests and expected files. That's why we can't run multiple httpd on a same machine.
Comment 9 Adam Barth 2011-07-26 10:41:45 PDT
There's a trade-off with complexity.  IMHO, the complexity isn't worthwhile given the availability of cheap virtual machines.
Comment 10 Andras Becsi 2011-07-26 10:52:28 PDT
(In reply to comment #9)
> There's a trade-off with complexity.  IMHO, the complexity isn't worthwhile given the availability of cheap virtual machines.

We already had this discussion, is this going to turn up over and over again?
https://bugs.webkit.org/show_bug.cgi?id=33153#c10

Virtual machines are absolutely not cheap, they are a huge overhead when only used for running tests on them. ORWT has http locking which turned out to be really simple, but it seems that the NRWT infrastructure is getting more and more complex and is not able to do simple things ORWT did.

Our whole testing infrastructure for Qt bots needs the http locking and switching to VMs and maintaning them is far more complex than fixing http locking to work correctly.
Comment 11 Adam Barth 2011-07-26 11:09:19 PDT
> We already had this discussion, is this going to turn up over and over again?
> https://bugs.webkit.org/show_bug.cgi?id=33153#c10

Probably.  :)

In any case, I stand by what I've said above.  Including this functionality in the test harness has some costs and some benefits.  Whether we should support this configuration is a matter of weighing the costs against the benefits.

The whole world is moving to virtual-machine-based hosting, for everything from web servers to databases to big-data computing.  Fighting that trend is a losing battle.
Comment 12 Andras Becsi 2011-07-26 11:24:51 PDT
(In reply to comment #11)
> > We already had this discussion, is this going to turn up over and over again?
> > https://bugs.webkit.org/show_bug.cgi?id=33153#c10
> 
> Probably.  :)
> 
> In any case, I stand by what I've said above.  Including this functionality in the test harness has some costs and some benefits.  Whether we should support this configuration is a matter of weighing the costs against the benefits.
>
> The whole world is moving to virtual-machine-based hosting, for everything from web servers to databases to big-data computing.  Fighting that trend is a losing battle.

I agree in some extent but I think running layout tests still consumes much less resources than running a virtual machine with a complete linux distribution and absolutely not comparable to big-data computing.
Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT, moving to VMs would make it another 5x slower. This is waste of resources.
Once NRWT can reliably run tests in multiple threads, by sharding tests accordingly or by fixing the inter-test dependencies we might want to consider moving to a hypervisor based system.

So rather than fighting the trends, I personally want to prevent throwing out the baby with the bath water :)
Comment 13 Adam Barth 2011-07-26 11:39:15 PDT
> Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT

Really?  That shouldn't be the case.  If that's true, we have a bug that we need to fix.
Comment 14 Andras Becsi 2011-07-26 12:34:52 PDT
(In reply to comment #13)
> > Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT
> 
> Really?  That shouldn't be the case.  If that's true, we have a bug that we need to fix.

3x slower was an extreme when we swithched to NRWT, but I can measure the current slownown tomorrow, which should be the half of that, if you have concerns it is caused by a bug.
I think it is caused by the way NRWT works.
NRWT runs the failing and flacky tests multiple times to be sure they are flacky, and aquires and releases the httpd lock for each individual http test if I'm correct. Whereas ORWT did not run any tests twice and only aquired the lock at the end of the testing session when all the http tests where run at once then the lock was released. Further more NRWT produces much more stdio output than ORWT did, which also makes our bot slower, and would make a VM a server killer because of the crappy IO of VMWare. Once running NRWT with multiple threads is reliably and reproducibly possible I think this should improve.
Comment 15 Adam Barth 2011-07-26 13:05:59 PDT
NRWT was slow when we first turned it on, and we changed a few things to make it faster.  If it's still slow, please let me know and we'll make it faster.  In single-child mode, NRWT should be about 5% slower than ORWT.  Anything more than is something we want to fix irrespective of whether we support running multiple instances on one machine.
Comment 16 Andras Becsi 2011-07-27 07:20:46 PDT
(In reply to comment #15)
> NRWT was slow when we first turned it on, and we changed a few things to make it faster.  If it's still slow, please let me know and we'll make it faster.  In single-child mode, NRWT should be about 5% slower than ORWT.  Anything more than is something we want to fix irrespective of whether we support running multiple instances on one machine.

You can see a good comparison between

ORWT: http://build.webkit.sed.hu/waterfall?show=x86-32%20Linux%20Qt-4.8.x%20Release

NRWT: http://build.webkit.org/waterfall?show=Qt%20Linux%20Release

The first bot is still using ORWT (NRWT does not understand qt-4.8) and runs the tests with few failing tests (and 3 additionaly skipped) in approximately 700 seconds whereas the release bot runs NRWT in approximately 1100s which is almost 40% slower on average.
Comment 17 Adam Barth 2011-07-27 10:50:21 PDT
Ok.  I assume these are comparable machines.  I've created https://bugs.webkit.org/show_bug.cgi?id=65268 to track that issue.