Bug 64898

Summary:

Make kill-old-processes kill httpd on mac

Product:

WebKit

Reporter:

Eric Seidel (no email) <eric>

Component:

New Bugs

Assignee:

Eric Seidel (no email) <eric>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

abarth, abecsi, aroben, loki, ojan, ossy, webkit.review.bot

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Attachments:

Description	Flags
Patch	none

Eric Seidel (no email)

Reported 2011-07-20 15:52:27 PDT

Make kill-old-processes kill httpd on mac

Attachments
Patch (7.11 KB, patch) 2011-07-20 15:53 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2011-07-20 15:53:38 PDT

Created attachment 101522 [details] Patch

Eric Seidel (no email)

Comment 2 2011-07-20 15:55:05 PDT

Snow Leopard is stuck again: http://build.webkit.org/builders/SnowLeopard%20Intel%20Release%20%28Tests%29/builds/31595/steps/layout-test/logs/stdio There was a typo in http_lock.py earlier this afternoon, which was shortly corrected, but I think it left an httpd process running without the corresponding lock file. I believe the bots will be more robust if we just kill httpd as an "old process" like how windows does. We can remove this line if we ever believe that NRWT's locking is bulletproof.

Adam Barth

Comment 3 2011-07-20 15:56:34 PDT

Comment on attachment 101522 [details] Patch Is this going to cause a problem for folks who run multiple slaves on the same box?

Adam Barth

Comment 4 2011-07-20 15:57:14 PDT

Comment on attachment 101522 [details] Patch IMHO, this whole locking business isn't worth the hassle. We shouldn't support running multiple instances of the tests on the same machine at the same time.

Eric Seidel (no email)

Comment 5 2011-07-20 15:58:12 PDT

(In reply to comment #3) > (From update of attachment 101522 [details]) > Is this going to cause a problem for folks who run multiple slaves on the same box? The only people who do that currently are the Qt bots, I believe. But yes, it would. Then again killing "DumpRenderTree" (which the script already does) would do that too, so there must be no mac bots running multiple copies of RWT at this time.

WebKit Review Bot

Comment 6 2011-07-20 16:28:39 PDT

Comment on attachment 101522 [details] Patch Clearing flags on attachment: 101522 Committed r91421: <http://trac.webkit.org/changeset/91421>

WebKit Review Bot

Comment 7 2011-07-20 16:28:43 PDT

All reviewed patches have been landed. Closing bug.

Csaba Osztrogonác

Comment 8 2011-07-26 07:40:10 PDT

(In reply to comment #4) > (From update of attachment 101522 [details]) > IMHO, this whole locking business isn't worth the hassle. We shouldn't support running multiple instances of the tests on the same machine at the same time. We should support running multiple instances of RWT, because we don't have separated physical machines for all bots. We run 8 tester bots on 4 machines. I hate this locking thing, but the root of the problem is the hard coded TCP port numbers into layout tests and expected files. That's why we can't run multiple httpd on a same machine.

Adam Barth

Comment 9 2011-07-26 10:41:45 PDT

There's a trade-off with complexity. IMHO, the complexity isn't worthwhile given the availability of cheap virtual machines.

Andras Becsi

Comment 10 2011-07-26 10:52:28 PDT

(In reply to comment #9) > There's a trade-off with complexity. IMHO, the complexity isn't worthwhile given the availability of cheap virtual machines. We already had this discussion, is this going to turn up over and over again? https://bugs.webkit.org/show_bug.cgi?id=33153#c10 Virtual machines are absolutely not cheap, they are a huge overhead when only used for running tests on them. ORWT has http locking which turned out to be really simple, but it seems that the NRWT infrastructure is getting more and more complex and is not able to do simple things ORWT did. Our whole testing infrastructure for Qt bots needs the http locking and switching to VMs and maintaning them is far more complex than fixing http locking to work correctly.

Adam Barth

Comment 11 2011-07-26 11:09:19 PDT

> We already had this discussion, is this going to turn up over and over again? > https://bugs.webkit.org/show_bug.cgi?id=33153#c10 Probably. :) In any case, I stand by what I've said above. Including this functionality in the test harness has some costs and some benefits. Whether we should support this configuration is a matter of weighing the costs against the benefits. The whole world is moving to virtual-machine-based hosting, for everything from web servers to databases to big-data computing. Fighting that trend is a losing battle.

Andras Becsi

Comment 12 2011-07-26 11:24:51 PDT

(In reply to comment #11) > > We already had this discussion, is this going to turn up over and over again? > > https://bugs.webkit.org/show_bug.cgi?id=33153#c10 > > Probably. :) > > In any case, I stand by what I've said above. Including this functionality in the test harness has some costs and some benefits. Whether we should support this configuration is a matter of weighing the costs against the benefits. > > The whole world is moving to virtual-machine-based hosting, for everything from web servers to databases to big-data computing. Fighting that trend is a losing battle. I agree in some extent but I think running layout tests still consumes much less resources than running a virtual machine with a complete linux distribution and absolutely not comparable to big-data computing. Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT, moving to VMs would make it another 5x slower. This is waste of resources. Once NRWT can reliably run tests in multiple threads, by sharding tests accordingly or by fixing the inter-test dependencies we might want to consider moving to a hypervisor based system. So rather than fighting the trends, I personally want to prevent throwing out the baby with the bath water :)

Adam Barth

Comment 13 2011-07-26 11:39:15 PDT

> Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT Really? That shouldn't be the case. If that's true, we have a bug that we need to fix.

Andras Becsi

Comment 14 2011-07-26 12:34:52 PDT

(In reply to comment #13) > > Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT > > Really? That shouldn't be the case. If that's true, we have a bug that we need to fix. 3x slower was an extreme when we swithched to NRWT, but I can measure the current slownown tomorrow, which should be the half of that, if you have concerns it is caused by a bug. I think it is caused by the way NRWT works. NRWT runs the failing and flacky tests multiple times to be sure they are flacky, and aquires and releases the httpd lock for each individual http test if I'm correct. Whereas ORWT did not run any tests twice and only aquired the lock at the end of the testing session when all the http tests where run at once then the lock was released. Further more NRWT produces much more stdio output than ORWT did, which also makes our bot slower, and would make a VM a server killer because of the crappy IO of VMWare. Once running NRWT with multiple threads is reliably and reproducibly possible I think this should improve.

Adam Barth

Comment 15 2011-07-26 13:05:59 PDT

NRWT was slow when we first turned it on, and we changed a few things to make it faster. If it's still slow, please let me know and we'll make it faster. In single-child mode, NRWT should be about 5% slower than ORWT. Anything more than is something we want to fix irrespective of whether we support running multiple instances on one machine.

Andras Becsi

Comment 16 2011-07-27 07:20:46 PDT

(In reply to comment #15) > NRWT was slow when we first turned it on, and we changed a few things to make it faster. If it's still slow, please let me know and we'll make it faster. In single-child mode, NRWT should be about 5% slower than ORWT. Anything more than is something we want to fix irrespective of whether we support running multiple instances on one machine. You can see a good comparison between ORWT: http://build.webkit.sed.hu/waterfall?show=x86-32%20Linux%20Qt-4.8.x%20Release NRWT: http://build.webkit.org/waterfall?show=Qt%20Linux%20Release The first bot is still using ORWT (NRWT does not understand qt-4.8) and runs the tests with few failing tests (and 3 additionaly skipped) in approximately 700 seconds whereas the release bot runs NRWT in approximately 1100s which is almost 40% slower on average.

Adam Barth

Comment 17 2011-07-27 10:50:21 PDT

Ok. I assume these are comparable machines. I've created https://bugs.webkit.org/show_bug.cgi?id=65268 to track that issue.

Note You need to log in before you can comment on or make changes to this bug.