Bug 64898

Summary: Make kill-old-processes kill httpd on mac
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: New BugsAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, abecsi, aroben, loki, ojan, ossy, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch none

Eric Seidel (no email)
Reported 2011-07-20 15:52:27 PDT
Make kill-old-processes kill httpd on mac
Attachments
Patch (7.11 KB, patch)
2011-07-20 15:53 PDT, Eric Seidel (no email)
no flags
Eric Seidel (no email)
Comment 1 2011-07-20 15:53:38 PDT
Eric Seidel (no email)
Comment 2 2011-07-20 15:55:05 PDT
Snow Leopard is stuck again: http://build.webkit.org/builders/SnowLeopard%20Intel%20Release%20%28Tests%29/builds/31595/steps/layout-test/logs/stdio There was a typo in http_lock.py earlier this afternoon, which was shortly corrected, but I think it left an httpd process running without the corresponding lock file. I believe the bots will be more robust if we just kill httpd as an "old process" like how windows does. We can remove this line if we ever believe that NRWT's locking is bulletproof.
Adam Barth
Comment 3 2011-07-20 15:56:34 PDT
Comment on attachment 101522 [details] Patch Is this going to cause a problem for folks who run multiple slaves on the same box?
Adam Barth
Comment 4 2011-07-20 15:57:14 PDT
Comment on attachment 101522 [details] Patch IMHO, this whole locking business isn't worth the hassle. We shouldn't support running multiple instances of the tests on the same machine at the same time.
Eric Seidel (no email)
Comment 5 2011-07-20 15:58:12 PDT
(In reply to comment #3) > (From update of attachment 101522 [details]) > Is this going to cause a problem for folks who run multiple slaves on the same box? The only people who do that currently are the Qt bots, I believe. But yes, it would. Then again killing "DumpRenderTree" (which the script already does) would do that too, so there must be no mac bots running multiple copies of RWT at this time.
WebKit Review Bot
Comment 6 2011-07-20 16:28:39 PDT
Comment on attachment 101522 [details] Patch Clearing flags on attachment: 101522 Committed r91421: <http://trac.webkit.org/changeset/91421>
WebKit Review Bot
Comment 7 2011-07-20 16:28:43 PDT
All reviewed patches have been landed. Closing bug.
Csaba Osztrogonác
Comment 8 2011-07-26 07:40:10 PDT
(In reply to comment #4) > (From update of attachment 101522 [details]) > IMHO, this whole locking business isn't worth the hassle. We shouldn't support running multiple instances of the tests on the same machine at the same time. We should support running multiple instances of RWT, because we don't have separated physical machines for all bots. We run 8 tester bots on 4 machines. I hate this locking thing, but the root of the problem is the hard coded TCP port numbers into layout tests and expected files. That's why we can't run multiple httpd on a same machine.
Adam Barth
Comment 9 2011-07-26 10:41:45 PDT
There's a trade-off with complexity. IMHO, the complexity isn't worthwhile given the availability of cheap virtual machines.
Andras Becsi
Comment 10 2011-07-26 10:52:28 PDT
(In reply to comment #9) > There's a trade-off with complexity. IMHO, the complexity isn't worthwhile given the availability of cheap virtual machines. We already had this discussion, is this going to turn up over and over again? https://bugs.webkit.org/show_bug.cgi?id=33153#c10 Virtual machines are absolutely not cheap, they are a huge overhead when only used for running tests on them. ORWT has http locking which turned out to be really simple, but it seems that the NRWT infrastructure is getting more and more complex and is not able to do simple things ORWT did. Our whole testing infrastructure for Qt bots needs the http locking and switching to VMs and maintaning them is far more complex than fixing http locking to work correctly.
Adam Barth
Comment 11 2011-07-26 11:09:19 PDT
> We already had this discussion, is this going to turn up over and over again? > https://bugs.webkit.org/show_bug.cgi?id=33153#c10 Probably. :) In any case, I stand by what I've said above. Including this functionality in the test harness has some costs and some benefits. Whether we should support this configuration is a matter of weighing the costs against the benefits. The whole world is moving to virtual-machine-based hosting, for everything from web servers to databases to big-data computing. Fighting that trend is a losing battle.
Andras Becsi
Comment 12 2011-07-26 11:24:51 PDT
(In reply to comment #11) > > We already had this discussion, is this going to turn up over and over again? > > https://bugs.webkit.org/show_bug.cgi?id=33153#c10 > > Probably. :) > > In any case, I stand by what I've said above. Including this functionality in the test harness has some costs and some benefits. Whether we should support this configuration is a matter of weighing the costs against the benefits. > > The whole world is moving to virtual-machine-based hosting, for everything from web servers to databases to big-data computing. Fighting that trend is a losing battle. I agree in some extent but I think running layout tests still consumes much less resources than running a virtual machine with a complete linux distribution and absolutely not comparable to big-data computing. Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT, moving to VMs would make it another 5x slower. This is waste of resources. Once NRWT can reliably run tests in multiple threads, by sharding tests accordingly or by fixing the inter-test dependencies we might want to consider moving to a hypervisor based system. So rather than fighting the trends, I personally want to prevent throwing out the baby with the bath water :)
Adam Barth
Comment 13 2011-07-26 11:39:15 PDT
> Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT Really? That shouldn't be the case. If that's true, we have a bug that we need to fix.
Andras Becsi
Comment 14 2011-07-26 12:34:52 PDT
(In reply to comment #13) > > Moving to NRWT (currently running single-thread for known reasons) made layout testing almost 3x slower than it was with ORWT > > Really? That shouldn't be the case. If that's true, we have a bug that we need to fix. 3x slower was an extreme when we swithched to NRWT, but I can measure the current slownown tomorrow, which should be the half of that, if you have concerns it is caused by a bug. I think it is caused by the way NRWT works. NRWT runs the failing and flacky tests multiple times to be sure they are flacky, and aquires and releases the httpd lock for each individual http test if I'm correct. Whereas ORWT did not run any tests twice and only aquired the lock at the end of the testing session when all the http tests where run at once then the lock was released. Further more NRWT produces much more stdio output than ORWT did, which also makes our bot slower, and would make a VM a server killer because of the crappy IO of VMWare. Once running NRWT with multiple threads is reliably and reproducibly possible I think this should improve.
Adam Barth
Comment 15 2011-07-26 13:05:59 PDT
NRWT was slow when we first turned it on, and we changed a few things to make it faster. If it's still slow, please let me know and we'll make it faster. In single-child mode, NRWT should be about 5% slower than ORWT. Anything more than is something we want to fix irrespective of whether we support running multiple instances on one machine.
Andras Becsi
Comment 16 2011-07-27 07:20:46 PDT
(In reply to comment #15) > NRWT was slow when we first turned it on, and we changed a few things to make it faster. If it's still slow, please let me know and we'll make it faster. In single-child mode, NRWT should be about 5% slower than ORWT. Anything more than is something we want to fix irrespective of whether we support running multiple instances on one machine. You can see a good comparison between ORWT: http://build.webkit.sed.hu/waterfall?show=x86-32%20Linux%20Qt-4.8.x%20Release NRWT: http://build.webkit.org/waterfall?show=Qt%20Linux%20Release The first bot is still using ORWT (NRWT does not understand qt-4.8) and runs the tests with few failing tests (and 3 additionaly skipped) in approximately 700 seconds whereas the release bot runs NRWT in approximately 1100s which is almost 40% slower on average.
Adam Barth
Comment 17 2011-07-27 10:50:21 PDT
Ok. I assume these are comparable machines. I've created https://bugs.webkit.org/show_bug.cgi?id=65268 to track that issue.
Note You need to log in before you can comment on or make changes to this bug.