Bug 136722

Summary: Blink Merge: Remove 'http_lock' code from webkitpy
Product: WebKit Reporter: Brent Fulgham <bfulgham>
Component: Tools / TestsAssignee: Brent Fulgham <bfulgham>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, commit-queue, glenn, ossy
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=139264
Bug Depends on: 136711    
Bug Blocks: 138958    
Attachments:
Description Flags
WIP patch
none
Patch none

Brent Fulgham
Reported 2014-09-10 16:25:06 PDT
Attachments
WIP patch (26.48 KB, patch)
2014-11-21 03:24 PST, Csaba Osztrogonác
no flags
Patch (30.22 KB, patch)
2014-12-09 05:03 PST, Csaba Osztrogonác
no flags
Csaba Osztrogonác
Comment 1 2014-09-11 05:46:41 PDT
Why? Did it caused any problem?
Brent Fulgham
Comment 2 2014-09-11 09:19:49 PDT
(In reply to comment #1) > Why? Did it caused any problem? From the original Blink comments: "We used to allow running run-webkit-tests concurrently and they would coordinate amongst themselves when running the http tests using file locks. This was used by the Qt port on their buildbots, but it's not clear that this was needed anywhere else. This patch removes all of that code, and deletes the generic file lock module we had as it was now only being used when parsing ASAN logs, which is only needed on mac and linux (where we can just use flock). This patch also moves starting and stopping the servers from the layout_test_runner to the manager, where we can wait to stop the servers until after retrying any tests. This keeps us from having to start the servers twice if we need to retry an HTTP test. This will speed up HTTP retries a tiny bit, but slow down the overall run-webkit-tests execution a bit in the normal case (no HTTP retries). This tradeoff seems worth it in order to simplify the logic overall." Since we no longer support the Qt port, it make sense to get rid of this complexity if it's not actively being used.
Csaba Osztrogonác
Comment 3 2014-11-20 04:24:51 PST
I know it was introduced for Qt buildbots to be able more run-webkit-tests on the same machine. Know there is no Qt port, and it seems there is no real need to be able to do it. I still think it can be useful if you have only one shared tester machine to be able run tests on a WebKit port you don't work primarily. Or if you work on two different bugs parallely, and you might would like to run a test until you wait a full test run in the background. Without this mechanism, the second RWT would kill/restart the apache started by the first RWT and tests would fail. If this useful mechanism causes problems anywhere, I prefer fixing it. If it isn't possible easily, it's OK for me to get rid of it. But in this case we would need some mechanism to warn or prevent the user to try to run more than one RWT in the same time.
Alexey Proskuryakov
Comment 4 2014-11-20 09:39:01 PST
This code is quite complicated and confusing. Even if it doesn't actively break things, it definitely makes hacking run-webkit-tests harder. Also, I'm not really sure what the benefit of running multiple run-webkit-test instances in parallel is. Why not just run them serially?
Csaba Osztrogonác
Comment 5 2014-11-20 09:54:49 PST
(In reply to comment #4) > This code is quite complicated and confusing. Even if it doesn't actively > break things, it definitely makes hacking run-webkit-tests harder. Is there a concrete issue which can be solved harder because of the existance of httpd locking? > Also, I'm not really sure what the benefit of running multiple > run-webkit-test instances in parallel is. Why not just run them serially? Here is a simple use case for it: I think that I fixed a huge bug, but I want to make sure if all tests pass. Run all tests can take at least 15-30 minutes in release more, more in debug mode. Until the tests run in the background, I'm working an other bug, maybe I would like to run some (1-2-10) tests too during debugging. Without this httpd locking mechanism, a newly started run-webkit-tests simply kills the running apache and breaks the first test run. It isn't good at all. If we stated that running more run-webkit-tests paralelly isn't supported anymore, we should prevent killing the previously started apache accidentally.
Alexey Proskuryakov
Comment 6 2014-11-20 10:17:34 PST
> Is there a concrete issue which can be solved harder because of the existance of httpd locking? I wasted 10 minutes yesterday trying to figure out how this affects what I'm doing to run http tests in parallel. And this all turned out to be unused code. > Here is a simple use case for it This case doesn't work now, because the second instance will be simply blocked waiting for the lock. > we should prevent killing the previously started apache accidentally Do you have a suggestion for how to achieve this? We need to kill runaway httpd processes (most notably when there is a forgotten run-webkit-httpd script in one of the tabs, but also for any other possible problems).
Csaba Osztrogonác
Comment 7 2014-11-20 10:45:00 PST
(In reply to comment #6) > > Is there a concrete issue which can be solved harder because of the > existance of httpd locking? > > I wasted 10 minutes yesterday trying to figure out how this affects what I'm > doing to run http tests in parallel. And this all turned out to be unused > code. As for me, I run http tests parallel long ago on EFL port and they are quit stable. It is hard coded to 1 thread in base.py, I don't know the reason, maybe there were/are issues on other ports. Running parrallel normal and locked (http + svg-perf) threads can be configured with: WEBKIT_TEST_CHILD_PROCESSES env and --child-processes WEBKIT_TEST_MAX_LOCK_SHARDS env and --max-locked-shards It is very good to fine tune parallelism or debug issues related to parallel test running. Or simple set the best and fastest setting locally. (On Linux WEBKIT_TEST_CHILD_PROCESSES is necessary for users who build with icecc on 30 threads, but run tests only on 4-8 threads) > This case doesn't work now, because the second instance will be simply > blocked waiting for the lock. It is blocked if http test running is in progress. After the last test fininshed, the lock is freed and the second instance can start running http tests. > > we should prevent killing the previously started apache accidentally > Do you have a suggestion for how to achieve this? We need to kill runaway > httpd processes (most notably when there is a forgotten run-webkit-httpd > script in one of the tabs, but also for any other possible problems). I can imagine a friendly question before killing the httpd, at least in interactive mode. It can be optional, opt-in or opt-out, it's all the same.
Csaba Osztrogonác
Comment 8 2014-11-20 10:47:08 PST
(In reply to comment #7) > WEBKIT_TEST_MAX_LOCK_SHARDS env and --max-locked-shards typo: WEBKIT_TEST_MAX_LOCK_SHARDS --> WEBKIT_TEST_MAX_LOCKED_SHARDS
Alexey Proskuryakov
Comment 9 2014-11-20 10:50:10 PST
> I don't know the reason, maybe there were/are issues on other ports. Once again, I'm talking about code complexity. This feature is extremely invasive for something that is unused. > It is blocked if http test running is in progress. But aren't they always in progress? On my machine, http tests run longer than all other tests combined.
Csaba Osztrogonác
Comment 10 2014-11-20 12:22:40 PST
(In reply to comment #9) > > I don't know the reason, maybe there were/are issues on other ports. > Once again, I'm talking about code complexity. This feature is extremely > invasive for something that is unused. I see, I don't say we shouldn't remove this complexity. But before doing it, we could increase the maximum number of locked shards in base.py: ---------------------------------------------------------------------------- def default_child_processes(self): """Return the number of DumpRenderTree instances to use for this port.""" return self._executive.cpu_count() def default_max_locked_shards(self): """Return the number of "locked" shards to run in parallel (like the http tests).""" return 1 ---------------------------------------------------------------------------- Let's start experimenting with the same numbers: def default_max_locked_shards(self): return self.default_child_processes() Note: default_child_processes() is hard coded to 2 for win, mac has a sophisticated algorithm, EFL and GTK use this default value. If all ports are happy and stable for a time with same unlocked and locked thread number, than we can remove all of these complex code paths. ;-) > > It is blocked if http test running is in progress. > But aren't they always in progress? On my machine, http tests run longer > than all other tests combined. It can be true if only 1 thread is enabled for http tests. I ran a quick test, https tests took 4.5 minutes (on 4 threads), all tests took 22.5 minutes. (EFL / Linux)
Alexey Proskuryakov
Comment 11 2014-11-20 13:42:49 PST
> But before doing it, we could increase the maximum number of locked shards in base.py I don't understand how this is related to removing the http lock. I am now actively working on making http tests parallel. That doesn't involve any experimenting, I'm fixing actual issues that prevent parallelism. But that is orthogonal to running multiple run-webkit-tests instances, because multiple instances would still fight for port 8000 and for temporary files created by PHP and Perl scripts.
Alexey Proskuryakov
Comment 12 2014-11-20 13:47:19 PST
>> But aren't they always in progress? On my machine, http tests run longer >> than all other tests combined. > It can be true if only 1 thread is enabled for http tests. HTTP tests are sequential now, there is always one thread. Trying to run more will make tests flaky, for multiple reasons that I'm fixing now. You are suggesting a new feature that hasn't been requested before we started talking about removing unused code. This doesn't strike me as a high level of interest from the community.
Csaba Osztrogonác
Comment 13 2014-11-21 03:16:00 PST
(In reply to comment #11) > > But before doing it, we could increase the maximum number of locked shards in base.py > I don't understand how this is related to removing the http lock. I thought you suggested removing the whole locking shards mechanism. It would really decrease the complexity of the code. But removing only http locking wouldn't decrease any complexity. http locking adds only 1-1 lines to this more complex locked sharding mechanism: - self._port.acquire_http_lock() before starting httpd - self._port.release_http_lock() after stopping httpd I still don't have objection against removing it, just asked to have a graceful mechanism not kill running httpds unasked and agressively. And if we are doing cleanups, go forward, and remove the whole locking shards complexity and run https tests parallel when all ports are ready for it and they are very stable. > I am now actively working on making http tests parallel. That doesn't > involve any experimenting, I'm fixing actual issues that prevent > parallelism. But that is orthogonal to running multiple run-webkit-tests > instances, because multiple instances would still fight for port 8000 and > for temporary files created by PHP and Perl scripts. As I said previously, parallel http testing is stable for long time ago, at least on EFL port. I already have WEBKIT_TEST_MAX_LOCK_SHARDS=4 in my environment for 2 years and haven't seen any flakiness related to it. Sharding guarantess that tests in the same directory aren't run parallel. Of course rare flakiness can happen if php and perl scripts are fighting for the same temporary files and full parallelism is enabled. Or if two different tests in different directories want to use the same temporary file in the same time. But this is a bug in the test infrastructure and in the tests too. To avoid all of these possible parallelism issues, we shouldn't let any test to write a hard coded file name on the filesystem. Now the run-webkit-test creates an own temporary directory for each DRT/WTR with <DumpRenderTree|WebkitTestRunner>-XXXXX . DRT/WTR know the path of this temporary directory, it can (and should) pass it to the perl/php scripts and they should use only this path for temporary files. (In reply to comment #12) > >> But aren't they always in progress? On my machine, http tests run longer > >> than all other tests combined. > > It can be true if only 1 thread is enabled for http tests. > HTTP tests are sequential now, there is always one thread. Trying to run > more will make tests flaky, for multiple reasons that I'm fixing now. Just out of curiosity, I tried parallel running on Mac too. (On a week old build, not checked trunk after your fixes.) I didn't get any flakiness with 4 parallel threads and http tests ran in 2.5 minutes. With 8 threads: 2 minutes, no flakiness. > You are suggesting a new feature that hasn't been requested before we > started talking about removing unused code. This doesn't strike me as a high > level of interest from the community. I suggested a new feature (not to kill httpd agressively without asking) after you stated http locking unused code. It is unused on Mac, because you have overriden the acquire_http_lock/release_http_lock methods. But it is used for other ports, and useful in the use case I wrote before.
Csaba Osztrogonác
Comment 14 2014-11-21 03:17:32 PST
(In reply to comment #0) > Merge this webkitpy change into our repository: > <https://chromium.googlesource.com/chromium/blink/+/ > 196f8146a948275c2f1594b13e30ab19a6e6fd66> file lock can't be removed, it is still used by xvfbdriver.py, changed the title accordingly.
Csaba Osztrogonác
Comment 15 2014-11-21 03:24:46 PST
Created attachment 242035 [details] WIP patch WIP patch, it removes only the lock, not move start/stop servers from the layout_test_runner to the manager. In my opinion it is useful to stop httpd when it isn't needed anymore. It doesn't make sense to start the httpd at the beginning and only stop at the end. TODO: find a way not to kill already runnning httpd agressively without asking the user.
Alexey Proskuryakov
Comment 16 2014-12-03 11:22:52 PST
> I thought you suggested removing the whole locking shards mechanism. What I'm saying is that this code is so hopelessly confusing that I can't even figure out how to make http tests parallel. I've been testing with "-f", but how do I make them normal tests, like those in fast/ directory? When I try to change anything, it either breaks locking for perf tests, or refuses to launch httpd (even if I change Manager.needs_servers() to return True). Then again, I'm not sure if locking does the right thing for perf tests either. Do we still run them in parallel with all other tests?
Csaba Osztrogonác
Comment 17 2014-12-03 12:16:20 PST
(In reply to comment #16) > > I thought you suggested removing the whole locking shards mechanism. > > What I'm saying is that this code is so hopelessly confusing that I can't > even figure out how to make http tests parallel. I've been testing with > "-f", but how do I make them normal tests, like those in fast/ directory? > > When I try to change anything, it either breaks locking for perf tests, or > refuses to launch httpd (even if I change Manager.needs_servers() to return > True). > > Then again, I'm not sure if locking does the right thing for perf tests > either. Do we still run them in parallel with all other tests? Making http tests to normal tests should be easy. Just remove the is_http_test from _test_requires_lock in manager.py. But in this case starting servers should be unconditional too. (in layout_test_runner.py) Now httpd are stopped after the last locked test. It would be broken after making http tests normal test. The blink change solves this problem with starting httpd at the beginning and stopping it at the end. And leave locking mechanism for the 24 tests in LayoutTests/perf directory. But locking is absolutely useless here, these tests are in one directory, it is one shard, they can't run in parallel with each other. But they can run parallel with other tests.
Alexey Proskuryakov
Comment 18 2014-12-03 12:38:18 PST
> But they can run parallel with other tests. Can they? I think that's a mistake, perf tests need to run exclusively - it does not matter whether they run in parallel with other tests, or with other perf tests, the impact is the same.
Csaba Osztrogonác
Comment 19 2014-12-03 13:40:34 PST
(In reply to comment #18) > > But they can run parallel with other tests. > > Can they? I think that's a mistake, perf tests need to run exclusively - it > does not matter whether they run in parallel with other tests, or with other > perf tests, the impact is the same. Yes, they can. They are handled as locked test same as http test. I checked the code a little bit deeper now. There are two numbers, number of workers and number of locked shards. Tests are sharded always by leaf subdirectory, and then _resize_shards integrates locked shards not to have more than the max number of locked shard. And then all shards are pushed to the pool, first the locked shard, second the unlocked shards. It doesn't guarantee that locked and unlocked shard can't run in parallel. It guarantees that only max_locked_shards can run parallel. But unlocked shards can run too if the number of workers are bigger than max_locked_shards. LayoutTests/perf tests aren't a real performance tests, their runtime doesn't matter, only their magnitude is important. They are to guarantee a feature has linear/logarithmic/O(n^2) magnitude. But it seems they are flakey and timeouts regularly. They are skipped on mac, many of them are skipped on GTK/EFL/Win. I'm afraid there isn't an easy way to guarantee perf tests run first without any parallelism and then all the other tests. Or what if we don't run them during layout testing, but in a separated buildstep - "run-webkit-tests perf"
Alexey Proskuryakov
Comment 20 2014-12-03 13:52:14 PST
Right. This means that locking is already broken for perf tests, because the idea is that they run exclusively, when the machine is otherwise idle: """Return True if the test needs to be locked when running multiple copies of NRWTs. Perf tests are locked because heavy load caused by running other tests in parallel might cause some of them to timeout.""" The comment isn't helping either, because this affects threading within a single NRWT, not (or not only?) multiple ones. Do you plan to finish the WIP patch? Removing the code would make it more feasible for me to hack on what remains.
Alexey Proskuryakov
Comment 21 2014-12-04 12:16:13 PST
> Right. This means that locking is already broken for perf tests Posted a patch to remove that in bug 139264.
Csaba Osztrogonác
Comment 22 2014-12-09 05:03:52 PST
Created attachment 242909 [details] Patch Sorry for the delay, let's remove it. I updated the patch after r176814 and r176830.
Alexey Proskuryakov
Comment 23 2014-12-09 09:39:17 PST
Comment on attachment 242909 [details] Patch Thank you! rs=me
WebKit Commit Bot
Comment 24 2014-12-09 11:32:19 PST
Comment on attachment 242909 [details] Patch Clearing flags on attachment: 242909 Committed r177028: <http://trac.webkit.org/changeset/177028>
WebKit Commit Bot
Comment 25 2014-12-09 11:32:22 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.