Bug 46453 - [NRWT] Put the http and websocket tests to end of the test list
Summary: [NRWT] Put the http and websocket tests to end of the test list
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Gabor Rapcsanyi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-24 00:09 PDT by Gabor Rapcsanyi
Modified: 2010-10-01 12:20 PDT (History)
7 users (show)

See Also:


Attachments
proposed patch (5.38 KB, patch)
2010-09-24 00:14 PDT, Gabor Rapcsanyi
tony: review-
Details | Formatted Diff | Diff
proposed_patch_v2 (6.20 KB, patch)
2010-09-29 07:16 PDT, Gabor Rapcsanyi
tony: review-
Details | Formatted Diff | Diff
proposed_patch_v3 (9.17 KB, patch)
2010-10-01 06:10 PDT, Gabor Rapcsanyi
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gabor Rapcsanyi 2010-09-24 00:09:27 PDT
This prepare the NRWT for http locking like in ORWT.
Related bug: https://bugs.webkit.org/show_bug.cgi?id=33153
Comment 1 Gabor Rapcsanyi 2010-09-24 00:14:02 PDT
Created attachment 68657 [details]
proposed patch
Comment 2 Tony Chang 2010-09-24 09:58:16 PDT
How does moving the http tests to the end of the queue impact total testing time?  I imagine this would be slower when using more than 1 thread because the slow http tests will start last.

When the http tests run first, how much time is there between the last http test finishing and the whole test suite finishing?  Maybe it would be better to shutdown the http server after the http test queue is empty.
Comment 3 Dirk Pranke 2010-09-24 12:19:00 PDT
Ojan, back when you were managing the sharding of tests in new-run-webkit-tests, you had concerns about this, too, but of course that involved multiple threads in a single run rather than multiple concurrent runs.

Any thoughts now?
Comment 4 Gabor Rapcsanyi 2010-09-24 14:25:39 PDT
(In reply to comment #2)
> How does moving the http tests to the end of the queue impact total testing time?  I imagine this would be slower when using more than 1 thread because the slow http tests will start last.
> 
> When the http tests run first, how much time is there between the last http test finishing and the whole test suite finishing?  Maybe it would be better to shutdown the http server after the http test queue is empty.

I didn't do any time measurements yet. But for the http locking we need to put these test to the end of list.
Because when we start more than one NRWT on the same machine and there is an http lock it can starts with the other tests so it just blocks when reaches the http tests.
So if this slows the NRWT I can put the --wait-for-httpd switch to this patch and when its not enable it will put the http tests to the begin of the queue.
Comment 5 Tony Chang 2010-09-24 14:30:54 PDT
(In reply to comment #4)
> Because when we start more than one NRWT on the same machine and there is an http lock it can starts with the other tests so it just blocks when reaches the http tests.
> So if this slows the NRWT I can put the --wait-for-httpd switch to this patch and when its not enable it will put the http tests to the begin of the queue.

With NRWT, if you're running with multiple threads, won't only the http thread block?  For example:

first NRWT
  a) first thread running http tests
  b) second thread running some other tests

second NRWT
  c) first thread blocks on http tests in first NRWT
  d) second thread running some other tests

So only a single thread will block on the second NRWT.  All the other tests can be run by thread (d) while thread (c) is blocking.  Maybe I don't understand how http locking is implemented?
Comment 6 Gabor Rapcsanyi 2010-09-24 15:45:07 PDT
(In reply to comment #5)
> 
> With NRWT, if you're running with multiple threads, won't only the http thread block?  For example:
> 
> first NRWT
>   a) first thread running http tests
>   b) second thread running some other tests
> 
> second NRWT
>   c) first thread blocks on http tests in first NRWT
>   d) second thread running some other tests
> 
> So only a single thread will block on the second NRWT.  All the other tests can be run by thread (d) while thread (c) is blocking.  Maybe I don't understand how http locking is implemented?

The threads are not selecting from the queue they just get next directory.
If we put all http test directory to the begin of queue then all thread will blocks while they get the lock.
If you say put all http test to the same test directory in the queue then just one thread can run the http tests and it can be slower when we just run http tests and the lock will stay longer anyway.
Comment 7 Tony Chang 2010-09-24 16:24:43 PDT
(In reply to comment #6)
> The threads are not selecting from the queue they just get next directory.
> If we put all http test directory to the begin of queue then all thread will blocks while they get the lock.

Ok, I think I understand what you're saying.  You're saying that the lock happens when trying to acquire the directory name from the queue.

If that's the case, is it possible to move the http lock into the thread that runs the tests?  That way a thread will still get the 'http' directory from the queue, but then lock before running any tests on the thread.

Can you direct me at the http locking code?  It might help if I can see how it's currently implemented.
Comment 8 Gabor Rapcsanyi 2010-09-24 16:46:33 PDT
(In reply to comment #7)
> 
> Ok, I think I understand what you're saying.  You're saying that the lock happens when trying to acquire the directory name from the queue.
> 
> If that's the case, is it possible to move the http lock into the thread that runs the tests?  That way a thread will still get the 'http' directory from the queue, but then lock before running any tests on the thread.
> 
> Can you direct me at the http locking code?  It might help if I can see how it's currently implemented.

I think I just did what you are saying. 
You can check my locking code here: http://gist.github.com/596234
Comment 9 Tony Chang 2010-09-24 17:39:54 PDT
(In reply to comment #8)
> You can check my locking code here: http://gist.github.com/596234

I see, so you need _thread_lock because the websocket thread or the http thread might want to start the web servers.  Can you just force those two sets of tests into the same group and remove _thread_lock?  Then the order of tests no longer matters.

I think you can also get more parallelism if you release _http_server_lock once the last http and websocket test is done rather than when the thread exits.
Comment 10 Dirk Pranke 2010-09-24 20:13:03 PDT
Ugh. Okay, I've taken a look at both the patch posted here and the patch on GitHub. They're troubling, but then again I find anything that requires locking, especially cross-process locking, to be troubling :)

First, assuming NRWT is fully exploiting the parallelism of the machine -- which it should be doing, ignoring the currently open bugs that are throttling it  -- a single NRWT run should pin the machine and running multiple in parallel will at best slow things down and more likely introduce more flakiness and weird timeouts and failures.

So, I need to step back and ask, why do we want to run more than one NRWT in parallel? Adam, Eric - how do you see what I wrote above interacting with the commit queue? Gabor, are there other reasons you want to run these things in parallel?

Second, in order to fix the currently open bugs that are causing multithreaded NRWT to not really work (see, e.g., bug 36622, bug 38300, bug 43565 ), we need to switch NRWT to a multi-process model instead. Which means that (a) anything you do that relies on inter-thread locking will need to be reworked, (b) it's likely that the shard-by-directory mechanism currently used will be reworked as well. I would really prefer not to have this change and that change going on at the same time.

Third, I do not want to do anything that slows down the "normal" execution path of running one NRWT on a machine at a time or introduces more flakiness (see above). Given that, it's unclear what the best way to interleave http- and websocket-requiring tests is with other tests. In a single-threaded model, it doesn't really matter, since the total execution time is fixed. But, in a multi-threaded model, it may be better to try and ensure that we distribute http- and websocket- based tests evenly over the entire test run in order to minimize the load on the servers and reduce flakiness. We sort of get that today by starting http tests at the beginning of the run in (one of) the threads while running most of the tests in others. I would want to see either that (a) we determine that the servers can handle whatever concurrency we throw at them (unlikely), or (b) we come up with some sharding mechanism here that allows us to continue to smooth the load as much as possible without introducing too much additional complexity. Of course, if it turns out that we really do want to run multiple NRWTs in parallel, then we want to minimize the amount of time we hold the lock :(

Fourth, now looking at the particulars of this patch, I would really like to see the algorithm abstracted better and the intent captured more clearly. I'm thinking of things along the lines of:

1) hide whether or not a test requires a lock behind an interface or method, e.g. test_requires_lock() that encapsulates all of the path splitting and parsing.

2) Hang the server lock on some fairly obvious global object like TestRunner (which you already have in your GitHub patch, but make them private), and instead of passing the locks to each thread, expose methods on TestRunner to acquire and release the lock as possible. As Tony suggested, it might be helpful here to track how many lock-requiring tests remain and release the lock automatically when you're done. Of course, any shared-memory approach like this will still break when I switch to a multi-process model, but if we pick the right methods to expose on TestRunner to signal when we need the lock, it might be possible to keep this relatively clean.

Lastly, as a nit, the past tense of "split" in English is "split", so "splitted_dir_path" is a bit awkward. I would suggest "split_path", "split_dir_path", "separated_path", or "separated_dir_path" as alternatives, probably in that order. But, again, if you hide the logic of testing whether or not the file needs the lock behind a method, that makes it better as well.
Comment 11 Dirk Pranke 2010-09-27 16:50:35 PDT
Another couple of comments, after having thought about this for a while ... there are at least two alternatives to taking out a global lock here.

The first would be to change the layout test infrastructure and DRT and TestShell to use dynamically selected ports. The idea would be to have the start_http_server() code dynamically select a free port (or set of ports if we're using SSL) and then pass that port to DRT. Inside DRT we could add a method to layoutTestController() to get the port to use for the specific kind of test.

Another would be to reference count the web server instead of insisting on a global lock, since AFAIK all of the tests are stateless on the server-side, so there's not a real reason why two different NRWT runs couldn't point at the same web server. This might be a bit buggier and harder to clean up, but it would also be less prone to deadlocking.
Comment 12 Gabor Rapcsanyi 2010-09-27 17:01:41 PDT
(In reply to comment #10)
> Ugh. Okay, I've taken a look at both the patch posted here and the patch on GitHub. They're troubling, but then again I find anything that requires locking, especially cross-process locking, to be troubling :)
>
> First, assuming NRWT is fully exploiting the parallelism of the machine -- which it should be doing, ignoring the currently open bugs that are throttling it  -- a single NRWT run should pin the machine and running multiple in parallel will at best slow things down and more likely introduce more flakiness and weird timeouts and failures.
>
> So, I need to step back and ask, why do we want to run more than one NRWT in parallel? Adam, Eric - how do you see what I wrote above interacting with the commit queue? Gabor, are there other reasons you want to run these things in parallel?
>

We have some multicore servers with several build and try bots, so it would be useful if we could run more than one layout test per machine.

> Second, in order to fix the currently open bugs that are causing multithreaded NRWT to not really work (see, e.g., bug 36622, bug 38300, bug 43565 ), we need to switch NRWT to a multi-process model instead. Which means that (a) anything you do that relies on inter-thread locking will need to be reworked, (b) it's likely that the shard-by-directory mechanism currently used will be reworked as well. I would really prefer not to have this change and that change going on at the same time.
>

If I put all tests what needs locking to a same group as Tony suggested then we don't need thread locking anymore. Just one thread will run the http tests and do the http server locking locally.

> Third, I do not want to do anything that slows down the "normal" execution path of running one NRWT on a machine at a time or introduces more flakiness (see above). Given that, it's unclear what the best way to interleave http- and websocket-requiring tests is with other tests. In a single-threaded model, it doesn't really matter, since the total execution time is fixed. But, in a multi-threaded model, it may be better to try and ensure that we distribute http- and websocket- based tests evenly over the entire test run in order to minimize the load on the servers and reduce flakiness. We sort of get that today by starting http tests at the beginning of the run in (one of) the threads while running most of the tests in others. I would want to see either that (a) we determine that the servers can handle whatever concurrency we throw at them (unlikely), or (b) we come up with some sharding mechanism here that allows us to continue to smooth the load as much as possible without introducing too much additional complexity. Of course, if it turns out that we really do want to run multiple NRWTs in parallel, then we want to minimize the amount of time we hold the lock :(
>

I can make an option check and put the http tests to first when we don't need locking.

> Fourth, now looking at the particulars of this patch, I would really like to see the algorithm abstracted better and the intent captured more clearly. I'm thinking of things along the lines of:
>
> 1) hide whether or not a test requires a lock behind an interface or method, e.g. test_requires_lock() that encapsulates all of the path splitting and parsing.
>

Thats a good idea, I will put this method to run_webkit_tests.TestRunner.

> 2) Hang the server lock on some fairly obvious global object like TestRunner (which you already have in your GitHub patch, but make them private), and instead of passing the locks to each thread, expose methods on TestRunner to acquire and release the lock as possible. As Tony suggested, it might be helpful here to track how many lock-requiring tests remain and release the lock automatically when you're done. Of course, any shared-memory approach like this will still break when I switch to a multi-process model, but if we pick the right methods to expose on TestRunner to signal when we need the lock, it might be possible to keep this relatively clean.
>

As I said above the server lock will be moved to the thread.

> Lastly, as a nit, the past tense of "split" in English is "split", so "splitted_dir_path" is a bit awkward. I would suggest "split_path", "split_dir_path", "separated_path", or "separated_dir_path" as alternatives, probably in that order. But, again, if you hide the logic of testing whether or not the file needs the lock behind a method, that makes it better as well.

My bad, I will correct :)
Comment 13 Dirk Pranke 2010-09-27 17:48:27 PDT
(In reply to comment #12)
> (In reply to comment #10)
> > Ugh. Okay, I've taken a look at both the patch posted here and the patch on GitHub. They're troubling, but then again I find anything that requires locking, especially cross-process locking, to be troubling :)
> >
> > First, assuming NRWT is fully exploiting the parallelism of the machine -- which it should be doing, ignoring the currently open bugs that are throttling it  -- a single NRWT run should pin the machine and running multiple in parallel will at best slow things down and more likely introduce more flakiness and weird timeouts and failures.
> >
> > So, I need to step back and ask, why do we want to run more than one NRWT in parallel? Adam, Eric - how do you see what I wrote above interacting with the commit queue? Gabor, are there other reasons you want to run these things in parallel?
> >
> 
> We have some multicore servers with several build and try bots, so it would be useful if we could run more than one layout test per machine.
> 


This may seem like a dumb question, but why do you run more than one bot per machine? Are the bots not always busy at the same time, or are the machines not pinned? It seems like, if the infrastructure is working properly, trying to run multiple bots on the same machine should just make things slower.

Of course, if you can only afford one machine, things are different, but that doesn't sound like the case from what you say?

> 
> I can make an option check and put the http tests to first when we don't need locking.
>

As I described above, I'm not sure that moving when the tests are executed will actually improve things either way; I honestly don't know what will perform best, either when there's only a single NRWT running, or when we're running more than one at the same time. 

I would like to see some benchmark numbers or test results to support whatever changes we make before we land anything.

Please don't take the above as a sign that I don't want to make changes - I'm happy to do anything that makes developers more productive, reduces cycle times, or allows you to test things you can't do now. I just want to make sure we're doing the right things and do see real benefit, rather than just changing things randomly in the hopes that it'll make things better.
Comment 14 Andras Becsi 2010-09-28 00:12:46 PDT
> > We have some multicore servers with several build and try bots, so it would be useful if we could run more than one layout test per machine.
> > 
> 
> 
> This may seem like a dumb question, but why do you run more than one bot per machine? Are the bots not always busy at the same time, or are the machines not pinned? It seems like, if the infrastructure is working properly, trying to run multiple bots on the same machine should just make things slower.

Actually we have a couple of 8 core an 12 core high performance servers with 32 GiB of RAM. Currently with old-run-webkit-tests --wait-for-httpd (where one instance does not run tests in parallel) we run 4 (or more) bots in average on each machine, and in sum we are a bit over half load and the Qt bots are among the fastest bots on bugs.webkit.org.

We think we could distribute load more effective if the tests sessions would run parallel since now there is only one thread in each test session (with ORWT).

I ceased to work on https://bugs.webkit.org/show_bug.cgi?id=33153 for now, because NRWT seems much more capable to suit our needs.
Please read https://bugs.webkit.org/show_bug.cgi?id=33153#c10, where I had a similar discussion with Alexey concerning this issue.
Comment 15 Tony Chang 2010-09-28 09:47:39 PDT
Comment on attachment 68657 [details]
proposed patch

r- for now because it sounds like Gabor is working on a new patch without the thread lock.
Comment 16 Dirk Pranke 2010-09-28 20:13:21 PDT
(In reply to comment #14)
> > > We have some multicore servers with several build and try bots, so it would be useful if we could run more than one layout test per machine.
> > > 
> > 
> > 
> > This may seem like a dumb question, but why do you run more than one bot per machine? Are the bots not always busy at the same time, or are the machines not pinned? It seems like, if the infrastructure is working properly, trying to run multiple bots on the same machine should just make things slower.
> 
> Actually we have a couple of 8 core an 12 core high performance servers with 32 GiB of RAM. Currently with old-run-webkit-tests --wait-for-httpd (where one instance does not run tests in parallel) we run 4 (or more) bots in average on each machine, and in sum we are a bit over half load and the Qt bots are among the fastest bots on bugs.webkit.org.
> 
> We think we could distribute load more effective if the tests sessions would run parallel since now there is only one thread in each test session (with ORWT).
> 

It is certainly true that you'd want to be able to run multiple ORWTs in parallel, since they had no intra-test parallelism. That doesn't change my concerns about what might happen if we run multiple NRWTs in parallel (e.g., more flakiness).

That said, I can certainly understand that if you have a 16-core machine available you might prefer to have two bots using 8 cores each, rather than using VMs or trying to serialize at a buildbot-layer.

I can also see some advantages to being able to do concurrent development or testing on a machine, assuming we don't see flakiness get introduced.

I do think removing hardcoded dependencies on ports in the tests is the right long-term answer, but enabling a migration path would probably be a good thing in the meantime.

So, I guess I bless pursuing this idea further, :) subject to the initial sets of caveats I've added:

*) we should figure out how much additional flakiness gets introduced by running 2 NRWTs concurrently

*) we should make sure we can abstract the locking mechanism as much as possible to prepare for the multi-process NRWT

and I would still like to pursue the idea of reference-counting the HTTP server instead of using an exclusive lock.




> I ceased to work on https://bugs.webkit.org/show_bug.cgi?id=33153 for now, because NRWT seems much more capable to suit our needs.
> Please read https://bugs.webkit.org/show_bug.cgi?id=33153#c10, where I had a similar discussion with Alexey concerning this issue.
Comment 17 Gabor Rapcsanyi 2010-09-29 07:16:48 PDT
Created attachment 69191 [details]
proposed_patch_v2

Look at the actual code from run_webkit_tests.TestRunner:

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:514
>        if (self._options.experimental_fully_parallel or
>            self._is_single_threaded()):
>            filename_queue = Queue.Queue()
>            for test_file in test_files:
>                filename_queue.put(
>                    ('.', [self._get_test_info_for_file(test_file)]))
>            return filename_queue

This will put all tests to separate groups to the queue one by one, I think it is a better way if we put all tests to the same group, because the separate groups cause overhead in dump_render_tree_thread.

>
>        tests_by_dir = {}
>        for test_file in test_files:
>            directory = self._get_dir_for_test_file(test_file)
>            tests_by_dir.setdefault(directory, [])
>            tests_by_dir[directory].append(
>                self._get_test_info_for_file(test_file))
>
>        # Sort by the number of tests in the dir so that the ones with the
>        # most tests get run first in order to maximize parallelization.
>        # Number of tests is a good enough, but not perfect, approximation
>        # of how long that set of tests will take to run. We can't just use
>        # a PriorityQueue until we move # to Python 2.6.
>        test_lists = []
>        http_tests = None
>        for directory in tests_by_dir:
>            test_list = tests_by_dir[directory]
>            # Keep the tests in alphabetical order.
>            # TODO: Remove once tests are fixed so they can be run in any
>            # order.
>            test_list.reverse()
>            test_list_tuple = (directory, test_list)
>            if directory == 'LayoutTests' + os.sep + 'http':
>                http_tests = test_list_tuple

This section is unreachable because directory will never contain the 'LayoutTests' word.
The comment says it will put the http tests first but it won't do anything because the http_tests will always be empty.

>            else:
>                test_lists.append(test_list_tuple)
>        test_lists.sort(lambda a, b: cmp(len(b[1]), len(a[1])))
>
>        # Put the http tests first. There are only a couple hundred of them,
>        # but each http test takes a very long time to run, so sorting by the
>        # number of tests doesn't accurately capture how long they take to run.
>        if http_tests:
>            test_lists.insert(0, http_tests)


In this patch I put the http and websocket tests to the end of the list if we use NRWT single threaded like in ORWT.
In multi-threaded mode it will put these tests first in the queue.
Comment 18 Tony Chang 2010-09-29 10:51:29 PDT
Comment on attachment 69191 [details]
proposed_patch_v2

In general, this looks fine to me.  Just some minor issues with naming and comments.  Also, is it possible to write a unit test for this method?  It could verify that http tests are actually at the beginning of the list (good catch!).  r- for no tests, but you might be able to convince me that it's really hard to test.

View in context: https://bugs.webkit.org/attachment.cgi?id=69191&action=review

> WebKitTools/ChangeLog:6
> +        [NRWT] Put the http and websocket tests to end of the test list.
> +        https://bugs.webkit.org/show_bug.cgi?id=46453

I think this description is no longer correct.

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:503
> +    def _test_requires_lock(self, test_dir):
> +        """Return True if a test needs locking."""

Can you expand this docstring a bit?  It's not clear what locking you're talking about.  Something like, "Return True if the test needs to be locked when running multiple copies of NRWTs."

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:507
> +        if 'http' in split_path or 'websocket' in split_path:
> +            return True
> +        return False

Maybe "return 'http' in split_path or 'websocket' in split_path" is better?

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:522
> +        tests_to_lock = []

Nit: Maybe name this tests_to_http_lock so it's clear what lock we're talking about.

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:551
> +            # of how long that set of tests will take to run. We can't just use
> +            # a PriorityQueue until we move # to Python 2.6.

Nit: Can you remove the extra # in the middle of the line?  Thanks.

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:555
> +                # TODO: Remove once tests are fixed so they can be run in any

Nit: Can you convert this to a FIXME instead of TODO?
Comment 19 Dirk Pranke 2010-09-29 17:24:46 PDT
(In reply to comment #17)
> Created an attachment (id=69191) [details]
> proposed_patch_v2
> 
> Look at the actual code from run_webkit_tests.TestRunner:
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:514
> >        if (self._options.experimental_fully_parallel or
> >            self._is_single_threaded()):
> >            filename_queue = Queue.Queue()
> >            for test_file in test_files:
> >                filename_queue.put(
> >                    ('.', [self._get_test_info_for_file(test_file)]))
> >            return filename_queue
> 
> This will put all tests to separate groups to the queue one by one, I think it is a better way if we put all tests to the same group, because the separate groups cause overhead in dump_render_tree_thread.

putting them in a single group would work if you were single threaded, but not in the experimental_fully_parallel mode - all the tests would end up on one thread. The overhead of reading groups of one file each is inconsequential (I've tested it).

> >        tests_by_dir = {}
> >        for test_file in test_files:
> >            directory = self._get_dir_for_test_file(test_file)
> >            tests_by_dir.setdefault(directory, [])
> >            tests_by_dir[directory].append(
> >                self._get_test_info_for_file(test_file))
> >
> >        # Sort by the number of tests in the dir so that the ones with the
> >        # most tests get run first in order to maximize parallelization.
> >        # Number of tests is a good enough, but not perfect, approximation
> >        # of how long that set of tests will take to run. We can't just use
> >        # a PriorityQueue until we move # to Python 2.6.
> >        test_lists = []
> >        http_tests = None
> >        for directory in tests_by_dir:
> >            test_list = tests_by_dir[directory]
> >            # Keep the tests in alphabetical order.
> >            # TODO: Remove once tests are fixed so they can be run in any
> >            # order.
> >            test_list.reverse()
> >            test_list_tuple = (directory, test_list)
> >            if directory == 'LayoutTests' + os.sep + 'http':
> >                http_tests = test_list_tuple
> 
> This section is unreachable because directory will never contain the 'LayoutTests' word.
> The comment says it will put the http tests first but it won't do anything because the http_tests will always be empty.

Hmm. I wondered about that the other day as well. Good catch.
Comment 20 Dirk Pranke 2010-09-29 17:26:15 PDT
(In reply to comment #16)
> and I would still like to pursue the idea of reference-counting the HTTP server instead of using an exclusive lock.

It occurred to me last night that this would work in the special case of a debug and release bot running the same build, but not in the general case of two completely different checkouts, since each checkout may have different files on the server. So, it's probably not worth pursuing, unfortunately.
Comment 21 Dirk Pranke 2010-09-29 17:35:43 PDT
Comment on attachment 69191 [details]
proposed_patch_v2

View in context: https://bugs.webkit.org/attachment.cgi?id=69191&action=review

Is there a reason to put the tests requiring locks first in the multi-threaded case and last in the single-threaded, multi-bot case? Why not put it first in both cases? I would think that the bot throughput and contention would be the same either way.

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:504
> +        split_path = test_dir.split(os.sep)

Also, the routine name is "test_requires", but you're actually passing in a directory. I would either call it "directory_to_lock", or, preferably, pass in the test name instead of the directory. The latter requires you to split the file twice, but it makes for a better abstraction and I suspect the performance hit is inconsequential.

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:-521
> -

As I noted in the other comment, this breaks --experimental-fully-parallel by putting everything in a single item in the queue.
Comment 22 Gabor Rapcsanyi 2010-10-01 06:10:14 PDT
Created attachment 69459 [details]
proposed_patch_v3

I've put the test what need http lock first in the queue in both case and fixed what you said:

make unittest for it 
 - done

breaks --experimental-fully-parallel
 - fixed

_test_requires_lock method get directory
 - fixed

rename tests_to_lock
 - fixed

comments
 - fixed
Comment 23 Tony Chang 2010-10-01 10:40:24 PDT
Comment on attachment 69459 [details]
proposed_patch_v3

View in context: https://bugs.webkit.org/attachment.cgi?id=69459&action=review

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:560
> +            test_lists.insert(0, ("tests_to_http_lock", tests_to_http_lock))

Nit: In a separate change, can you update dump_render_tree_thread.py?  TestShellThread._current_dir is a little confusing since in this case it's not a directory.  We can probably rename it to test_group or group_name or something.
Comment 24 Gabor Rapcsanyi 2010-10-01 11:09:44 PDT
(In reply to comment #23)
> (From update of attachment 69459 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=69459&action=review
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:560
> > +            test_lists.insert(0, ("tests_to_http_lock", tests_to_http_lock))
> 
> Nit: In a separate change, can you update dump_render_tree_thread.py?  TestShellThread._current_dir is a little confusing since in this case it's not a directory.  We can probably rename it to test_group or group_name or something.

Sure, the next bug/patch of http locking is hit dump_render_tree_thread.py so I will correct this.
Comment 25 Csaba Osztrogonác 2010-10-01 11:17:51 PDT
Comment on attachment 69459 [details]
proposed_patch_v3

Clearing flags on attachment: 69459

Committed r68903: <http://trac.webkit.org/changeset/68903>
Comment 26 Csaba Osztrogonác 2010-10-01 11:18:04 PDT
All reviewed patches have been landed.  Closing bug.
Comment 27 Tony Chang 2010-10-01 12:20:12 PDT
(In reply to comment #26)
> All reviewed patches have been landed.  Closing bug.

BTW, I checked on the chromium bots and there appears to be a tiny bug that the http tests are run in reverse alphabetical order.  I think we just need to reverse tests_to_http_lock.  Should be a simple cleanup fix.