60605 – [NRWT] Reftests should run even when pixel tests are disabled.

Hayato Ito

Reported 2011-05-10 20:18:58 PDT

In current implementation of reftests, reftests don't run if pixel tests are disabled. Which was introduced in http://trac.webkit.org/changeset/86126 as a temporary fix to avoid giving wrong test results. It is confusing for end-users that '--pixel-tests' command line option affects the behavior of reftests. We should run reftests regardless of whether 'pixel tests' are enabled or not.

Eric Seidel (no email)

Comment 1 2011-10-12 14:35:16 PDT

Do we still have ref tests?

Dirk Pranke

Comment 2 2011-10-12 14:50:38 PDT

Yes.

Eric Seidel (no email)

Comment 3 2011-10-12 14:52:15 PDT

How many reftests do we have, and can we have more of them? :)

Dirk Pranke

Comment 4 2011-10-12 14:54:25 PDT

I believe less than a handful, and I believe we were gating reftests on converting everyone over to NRWT (or at least getting critical mass). I would suggest that as soon as the WK2 bots are running NRWT, that's good enough (since I think that would just leave the win bot running ORWT).

Eric Seidel (no email)

Comment 5 2011-10-12 14:56:32 PDT

Sounds good. I'll polish up my -2 patch for NRWT once I fix bug 69899.

Ojan Vafai

Comment 6 2011-11-02 18:58:11 PDT

*** Bug 71158 has been marked as a duplicate of this bug. ***

Ryosuke Niwa

Comment 7 2011-11-07 20:17:16 PST

I'm concerned that running reftests as implemented now might be as slow as running pixel tests. If that were the case, then always enabling reftests defeats the point of not running pixel tests (fast cycle time). Do we have data to suggest otherwise? (i.e. running reftests is as fast as running render tree dump tests) Or are we going to compare render tree dumps when -p is not passed?

Dirk Pranke

Comment 8 2011-11-07 20:38:06 PST

(In reply to comment #7) > I'm concerned that running reftests as implemented now might be as slow as running pixel tests. If that were the case, then always enabling reftests defeats the point of not running pixel tests (fast cycle time). > > Do we have data to suggest otherwise? (i.e. running reftests is as fast as running render tree dump tests) Or are we going to compare render tree dumps when -p is not passed? I have always assumed that people don't run pixel tests because the pixel tests fail (the png's aren't correct), not that they were too slow. Hence, I support this change. It is possible that they were slower with ORWT, but given NRWT running with more than one thread, this would seem like not much of an issue. That said, I assume running reftests are significantly slower than either pixel tests or text-only tests, since we have to do double the work.

Ryosuke Niwa

Comment 9 2011-11-07 21:01:46 PST

(In reply to comment #8) > I have always assumed that people don't run pixel tests because the pixel tests fail (the png's aren't correct), not that they were too slow. Hence, I support this change. The last time I checked generating/comparing pixel tests were significantly slower than generating/comparing render tree dumps. > It is possible that they were slower with ORWT, but given NRWT running with more than one thread, this would seem like not much of an issue. Possible. It'll be nice to get numbers.

Dirk Pranke

Comment 10 2011-11-07 22:17:57 PST

(In reply to comment #9) > (In reply to comment #8) > > I have always assumed that people don't run pixel tests because the pixel tests fail (the png's aren't correct), not that they were too slow. Hence, I support this change. > > The last time I checked generating/comparing pixel tests were significantly slower than generating/comparing render tree dumps. Yes, I am sure that this is true. > > > It is possible that they were slower with ORWT, but given NRWT running with more than one thread, this would seem like not much of an issue. > > Possible. It'll be nice to get numbers. I think you're missing my point; I've never heard anyone say that they skip pixel tests w/ NRWT because they're too slow. Possibly this was true with ORWT, but given how much faster NRWT can be than ORWT, I don't think it matters all that much. If someone is worried about cycle time, there are lots of things we can do to improve the cycle time; we should not let that be an argument against getting people to write reftests instead of pixel tests, given the other benefits that they will bring.

Ryosuke Niwa

Comment 11 2011-11-07 22:22:08 PST

(In reply to comment #10) > (In reply to comment #9) > > Possible. It'll be nice to get numbers. > > I think you're missing my point; I've never heard anyone say that they skip pixel tests w/ NRWT because they're too slow. I do that because multi-processing isn't enabled on Mac port by default. > we should not let that be an argument against getting people to write reftests instead of pixel tests, given the other benefits that they will bring. I'm not but we certainly need to enable ref tests when -p is not passed because bots don't use -p. Also, I'm inclined to think that we might be better off generating and comparing render tree dump when -p is not passed.

Ojan Vafai

Comment 12 2011-11-07 23:30:19 PST

Pixel tests were disabled because they were hard to maintain due to subtle platform differences and lots of pixel results changing for small changes. Neither of these problems is true for reftests. Reftests are likely slower than non-reftests, but I don't expect that this would block turning them on by default. Certainly, noone on webkit-dev seems opposed to it.

Hayato Ito

Comment 13 2011-11-07 23:55:49 PST

Let me loop in. Apart from the discussion, I guess no one is working on to run reftests even if '-p' is not passed, right? If so, let me start to implement it. If '-p' is given, we run *traditional* pixel tests and run also reftests. If '-p' is not given, we don't run *traditional* pixel tests, but reftests should be run. Reftests should be run in both cases. Maybe we should have additional cmd line option, e.g. '--run-reftests', but it should be 'on' by default, shouldn't we? (In reply to comment #12) > Pixel tests were disabled because they were hard to maintain due to subtle platform differences and lots of pixel results changing for small changes. Neither of these problems is true for reftests. > > Reftests are likely slower than non-reftests, but I don't expect that this would block turning them on by default. Certainly, noone on webkit-dev seems opposed to it.

Ryosuke Niwa

Comment 14 2011-11-07 23:58:16 PST

(In reply to comment #13) > If '-p' is given, we run *traditional* pixel tests and run also reftests. > If '-p' is not given, we don't run *traditional* pixel tests, but reftests should be run. > Reftests should be run in both cases. Right. > Maybe we should have additional cmd line option, e.g. '--run-reftests', but it should be 'on' by default, shouldn't we? Agreed. But I think it's worth hearing what other ports' contributors feel about doing png comparison vs. render tree dump comparison.

Ojan Vafai

Comment 15 2011-11-08 08:40:29 PST

(In reply to comment #14) > (In reply to comment #13) > > If '-p' is given, we run *traditional* pixel tests and run also reftests. > > If '-p' is not given, we don't run *traditional* pixel tests, but reftests should be run. > > Reftests should be run in both cases. > > Right. Yup. > > Maybe we should have additional cmd line option, e.g. '--run-reftests', but it should be 'on' by default, shouldn't we? > > Agreed. > > But I think it's worth hearing what other ports' contributors feel about doing png comparison vs. render tree dump comparison. I think my email to webkit-dev about turning reftests on by default was sufficient. Everyone who chimed in supports doing it and noone objected. Is there a port your concerned about?

Martin Robinson

Comment 16 2011-11-08 08:43:38 PST

(In reply to comment #15) > I think my email to webkit-dev about turning reftests on by default was sufficient. Everyone who chimed in supports doing it and noone objected. Is there a port your concerned about? Speaking from the perspective of one of the ports that has pixel tests support, but does not have pixel tests enabled, I like the idea of reftests by default.

Ryosuke Niwa

Comment 17 2011-11-08 21:21:19 PST

Per IRC discusion, we've agreed to run "traditional" reftests even when -p is not passed.

Tony Chang

Comment 18 2011-11-09 12:23:58 PST

Created attachment 114341 [details] Patch

Ryosuke Niwa

Comment 19 2011-11-09 12:31:53 PST

Comment on attachment 114341 [details] Patch Don't you need to pass -p to DRT?

Tony Chang

Comment 20 2011-11-09 14:09:54 PST

Comment on attachment 114341 [details] Patch Ryosuke is right, this says pass but doesn't run the test.

Ryosuke Niwa

Comment 21 2011-11-09 14:22:47 PST

(In reply to comment #20) > (From update of attachment 114341 [details]) > Ryosuke is right, this says pass but doesn't run the test. We should probably return MISSING result in such cases.

Hayato Ito

Comment 22 2011-11-09 22:59:34 PST

The problem is that we have to pass an option, like '--pixel-tests=.../png_result_x.png', to DRT for reftests, but we should not pass such an option to other non-pixel tests because it impacts the performance. Because DRT requires to take such option at startup, we cannot share one DRT instance between reftests and other non-pixel tests. Is there any way to solve this issue without modifying each DRT port? I am thinking that we should separate DRT instance so that some DRT instances are only used for reftests. But I am not sure this is a good idea...

Hayato Ito

Comment 23 2011-11-09 23:55:23 PST

Maybe I am too worrying about the performance since chromium port always use '--pixel-tests' and no one doesn't complain about the performance with NRWT in chromium port. Does anyone have data about other ports? (In reply to comment #22) > The problem is that we have to pass an option, like '--pixel-tests=.../png_result_x.png', to DRT for reftests, but we should not pass such an option to other non-pixel tests because it impacts the performance. > Because DRT requires to take such option at startup, we cannot share one DRT instance between reftests and other non-pixel tests. > Is there any way to solve this issue without modifying each DRT port? > I am thinking that we should separate DRT instance so that some DRT instances are only used for reftests. But I am not sure this is a good idea...

Tony Chang

Comment 24 2011-11-10 09:40:37 PST

Sorry, here's my plan: 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker. 2) When a Worker is created, create 2 Drivers (one with pixel tests one without). 3) Pass both drivers to single_test_runner. 4) In single_test_runner, use the correct driver depending on which test is needed. This means that each worker may start 2 DRT instances, but only use one at a time. This would result in a few more DRT instances hanging around (and a bit more memory), but that seems OK. If running with pixel tests enabled, we would only have 1 DRT instance per worker. I'm going to split this work into 3 patches.

Ryosuke Niwa

Comment 25 2011-11-10 09:52:25 PST

(In reply to comment #24) > 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker. > 2) When a Worker is created, create 2 Drivers (one with pixel tests one without). > 3) Pass both drivers to single_test_runner. > 4) In single_test_runner, use the correct driver depending on which test is needed. > > This means that each worker may start 2 DRT instances, but only use one at a time. Does this mean that those 2 instances are created lazily?

Tony Chang

Comment 26 2011-11-10 09:58:16 PST

(In reply to comment #25) > (In reply to comment #24) > > 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker. > > 2) When a Worker is created, create 2 Drivers (one with pixel tests one without). > > 3) Pass both drivers to single_test_runner. > > 4) In single_test_runner, use the correct driver depending on which test is needed. > > > > This means that each worker may start 2 DRT instances, but only use one at a time. > > Does this mean that those 2 instances are created lazily? Correct. If a worker doesn't have any ref tests, it will only have 1 DRT instance.

Dirk Pranke

Comment 27 2011-11-10 12:20:09 PST

(In reply to comment #24) > Sorry, here's my plan: > > 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker. > 2) When a Worker is created, create 2 Drivers (one with pixel tests one without). > 3) Pass both drivers to single_test_runner. > 4) In single_test_runner, use the correct driver depending on which test is needed. > > This means that each worker may start 2 DRT instances, but only use one at a time. This would result in a few more DRT instances hanging around (and a bit more memory), but that seems OK. If running with pixel tests enabled, we would only have 1 DRT instance per worker. > > I'm going to split this work into 3 patches. Correct me if I'm wrong, but this isn't "a few" more instances, this is 2x the instances, right? i.e., if I'm running w/ 8 workers, I will have up to 16 DRTs (although at most 8 of them doing something at any one time)? Rather than doing this, you could perhaps restart DRT if it needs to switch between "wants pixel tests" and doesn't want pixel tests. This seems a bit conceptually simpler, although it might be slightly slower.

Eric Seidel (no email)

Comment 28 2011-11-10 12:33:38 PST

We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests. pixel testing isn't stateful. Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :)

Dirk Pranke

Comment 29 2011-11-10 12:37:53 PST

(In reply to comment #28) > We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests. pixel testing isn't stateful. Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :) True. We already pass the image has we're expecting for each test, so either passing in a magic value or another parameter would do the trick.

Tony Chang

Comment 30 2011-11-10 12:39:49 PST

(In reply to comment #27) > Correct me if I'm wrong, but this isn't "a few" more instances, this is 2x the instances, right? i.e., if I'm running w/ 8 workers, I will have up to 16 DRTs (although at most 8 of them doing something at any one time)? Correct, but only if all workers hit a ref test. Currently, that means a few more DRTs, but in the future, it probably means 2x instances. > Rather than doing this, you could perhaps restart DRT if it needs to switch between "wants pixel tests" and doesn't want pixel tests. This seems a bit conceptually simpler, although it might be slightly slower. I considered doing this first, but the abstraction makes it hard to do this. Worker creates DRT instances, but single_test_runner makes decisions about whether we have a ref test or not. It seemed more complicated (both conceptually and in code) to move this around. Also, the slowness could be pretty extreme since ref tests are mixed with non-ref tests. The extra memory use here should be minimal and memory is not currently a limiting factor in running layout tests. (In reply to comment #28) > We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests. pixel testing isn't stateful. Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :) I considered this as well, but updating 4+ DRTs seems like more effort than it was worth.

Dirk Pranke

Comment 31 2011-11-10 12:48:33 PST

(In reply to comment #30) > I considered doing this first, but the abstraction makes it hard to do this. Worker creates DRT instances, but single_test_runner makes decisions about whether we have a ref test or not. It seemed more complicated (both conceptually and in code) to move this around. > Sorry, I meant that you could make the change in bug 72038, so that worker and single_test_runner get out of the start() business, and then you could decide whether you needed to restart DRT inside run_test as necessary. I think w/ the change in 72038, we can then play with the logic we want to use purely inside run_test(), so I don't feel strongly about the approach now. We can change it later if it turns out to matter.

Eric Seidel (no email)

Comment 32 2011-11-10 13:06:16 PST

> (In reply to comment #28) > > We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests. pixel testing isn't stateful. Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :) > > I considered this as well, but updating 4+ DRTs seems like more effort than it was worth. This is a huge barrier to entry we have. We need to continue to share more code between DRT instances. There is no reason they shouldn't all share the same main loop, in which case changes like this would be absolutely trivial.

Ryosuke Niwa

Comment 33 2011-11-10 13:28:12 PST

(In reply to comment #27) > Correct me if I'm wrong, but this isn't "a few" more instances, this is 2x the instances, right? i.e., if I'm running w/ 8 workers, I will have up to 16 DRTs (although at most 8 of them doing something at any one time)? I don't think this would be an issue. My machine isn't certainly going out of memory even if I had 30+ chrome renderers open. If we can't have 16 DRT instances on 8 core machines, we're definitely doing something wrong. Also DRT loads about:blank as soon as the test is finished to clear states. > Rather than doing this, you could perhaps restart DRT if it needs to switch between "wants pixel tests" and doesn't want pixel tests. This seems a bit conceptually simpler, although it might be slightly slower. I'm unsupportive of this approach given it might slow down nrwt. (In reply to comment #28) > We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests. pixel testing isn't stateful. Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :) Mac port's DRT changes the color profile if pixel tests are enabled. This probably doesn't matter for reftests unless the test is dependent on the color profile configuration provided under --pixel.(In reply to comment #30) (In reply to comment #30) > I considered this as well, but updating 4+ DRTs seems like more effort than it was worth. And WebKitTestRunner for WebKit2. (In reply to comment #32) > This is a huge barrier to entry we have. We need to continue to share more code between DRT instances. There is no reason they shouldn't all share the same main loop, in which case changes like this would be absolutely trivial. I believe WebKit2 does that in WebKitTestRunner.

Hayato Ito

Comment 34 2011-11-10 23:12:50 PST

Instead of passing two drivers into single_test_runner, how about driver itself maintains two DRT instances, and use correct one inside of driver.run_test()? Although we have to add additional flag to DriverInput class so that driver can easily know which DRT should be used inside of its run_test, this make the implementation simpler, doesn't it? (In reply to comment #24) > Sorry, here's my plan: > > 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker. > 2) When a Worker is created, create 2 Drivers (one with pixel tests one without). > 3) Pass both drivers to single_test_runner. > 4) In single_test_runner, use the correct driver depending on which test is needed. > > This means that each worker may start 2 DRT instances, but only use one at a time. This would result in a few more DRT instances hanging around (and a bit more memory), but that seems OK. If running with pixel tests enabled, we would only have 1 DRT instance per worker. > > I'm going to split this work into 3 patches.

Ryosuke Niwa

Comment 35 2011-11-10 23:38:28 PST

(In reply to comment #34) > Instead of passing two drivers into single_test_runner, how about driver itself maintains two DRT instances, and use correct one inside of driver.run_test()? Although we have to add additional flag to DriverInput class so that driver can easily know which DRT should be used inside of its run_test, this make the implementation simpler, doesn't it? That sounds like a good idea. It also makes the transition easier if we ever wanted to make DRT natively support switching between two modes.

Tony Chang

Comment 36 2011-11-11 16:03:08 PST

Created attachment 114793 [details] Patch

Tony Chang

Comment 37 2011-11-11 16:05:35 PST

(In reply to comment #34) > Instead of passing two drivers into single_test_runner, how about driver itself maintains two DRT instances, and use correct one inside of driver.run_test()? Although we have to add additional flag to DriverInput class so that driver can easily know which DRT should be used inside of its run_test, this make the implementation simpler, doesn't it? This is a good idea. I made a new driver called RefTestDriver that handles this logic. I made a new driver to avoid having to make duplicate changes in ChromiumDriver and WebKitDriver which don't share that much right now. If the patch is too big to review but the design is OK, let me know and I can break it out into some smaller patches.

Hayato Ito

Comment 38 2011-11-13 18:28:43 PST

Before reviewing that, I'd like to confirm about new-run-webkit-test's behavior: Suppose these two conditions are met: 1. command line option, '--pixel-tests', is given 2. A test, 'foo.html', doesn't contain 'foo-expected.png', contains only 'foo-expected.txt'. In this case, current NRWT always uses DRT which started with '--pixel-tests=.../png_resultx.png' even for such a text-only test. I guess this affects the performance. Is this an intended behavior? If this is not a desired behavior, it might be good opportunity to change this behavior because now we can choose correct DRT at runtime for each test.

Ryosuke Niwa

Comment 39 2011-11-13 19:48:56 PST

(In reply to comment #38) > Before reviewing that, I'd like to confirm about new-run-webkit-test's behavior: > > Suppose these two conditions are met: > 1. command line option, '--pixel-tests', is given > 2. A test, 'foo.html', doesn't contain 'foo-expected.png', contains only 'foo-expected.txt'. > In this case, current NRWT always uses DRT which started with '--pixel-tests=.../png_resultx.png' even for such a text-only test. > I guess this affects the performance. Is this an intended behavior? I think it is intended. If the test author check-in .txt to cross-platform dir. and t hen .png to platform-specific dir., we'd still like to get .png out of DRT, right?

Hayato Ito

Comment 40 2011-11-13 20:04:47 PST

I meant that 'foo.html' doesn't have any '-expected.png' anywhere, either in cross-platform or platform-specific dir. We can judge that for each tests before calling driver.run_test for that test. (In reply to comment #39) > (In reply to comment #38) > > Before reviewing that, I'd like to confirm about new-run-webkit-test's behavior: > > > > Suppose these two conditions are met: > > 1. command line option, '--pixel-tests', is given > > 2. A test, 'foo.html', doesn't contain 'foo-expected.png', contains only 'foo-expected.txt'. > > In this case, current NRWT always uses DRT which started with '--pixel-tests=.../png_resultx.png' even for such a text-only test. > > I guess this affects the performance. Is this an intended behavior? > > I think it is intended. If the test author check-in .txt to cross-platform dir. and t hen .png to platform-specific dir., we'd still like to get .png out of DRT, right?

Ryosuke Niwa

Comment 41 2011-11-13 20:14:10 PST

(In reply to comment #40) > I meant that 'foo.html' doesn't have any '-expected.png' anywhere, either in cross-platform or platform-specific dir. We can judge that for each tests before calling driver.run_test for that test. That sounds like a reasonable idea if neither --rest-results nor --new-baseline is specified. But accessing many different directories itself could be expensive so it might be worth measuring the time.

Hayato Ito

Comment 42 2011-11-13 20:23:29 PST

Yeah. Actually, it seems that such accessing many different directories is already done in single_test_runner.driver_input(), which calls port.expected_checksum(), which checks each directories to retrieve correct 'expected' checksum. So this shouldn't increase the runtime cost. (In reply to comment #41) > (In reply to comment #40) > > I meant that 'foo.html' doesn't have any '-expected.png' anywhere, either in cross-platform or platform-specific dir. We can judge that for each tests before calling driver.run_test for that test. > > That sounds like a reasonable idea if neither --rest-results nor --new-baseline is specified. But accessing many different directories itself could be expensive so it might be worth measuring the time.

Hayato Ito

Comment 43 2011-11-13 20:32:45 PST

The point is: If ((test.is_reftest) or (('options.pixel_tests is true') and ('we can find an expected checksum for that test'))), we should use pixel_driver. The condition will be more complex because we should consider also 'options.new_baseline' and 'options.result_resutls'... (In reply to comment #42) > Yeah. Actually, it seems that such accessing many different directories is already done in single_test_runner.driver_input(), which calls port.expected_checksum(), which checks each directories to retrieve correct 'expected' checksum. So this shouldn't increase the runtime cost. > > (In reply to comment #41) > > (In reply to comment #40) > > > I meant that 'foo.html' doesn't have any '-expected.png' anywhere, either in cross-platform or platform-specific dir. We can judge that for each tests before calling driver.run_test for that test. > > > > That sounds like a reasonable idea if neither --rest-results nor --new-baseline is specified. But accessing many different directories itself could be expensive so it might be worth measuring the time.

Tony Chang

Comment 44 2011-11-14 09:38:14 PST

(In reply to comment #38) > Before reviewing that, I'd like to confirm about new-run-webkit-test's behavior: > > Suppose these two conditions are met: > 1. command line option, '--pixel-tests', is given > 2. A test, 'foo.html', doesn't contain 'foo-expected.png', contains only 'foo-expected.txt'. > In this case, current NRWT always uses DRT which started with '--pixel-tests=.../png_resultx.png' even for such a text-only test. > I guess this affects the performance. Is this an intended behavior? I don't think this affects performance. If layoutTestController.dumpAsText() is called, we don't compute the pixel results or write the png file. It shouldn't matter which DRT we pass a dumpAsText() test to.

Dirk Pranke

Comment 45 2011-11-14 12:47:35 PST

(In reply to comment #39) > I think it is intended. If the test author check-in .txt to cross-platform dir. and t hen .png to platform-specific dir., we'd still like to get .png out of DRT, right? Correct. In the absence of being able to determine that this is a text-only test, we have to assume that this is a pixel test and pass the correct flags to DRT (and assume that the PNG is simply MISSING).

Dirk Pranke

Comment 46 2011-11-14 13:09:13 PST

Comment on attachment 114793 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=114793&action=review > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:130 > + return self._run_reftest() Can you restructure this function so that we test for reftests first, and use the same dummy/skip logic for both the case where we're rebasing or we're skipping the reftests? I think that can eliminate at least one branch and make the logic a little clearer. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:311 > + assert(driver_output1.image_hash or driver_output2.image_hash) Why is this an OR and not an AND? I guess I'm not sure why you're allowed to assert anything here if it isn't an AND? > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:257 > + return RefTestDriver(self, worker_number, ChromiumDriver) I think it might make more sense to make the "driver_instance_constructor" argument a separate property of the Port object, and then pull create_driver() up into base.Port, since the rest of the code is the same. Then you can insulate the port subclasses from the existence of the RefTestDriver (or whatever we end up calling it). > Tools/Scripts/webkitpy/layout_tests/port/driver.py:75 > + self._pixel_tests = force_pixel_results or (self._port and self._port.get_option('pixel_tests')) Nit: I think port should always be non-NULL, so you probably don't need that check? > Tools/Scripts/webkitpy/layout_tests/port/driver.py:104 > +class RefTestDriver: Nit: subclass from object. I think this design makes sense, but I'm not wild about the class name; it makes me think that this driver is geared specifically to ref tests, as opposed to a delegating proxy that picks the right driver depending on the test. Maybe rename this to something like "DriverProxy" (not that that is a great name either) and add a comment as to why it's necessary? > Tools/Scripts/webkitpy/layout_tests/port/driver.py:108 > + self._pixel_driver = self._driver I might rename this _reftest_driver instead of _pixel_driver, since the code is written to only use this field if the test is a reftest. > Tools/Scripts/webkitpy/layout_tests/port/driver.py:127 > + cmd_line += ['; '] + self._pixel_driver.cmd_line() This doesn't seem right (or perhaps safe) ... cmd_line() is used for logging but I think it is also to actually spawn the DRTs (e.g., when it's passed to ServerProcess in the webkit ports, and when MockDRT uses it). We might have to return a list of cmd lines or something instead. > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:444 > + Driver.__init__(self, port, worker_number, force_pixel_results) It would be good to add a comment or a docstring explaining what force_pixel_results is for.

Tony Chang

Comment 47 2011-11-14 15:46:27 PST

Created attachment 115047 [details] Patch

Tony Chang

Comment 48 2011-11-14 15:51:59 PST

Comment on attachment 114793 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=114793&action=review I made all the suggested changes except those mentioned below. >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:311 >> + assert(driver_output1.image_hash or driver_output2.image_hash) > > Why is this an OR and not an AND? I guess I'm not sure why you're allowed to assert anything here if it isn't an AND? Another way to write it would be "driver_output1.image_hash is not None and driver_output2.image_hash is not None". This was why it used to say PASS previously, we were getting no image hashes but None == None, so the test was passing. I could also return a test failure in this case, but I'm not sure that would be any clearer. >> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:257 >> + return RefTestDriver(self, worker_number, ChromiumDriver) > > I think it might make more sense to make the "driver_instance_constructor" argument a separate property of the Port object, and then pull create_driver() up into base.Port, since the rest of the code is the same. Then you can insulate the port subclasses from the existence of the RefTestDriver (or whatever we end up calling it). I called it driver_class. >> Tools/Scripts/webkitpy/layout_tests/port/driver.py:75 >> + self._pixel_tests = force_pixel_results or (self._port and self._port.get_option('pixel_tests')) > > Nit: I think port should always be non-NULL, so you probably don't need that check? There were tests in base_unittests.py that passed in None for the port. I fixed the tests instead. >> Tools/Scripts/webkitpy/layout_tests/port/driver.py:127 >> + cmd_line += ['; '] + self._pixel_driver.cmd_line() > > This doesn't seem right (or perhaps safe) ... cmd_line() is used for logging but I think it is also to actually spawn the DRTs (e.g., when it's passed to ServerProcess in the webkit ports, and when MockDRT uses it). We might have to return a list of cmd lines or something instead. cmd_line() is only used to spawn DRTs within a driver (e.g., WebKitDriver uses it for that). It's safe to change it here because DriverProxy never calls cmd_line() directly. AFAICT, the API is only for logging. >> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:444 >> + Driver.__init__(self, port, worker_number, force_pixel_results) > > It would be good to add a comment or a docstring explaining what force_pixel_results is for. Do you think we could name this param so no comment is needed? Maybe force_drt_to_dump_pixel_results or something?

Dirk Pranke

Comment 49 2011-11-14 16:21:23 PST

Comment on attachment 115047 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=115047&action=review (In reply to comment #48) > >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:311 > >> + assert(driver_output1.image_hash or driver_output2.image_hash) > > > > Why is this an OR and not an AND? I guess I'm not sure why you're allowed to assert anything here if it isn't an AND? > > Another way to write it would be "driver_output1.image_hash is not None and driver_output2.image_hash is not None". This was why it used to say PASS previously, we were getting no image hashes but None == None, so the test was passing. > Got it. > >> Tools/Scripts/webkitpy/layout_tests/port/driver.py:75 > >> + self._pixel_tests = force_pixel_results or (self._port and self._port.get_option('pixel_tests')) > > > > Nit: I think port should always be non-NULL, so you probably don't need that check? > > There were tests in base_unittests.py that passed in None for the port. I fixed the tests instead. > Great. > >> Tools/Scripts/webkitpy/layout_tests/port/driver.py:127 > >> + cmd_line += ['; '] + self._pixel_driver.cmd_line() > > > > This doesn't seem right (or perhaps safe) ... cmd_line() is used for logging but I think it is also to actually spawn the DRTs (e.g., when it's passed to ServerProcess in the webkit ports, and when MockDRT uses it). We might have to return a list of cmd lines or something instead. > > cmd_line() is only used to spawn DRTs within a driver (e.g., WebKitDriver uses it for that). It's safe to change it here because DriverProxy never calls cmd_line() directly. AFAICT, the API is only for logging. > Hmm. I'm not quite sure I follow you, but I'll take your word for it. > Tools/Scripts/webkitpy/layout_tests/port/driver.py:75 > + self._pixel_tests = force_pixel_results or self._port.get_option('pixel_tests') Thinking about this further ... can we rearrange this slightly so that the Driver doesn't need to know about self._port.get_option('pixel_tests')? I.e., since Drivers are now only constructed by Port.create_driver(), just modify that code to know whether this driver needs pixel_tests (and then we can just call this parameter pixel_tests). > Tools/Scripts/webkitpy/layout_tests/port/driver.py:114 > + self._reftest_driver = driver_instance_constructor(port, worker_number, force_pixel_results=True) See comments above ... I think we can restructure this so that force_pixel_tests can be renamed pixel_tests and we can explicitly tell the constructor whether we want pixel tests or not.

Hayato Ito

Comment 50 2011-11-14 19:36:52 PST

Thank you for the explanation. My understanding is that as long as a text-only test calls dumpAsText(), we don't have to worry about the performance even if we use DRT which starts with 'pixel-tests' option because it doesn't dump pixels at all. We can assume that all existing tests follow this rule. (In reply to comment #45) > (In reply to comment #39) > > I think it is intended. If the test author check-in .txt to cross-platform dir. and t hen .png to platform-specific dir., we'd still like to get .png out of DRT, right? > > Correct. In the absence of being able to determine that this is a text-only test, we have to assume that this is a pixel test and pass the correct flags to DRT (and assume that the PNG is simply MISSING).

Dirk Pranke

Comment 51 2011-11-14 21:44:44 PST

(In reply to comment #50) > Thank you for the explanation. > My understanding is that as long as a text-only test calls dumpAsText(), we don't have to worry about the performance even if we use DRT which starts with 'pixel-tests' option because it doesn't dump pixels at all. We can assume that all existing tests follow this rule. > Right. There is no performance hit caused by merely passing the flag along, and this is how all the existing tests work.

Ryosuke Niwa

Comment 52 2011-11-14 22:03:23 PST

As I talked about this with Tony, there's a subtle implication on color profile setting on Mac port. Mac port's DRT sets up the color profile at start and then tear it down when the program exists. Setting new color profile is extremely annoying and takes a long time (1-2s).

Simon Fraser (smfr)

Comment 53 2011-11-15 10:53:12 PST

We may no longer have to change the entire display profile; we should just be able to set a profile on the DRT window, now that Leopard is no longer an issue.

Ryosuke Niwa

Comment 54 2011-11-15 10:57:51 PST

(In reply to comment #53) > We may no longer have to change the entire display profile; we should just be able to set a profile on the DRT window, now that Leopard is no longer an issue. That's good to know! FYI, Chromium Mac port has a special script to set/unset color profile so resolving this issue on Mac port will let us move forward.

Tony Chang

Comment 55 2011-11-15 11:16:48 PST

Created attachment 115206 [details] Patch

Tony Chang

Comment 56 2011-11-15 11:17:47 PST

Comment on attachment 115047 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=115047&action=review >>> Tools/Scripts/webkitpy/layout_tests/port/driver.py:75 >>> + self._pixel_tests = force_pixel_results or self._port.get_option('pixel_tests') >> >> Great. > > Thinking about this further ... can we rearrange this slightly so that the Driver doesn't need to know about self._port.get_option('pixel_tests')? I.e., since Drivers are now only constructed by Port.create_driver(), just modify that code to know whether this driver needs pixel_tests (and then we can just call this parameter pixel_tests). OK.

Dirk Pranke

Comment 57 2011-11-15 12:12:26 PST

Comment on attachment 115206 [details] Patch Almost there. This feels pretty clean to me, now. View in context: https://bugs.webkit.org/attachment.cgi?id=115206&action=review > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:402 > + def __init__(self, port, worker_number, pixel_tests=False): Nit: I think you should remove the default value for pixel_tests. > Tools/Scripts/webkitpy/layout_tests/port/dryrun.py:103 > + return DryrunDriver(self, worker_number, pixel_tests=False) I think you can just delete this method and implement _driver_class(). I'm pretty sure this change would cause pixel tests to be ignored, which would be wrong. > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:443 > + def __init__(self, port, worker_number, pixel_tests=False): Nit: again, remove the default value. > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:155 > + TestDriver.__init__(self, port, worker_number, pixel_tests=False) Nit: pass in pixel_tests instead, maybe?

Tony Chang

Comment 58 2011-11-15 12:34:28 PST

Created attachment 115219 [details] Patch

Tony Chang

Comment 59 2011-11-15 12:37:11 PST

Comment on attachment 115206 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=115206&action=review >> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:402 >> + def __init__(self, port, worker_number, pixel_tests=False): > > Nit: I think you should remove the default value for pixel_tests. I kept it here and in WebKitDriver because there are multiple callers that need to be updated that I hope are only tests. I would rather do that in a separate patch since this patch is already large and likely to break something even without making that change. >> Tools/Scripts/webkitpy/layout_tests/port/dryrun.py:103 >> + return DryrunDriver(self, worker_number, pixel_tests=False) > > I think you can just delete this method and implement _driver_class(). I'm pretty sure this change would cause pixel tests to be ignored, which would be wrong. OK. >> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:155 >> + TestDriver.__init__(self, port, worker_number, pixel_tests=False) > > Nit: pass in pixel_tests instead, maybe? OK.

Dirk Pranke

Comment 60 2011-11-15 13:54:23 PST

(In reply to comment #59) > (From update of attachment 115206 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=115206&action=review > > >> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:402 > >> + def __init__(self, port, worker_number, pixel_tests=False): > > > > Nit: I think you should remove the default value for pixel_tests. > > I kept it here and in WebKitDriver because there are multiple callers that need to be updated that I hope are only tests. I would rather do that in a separate patch since this patch is already large and likely to break something even without making that change. > Sure. Thanks for making all the changes.

Tony Chang

Comment 61 2011-11-15 14:30:28 PST

Committed r100326: <http://trac.webkit.org/changeset/100326>

Attachments
Patch (4.23 KB, patch) 2011-11-09 12:23 PST, Tony Chang	no flags	Details Formatted Diff Diff
Patch (18.83 KB, patch) 2011-11-11 16:03 PST, Tony Chang	no flags	Details Formatted Diff Diff
Patch (21.75 KB, patch) 2011-11-14 15:46 PST, Tony Chang	no flags	Details Formatted Diff Diff
Patch (22.09 KB, patch) 2011-11-15 11:16 PST, Tony Chang	no flags	Details Formatted Diff Diff
Patch (22.13 KB, patch) 2011-11-15 12:34 PST, Tony Chang	dpranke: review+	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.