Bug 60605 - [NRWT] Reftests should run even when pixel tests are disabled.
Summary: [NRWT] Reftests should run even when pixel tests are disabled.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Tony Chang
URL:
Keywords:
: 71158 (view as bug list)
Depends on: 72038
Blocks: 36065 71779
  Show dependency treegraph
 
Reported: 2011-05-10 20:18 PDT by Hayato Ito
Modified: 2011-11-15 14:30 PST (History)
13 users (show)

See Also:


Attachments
Patch (4.23 KB, patch)
2011-11-09 12:23 PST, Tony Chang
no flags Details | Formatted Diff | Diff
Patch (18.83 KB, patch)
2011-11-11 16:03 PST, Tony Chang
no flags Details | Formatted Diff | Diff
Patch (21.75 KB, patch)
2011-11-14 15:46 PST, Tony Chang
no flags Details | Formatted Diff | Diff
Patch (22.09 KB, patch)
2011-11-15 11:16 PST, Tony Chang
no flags Details | Formatted Diff | Diff
Patch (22.13 KB, patch)
2011-11-15 12:34 PST, Tony Chang
dpranke: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hayato Ito 2011-05-10 20:18:58 PDT
In current implementation of reftests, reftests don't run if pixel tests are disabled.
Which was introduced in http://trac.webkit.org/changeset/86126 as a temporary fix to avoid giving wrong test results.

It is confusing for end-users that '--pixel-tests' command line option affects the behavior of reftests.

We should run reftests regardless of whether 'pixel tests' are enabled or not.
Comment 1 Eric Seidel (no email) 2011-10-12 14:35:16 PDT
Do we still have ref tests?
Comment 2 Dirk Pranke 2011-10-12 14:50:38 PDT
Yes.
Comment 3 Eric Seidel (no email) 2011-10-12 14:52:15 PDT
How many reftests do we have, and can we have more of them? :)
Comment 4 Dirk Pranke 2011-10-12 14:54:25 PDT
I believe less than a handful, and I believe we were gating reftests on converting everyone over to NRWT (or at least getting critical mass). I would suggest that as soon as the WK2 bots are running NRWT, that's good enough (since I think that would just leave the win bot running ORWT).
Comment 5 Eric Seidel (no email) 2011-10-12 14:56:32 PDT
Sounds good.  I'll polish up my -2 patch for NRWT once I fix bug 69899.
Comment 6 Ojan Vafai 2011-11-02 18:58:11 PDT
*** Bug 71158 has been marked as a duplicate of this bug. ***
Comment 7 Ryosuke Niwa 2011-11-07 20:17:16 PST
I'm concerned that running reftests as implemented now might be as slow as running pixel tests. If that were the case, then always enabling reftests defeats the point of not running pixel tests (fast cycle time).

Do we have data to suggest otherwise? (i.e. running reftests is as fast as running render tree dump tests) Or are we going to compare render tree dumps when -p is not passed?
Comment 8 Dirk Pranke 2011-11-07 20:38:06 PST
(In reply to comment #7)
> I'm concerned that running reftests as implemented now might be as slow as running pixel tests. If that were the case, then always enabling reftests defeats the point of not running pixel tests (fast cycle time).
> 
> Do we have data to suggest otherwise? (i.e. running reftests is as fast as running render tree dump tests) Or are we going to compare render tree dumps when -p is not passed?

I have always assumed that people don't run pixel tests because the pixel tests fail (the png's aren't correct), not that they were too slow. Hence, I support this change.

It is possible that they were slower with ORWT, but given NRWT running with more than one thread, this would seem like not much of an issue.

That said, I assume running reftests are significantly slower than either pixel tests or text-only tests, since we have to do double the work.
Comment 9 Ryosuke Niwa 2011-11-07 21:01:46 PST
(In reply to comment #8)
> I have always assumed that people don't run pixel tests because the pixel tests fail (the png's aren't correct), not that they were too slow. Hence, I support this change.

The last time I checked generating/comparing pixel tests were significantly slower than generating/comparing render tree dumps.

> It is possible that they were slower with ORWT, but given NRWT running with more than one thread, this would seem like not much of an issue.

Possible. It'll be nice to get numbers.
Comment 10 Dirk Pranke 2011-11-07 22:17:57 PST
(In reply to comment #9)
> (In reply to comment #8)
> > I have always assumed that people don't run pixel tests because the pixel tests fail (the png's aren't correct), not that they were too slow. Hence, I support this change.
> 
> The last time I checked generating/comparing pixel tests were significantly slower than generating/comparing render tree dumps.

Yes, I am sure that this is true.

> 
> > It is possible that they were slower with ORWT, but given NRWT running with more than one thread, this would seem like not much of an issue.
> 
> Possible. It'll be nice to get numbers.

I think you're missing my point; I've never heard anyone say that they skip pixel tests w/ NRWT because they're too slow. Possibly this was true with ORWT, but given how much faster NRWT can be than ORWT, I don't think it matters all that much. If someone is worried about cycle time, there are lots of things we can do to improve the cycle time; we should not let that be an argument against getting people to write reftests instead of pixel tests, given the other benefits that they will bring.
Comment 11 Ryosuke Niwa 2011-11-07 22:22:08 PST
(In reply to comment #10)
> (In reply to comment #9)
> > Possible. It'll be nice to get numbers.
> 
> I think you're missing my point; I've never heard anyone say that they skip pixel tests w/ NRWT because they're too slow.

I do that because multi-processing isn't enabled on Mac port by default.

> we should not let that be an argument against getting people to write reftests instead of pixel tests, given the other benefits that they will bring.

I'm not but we certainly need to enable ref tests when -p is not passed because bots don't use -p. Also, I'm inclined to think that we might be better off generating and comparing render tree dump when -p is not passed.
Comment 12 Ojan Vafai 2011-11-07 23:30:19 PST
Pixel tests were disabled because they were hard to maintain due to subtle platform differences and lots of pixel results changing for small changes. Neither of these problems is true for reftests. 

Reftests are likely slower than non-reftests, but I don't expect that this would block turning them on by default. Certainly, noone on webkit-dev seems opposed to it.
Comment 13 Hayato Ito 2011-11-07 23:55:49 PST
Let me loop in.
Apart from the discussion, I guess no one is working on to run reftests even if '-p' is not passed, right?
If so, let me start to implement it.

If '-p' is given, we run *traditional* pixel tests and run also reftests.
If '-p' is not given, we don't run *traditional* pixel tests, but reftests should be run.
Reftests should be run in both cases.

Maybe we should have additional cmd line option, e.g. '--run-reftests', but it should be 'on' by default, shouldn't we?


(In reply to comment #12)
> Pixel tests were disabled because they were hard to maintain due to subtle platform differences and lots of pixel results changing for small changes. Neither of these problems is true for reftests. 
> 
> Reftests are likely slower than non-reftests, but I don't expect that this would block turning them on by default. Certainly, noone on webkit-dev seems opposed to it.
Comment 14 Ryosuke Niwa 2011-11-07 23:58:16 PST
(In reply to comment #13)
> If '-p' is given, we run *traditional* pixel tests and run also reftests.
> If '-p' is not given, we don't run *traditional* pixel tests, but reftests should be run.
> Reftests should be run in both cases.

Right.

> Maybe we should have additional cmd line option, e.g. '--run-reftests', but it should be 'on' by default, shouldn't we?

Agreed.

But I think it's worth hearing what other ports' contributors feel about doing png comparison vs. render tree dump comparison.
Comment 15 Ojan Vafai 2011-11-08 08:40:29 PST
(In reply to comment #14)
> (In reply to comment #13)
> > If '-p' is given, we run *traditional* pixel tests and run also reftests.
> > If '-p' is not given, we don't run *traditional* pixel tests, but reftests should be run.
> > Reftests should be run in both cases.
> 
> Right.

Yup.

> > Maybe we should have additional cmd line option, e.g. '--run-reftests', but it should be 'on' by default, shouldn't we?
> 
> Agreed.
> 
> But I think it's worth hearing what other ports' contributors feel about doing png comparison vs. render tree dump comparison.

I think my email to webkit-dev about turning reftests on by default was sufficient. Everyone who chimed in supports doing it and noone objected. Is there a port your concerned about?
Comment 16 Martin Robinson 2011-11-08 08:43:38 PST
(In reply to comment #15)

> I think my email to webkit-dev about turning reftests on by default was sufficient. Everyone who chimed in supports doing it and noone objected. Is there a port your concerned about?

Speaking from the perspective of one of the ports that has pixel tests support, but does not have pixel tests enabled, I like the idea of reftests by default.
Comment 17 Ryosuke Niwa 2011-11-08 21:21:19 PST
Per IRC discusion, we've agreed to run "traditional" reftests even when -p is not passed.
Comment 18 Tony Chang 2011-11-09 12:23:58 PST
Created attachment 114341 [details]
Patch
Comment 19 Ryosuke Niwa 2011-11-09 12:31:53 PST
Comment on attachment 114341 [details]
Patch

Don't you need to pass -p to DRT?
Comment 20 Tony Chang 2011-11-09 14:09:54 PST
Comment on attachment 114341 [details]
Patch

Ryosuke is right, this says pass but doesn't run the test.
Comment 21 Ryosuke Niwa 2011-11-09 14:22:47 PST
(In reply to comment #20)
> (From update of attachment 114341 [details])
> Ryosuke is right, this says pass but doesn't run the test.

We should probably return MISSING result in such cases.
Comment 22 Hayato Ito 2011-11-09 22:59:34 PST
The problem is that we have to pass an option, like '--pixel-tests=.../png_result_x.png', to DRT for reftests, but we should not pass such an option to other non-pixel tests because it impacts the performance.
Because DRT requires to take such option at startup, we cannot share one DRT instance between reftests and other non-pixel tests.
Is there any way to solve this issue without modifying each DRT port?
I am thinking that we should separate DRT instance so that some DRT instances are only used for reftests. But I am not sure this is a good idea...
Comment 23 Hayato Ito 2011-11-09 23:55:23 PST
Maybe I am too worrying about the performance since chromium port always use '--pixel-tests' and no one doesn't complain about the performance with NRWT in chromium port.
Does anyone have data about other ports?

(In reply to comment #22)
> The problem is that we have to pass an option, like '--pixel-tests=.../png_result_x.png', to DRT for reftests, but we should not pass such an option to other non-pixel tests because it impacts the performance.
> Because DRT requires to take such option at startup, we cannot share one DRT instance between reftests and other non-pixel tests.
> Is there any way to solve this issue without modifying each DRT port?
> I am thinking that we should separate DRT instance so that some DRT instances are only used for reftests. But I am not sure this is a good idea...
Comment 24 Tony Chang 2011-11-10 09:40:37 PST
Sorry, here's my plan:

1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker.
2) When a Worker is created, create 2 Drivers (one with pixel tests one without).
3) Pass both drivers to single_test_runner.
4) In single_test_runner, use the correct driver depending on which test is needed.

This means that each worker may start 2 DRT instances, but only use one at a time.  This would result in a few more DRT instances hanging around (and a bit more memory), but that seems OK.  If running with pixel tests enabled, we would only have 1 DRT instance per worker.

I'm going to split this work into 3 patches.
Comment 25 Ryosuke Niwa 2011-11-10 09:52:25 PST
(In reply to comment #24)
> 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker.
> 2) When a Worker is created, create 2 Drivers (one with pixel tests one without).
> 3) Pass both drivers to single_test_runner.
> 4) In single_test_runner, use the correct driver depending on which test is needed.
> 
> This means that each worker may start 2 DRT instances, but only use one at a time.

Does this mean that those 2 instances are created lazily?
Comment 26 Tony Chang 2011-11-10 09:58:16 PST
(In reply to comment #25)
> (In reply to comment #24)
> > 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker.
> > 2) When a Worker is created, create 2 Drivers (one with pixel tests one without).
> > 3) Pass both drivers to single_test_runner.
> > 4) In single_test_runner, use the correct driver depending on which test is needed.
> > 
> > This means that each worker may start 2 DRT instances, but only use one at a time.
> 
> Does this mean that those 2 instances are created lazily?

Correct. If a worker doesn't have any ref tests, it will only have 1 DRT instance.
Comment 27 Dirk Pranke 2011-11-10 12:20:09 PST
(In reply to comment #24)
> Sorry, here's my plan:
> 
> 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker.
> 2) When a Worker is created, create 2 Drivers (one with pixel tests one without).
> 3) Pass both drivers to single_test_runner.
> 4) In single_test_runner, use the correct driver depending on which test is needed.
> 
> This means that each worker may start 2 DRT instances, but only use one at a time.  This would result in a few more DRT instances hanging around (and a bit more memory), but that seems OK.  If running with pixel tests enabled, we would only have 1 DRT instance per worker.
> 
> I'm going to split this work into 3 patches.

Correct me if I'm wrong, but this isn't "a few" more instances, this is 2x the instances, right? i.e., if I'm running w/ 8 workers, I will have up to 16 DRTs (although at most 8 of them doing something at any one time)? 

Rather than doing this, you could perhaps restart DRT if it needs to switch between "wants pixel tests" and doesn't want pixel tests. This seems a bit conceptually simpler, although it might be slightly slower.
Comment 28 Eric Seidel (no email) 2011-11-10 12:33:38 PST
We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests.  pixel testing isn't stateful.  Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :)
Comment 29 Dirk Pranke 2011-11-10 12:37:53 PST
(In reply to comment #28)
> We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests.  pixel testing isn't stateful.  Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :)

True. We already pass the image has we're expecting for each test, so either passing in a magic value or another parameter would do the trick.
Comment 30 Tony Chang 2011-11-10 12:39:49 PST
(In reply to comment #27)
> Correct me if I'm wrong, but this isn't "a few" more instances, this is 2x the instances, right? i.e., if I'm running w/ 8 workers, I will have up to 16 DRTs (although at most 8 of them doing something at any one time)? 

Correct, but only if all workers hit a ref test.  Currently, that means a few more DRTs, but in the future, it probably means 2x instances.

> Rather than doing this, you could perhaps restart DRT if it needs to switch between "wants pixel tests" and doesn't want pixel tests. This seems a bit conceptually simpler, although it might be slightly slower.

I considered doing this first, but the abstraction makes it hard to do this.  Worker creates DRT instances, but single_test_runner makes decisions about whether we have a ref test or not.  It seemed more complicated (both conceptually and in code) to move this around.

Also, the slowness could be pretty extreme since ref tests are mixed with non-ref tests.

The extra memory use here should be minimal and memory is not currently a limiting factor in running layout tests.

(In reply to comment #28)
> We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests.  pixel testing isn't stateful.  Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :)

I considered this as well, but updating 4+ DRTs seems like more effort than it was worth.
Comment 31 Dirk Pranke 2011-11-10 12:48:33 PST
(In reply to comment #30)
> I considered doing this first, but the abstraction makes it hard to do this.  Worker creates DRT instances, but single_test_runner makes decisions about whether we have a ref test or not.  It seemed more complicated (both conceptually and in code) to move this around.
>

Sorry, I meant that you could make the change in bug 72038, so that worker and single_test_runner get out of the start() business, and then you could decide whether you needed to restart DRT inside run_test as necessary.

I think w/ the change in 72038, we can then play with the logic we want to use purely inside run_test(), so I don't feel strongly about the approach now. We can change it later if it turns out to matter.
Comment 32 Eric Seidel (no email) 2011-11-10 13:06:16 PST
> (In reply to comment #28)
> > We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests.  pixel testing isn't stateful.  Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :)
> 
> I considered this as well, but updating 4+ DRTs seems like more effort than it was worth.

This is a huge barrier to entry we have.  We need to continue to share more code between DRT instances.  There is no reason they shouldn't all share the same main loop, in which case changes like this would be absolutely trivial.
Comment 33 Ryosuke Niwa 2011-11-10 13:28:12 PST
(In reply to comment #27)
> Correct me if I'm wrong, but this isn't "a few" more instances, this is 2x the instances, right? i.e., if I'm running w/ 8 workers, I will have up to 16 DRTs (although at most 8 of them doing something at any one time)? 

I don't think this would be an issue. My machine isn't certainly going out of memory even if I had 30+ chrome renderers open. If we can't have 16 DRT instances on 8 core machines, we're definitely doing something wrong. Also DRT loads about:blank as soon as the test is finished to clear states.

> Rather than doing this, you could perhaps restart DRT if it needs to switch between "wants pixel tests" and doesn't want pixel tests. This seems a bit conceptually simpler, although it might be slightly slower.

I'm unsupportive of this approach given it might slow down nrwt.

(In reply to comment #28)
> We also could simply add a command to DRTs to allow them to dump certain tests as pixel tests.  pixel testing isn't stateful.  Since DRT already is (sorta) speaking http, think of it like requesting a differnet page on the same server. :)

Mac port's DRT changes the color profile if pixel tests are enabled. This probably doesn't matter for reftests unless the test is dependent on the color profile configuration provided under --pixel.(In reply to comment #30)

(In reply to comment #30)
> I considered this as well, but updating 4+ DRTs seems like more effort than it was worth.

And WebKitTestRunner for WebKit2.

(In reply to comment #32)
> This is a huge barrier to entry we have.  We need to continue to share more code between DRT instances.  There is no reason they shouldn't all share the same main loop, in which case changes like this would be absolutely trivial.

I believe WebKit2 does that in WebKitTestRunner.
Comment 34 Hayato Ito 2011-11-10 23:12:50 PST
Instead of passing two drivers into single_test_runner, how about driver itself maintains two DRT instances, and use correct one inside of driver.run_test()? Although we have to add additional flag to DriverInput class so that driver can easily know which DRT should be used inside of its run_test, this make the implementation simpler, doesn't it?

(In reply to comment #24)
> Sorry, here's my plan:
> 
> 1) Refactor Driver so start() is call lazily rather than having to manually call it from Worker.
> 2) When a Worker is created, create 2 Drivers (one with pixel tests one without).
> 3) Pass both drivers to single_test_runner.
> 4) In single_test_runner, use the correct driver depending on which test is needed.
> 
> This means that each worker may start 2 DRT instances, but only use one at a time.  This would result in a few more DRT instances hanging around (and a bit more memory), but that seems OK.  If running with pixel tests enabled, we would only have 1 DRT instance per worker.
> 
> I'm going to split this work into 3 patches.
Comment 35 Ryosuke Niwa 2011-11-10 23:38:28 PST
(In reply to comment #34)
> Instead of passing two drivers into single_test_runner, how about driver itself maintains two DRT instances, and use correct one inside of driver.run_test()? Although we have to add additional flag to DriverInput class so that driver can easily know which DRT should be used inside of its run_test, this make the implementation simpler, doesn't it?

That sounds like a good idea. It also makes the transition easier if we ever wanted to make DRT natively support switching between two modes.
Comment 36 Tony Chang 2011-11-11 16:03:08 PST
Created attachment 114793 [details]
Patch
Comment 37 Tony Chang 2011-11-11 16:05:35 PST
(In reply to comment #34)
> Instead of passing two drivers into single_test_runner, how about driver itself maintains two DRT instances, and use correct one inside of driver.run_test()? Although we have to add additional flag to DriverInput class so that driver can easily know which DRT should be used inside of its run_test, this make the implementation simpler, doesn't it?

This is a good idea.  I made a new driver called RefTestDriver that handles this logic.  I made a new driver to avoid having to make duplicate changes in ChromiumDriver and WebKitDriver which don't share that much right now.

If the patch is too big to review but the design is OK, let me know and I can break it out into some smaller patches.
Comment 38 Hayato Ito 2011-11-13 18:28:43 PST
Before reviewing that, I'd like to confirm about new-run-webkit-test's behavior:

Suppose these two conditions are met:
  1. command line option, '--pixel-tests', is given
  2. A test, 'foo.html', doesn't contain 'foo-expected.png', contains only 'foo-expected.txt'.
In this case, current NRWT always uses DRT which started with '--pixel-tests=.../png_resultx.png' even for such a text-only test.
I guess this affects the performance. Is this an intended behavior?

If this is not a desired behavior, it might be good opportunity to change this behavior because now we can choose correct DRT at runtime for each test.
Comment 39 Ryosuke Niwa 2011-11-13 19:48:56 PST
(In reply to comment #38)
> Before reviewing that, I'd like to confirm about new-run-webkit-test's behavior:
> 
> Suppose these two conditions are met:
>   1. command line option, '--pixel-tests', is given
>   2. A test, 'foo.html', doesn't contain 'foo-expected.png', contains only 'foo-expected.txt'.
> In this case, current NRWT always uses DRT which started with '--pixel-tests=.../png_resultx.png' even for such a text-only test.
> I guess this affects the performance. Is this an intended behavior?

I think it is intended. If the test author check-in .txt to cross-platform dir. and t hen .png to platform-specific dir., we'd still like to get .png out of DRT, right?
Comment 40 Hayato Ito 2011-11-13 20:04:47 PST
I meant that 'foo.html' doesn't have any '-expected.png' anywhere, either in cross-platform or platform-specific dir. We can judge that for each tests before calling driver.run_test for that test.

(In reply to comment #39)
> (In reply to comment #38)
> > Before reviewing that, I'd like to confirm about new-run-webkit-test's behavior:
> > 
> > Suppose these two conditions are met:
> >   1. command line option, '--pixel-tests', is given
> >   2. A test, 'foo.html', doesn't contain 'foo-expected.png', contains only 'foo-expected.txt'.
> > In this case, current NRWT always uses DRT which started with '--pixel-tests=.../png_resultx.png' even for such a text-only test.
> > I guess this affects the performance. Is this an intended behavior?
> 
> I think it is intended. If the test author check-in .txt to cross-platform dir. and t hen .png to platform-specific dir., we'd still like to get .png out of DRT, right?
Comment 41 Ryosuke Niwa 2011-11-13 20:14:10 PST
(In reply to comment #40)
> I meant that 'foo.html' doesn't have any '-expected.png' anywhere, either in cross-platform or platform-specific dir. We can judge that for each tests before calling driver.run_test for that test.

That sounds like a reasonable idea if neither --rest-results nor --new-baseline is specified. But accessing many different directories itself could be expensive so it might be worth measuring the time.
Comment 42 Hayato Ito 2011-11-13 20:23:29 PST
Yeah. Actually, it seems that such accessing many different directories is already done in single_test_runner.driver_input(), which calls port.expected_checksum(), which checks each directories to retrieve correct 'expected' checksum. So this shouldn't increase the runtime cost.

(In reply to comment #41)
> (In reply to comment #40)
> > I meant that 'foo.html' doesn't have any '-expected.png' anywhere, either in cross-platform or platform-specific dir. We can judge that for each tests before calling driver.run_test for that test.
> 
> That sounds like a reasonable idea if neither --rest-results nor --new-baseline is specified. But accessing many different directories itself could be expensive so it might be worth measuring the time.
Comment 43 Hayato Ito 2011-11-13 20:32:45 PST
The point is:
If ((test.is_reftest) or (('options.pixel_tests is true') and ('we can find an expected checksum for that test'))), we should use pixel_driver.

The condition will be more complex because we should consider also 'options.new_baseline' and 'options.result_resutls'...


(In reply to comment #42)
> Yeah. Actually, it seems that such accessing many different directories is already done in single_test_runner.driver_input(), which calls port.expected_checksum(), which checks each directories to retrieve correct 'expected' checksum. So this shouldn't increase the runtime cost.
> 
> (In reply to comment #41)
> > (In reply to comment #40)
> > > I meant that 'foo.html' doesn't have any '-expected.png' anywhere, either in cross-platform or platform-specific dir. We can judge that for each tests before calling driver.run_test for that test.
> > 
> > That sounds like a reasonable idea if neither --rest-results nor --new-baseline is specified. But accessing many different directories itself could be expensive so it might be worth measuring the time.
Comment 44 Tony Chang 2011-11-14 09:38:14 PST
(In reply to comment #38)
> Before reviewing that, I'd like to confirm about new-run-webkit-test's behavior:
> 
> Suppose these two conditions are met:
>   1. command line option, '--pixel-tests', is given
>   2. A test, 'foo.html', doesn't contain 'foo-expected.png', contains only 'foo-expected.txt'.
> In this case, current NRWT always uses DRT which started with '--pixel-tests=.../png_resultx.png' even for such a text-only test.
> I guess this affects the performance. Is this an intended behavior?

I don't think this affects performance.  If layoutTestController.dumpAsText() is called, we don't compute the pixel results or write the png file.  It shouldn't matter which DRT we pass a dumpAsText() test to.
Comment 45 Dirk Pranke 2011-11-14 12:47:35 PST
(In reply to comment #39)
> I think it is intended. If the test author check-in .txt to cross-platform dir. and t hen .png to platform-specific dir., we'd still like to get .png out of DRT, right?

Correct. In the absence of being able to determine that this is a text-only test, we have to assume that this is a pixel test and pass the correct flags to DRT (and assume that the PNG is simply MISSING).
Comment 46 Dirk Pranke 2011-11-14 13:09:13 PST
Comment on attachment 114793 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=114793&action=review

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:130
> +            return self._run_reftest()

Can you restructure this function so that we test for reftests first, and use the same dummy/skip logic for both the case where we're rebasing or we're skipping the reftests? I think that can eliminate at least one branch and make the logic a little clearer.

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:311
> +        assert(driver_output1.image_hash or driver_output2.image_hash)

Why is this an OR and not an AND? I guess I'm not sure why you're allowed to assert anything here if it isn't an AND?

> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:257
> +        return RefTestDriver(self, worker_number, ChromiumDriver)

I think it might make more sense to make the "driver_instance_constructor" argument a separate property of the Port object, and then pull create_driver() up into base.Port, since the rest of the code is the same. Then you can insulate the port subclasses from the existence of the RefTestDriver (or whatever we end up calling it).

> Tools/Scripts/webkitpy/layout_tests/port/driver.py:75
> +        self._pixel_tests = force_pixel_results or (self._port and self._port.get_option('pixel_tests'))

Nit: I think port should always be non-NULL, so you probably don't need that check?

> Tools/Scripts/webkitpy/layout_tests/port/driver.py:104
> +class RefTestDriver:

Nit: subclass from object.

I think this design makes sense, but I'm not wild about the class name; it makes me think that this driver is geared specifically to ref tests, as opposed to a delegating proxy that picks the right driver depending on the test.

Maybe rename this to something like "DriverProxy" (not that that is a great name either) and add a comment as to why it's necessary?

> Tools/Scripts/webkitpy/layout_tests/port/driver.py:108
> +            self._pixel_driver = self._driver

I might rename this _reftest_driver instead of _pixel_driver, since the code is written to only use this field if the test is a reftest.

> Tools/Scripts/webkitpy/layout_tests/port/driver.py:127
> +            cmd_line += ['; '] + self._pixel_driver.cmd_line()

This doesn't seem right (or perhaps safe) ...  cmd_line() is used for logging but I think it is also to actually spawn the DRTs (e.g., when it's passed to ServerProcess in the webkit ports, and when MockDRT uses it). We might have to return a list of cmd lines or something instead.

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:444
> +        Driver.__init__(self, port, worker_number, force_pixel_results)

It would be good to add a comment or a docstring explaining what force_pixel_results is for.
Comment 47 Tony Chang 2011-11-14 15:46:27 PST
Created attachment 115047 [details]
Patch
Comment 48 Tony Chang 2011-11-14 15:51:59 PST
Comment on attachment 114793 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=114793&action=review

I made all the suggested changes except those mentioned below.

>> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:311
>> +        assert(driver_output1.image_hash or driver_output2.image_hash)
> 
> Why is this an OR and not an AND? I guess I'm not sure why you're allowed to assert anything here if it isn't an AND?

Another way to write it would be "driver_output1.image_hash is not None and driver_output2.image_hash is not None".  This was why it used to say PASS previously, we were getting no image hashes but None == None, so the test was passing.

I could also return a test failure in this case, but I'm not sure that would be any clearer.

>> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:257
>> +        return RefTestDriver(self, worker_number, ChromiumDriver)
> 
> I think it might make more sense to make the "driver_instance_constructor" argument a separate property of the Port object, and then pull create_driver() up into base.Port, since the rest of the code is the same. Then you can insulate the port subclasses from the existence of the RefTestDriver (or whatever we end up calling it).

I called it driver_class.

>> Tools/Scripts/webkitpy/layout_tests/port/driver.py:75
>> +        self._pixel_tests = force_pixel_results or (self._port and self._port.get_option('pixel_tests'))
> 
> Nit: I think port should always be non-NULL, so you probably don't need that check?

There were tests in base_unittests.py that passed in None for the port.  I fixed the tests instead.

>> Tools/Scripts/webkitpy/layout_tests/port/driver.py:127
>> +            cmd_line += ['; '] + self._pixel_driver.cmd_line()
> 
> This doesn't seem right (or perhaps safe) ...  cmd_line() is used for logging but I think it is also to actually spawn the DRTs (e.g., when it's passed to ServerProcess in the webkit ports, and when MockDRT uses it). We might have to return a list of cmd lines or something instead.

cmd_line() is only used to spawn DRTs within a driver (e.g., WebKitDriver uses it for that).  It's safe to change it here because DriverProxy never calls cmd_line() directly.  AFAICT, the API is only for logging.

>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:444
>> +        Driver.__init__(self, port, worker_number, force_pixel_results)
> 
> It would be good to add a comment or a docstring explaining what force_pixel_results is for.

Do you think we could name this param so no comment is needed?  Maybe force_drt_to_dump_pixel_results or something?
Comment 49 Dirk Pranke 2011-11-14 16:21:23 PST
Comment on attachment 115047 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=115047&action=review

(In reply to comment #48)
> >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:311
> >> +        assert(driver_output1.image_hash or driver_output2.image_hash)
> > 
> > Why is this an OR and not an AND? I guess I'm not sure why you're allowed to assert anything here if it isn't an AND?
> 
> Another way to write it would be "driver_output1.image_hash is not None and driver_output2.image_hash is not None".  This was why it used to say PASS previously, we were getting no image hashes but None == None, so the test was passing.
> 

Got it.

> >> Tools/Scripts/webkitpy/layout_tests/port/driver.py:75
> >> +        self._pixel_tests = force_pixel_results or (self._port and self._port.get_option('pixel_tests'))
> > 
> > Nit: I think port should always be non-NULL, so you probably don't need that check?
> 
> There were tests in base_unittests.py that passed in None for the port.  I fixed the tests instead.
>

Great.
 
> >> Tools/Scripts/webkitpy/layout_tests/port/driver.py:127
> >> +            cmd_line += ['; '] + self._pixel_driver.cmd_line()
> > 
> > This doesn't seem right (or perhaps safe) ...  cmd_line() is used for logging but I think it is also to actually spawn the DRTs (e.g., when it's passed to ServerProcess in the webkit ports, and when MockDRT uses it). We might have to return a list of cmd lines or something instead.
> 
> cmd_line() is only used to spawn DRTs within a driver (e.g., WebKitDriver uses it for that).  It's safe to change it here because DriverProxy never calls cmd_line() directly.  AFAICT, the API is only for logging.
> 

Hmm. I'm not quite sure I follow you, but I'll take your word for it.

> Tools/Scripts/webkitpy/layout_tests/port/driver.py:75
> +        self._pixel_tests = force_pixel_results or self._port.get_option('pixel_tests')

Thinking about this further ... can we rearrange this slightly so that the Driver doesn't need to know about self._port.get_option('pixel_tests')? I.e., since Drivers are now only constructed by Port.create_driver(), just modify that code to know whether this driver needs pixel_tests (and then we can just call this parameter pixel_tests).

> Tools/Scripts/webkitpy/layout_tests/port/driver.py:114
> +            self._reftest_driver = driver_instance_constructor(port, worker_number, force_pixel_results=True)

See comments above ... I think we can restructure this so that force_pixel_tests can be renamed pixel_tests and we can explicitly tell the constructor whether we want pixel tests or not.
Comment 50 Hayato Ito 2011-11-14 19:36:52 PST
Thank you for the explanation.
My understanding is that as long as a text-only test calls dumpAsText(), we don't have to worry about the performance even if we use DRT which starts with 'pixel-tests' option because it doesn't dump pixels at all. We can assume that all existing tests follow this rule.


(In reply to comment #45)
> (In reply to comment #39)
> > I think it is intended. If the test author check-in .txt to cross-platform dir. and t hen .png to platform-specific dir., we'd still like to get .png out of DRT, right?
> 
> Correct. In the absence of being able to determine that this is a text-only test, we have to assume that this is a pixel test and pass the correct flags to DRT (and assume that the PNG is simply MISSING).
Comment 51 Dirk Pranke 2011-11-14 21:44:44 PST
(In reply to comment #50)
> Thank you for the explanation.
> My understanding is that as long as a text-only test calls dumpAsText(), we don't have to worry about the performance even if we use DRT which starts with 'pixel-tests' option because it doesn't dump pixels at all. We can assume that all existing tests follow this rule.
>

Right. There is no performance hit caused by merely passing the flag along, and this is how all the existing tests work.
Comment 52 Ryosuke Niwa 2011-11-14 22:03:23 PST
As I talked about this with Tony, there's a subtle implication on color profile setting on Mac port. Mac port's DRT sets up the color profile at start and then tear it down when the program exists. Setting new color profile is extremely annoying and takes a long time (1-2s).
Comment 53 Simon Fraser (smfr) 2011-11-15 10:53:12 PST
We may no longer have to change the entire display profile; we should just be able to set a profile on the DRT window, now that Leopard is no longer an issue.
Comment 54 Ryosuke Niwa 2011-11-15 10:57:51 PST
(In reply to comment #53)
> We may no longer have to change the entire display profile; we should just be able to set a profile on the DRT window, now that Leopard is no longer an issue.

That's good to know! FYI, Chromium Mac port has a special script to set/unset color profile so resolving this issue on Mac port will let us move forward.
Comment 55 Tony Chang 2011-11-15 11:16:48 PST
Created attachment 115206 [details]
Patch
Comment 56 Tony Chang 2011-11-15 11:17:47 PST
Comment on attachment 115047 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=115047&action=review

>>> Tools/Scripts/webkitpy/layout_tests/port/driver.py:75
>>> +        self._pixel_tests = force_pixel_results or self._port.get_option('pixel_tests')
>> 
>> Great.
> 
> Thinking about this further ... can we rearrange this slightly so that the Driver doesn't need to know about self._port.get_option('pixel_tests')? I.e., since Drivers are now only constructed by Port.create_driver(), just modify that code to know whether this driver needs pixel_tests (and then we can just call this parameter pixel_tests).

OK.
Comment 57 Dirk Pranke 2011-11-15 12:12:26 PST
Comment on attachment 115206 [details]
Patch

Almost there. This feels pretty clean to me, now.

View in context: https://bugs.webkit.org/attachment.cgi?id=115206&action=review

> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:402
> +    def __init__(self, port, worker_number, pixel_tests=False):

Nit: I think you should remove the default value for pixel_tests.

> Tools/Scripts/webkitpy/layout_tests/port/dryrun.py:103
> +        return DryrunDriver(self, worker_number, pixel_tests=False)

I think you can just delete this method and implement _driver_class(). I'm pretty sure this change would cause pixel tests to be ignored, which would be wrong.

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:443
> +    def __init__(self, port, worker_number, pixel_tests=False):

Nit: again, remove the default value.

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:155
> +            TestDriver.__init__(self, port, worker_number, pixel_tests=False)

Nit: pass in pixel_tests instead, maybe?
Comment 58 Tony Chang 2011-11-15 12:34:28 PST
Created attachment 115219 [details]
Patch
Comment 59 Tony Chang 2011-11-15 12:37:11 PST
Comment on attachment 115206 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=115206&action=review

>> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:402
>> +    def __init__(self, port, worker_number, pixel_tests=False):
> 
> Nit: I think you should remove the default value for pixel_tests.

I kept it here and in WebKitDriver because there are multiple callers that need to be updated that I hope are only tests.  I would rather do that in a separate patch since this patch is already large and likely to break something even without making that change.

>> Tools/Scripts/webkitpy/layout_tests/port/dryrun.py:103
>> +        return DryrunDriver(self, worker_number, pixel_tests=False)
> 
> I think you can just delete this method and implement _driver_class(). I'm pretty sure this change would cause pixel tests to be ignored, which would be wrong.

OK.

>> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:155
>> +            TestDriver.__init__(self, port, worker_number, pixel_tests=False)
> 
> Nit: pass in pixel_tests instead, maybe?

OK.
Comment 60 Dirk Pranke 2011-11-15 13:54:23 PST
(In reply to comment #59)
> (From update of attachment 115206 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=115206&action=review
> 
> >> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:402
> >> +    def __init__(self, port, worker_number, pixel_tests=False):
> > 
> > Nit: I think you should remove the default value for pixel_tests.
> 
> I kept it here and in WebKitDriver because there are multiple callers that need to be updated that I hope are only tests.  I would rather do that in a separate patch since this patch is already large and likely to break something even without making that change.
> 

Sure. Thanks for making all the changes.
Comment 61 Tony Chang 2011-11-15 14:30:28 PST
Committed r100326: <http://trac.webkit.org/changeset/100326>