Bug 36065

Summary: Add Support for Reference Tests (aka reftests) to existing Layout Test harness
Product: WebKit Reporter: Dimitri Glazkov (Google) <dglazkov>
Component: Tools / TestsAssignee: Hayato Ito <hayato>
Status: RESOLVED FIXED    
Severity: Enhancement CC: ap, aroben, darin, diegohcg, dpranke, eric.carlson, fred.wang, hamaji, hayato, krit, mitz, mjs, morrita, ojan, simon.fraser, tonikitoo, vestbo, yuzo
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 58302, 49431, 49835, 49951, 50362, 51091, 55457, 55652, 55654, 55657, 55936, 56076, 56450, 59078, 59188, 60605, 73453    
Bug Blocks:    
Attachments:
Description Flags
snapshot
none
snapshot2
none
for-review
none
synced-to-tot none

Description Dimitri Glazkov (Google) 2010-03-12 11:12:15 PST
This is a master bug and a place of discussion for implementing reftests.

After poking at it a bit, I am going to attempt implementing this in a way that would support using Mozilla's existig reftests out of the box, only if a subset at first.

Reading material:

http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/
https://developer.mozilla.org/en/Creating_reftest-based_unit_tests
http://weblogs.mozillazine.org/roc/archives/2009/01/invalidation_re.html
https://library.mozilla.org/@api/deki/files/88/=Creating_a_reftest.pdf
https://lists.webkit.org/pipermail/webkit-dev/2010-March/011856.html
Comment 1 Dimitri Glazkov (Google) 2010-03-12 11:20:32 PST
Also http://mxr.mozilla.org/mozilla-central/source/layout/reftests/
Comment 2 Dimitri Glazkov (Google) 2010-04-15 09:39:25 PDT
Initial approach:

* Use Mozilla's manifest approach to minimize the need to change tests when importing/reusing. This does mean we'll have two different test frameworks, but given that W3C Testing Task Force is going to use Mozilla reftests, it seems counterintuitive to invent our own adaptation of reftests.

* Place reftests into a special directory, like top-level ReferenceTests or LayoutTests/reference/ to avoid testing frameworks stepping on each other's toes.

* Write harness in python, integrated with new-run-webkit-tests, not worrying about old-run-webkit-tests.

* At first, use unmodified DRT (well, maybe pass a flag to mute render tree dump) to produce pixel results in the harness. Later, teach DRT to dump+compare two files to improve performance.

Thoughts?
Comment 3 Alexey Proskuryakov 2010-04-15 11:08:45 PDT
> * Place reftests into a special directory, like top-level ReferenceTests or
> LayoutTests/reference/ to avoid testing frameworks stepping on each other's
> toes.

It seems easy to ignore tests that have expected results in reftest format in scripts that don't support those. Also, it may be a good idea to run those (to at least catch crashes).

There is some use in having tests organized logically - people sometimes run only a subset of tests during iterative development, to run the whole suite before submitting a patch.

> * Write harness in python, integrated with new-run-webkit-tests, not worrying
> about old-run-webkit-tests.

I'm not sure about this. Last time I heard about it, there was no plan to make new-run-webkit-tests run on Tiger, which means that it probably won't obsolete the existing script in foreseeable future.
Comment 4 Dimitri Glazkov (Google) 2010-04-15 13:13:10 PDT
(In reply to comment #3)
> > * Place reftests into a special directory, like top-level ReferenceTests or
> > LayoutTests/reference/ to avoid testing frameworks stepping on each other's
> > toes.
> 
> It seems easy to ignore tests that have expected results in reftest format in
> scripts that don't support those. Also, it may be a good idea to run those (to
> at least catch crashes).

I agree -- I didn't state anywhere (my bad!) that run-webkit-tests always runs both types of tests.

> There is some use in having tests organized logically - people sometimes run
> only a subset of tests during iterative development, to run the whole suite
> before submitting a patch.

That's a good point. I'll think of an idea here.

> > * Write harness in python, integrated with new-run-webkit-tests, not worrying
> > about old-run-webkit-tests.
> 
> I'm not sure about this. Last time I heard about it, there was no plan to make
> new-run-webkit-tests run on Tiger, which means that it probably won't obsolete
> the existing script in foreseeable future.

Well, that stinks :( -- I'll talk w/dpranke+eseidel about this.
Comment 5 Hayato Ito 2010-07-07 03:55:12 PDT
As discussed in other threads, we've decided to add support for RefTest.
Here is a draft plan to support RefTest in WebKit.
https://docs.google.com/document/edit?id=1bwAzqOKfyB97rSIyLQ5ER6YG6-cEUa34xJ8KKgvdc0o&hl=en&authkey=CL39hMMB

Any suggestions and advice are welcome.
Comment 6 Tor Arne Vestbø 2010-07-07 06:02:38 PDT
(In reply to comment #5)
> As discussed in other threads, we've decided to add support for RefTest.
> Here is a draft plan to support RefTest in WebKit.
> https://docs.google.com/document/edit?id=1bwAzqOKfyB97rSIyLQ5ER6YG6-cEUa34xJ8KKgvdc0o&hl=en&authkey=CL39hMMB
> 
> Any suggestions and advice are welcome.

Great effort!

Could we use the "-ref.html"  (or reference) postfix like we now have -expected.txt to allow new tests written as reftests to be placed into the existing hierarchy of tests (together with the -expected.checksum and -expected.png)? 

That would allow easy transition from -expected.txt with render-tree-output to a -ref.html + png/checksum.

Imported tests (mozilla, W3C) would of course have their own folder, possibly in a subfolder of LayoutTests if they only deal with css for example.
Comment 7 Darin Adler 2010-07-08 11:13:11 PDT
(In reply to comment #6)
> Could we use the "-ref.html"  (or reference) postfix like we now have -expected.txt to allow new tests written as reftests to be placed into the existing hierarchy of tests (together with the -expected.checksum and -expected.png)? 
> 
> That would allow easy transition from -expected.txt with render-tree-output to a -ref.html + png/checksum.
> 
> Imported tests (mozilla, W3C) would of course have their own folder, possibly in a subfolder of LayoutTests if they only deal with css for example.

I like these suggestions. I don’t think we want separate testing frameworks.
Comment 8 Dirk Pranke 2010-07-08 12:52:42 PDT
+1 to -ref.html and integrating tightly into new-run-webkit-tests. I don't think we should have a separate tool or a separate top level directory either.
Comment 9 Hayato Ito 2010-07-08 20:24:34 PDT
(In reply to comment #8)
> +1 to -ref.html and integrating tightly into new-run-webkit-tests. I don't think we should have a separate tool or a separate top level directory either.

Thank you for the comments. I agree. In fact, a typical manifest file used in Mozilla is like:

== appendsingle.html appendsingle-ref.html
== appendmultiple.html appendmultiple-ref.html
== insertsingle.html insertsingle-ref.html
== insertmultiple.html insertmultiple-ref.html

They use convention as:
== SOME-TEST.html SOME-TEST-ref.html

We should follow their convention unless there is a strong reason against it.

As for directory location for reftests, it looks good to me that we reuse existing LayoutTest directory  unless there is a significant maintenance issue.
Comment 10 Simon Fraser (smfr) 2010-07-08 21:24:44 PDT
(In reply to comment #9)

> == appendsingle.html appendsingle-ref.html
> == appendmultiple.html appendmultiple-ref.html
> == insertsingle.html insertsingle-ref.html
> == insertmultiple.html insertmultiple-ref.html
> 
> They use convention as:
> == SOME-TEST.html SOME-TEST-ref.html

Note also that you can re-use the same reference for multiple tests, so don't assume that each test has a corresponding ref. Also, it's possible to test that the rendering does NOT match the ref, which can also be useful.
Comment 11 Darin Adler 2010-07-08 23:29:52 PDT
The current tests work without a separate manifest file, and I’d like to see that continue with the ref tests. Or if we switch to a manifest file, I’d like to do it for all the tests.

I don’t think matching Mozilla’s convention is as important as matching our own convention!
Comment 12 Hayato Ito 2010-07-09 00:09:03 PDT
(In reply to comment #11)
> The current tests work without a separate manifest file, and I’d like to see that continue with the ref tests. Or if we switch to a manifest file, I’d like to do it for all the tests.
> 
> I don’t think matching Mozilla’s convention is as important as matching our own convention!

Thank you for the comment.

Okay. I am planning to support both options, a manifest-file based and simple file-naming-convention based.
Let me explain that.

1. file-naming-convention based

Suppose we have the following html files in a directory:

- foo-test-a.html 
- foo-test-b.html 
- foo-test-ref.html
- bar-test-hello.html 
- bar-test-world.html 
- bar-test-fail-ref.html

And we don't have files named 'foo-test-a-expected.html', 'foo-test-b-expected.html' and so on, we should consider them as reftests as if they were explicitly specified with manifest file, like:

== foo-test-a.html foo-test-ref.html
== foo-test-b.html foo-test-ref.html
!= bar-test-hello.html bar-test-fail-ref.html
!= bar-test-world.html bar-test-fail-ref.html

2. Manifest based

There remains the cases in reftests that we cannot express easily by such a simple naming convention. In that case, we use a manifest files as a last resort. We should try to avoid using manifest file as possible as we can.

It might be challenging that making good file-naming convention because reftests used in Mozilla support many kinds of reftests.
But I believe we can cover most cases by simple naming conventions.
Comment 13 Simon Fraser (smfr) 2010-07-09 08:45:14 PDT
(In reply to comment #11)
> I don’t think matching Mozilla’s convention is as important as matching our own convention!

Matching Mozilla's convention may be useful if we get large test suites from W3C working groups which use that convention. It would be nice to be able to just share these unchanged.

I don't think Manifest maintenance is a burden, and Manifests allow for more testing options.
Comment 14 Darin Adler 2010-07-09 09:00:56 PDT
(In reply to comment #13)
> I don't think Manifest maintenance is a burden, and Manifests allow for more testing options.

If we like manifests, I suggest switching the layout test system to use manifests. What I want to avoid is arbitrary combinations and multiple subtly different testing systems.
Comment 15 Hayato Ito 2010-07-13 22:47:30 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > I don't think Manifest maintenance is a burden, and Manifests allow for more testing options.
> 
> If we like manifests, I suggest switching the layout test system to use manifests. What I want to avoid is arbitrary combinations and multiple subtly different testing systems.

I've tried parsing manifest files used in Mozilla and found that most reftests in Mozilla are 'simple'.
I mean about 90 percent of reftests (4263 out of 4710 reftests) are being specified in either 
  == AAAA.html BBBB-ref.html
or
  != CCCC.html DDDD-ref.html.

So I am moving to adding a support for simple filename-convention based reftests at first.
Note that I am not saying that we don't support a manifest file. It'll be supported later so that we can run reftests imported from Mozilla as is.
Comment 16 Hayato Ito 2010-07-26 06:35:34 PDT
I am planning to adapt the following naming convention for reftests:

1. foo.html and foo-ref.html
We use this naming for a reftest which is expected to match. That corresponds to '= foo.html foo-ref.html' in Mozilla's manifest.

2. foo.html and foo-noref.html
We use this naming for a reftest which is expected not to match. That corresponds to '!= foo.html foo-noref.html'.

These conventions are commonly used in Mozilla's reftest.
 

(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > I don't think Manifest maintenance is a burden, and Manifests allow for more testing options.
> > 
> > If we like manifests, I suggest switching the layout test system to use manifests. What I want to avoid is arbitrary combinations and multiple subtly different testing systems.
> 
> I've tried parsing manifest files used in Mozilla and found that most reftests in Mozilla are 'simple'.
> I mean about 90 percent of reftests (4263 out of 4710 reftests) are being specified in either 
>   == AAAA.html BBBB-ref.html
> or
>   != CCCC.html DDDD-ref.html.
> 
> So I am moving to adding a support for simple filename-convention based reftests at first.
> Note that I am not saying that we don't support a manifest file. It'll be supported later so that we can run reftests imported from Mozilla as is.
Comment 17 Hayato Ito 2010-07-26 06:37:20 PDT
Correction.

(In reply to comment #16)
> I am planning to adapt the following naming convention for reftests:
> 
> 1. foo.html and foo-ref.html
> We use this naming for a reftest which is expected to match. That corresponds to '= foo.html foo-ref.html' in Mozilla's manifest.
> 
> 2. foo.html and foo-noref.html
> We use this naming for a reftest which is expected not to match. That corresponds to '!= foo.html foo-noref.html'.

'2. foo.html and foo-notref.html' is a correct naming.

> 
> These conventions are commonly used in Mozilla's reftest.
> 
> 
> (In reply to comment #15)
> > (In reply to comment #14)
> > > (In reply to comment #13)
> > > > I don't think Manifest maintenance is a burden, and Manifests allow for more testing options.
> > > 
> > > If we like manifests, I suggest switching the layout test system to use manifests. What I want to avoid is arbitrary combinations and multiple subtly different testing systems.
> > 
> > I've tried parsing manifest files used in Mozilla and found that most reftests in Mozilla are 'simple'.
> > I mean about 90 percent of reftests (4263 out of 4710 reftests) are being specified in either 
> >   == AAAA.html BBBB-ref.html
> > or
> >   != CCCC.html DDDD-ref.html.
> > 
> > So I am moving to adding a support for simple filename-convention based reftests at first.
> > Note that I am not saying that we don't support a manifest file. It'll be supported later so that we can run reftests imported from Mozilla as is.
Comment 18 mitz 2010-07-26 09:53:33 PDT
Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ?
Comment 19 Adam Roben (:aroben) 2010-07-26 10:18:35 PDT
(In reply to comment #18)
> Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ?

Would that make it harder to open them in a browser (e.g., by double-clicking in Finder or Windows Explorer)?
Comment 20 mitz 2010-07-26 10:19:26 PDT
(In reply to comment #19)
> (In reply to comment #18)
> > Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ?
> 
> Would that make it harder to open them in a browser (e.g., by double-clicking in Finder or Windows Explorer)?

Of course it would :(
Comment 21 Hayato Ito 2010-07-27 01:45:52 PDT
Thank you for the comment.

(In reply to comment #18)
> Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ?

Yes, adding xx-{ref,notref}.html to LayoutTest directory will upsets existing tools.
Because we will add reftest support to only new-run-webkit-test in the current plan, we have to update other existing tools so that they should ignore such html files related to reftests.

If it will take long time to make new-run-webkit-test default test runner, we might reconsider our plan.
Comment 22 Yuzo Fujishima 2010-07-27 03:04:03 PDT
It seems that only WebKitTools/Scripts/{old-run-webkit-tests,prepare-ChangeLog}
need to be changed to ignore -{not,}ref.html. (See supportedFileExtensions)

Because the needed change seems to be simple, I'd recommend just changing those files
and going ahead, rather than waiting for new-run-webkit-test.

(In reply to comment #21)
> Thank you for the comment.
> 
> (In reply to comment #18)
> > Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ?
> 
> Yes, adding xx-{ref,notref}.html to LayoutTest directory will upsets existing tools.
> Because we will add reftest support to only new-run-webkit-test in the current plan, we have to update other existing tools so that they should ignore such html files related to reftests.
> 
> If it will take long time to make new-run-webkit-test default test runner, we might reconsider our plan.
Comment 23 Yuzo Fujishima 2010-07-27 03:07:28 PDT
(In reply to comment #22)
> It seems that only WebKitTools/Scripts/{old-run-webkit-tests,prepare-ChangeLog}

and WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_files.py

> need to be changed to ignore -{not,}ref.html. (See supportedFileExtensions)
> 
> Because the needed change seems to be simple, I'd recommend just changing those files
> and going ahead, rather than waiting for new-run-webkit-test.
> 
> (In reply to comment #21)
> > Thank you for the comment.
> > 
> > (In reply to comment #18)
> > > Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ?
> > 
> > Yes, adding xx-{ref,notref}.html to LayoutTest directory will upsets existing tools.
> > Because we will add reftest support to only new-run-webkit-test in the current plan, we have to update other existing tools so that they should ignore such html files related to reftests.
> > 
> > If it will take long time to make new-run-webkit-test default test runner, we might reconsider our plan.
Comment 24 Hayato Ito 2010-08-04 20:26:41 PDT
Update on file naming convention:

I've changed my mind to use the following naming conventions for reftests:

1. foo.html and foo-expected.html
   Which are used for reftests where both rendering results are expected to be equal.

2. foo.html and foo-expected-notequal.html
   Which are used for reftests where both rendering results are expected to be different.

The motivation of this naming is to match existing file name conventions. There are already conventions in WebKit Tests such as , 'xxx-expected.txt', 'xxx-expected.png' and 'xxx-expected.checksum'. So it looks natural for me to use 'xxx-expected.YYY' naming pattern for reftests.
Comment 25 Dimitri Glazkov (Google) 2010-08-05 08:39:27 PDT
(In reply to comment #24)
> Update on file naming convention:
> 
> I've changed my mind to use the following naming conventions for reftests:
> 
> 1. foo.html and foo-expected.html
>    Which are used for reftests where both rendering results are expected to be equal.
> 
> 2. foo.html and foo-expected-notequal.html
>    Which are used for reftests where both rendering results are expected to be different.
> 
> The motivation of this naming is to match existing file name conventions. There are already conventions in WebKit Tests such as , 'xxx-expected.txt', 'xxx-expected.png' and 'xxx-expected.checksum'. So it looks natural for me to use 'xxx-expected.YYY' naming pattern for reftests.

Sounds good.
Comment 26 Darin Adler 2010-08-05 11:13:26 PDT
(In reply to comment #24)
> 2. foo.html and foo-expected-notequal.html
>    Which are used for reftests where both rendering results are expected to be different.

I don’t think you should merge the words "not" and "equal" into "notequal".

If you can’t think of a better phrase, then I suggest "expected-not-equal.html" but the phrase "expected not equal" is not good grammar, so maybe we can come up with a better phrase.

Generally speaking, an entire test that only says "these must not be identical" seems like a pretty blunt tool. I mean, it's good to make sure that bolding actually does something, but an entire test of this sort can only test one thing! That seems to be a general downside of ref tests. Each test can only effectively check one thing. Unlike dumpAsText tests, where we can test a lot of things in a single test.
Comment 27 Dirk Pranke 2010-08-05 11:25:47 PDT
(In reply to comment #26)
> (In reply to comment #24)
> > 2. foo.html and foo-expected-notequal.html
> >    Which are used for reftests where both rendering results are expected to be different.
> 
> I don’t think you should merge the words "not" and "equal" into "notequal".
> 
> If you can’t think of a better phrase, then I suggest "expected-not-equal.html" but the phrase "expected not equal" is not good grammar, so maybe we can come up with a better phrase.
> 

Perhaps "expected-mismatch" or "expected-different" ?
Comment 28 Darin Adler 2010-08-05 12:18:27 PDT
(In reply to comment #27)
> Perhaps "expected-mismatch" or "expected-different" ?

I like expected-mismatch.
Comment 29 Dimitri Glazkov (Google) 2010-08-05 12:46:28 PDT
(In reply to comment #28)
> (In reply to comment #27)
> > Perhaps "expected-mismatch" or "expected-different" ?
> 
> I like expected-mismatch.

Me too.
Comment 30 Hayato Ito 2010-08-10 21:11:58 PDT
Thank you for comments. I like xx-expected-mismatch.html too. I'll use that convention.

(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> > > Perhaps "expected-mismatch" or "expected-different" ?
> > 
> > I like expected-mismatch.
> 
> Me too.
Comment 31 Simon Fraser (smfr) 2010-09-30 14:43:59 PDT
Any progress here?
Comment 32 Hayato Ito 2010-09-30 20:45:57 PDT
In progress. I am actually working on adding reftest support.
I think I can submit an initial patch in a week or later for an early review.
That might have a coule of rooms to be improved,  but I need early feedbacks and reviews.

(In reply to comment #31)
> Any progress here?
Comment 33 Hayato Ito 2010-10-19 21:25:15 PDT
Created attachment 71245 [details]
snapshot
Comment 34 Hayato Ito 2010-10-19 21:28:19 PDT
The attachment is not for the review. That needs to be polished for the review.
I am preparing a patch as an initial effort.

(In reply to comment #33)
> Created an attachment (id=71245) [details]
> snapshot
Comment 35 Shinichiro Hamaji 2010-10-19 22:05:51 PDT
Comment on attachment 71245 [details]
snapshot

View in context: https://bugs.webkit.org/attachment.cgi?id=71245&action=review

The overall approach looks fine. What is still missing? Maybe reftest_diff.py needs more work?

I'm very glad to see reftest is becoming a real!

> WebKitTools/Scripts/old-run-webkit-tests:2254
> +    if ($filename =~ /-expected(-mismatch)?\.html$/) {

expectedTag?

Also, we may want to consider adding mismatchTag for consistency?

> WebKitTools/Scripts/old-run-webkit-tests:2273
> +        if (exists $supportedFileExtensions{$extension}) {

I'd say

exists $supportedFileExtensions($extension) && !isUsedInReftest($filename)

> WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:90
> +    """Receives the output from a DumpRenderTree process, subjects it to a

It would be better to have a summary in the first line of a docstring. PEP257 says "Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description.".

http://www.python.org/dev/peps/pep-0257/

It's not a strict rule for us, but we are trying to apply style-related PEPs as possible.

> WebKitTools/Scripts/webkitpy/layout_tests/port/test_files.py:54
> +_reftest_expected_suffixes = set(['-expected.html', 'expected-mismatch.html'])

-expected-mismatch maybe?

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:141
> +    def __repr__(self):

Do we need this? For many objects, eval(repr(obj)) should be equal to obj but this definition doesn't satisfy this convention.
Comment 36 Hayato Ito 2010-10-19 23:54:06 PDT
Thank you for the quick review.

I should have added a comment, 'please don't take a look at the attachment in details. The attachment is only for showing an overall approach. That contains a lot of stuff which are only used for debugging' :)
It was my fault.

I am now polishing my patch for the review. Anyway, thank you for the review. That is helpful. I'll reflect your comments in following patch.

(In reply to comment #35)
> (From update of attachment 71245 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=71245&action=review
> 
> The overall approach looks fine. What is still missing? Maybe reftest_diff.py needs more work?
> 
> I'm very glad to see reftest is becoming a real!
> 
> > WebKitTools/Scripts/old-run-webkit-tests:2254
> > +    if ($filename =~ /-expected(-mismatch)?\.html$/) {
> 
> expectedTag?
> 
> Also, we may want to consider adding mismatchTag for consistency?
> 
> > WebKitTools/Scripts/old-run-webkit-tests:2273
> > +        if (exists $supportedFileExtensions{$extension}) {
> 
> I'd say
> 
> exists $supportedFileExtensions($extension) && !isUsedInReftest($filename)
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:90
> > +    """Receives the output from a DumpRenderTree process, subjects it to a
> 
> It would be better to have a summary in the first line of a docstring. PEP257 says "Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description.".
> 
> http://www.python.org/dev/peps/pep-0257/
> 
> It's not a strict rule for us, but we are trying to apply style-related PEPs as possible.
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/port/test_files.py:54
> > +_reftest_expected_suffixes = set(['-expected.html', 'expected-mismatch.html'])
> 
> -expected-mismatch maybe?
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:141
> > +    def __repr__(self):
> 
> Do we need this? For many objects, eval(repr(obj)) should be equal to obj but this definition doesn't satisfy this convention.
Comment 37 Shinichiro Hamaji 2010-10-20 00:02:33 PDT
> I should have added a comment, 'please don't take a look at the attachment in details. The attachment is only for showing an overall approach. That contains a lot of stuff which are only used for debugging' :)
> It was my fault.

Yeah, I knew this patch isn't for review. Actually, I didn't review the patch in detail. I put some comments just because I found them while I'm looking your overall approach.
Comment 38 Hayato Ito 2010-11-04 06:28:37 PDT
Created attachment 72932 [details]
snapshot2
Comment 39 Hayato Ito 2010-11-04 06:54:37 PDT
I've updated my patch.

I am aware that this patch is incomplete, such as missing ChangeLog, including unnecessary sample reftests and so on.

I'd like to keep this patch as small as possible for easy reviews. So I don't plan to add any features to this patch. Following patches should take care of other necessary changes. 

By this patch, we can run reftests by adding '--run-reftest' option to new-run-webkit-test .
Without '--run-reftests', there should be no impacts to existing LayoutTest infrastructure.

I'd like to commit patches incrementally until reftests becomes completely ready status.

Early reviews are highly welcome.

(In reply to comment #38)
> Created an attachment (id=72932) [details]
> snapshot2
Comment 40 Hayato Ito 2010-11-05 06:45:25 PDT
Created attachment 73060 [details]
for-review
Comment 41 Shinichiro Hamaji 2010-11-08 05:07:43 PST
Comment on attachment 73060 [details]
for-review

View in context: https://bugs.webkit.org/attachment.cgi?id=73060&action=review

Overall this patch looks good. Putting r- because I think you might want to address some of my comments. Also, I think you can make the patch be applicable to ToT ?

> WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:79
> +    # TODO(hayato): It might be better to increase the timeout two times when

Please use FIXME instead of TODO

> WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:81
> +    return _milliseconds_to_seconds(

Do we really need this line break?

> WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:108
>            test_types: A list of TestType objects to run the test output

How about adding "layout" for this comment? "A list of TestType objects to run the *layout* test output" It might be clearer.

> WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:30
> +"""A test runner which runs a single test file."""

I'd say we'll run either a layout test or a reftest.

Runs a single test file (either a layout test or a reftest)

or something like this.

> WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:59
> +                 test_types, reftest_types):

It's nice if we have a docstring with an Args section.

> WebKitTools/Scripts/webkitpy/layout_tests/port/webkit.py:142
> +        if tolerance is not None:

A usual idiom for an optional parameter is

if opt_param is None:
  opt_param = something

So, I'd slightly prefer

if tolerance is None:
  if self.get_option('tolerance') is not None:
    tolerance = self....
  else
    tolerance = 0.1

> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:125
> +            self.expected_mismatch_html_filename = None

I think it's very easy to check the case where we have both foo-expected.html and foo-expected-mismatch.html ? Using _log.error would be nice.

> WebKitTools/Scripts/webkitpy/layout_tests/test_types/reference_diff.py:50
> +        if test_info.expected_html_filename:

How about

expected = None
if test_info.expected_html_filename:
  ...
  expected = reference_image_contents
else:
  ...
self.write_output_files(...)

?

> WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:96
> +        """Copies result files into the output directory with appropriate

The first line of a docstring should be one line. I think "with appropriate names" has almost no information so I'd remove this. I guess the Args will tell what name is appropriate?
Comment 42 Hayato Ito 2010-11-09 20:12:52 PST
Created attachment 73454 [details]
synced-to-tot
Comment 43 Hayato Ito 2010-11-09 20:20:03 PST
Thank you for the review.

I've addressed all your comments. I also synched and rebased my patch to ToT.
I've confirmed that all tests run by WebKitTools/Scripts/test-webkit-scripts passed even after applying this patch.

I think I need to add more tests to test-webkit-scripts. But it would be in another patch.


(In reply to comment #41)
> (From update of attachment 73060 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=73060&action=review
> 
> Overall this patch looks good. Putting r- because I think you might want to address some of my comments. Also, I think you can make the patch be applicable to ToT ?
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:79
> > +    # TODO(hayato): It might be better to increase the timeout two times when
> 
> Please use FIXME instead of TODO
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:81
> > +    return _milliseconds_to_seconds(
> 
> Do we really need this line break?
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:108
> >            test_types: A list of TestType objects to run the test output
> 
> How about adding "layout" for this comment? "A list of TestType objects to run the *layout* test output" It might be clearer.
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:30
> > +"""A test runner which runs a single test file."""
> 
> I'd say we'll run either a layout test or a reftest.
> 
> Runs a single test file (either a layout test or a reftest)
> 
> or something like this.
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:59
> > +                 test_types, reftest_types):
> 
> It's nice if we have a docstring with an Args section.
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/port/webkit.py:142
> > +        if tolerance is not None:
> 
> A usual idiom for an optional parameter is
> 
> if opt_param is None:
>   opt_param = something
> 
> So, I'd slightly prefer
> 
> if tolerance is None:
>   if self.get_option('tolerance') is not None:
>     tolerance = self....
>   else
>     tolerance = 0.1
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:125
> > +            self.expected_mismatch_html_filename = None
> 
> I think it's very easy to check the case where we have both foo-expected.html and foo-expected-mismatch.html ? Using _log.error would be nice.
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/test_types/reference_diff.py:50
> > +        if test_info.expected_html_filename:
> 
> How about
> 
> expected = None
> if test_info.expected_html_filename:
>   ...
>   expected = reference_image_contents
> else:
>   ...
> self.write_output_files(...)
> 
> ?
> 
> > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:96
> > +        """Copies result files into the output directory with appropriate
> 
> The first line of a docstring should be one line. I think "with appropriate names" has almost no information so I'd remove this. I guess the Args will tell what name is appropriate?
Comment 44 Hayato Ito 2010-11-09 20:33:50 PST
One more note:

This patch is adding new test_result_type in json result file as follows:

                       test_expectations.REFERENCE_MISMATCH: "R",
                       test_expectations.REFERENCE_MATCH: "M",

So I guess buildbots will need to understand these kinds of test result.

(In reply to comment #43)
> Thank you for the review.
> 
> I've addressed all your comments. I also synched and rebased my patch to ToT.
> I've confirmed that all tests run by WebKitTools/Scripts/test-webkit-scripts passed even after applying this patch.
> 
> I think I need to add more tests to test-webkit-scripts. But it would be in another patch.
> 
> 
> (In reply to comment #41)
> > (From update of attachment 73060 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=73060&action=review
> > 
> > Overall this patch looks good. Putting r- because I think you might want to address some of my comments. Also, I think you can make the patch be applicable to ToT ?
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:79
> > > +    # TODO(hayato): It might be better to increase the timeout two times when
> > 
> > Please use FIXME instead of TODO
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:81
> > > +    return _milliseconds_to_seconds(
> > 
> > Do we really need this line break?
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:108
> > >            test_types: A list of TestType objects to run the test output
> > 
> > How about adding "layout" for this comment? "A list of TestType objects to run the *layout* test output" It might be clearer.
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:30
> > > +"""A test runner which runs a single test file."""
> > 
> > I'd say we'll run either a layout test or a reftest.
> > 
> > Runs a single test file (either a layout test or a reftest)
> > 
> > or something like this.
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:59
> > > +                 test_types, reftest_types):
> > 
> > It's nice if we have a docstring with an Args section.
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/port/webkit.py:142
> > > +        if tolerance is not None:
> > 
> > A usual idiom for an optional parameter is
> > 
> > if opt_param is None:
> >   opt_param = something
> > 
> > So, I'd slightly prefer
> > 
> > if tolerance is None:
> >   if self.get_option('tolerance') is not None:
> >     tolerance = self....
> >   else
> >     tolerance = 0.1
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:125
> > > +            self.expected_mismatch_html_filename = None
> > 
> > I think it's very easy to check the case where we have both foo-expected.html and foo-expected-mismatch.html ? Using _log.error would be nice.
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/reference_diff.py:50
> > > +        if test_info.expected_html_filename:
> > 
> > How about
> > 
> > expected = None
> > if test_info.expected_html_filename:
> >   ...
> >   expected = reference_image_contents
> > else:
> >   ...
> > self.write_output_files(...)
> > 
> > ?
> > 
> > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:96
> > > +        """Copies result files into the output directory with appropriate
> > 
> > The first line of a docstring should be one line. I think "with appropriate names" has almost no information so I'd remove this. I guess the Args will tell what name is appropriate?
Comment 45 Dirk Pranke 2010-11-10 00:45:01 PST
Hi Ito-san,

It's great that you're getting so close, and I apologize for not getting to look at this earlier. I will post more detailed comments tomorrow afternoon, but here's some initial thoughts on the general approach.

1) this is not going to be a pretty merge with the changes I'm making to restructure the code to use processes instead of threads, but it shouldn't be horrible, either. One amusing side effect is that you've moved all of the real testing work out of dump_render_tree_thread and into single_test_runner; My patch moves all of the non-testing work out of dump_render_tree_thread. It'll probably be easier if you keep the two files combined for now.

2) I'm not sure that we want to expose the concept of a reftest as an alternative to a regular test; I'm wondering if we can instead restructure things so that port.expected_text/expected_checksum/expected_image returns the output of the reftest run if there's a reftest, and the checked-in output otherwise. Of course, we'd have to replace the three routines with a single routine that returned all of the types of output.

I'm guessing you've already thought about this approach and rejected it in favor of the approach you have now?

As you correctly note somewhere in the patch, it would be weird to have both reference tests and reference output, so we should make that impossible.

3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities?

Anyway, more tomorrow ...
Comment 46 Dirk Pranke 2010-11-10 00:47:43 PST
Caveat - I didn't actually read through the whole bug again before reading the patch (or any of the referenced links), so I may have asked questions that have already been answered. I will read through everything again tomorrow, so if it makes sense, you can hold off on pointing out my stupidity until I can catch it myself :)
Comment 47 Hayato Ito 2010-11-10 21:33:46 PST
Hi Dirk,

Thank you for comments, I appreciate that!

(In reply to comment #45)
> Hi Ito-san,
> 
> It's great that you're getting so close, and I apologize for not getting to look at this earlier. I will post more detailed comments tomorrow afternoon, but here's some initial thoughts on the general approach.
> 
> 1) this is not going to be a pretty merge with the changes I'm making to restructure the code to use processes instead of threads, but it shouldn't be horrible, either. One amusing side effect is that you've moved all of the real testing work out of dump_render_tree_thread and into single_test_runner; My patch moves all of the non-testing work out of dump_render_tree_thread. It'll probably be easier if you keep the two files combined for now.
> 

Thank you for letting me know it. I'll try to merge that later if needed. That should not be a problem, I think.

> 2) I'm not sure that we want to expose the concept of a reftest as an alternative to a regular test; I'm wondering if we can instead restructure things so that port.expected_text/expected_checksum/expected_image returns the output of the reftest run if there's a reftest, and the checked-in output otherwise. Of course, we'd have to replace the three routines with a single routine that returned all of the types of output.
> 
> I'm guessing you've already thought about this approach and rejected it in favor of the approach you have now?

Exactly! I feel happy to hear you thought the same thing as I thought:) In fact, I tried to take that approach at first, which seems a natural expansion. Let me explain briefly.

At first attempt, I tried to simply add a test type of ReferenceDiff and put it to test_types and run each test_type against both existing layout tests and reftests. I'd like to treat them uniformly.
But I abandoned this idea due to the following reasons as far as I remember:

- It requires many tweaks to handle each combination of results from each test type (a list of test_failures). The code became messy.
- I tried to get expected_image for reftest on demand within compare_output(), but that need to run 'driver' again.
  My impression on current code base tells that when we call _process_output (and compare_output() that follows), we should finish all time-consuming tasks by then. I'd like to avoid calling DumpRenderTree again there. It doesn't look good to me.

I found that there are some 'gaps' between both tests than I expected initially. So instead of struggling to apply compare_output() function to reftests for consistency, I just introduced new compare function, compare_actual_output_with_reference().
The overall code became simple by early branching codepath between layout tests and reftests. 

I am aware that this patch is not ideal from the point of view in OO design. Actually there is no significant reason that a test type of ReferenceDiff is a subtype of TestBaseType. Sigh.

> 
> As you correctly note somewhere in the patch, it would be weird to have both reference tests and reference output, so we should make that impossible.
> 
> 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities?

Nice point. I tried to use one test file for both layout tests and reftest, supporting all of the combinations. But I have no confidence that there are enough use cases for that.
So until we found that many developers really want to use one test file for both tests and found a nice way to treat them together, I'd like to be conservative and separate existing layout tests and reftests.

I am looking forward to your more feed backs. Thank you!


> 
> Anyway, more tomorrow ...
Comment 48 Shinichiro Hamaji 2010-11-10 21:47:07 PST
> > 2) I'm not sure that we want to expose the concept of a reftest as an alternative to a regular test; I'm wondering if we can instead restructure things so that port.expected_text/expected_checksum/expected_image returns the output of the reftest run if there's a reftest, and the checked-in output otherwise. Of course, we'd have to replace the three routines with a single routine that returned all of the types of output.
> > 
> > I'm guessing you've already thought about this approach and rejected it in favor of the approach you have now?
> 
> Exactly! I feel happy to hear you thought the same thing as I thought:) In fact, I tried to take that approach at first, which seems a natural expansion. Let me explain briefly.
> 
> At first attempt, I tried to simply add a test type of ReferenceDiff and put it to test_types and run each test_type against both existing layout tests and reftests. I'd like to treat them uniformly.
> But I abandoned this idea due to the following reasons as far as I remember:
> 
> - It requires many tweaks to handle each combination of results from each test type (a list of test_failures). The code became messy.
> - I tried to get expected_image for reftest on demand within compare_output(), but that need to run 'driver' again.
>   My impression on current code base tells that when we call _process_output (and compare_output() that follows), we should finish all time-consuming tasks by then. I'd like to avoid calling DumpRenderTree again there. It doesn't look good to me.
> 
> I found that there are some 'gaps' between both tests than I expected initially. So instead of struggling to apply compare_output() function to reftests for consistency, I just introduced new compare function, compare_actual_output_with_reference().
> The overall code became simple by early branching codepath between layout tests and reftests. 
> 
> I am aware that this patch is not ideal from the point of view in OO design. Actually there is no significant reason that a test type of ReferenceDiff is a subtype of TestBaseType. Sigh.

I'm not sure if I understand this discussion well, but I'd like to mention supporting "expected-mismatch" would make reusing existing code a bit more difficult.

> > 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities?
> 
> Nice point. I tried to use one test file for both layout tests and reftest, supporting all of the combinations. But I have no confidence that there are enough use cases for that.
> So until we found that many developers really want to use one test file for both tests and found a nice way to treat them together, I'd like to be conservative and separate existing layout tests and reftests.

I think text diff is pretty useful and we need text diff soon or later. The images DRT output only have 800x600 pixels so some mismatch could be ignored if an HTML is big.
Comment 49 Dirk Pranke 2010-11-10 23:40:50 PST
Okay, it is later than I had hoped for and I didn't have time to review all of the Mozilla documentation. So, as far as Mozilla-compatibility goes, I will say for the moment that I think we need to burn some serious brain cycles on how we can develop interoperable test suites. But, I think that that should be viewed as a separate effort from getting reftests working initially in WebKit. I plan to follow up on this w/ Hixie and Tab Atkins and can happily involve whomever is cc'ed on this list who is interested - let me know off-bug.
Comment 50 Dirk Pranke 2010-11-11 00:10:14 PST
(In reply to comment #47)
> At first attempt, I tried to simply add a test type of ReferenceDiff and put it to test_types and run each test_type against both existing layout tests and reftests. I'd like to treat them uniformly.
> But I abandoned this idea due to the following reasons as far as I remember:
> 
> - It requires many tweaks to handle each combination of results from each test type (a list of test_failures). The code became messy.
> - I tried to get expected_image for reftest on demand within compare_output(), but that need to run 'driver' again.
>   My impression on current code base tells that when we call _process_output (and compare_output() that follows), we should finish all time-consuming tasks by then. I'd like to avoid calling DumpRenderTree again there. It doesn't look good to me.
>

I don't think I expressed myself well in the previous comment. What I was trying to say is that I think reftests are just a way of generating the expected output that we should compare the actual output to. They aren't actually a different kind of test. So, you don't need to create ReferenceDiffs or add different reference test result types or expectations. I think virtually all, if not all, of the changes can (or perhaps should) be confined to dump_render_tree_thread.py . 

The major implication of this is that the test_types classes need to be refactored to have the expected output passed in to them. I have also been considering pulling all of the --new-baseline logic out of test_types, because it feels wrong to be there as well, but I go off on a tangent.

You are definitely correct that _process_output and _compare_output should be fast and stable, and so we definitely shouldn't call out to DRT again inside them. The method signatures are also really clumsy. 

Part of the refactoring I hope to land in the next couple days will introduce something like a TestResult class that holds many of the arguments passed to process_output() current (and that also becomes a method on TestShellThread, much like you've done in SingleTestRunner). I realize that this'll play substantial havoc with your patch, but it feels pretty strongly to me at the moment that it's the right thing to do. I'll cc you on the other patches so you can see the changes I'm talking about.

Note that what I'm saying implies that, for a given test, we can only do regular tests against checked-in output or tests against an -expected.html file, not both. I think that this is fine and preferable in the long run. I don't think there will be much of an issue here, because we don't have any existing ref tests that we need to support. We can just start adding new ones.
 
Does this sound good? Are there issues that you've run into that I'm not yet seeing?

> > 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities?
> 
> Nice point. I tried to use one test file for both layout tests and reftest, supporting all of the combinations. But I have no confidence that there are enough use cases for that.

I'm not sure if we understood each other here, but I'm not sure if it matters. Let me explain why ...
from conversations today with Dimitri and Darin, I think we agreed that there are only two kinds of reftests that matter. The first is a text-only test (dumpAsText or dumpAsMarkup). The second is an image-only test, where we care about the pixels being rendered but don't care at all about the render tree.

I *think* in the first case any two text-only sets of output could probably be rewritten as a single test that produced text only output (you could merge the test.html and test-expected.html into one single test file that produced an -expected.txt text-only output). Conservatively, even if we don't do this, it still reduces to comparing two text files, which of course we support now.

Similarly, comparing two image files is already supported. There's nothing special about the fact that the one image was generated from a -expected.html file instead of checked in as -expected.png.  So I think, to answer my own questions, we don't need to support any possibilities with reftests beyond what we already have (and, in fact, we probably have fewer combinations, since we won't have IMAGE+TEXT).

I will also note that Hamaji-san raises a good point - I can't forget about -expected-mismatch.html, so compare_output probably needs to grow a match/mismatch bool. Or, maybe better, when we create the new TestResult object we pass in the flag and it inverts the result of test_failures.determine_result_type() as necessary.
Comment 51 Hayato Ito 2010-11-11 00:57:18 PST
Hi Dirk, 
Thank you for comments. I appreciate that.

In short answer, I agree.

I agree that a refrest is a just yet another way to generate expected image or text.
I had noticed that indeed. But I also found that it was difficult to realize that in current codebase without an aggressive refactoring.

It seems that I should have refactored the code base more eagerly without *fear*.
Maybe I am a little bit biased to reusing existing codebase as soon as possible.

If such a refactoring is a way to go, I'd like to do it.
Maybe I should wait for your patch in the next couple of days.


(In reply to comment #50)
> (In reply to comment #47)
> > At first attempt, I tried to simply add a test type of ReferenceDiff and put it to test_types and run each test_type against both existing layout tests and reftests. I'd like to treat them uniformly.
> > But I abandoned this idea due to the following reasons as far as I remember:
> > 
> > - It requires many tweaks to handle each combination of results from each test type (a list of test_failures). The code became messy.
> > - I tried to get expected_image for reftest on demand within compare_output(), but that need to run 'driver' again.
> >   My impression on current code base tells that when we call _process_output (and compare_output() that follows), we should finish all time-consuming tasks by then. I'd like to avoid calling DumpRenderTree again there. It doesn't look good to me.
> >
> 
> I don't think I expressed myself well in the previous comment. What I was trying to say is that I think reftests are just a way of generating the expected output that we should compare the actual output to. They aren't actually a different kind of test. So, you don't need to create ReferenceDiffs or add different reference test result types or expectations. I think virtually all, if not all, of the changes can (or perhaps should) be confined to dump_render_tree_thread.py . 
> 
> The major implication of this is that the test_types classes need to be refactored to have the expected output passed in to them. I have also been considering pulling all of the --new-baseline logic out of test_types, because it feels wrong to be there as well, but I go off on a tangent.
> 
> You are definitely correct that _process_output and _compare_output should be fast and stable, and so we definitely shouldn't call out to DRT again inside them. The method signatures are also really clumsy. 
> 
> Part of the refactoring I hope to land in the next couple days will introduce something like a TestResult class that holds many of the arguments passed to process_output() current (and that also becomes a method on TestShellThread, much like you've done in SingleTestRunner). I realize that this'll play substantial havoc with your patch, but it feels pretty strongly to me at the moment that it's the right thing to do. I'll cc you on the other patches so you can see the changes I'm talking about.
> 
> Note that what I'm saying implies that, for a given test, we can only do regular tests against checked-in output or tests against an -expected.html file, not both. I think that this is fine and preferable in the long run. I don't think there will be much of an issue here, because we don't have any existing ref tests that we need to support. We can just start adding new ones.
> 
> Does this sound good? Are there issues that you've run into that I'm not yet seeing?
> 
> > > 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities?
> > 
> > Nice point. I tried to use one test file for both layout tests and reftest, supporting all of the combinations. But I have no confidence that there are enough use cases for that.
> 
> I'm not sure if we understood each other here, but I'm not sure if it matters. Let me explain why ...
> from conversations today with Dimitri and Darin, I think we agreed that there are only two kinds of reftests that matter. The first is a text-only test (dumpAsText or dumpAsMarkup). The second is an image-only test, where we care about the pixels being rendered but don't care at all about the render tree.
> 
> I *think* in the first case any two text-only sets of output could probably be rewritten as a single test that produced text only output (you could merge the test.html and test-expected.html into one single test file that produced an -expected.txt text-only output). Conservatively, even if we don't do this, it still reduces to comparing two text files, which of course we support now.
> 
> Similarly, comparing two image files is already supported. There's nothing special about the fact that the one image was generated from a -expected.html file instead of checked in as -expected.png.  So I think, to answer my own questions, we don't need to support any possibilities with reftests beyond what we already have (and, in fact, we probably have fewer combinations, since we won't have IMAGE+TEXT).
> 
> I will also note that Hamaji-san raises a good point - I can't forget about -expected-mismatch.html, so compare_output probably needs to grow a match/mismatch bool. Or, maybe better, when we create the new TestResult object we pass in the flag and it inverts the result of test_failures.determine_result_type() as necessary.
Comment 52 Dirk Pranke 2010-11-11 11:41:34 PST
(In reply to comment #51)
> Hi Dirk, 
> Thank you for comments. I appreciate that.
> 
> In short answer, I agree.
> 
> I agree that a refrest is a just yet another way to generate expected image or text.
> I had noticed that indeed. But I also found that it was difficult to realize that in current codebase without an aggressive refactoring.
> 

It could be that I know this codebase well, or it could be that I'm just overly aggressive, but I have no fear of this :) The good news is that this code is very well tested, so you can make changes without too much concern. run_webkit_tests_unittest.py is your friend.


> If such a refactoring is a way to go, I'd like to do it.

Terrific! 

Having thought abou this overnight, I think you can make most of the changes without colliding too much with the changes I'm working on. I suggest approaching this by working on a completely separate patch that refactors the code as it's currently checked in, and create a TestOutput POD (plain old data) class that is returned from port.Driver.run_test(), and then create a separate instance for the"expected" results, and then pass the two to _process_output() and the test_types routines.

Once you do that, adding in reftest support to generate a different expected output instance should be straightforward.
Comment 53 Hayato Ito 2010-11-12 02:04:50 PST
(In reply to comment #52)

Hi Dirk, Thank you for the commet.

I've refactored the current code as a completely separate patch.
Please see: https://bugs.webkit.org/show_bug.cgi?id=49431

Could you review this patch?

> Having thought abou this overnight, I think you can make most of the changes without colliding too much with the changes I'm working on. I suggest approaching this by working on a completely separate patch that refactors the code as it's currently checked in, and create a TestOutput POD (plain old data) class that is returned from port.Driver.run_test(), and then create a separate instance for the"expected" results, and then pass the two to _process_output() and the test_types routines.
Comment 54 Hayato Ito 2011-03-01 02:28:53 PST
Let me update the current status.

Now, I've finished the necessary refactoring for new-run-webkit-tests, I just started to work on adding reftests support.
I hope I can post a patch in a few days.
Comment 55 Hayato Ito 2011-03-08 23:28:37 PST
I am starting to write Wiki page about how to use reftests.

https://trac.webkit.org/wiki/Writing%20Reftests
Comment 56 Hayato Ito 2011-03-09 02:21:46 PST
I am aiming to add reftest support to new-run-webkit-tests in this quarter.
But I think new-run-webkit-tests has not replaced old-run-webkit-tests yet.

I am updating old-run-webkit-tests so that it ignores reftests: https://bugs.webkit.org/show_bug.cgi?id=55936  

But I am not sure which means that we can check in reftests in LayoutTests directory.
Should we refrain us from checking in reftests until new-run-webkit-tests replaces old-run-webkit-tests?

Ojan gave me an idea:
 .. we might be able to get away with adding a mode to new-run-webkit-tests to run just the reftests and then shelling out from run-webkit-tests to new-run-webkit-tests or just requiring people to run both until new-run-webkit-tests replaces old-run-webkit-tests.

But I am not sure what is the best way.

Could someone summarize where old-run-webkit-tests is used?
Comment 57 Adam Roben (:aroben) 2011-03-09 05:28:38 PST
(In reply to comment #56)
> Could someone summarize where old-run-webkit-tests is used?

No one uses new-run-webkit-tests by default. See Tools/Scripts/run-webkit-tests, which decides whether NRWT or ORWT is used. I think there are a few configurations that explicitly use NRWT (Chromium?) on their bots.

My understanding is that NRWT does not work at all on some platforms, and thus shelling out to it is not an option. Dirk Pranke would know more, though.
Comment 58 Dirk Pranke 2011-03-09 12:34:32 PST
(In reply to comment #57)
> (In reply to comment #56)
> > Could someone summarize where old-run-webkit-tests is used?
> 
> No one uses new-run-webkit-tests by default. See Tools/Scripts/run-webkit-tests, which decides whether NRWT or ORWT is used. I think there are a few configurations that explicitly use NRWT (Chromium?) on their bots.

Well, Chromium uses NRWT by default ;) ORWT doesn't even work w/ Chromium.

> 
> My understanding is that NRWT does not work at all on some platforms, and thus shelling out to it is not an option. Dirk Pranke would know more, though.

NRWT works everywhere but on the base Win platform AFAIK. I believe GTK team has a bot that runs NRWT regularly. 

As soon as I wrap up my current project (should happen this week), I plan to start the push to get NRWT working and running everywhere by default. I'd like to use reftests as one of the carrots for this, along with the speed increases and the fact that it's being actively maintained.

So, I suggest we do the minimal amount of work on ORWT necessary to not break things. I'm not expecting a huge influx of reftests over the next few weeks, so hopefully this is the right strategy.
Comment 59 Darin Adler 2011-03-09 12:37:15 PST
(In reply to comment #58)
> NRWT works everywhere but on the base Win platform AFAIK.

I am surprised to hear it has WebKit2 support.
Comment 60 Dirk Pranke 2011-03-09 12:39:58 PST
(In reply to comment #59)
> (In reply to comment #58)
> > NRWT works everywhere but on the base Win platform AFAIK.
> 
> I am surprised to hear it has WebKit2 support.

Ah, right. Yeah, it doesn't work with WebKit2, either. I haven't yet looked into how hard it will be to add that, but I expect it should be pretty trivial.
Comment 61 Hayato Ito 2011-03-09 23:33:53 PST
Thank you for the info.

Giving that new-run-webkit-tests doesn't run on all platforms, it might not be worth adding a 'reftest' mode, which runs only reftests, to new-run-webkit-tests because that won't improve the situations at all.

So for a while, reftests will be only available to platforms which run new-run-webkit-tests.
To choose reftests or not to choose is a test writer's freedom, but they should be aware that old-run-webkit-tests skips reftests when choosing reftests in writing new tests, I think.


(In reply to comment #57)
> (In reply to comment #56)
> > Could someone summarize where old-run-webkit-tests is used?
> 
> No one uses new-run-webkit-tests by default. See Tools/Scripts/run-webkit-tests, which decides whether NRWT or ORWT is used. I think there are a few configurations that explicitly use NRWT (Chromium?) on their bots.
> 
> My understanding is that NRWT does not work at all on some platforms, and thus shelling out to it is not an option. Dirk Pranke would know more, though.
Comment 62 Darin Adler 2011-03-10 14:04:32 PST
I think it was great when we renamed Chromium’s test runner to new-run-webkit-tests and renamed the old test runner to old-run-webkit-tests, implying that some day we would all switch to it the Chromium one. As long as it’s only used by Chromium, though, that name change isn’t so significant.

When will the transition happen? Who is maintaining the list of things we need to do to new-run-webkit-tests before we can switch everyone to it? Is there a bug tracking this effort?
Comment 63 Mihai Parparita 2011-03-10 14:31:10 PST
(In reply to comment #62)
> When will the transition happen? Who is maintaining the list of things we need to do to new-run-webkit-tests before we can switch everyone to it? Is there a bug tracking this effort?

Bug 34984 is the master bug for switching to NRWT, it depends on a bunch of others.
Comment 64 Dirk Pranke 2011-03-10 16:29:47 PST
(In reply to comment #63)
> (In reply to comment #62)
> > When will the transition happen? Who is maintaining the list of things we need to do to new-run-webkit-tests before we can switch everyone to it? Is there a bug tracking this effort?
> 
> Bug 34984 is the master bug for switching to NRWT, it depends on a bunch of others.

Darin, I've added you to that bug, taken ownership of it, and added a comment. I will be putting together a better list of the remaining tasks soon.
Comment 65 Hayato Ito 2011-03-16 18:58:15 PDT
Let me update the status.

- new-run-webkit-tests now supports reftest.
   https://bugs.webkit.org/show_bug.cgi?id=55457 was landed.

- old-run-webkit-tests now ignores reftests.
  https://bugs.webkit.org/show_bug.cgi?id=55936 was landed.

I think that I had done all required changes so that we can check in reftests to the LayoutTests directory.

All build bots or dashboard shouldn't be upset even if reftests are checked-in. I tried to maintain compatibility with that existing LayoutTest infrastructure as possible as I can.

In order to know what will happen and what additional supports we should add, I'll check-in the first minimum sample reftest here:
  https://bugs.webkit.org/show_bug.cgi?id=56450


If I am missing something, please let me know.
Comment 66 Hayato Ito 2011-03-29 02:02:26 PDT
I confirmed that the first reftest didn't upset any build bots.
So I think it is ready to check-in reftests as long as test writers are aware that reftests work only with new-run-webkit-tests.

I've updated the Wiki, https://trac.webkit.org/wiki/Writing%20Reftests, so that it emphasizes this limitation.

Also I'd like to know what changes are nice to have in flakiness dashboard. That should work as is.
But I think there are some areas to be improved in flakiness dashboard for reftests.

(In reply to comment #65)
> Let me update the status.
> 
> - new-run-webkit-tests now supports reftest.
>    https://bugs.webkit.org/show_bug.cgi?id=55457 was landed.
> 
> - old-run-webkit-tests now ignores reftests.
>   https://bugs.webkit.org/show_bug.cgi?id=55936 was landed.
> 
> I think that I had done all required changes so that we can check in reftests to the LayoutTests directory.
> 
> All build bots or dashboard shouldn't be upset even if reftests are checked-in. I tried to maintain compatibility with that existing LayoutTest infrastructure as possible as I can.
> 
> In order to know what will happen and what additional supports we should add, I'll check-in the first minimum sample reftest here:
>   https://bugs.webkit.org/show_bug.cgi?id=56450
> 
> 
> If I am missing something, please let me know.
Comment 69 Dirk Pranke 2011-12-21 14:06:21 PST
can we close this now?
Comment 70 Darin Adler 2011-12-21 14:06:46 PST
I am going to do that.