36065 – Add Support for Reference Tests (aka reftests) to existing Layout Test harness

Dimitri Glazkov (Google)

Reported 2010-03-12 11:12:15 PST

This is a master bug and a place of discussion for implementing reftests. After poking at it a bit, I am going to attempt implementing this in a way that would support using Mozilla's existig reftests out of the box, only if a subset at first. Reading material: http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/ https://developer.mozilla.org/en/Creating_reftest-based_unit_tests http://weblogs.mozillazine.org/roc/archives/2009/01/invalidation_re.html https://library.mozilla.org/@api/deki/files/88/=Creating_a_reftest.pdf https://lists.webkit.org/pipermail/webkit-dev/2010-March/011856.html

Dimitri Glazkov (Google)

Comment 1 2010-03-12 11:20:32 PST

Also http://mxr.mozilla.org/mozilla-central/source/layout/reftests/

Dimitri Glazkov (Google)

Comment 2 2010-04-15 09:39:25 PDT

Initial approach: * Use Mozilla's manifest approach to minimize the need to change tests when importing/reusing. This does mean we'll have two different test frameworks, but given that W3C Testing Task Force is going to use Mozilla reftests, it seems counterintuitive to invent our own adaptation of reftests. * Place reftests into a special directory, like top-level ReferenceTests or LayoutTests/reference/ to avoid testing frameworks stepping on each other's toes. * Write harness in python, integrated with new-run-webkit-tests, not worrying about old-run-webkit-tests. * At first, use unmodified DRT (well, maybe pass a flag to mute render tree dump) to produce pixel results in the harness. Later, teach DRT to dump+compare two files to improve performance. Thoughts?

Alexey Proskuryakov

Comment 3 2010-04-15 11:08:45 PDT

> * Place reftests into a special directory, like top-level ReferenceTests or > LayoutTests/reference/ to avoid testing frameworks stepping on each other's > toes. It seems easy to ignore tests that have expected results in reftest format in scripts that don't support those. Also, it may be a good idea to run those (to at least catch crashes). There is some use in having tests organized logically - people sometimes run only a subset of tests during iterative development, to run the whole suite before submitting a patch. > * Write harness in python, integrated with new-run-webkit-tests, not worrying > about old-run-webkit-tests. I'm not sure about this. Last time I heard about it, there was no plan to make new-run-webkit-tests run on Tiger, which means that it probably won't obsolete the existing script in foreseeable future.

Dimitri Glazkov (Google)

Comment 4 2010-04-15 13:13:10 PDT

(In reply to comment #3) > > * Place reftests into a special directory, like top-level ReferenceTests or > > LayoutTests/reference/ to avoid testing frameworks stepping on each other's > > toes. > > It seems easy to ignore tests that have expected results in reftest format in > scripts that don't support those. Also, it may be a good idea to run those (to > at least catch crashes). I agree -- I didn't state anywhere (my bad!) that run-webkit-tests always runs both types of tests. > There is some use in having tests organized logically - people sometimes run > only a subset of tests during iterative development, to run the whole suite > before submitting a patch. That's a good point. I'll think of an idea here. > > * Write harness in python, integrated with new-run-webkit-tests, not worrying > > about old-run-webkit-tests. > > I'm not sure about this. Last time I heard about it, there was no plan to make > new-run-webkit-tests run on Tiger, which means that it probably won't obsolete > the existing script in foreseeable future. Well, that stinks :( -- I'll talk w/dpranke+eseidel about this.

Hayato Ito

Comment 5 2010-07-07 03:55:12 PDT

As discussed in other threads, we've decided to add support for RefTest. Here is a draft plan to support RefTest in WebKit. https://docs.google.com/document/edit?id=1bwAzqOKfyB97rSIyLQ5ER6YG6-cEUa34xJ8KKgvdc0o&hl=en&authkey=CL39hMMB Any suggestions and advice are welcome.

Tor Arne Vestbø

Comment 6 2010-07-07 06:02:38 PDT

(In reply to comment #5) > As discussed in other threads, we've decided to add support for RefTest. > Here is a draft plan to support RefTest in WebKit. > https://docs.google.com/document/edit?id=1bwAzqOKfyB97rSIyLQ5ER6YG6-cEUa34xJ8KKgvdc0o&hl=en&authkey=CL39hMMB > > Any suggestions and advice are welcome. Great effort! Could we use the "-ref.html" (or reference) postfix like we now have -expected.txt to allow new tests written as reftests to be placed into the existing hierarchy of tests (together with the -expected.checksum and -expected.png)? That would allow easy transition from -expected.txt with render-tree-output to a -ref.html + png/checksum. Imported tests (mozilla, W3C) would of course have their own folder, possibly in a subfolder of LayoutTests if they only deal with css for example.

Darin Adler

Comment 7 2010-07-08 11:13:11 PDT

(In reply to comment #6) > Could we use the "-ref.html" (or reference) postfix like we now have -expected.txt to allow new tests written as reftests to be placed into the existing hierarchy of tests (together with the -expected.checksum and -expected.png)? > > That would allow easy transition from -expected.txt with render-tree-output to a -ref.html + png/checksum. > > Imported tests (mozilla, W3C) would of course have their own folder, possibly in a subfolder of LayoutTests if they only deal with css for example. I like these suggestions. I don’t think we want separate testing frameworks.

Dirk Pranke

Comment 8 2010-07-08 12:52:42 PDT

+1 to -ref.html and integrating tightly into new-run-webkit-tests. I don't think we should have a separate tool or a separate top level directory either.

Hayato Ito

Comment 9 2010-07-08 20:24:34 PDT

(In reply to comment #8) > +1 to -ref.html and integrating tightly into new-run-webkit-tests. I don't think we should have a separate tool or a separate top level directory either. Thank you for the comments. I agree. In fact, a typical manifest file used in Mozilla is like: == appendsingle.html appendsingle-ref.html == appendmultiple.html appendmultiple-ref.html == insertsingle.html insertsingle-ref.html == insertmultiple.html insertmultiple-ref.html They use convention as: == SOME-TEST.html SOME-TEST-ref.html We should follow their convention unless there is a strong reason against it. As for directory location for reftests, it looks good to me that we reuse existing LayoutTest directory unless there is a significant maintenance issue.

Simon Fraser (smfr)

Comment 10 2010-07-08 21:24:44 PDT

(In reply to comment #9) > == appendsingle.html appendsingle-ref.html > == appendmultiple.html appendmultiple-ref.html > == insertsingle.html insertsingle-ref.html > == insertmultiple.html insertmultiple-ref.html > > They use convention as: > == SOME-TEST.html SOME-TEST-ref.html Note also that you can re-use the same reference for multiple tests, so don't assume that each test has a corresponding ref. Also, it's possible to test that the rendering does NOT match the ref, which can also be useful.

Darin Adler

Comment 11 2010-07-08 23:29:52 PDT

The current tests work without a separate manifest file, and I’d like to see that continue with the ref tests. Or if we switch to a manifest file, I’d like to do it for all the tests. I don’t think matching Mozilla’s convention is as important as matching our own convention!

Hayato Ito

Comment 12 2010-07-09 00:09:03 PDT

(In reply to comment #11) > The current tests work without a separate manifest file, and I’d like to see that continue with the ref tests. Or if we switch to a manifest file, I’d like to do it for all the tests. > > I don’t think matching Mozilla’s convention is as important as matching our own convention! Thank you for the comment. Okay. I am planning to support both options, a manifest-file based and simple file-naming-convention based. Let me explain that. 1. file-naming-convention based Suppose we have the following html files in a directory: - foo-test-a.html - foo-test-b.html - foo-test-ref.html - bar-test-hello.html - bar-test-world.html - bar-test-fail-ref.html And we don't have files named 'foo-test-a-expected.html', 'foo-test-b-expected.html' and so on, we should consider them as reftests as if they were explicitly specified with manifest file, like: == foo-test-a.html foo-test-ref.html == foo-test-b.html foo-test-ref.html != bar-test-hello.html bar-test-fail-ref.html != bar-test-world.html bar-test-fail-ref.html 2. Manifest based There remains the cases in reftests that we cannot express easily by such a simple naming convention. In that case, we use a manifest files as a last resort. We should try to avoid using manifest file as possible as we can. It might be challenging that making good file-naming convention because reftests used in Mozilla support many kinds of reftests. But I believe we can cover most cases by simple naming conventions.

Simon Fraser (smfr)

Comment 13 2010-07-09 08:45:14 PDT

(In reply to comment #11) > I don’t think matching Mozilla’s convention is as important as matching our own convention! Matching Mozilla's convention may be useful if we get large test suites from W3C working groups which use that convention. It would be nice to be able to just share these unchanged. I don't think Manifest maintenance is a burden, and Manifests allow for more testing options.

Darin Adler

Comment 14 2010-07-09 09:00:56 PDT

(In reply to comment #13) > I don't think Manifest maintenance is a burden, and Manifests allow for more testing options. If we like manifests, I suggest switching the layout test system to use manifests. What I want to avoid is arbitrary combinations and multiple subtly different testing systems.

Hayato Ito

Comment 15 2010-07-13 22:47:30 PDT

(In reply to comment #14) > (In reply to comment #13) > > I don't think Manifest maintenance is a burden, and Manifests allow for more testing options. > > If we like manifests, I suggest switching the layout test system to use manifests. What I want to avoid is arbitrary combinations and multiple subtly different testing systems. I've tried parsing manifest files used in Mozilla and found that most reftests in Mozilla are 'simple'. I mean about 90 percent of reftests (4263 out of 4710 reftests) are being specified in either == AAAA.html BBBB-ref.html or != CCCC.html DDDD-ref.html. So I am moving to adding a support for simple filename-convention based reftests at first. Note that I am not saying that we don't support a manifest file. It'll be supported later so that we can run reftests imported from Mozilla as is.

Hayato Ito

Comment 16 2010-07-26 06:35:34 PDT

I am planning to adapt the following naming convention for reftests: 1. foo.html and foo-ref.html We use this naming for a reftest which is expected to match. That corresponds to '= foo.html foo-ref.html' in Mozilla's manifest. 2. foo.html and foo-noref.html We use this naming for a reftest which is expected not to match. That corresponds to '!= foo.html foo-noref.html'. These conventions are commonly used in Mozilla's reftest. (In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #13) > > > I don't think Manifest maintenance is a burden, and Manifests allow for more testing options. > > > > If we like manifests, I suggest switching the layout test system to use manifests. What I want to avoid is arbitrary combinations and multiple subtly different testing systems. > > I've tried parsing manifest files used in Mozilla and found that most reftests in Mozilla are 'simple'. > I mean about 90 percent of reftests (4263 out of 4710 reftests) are being specified in either > == AAAA.html BBBB-ref.html > or > != CCCC.html DDDD-ref.html. > > So I am moving to adding a support for simple filename-convention based reftests at first. > Note that I am not saying that we don't support a manifest file. It'll be supported later so that we can run reftests imported from Mozilla as is.

Hayato Ito

Comment 17 2010-07-26 06:37:20 PDT

Correction. (In reply to comment #16) > I am planning to adapt the following naming convention for reftests: > > 1. foo.html and foo-ref.html > We use this naming for a reftest which is expected to match. That corresponds to '= foo.html foo-ref.html' in Mozilla's manifest. > > 2. foo.html and foo-noref.html > We use this naming for a reftest which is expected not to match. That corresponds to '!= foo.html foo-noref.html'. '2. foo.html and foo-notref.html' is a correct naming. > > These conventions are commonly used in Mozilla's reftest. > > > (In reply to comment #15) > > (In reply to comment #14) > > > (In reply to comment #13) > > > > I don't think Manifest maintenance is a burden, and Manifests allow for more testing options. > > > > > > If we like manifests, I suggest switching the layout test system to use manifests. What I want to avoid is arbitrary combinations and multiple subtly different testing systems. > > > > I've tried parsing manifest files used in Mozilla and found that most reftests in Mozilla are 'simple'. > > I mean about 90 percent of reftests (4263 out of 4710 reftests) are being specified in either > > == AAAA.html BBBB-ref.html > > or > > != CCCC.html DDDD-ref.html. > > > > So I am moving to adding a support for simple filename-convention based reftests at first. > > Note that I am not saying that we don't support a manifest file. It'll be supported later so that we can run reftests imported from Mozilla as is.

mitz

Comment 18 2010-07-26 09:53:33 PDT

Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ?

Adam Roben (:aroben)

Comment 19 2010-07-26 10:18:35 PDT

(In reply to comment #18) > Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ? Would that make it harder to open them in a browser (e.g., by double-clicking in Finder or Windows Explorer)?

mitz

Comment 20 2010-07-26 10:19:26 PDT

(In reply to comment #19) > (In reply to comment #18) > > Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ? > > Would that make it harder to open them in a browser (e.g., by double-clicking in Finder or Windows Explorer)? Of course it would :(

Hayato Ito

Comment 21 2010-07-27 01:45:52 PDT

Thank you for the comment. (In reply to comment #18) > Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ? Yes, adding xx-{ref,notref}.html to LayoutTest directory will upsets existing tools. Because we will add reftest support to only new-run-webkit-test in the current plan, we have to update other existing tools so that they should ignore such html files related to reftests. If it will take long time to make new-run-webkit-test default test runner, we might reconsider our plan.

Yuzo Fujishima

Comment 22 2010-07-27 03:04:03 PDT

It seems that only WebKitTools/Scripts/{old-run-webkit-tests,prepare-ChangeLog} need to be changed to ignore -{not,}ref.html. (See supportedFileExtensions) Because the needed change seems to be simple, I'd recommend just changing those files and going ahead, rather than waiting for new-run-webkit-test. (In reply to comment #21) > Thank you for the comment. > > (In reply to comment #18) > > Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ? > > Yes, adding xx-{ref,notref}.html to LayoutTest directory will upsets existing tools. > Because we will add reftest support to only new-run-webkit-test in the current plan, we have to update other existing tools so that they should ignore such html files related to reftests. > > If it will take long time to make new-run-webkit-test default test runner, we might reconsider our plan.

Yuzo Fujishima

Comment 23 2010-07-27 03:07:28 PDT

(In reply to comment #22) > It seems that only WebKitTools/Scripts/{old-run-webkit-tests,prepare-ChangeLog} and WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_files.py > need to be changed to ignore -{not,}ref.html. (See supportedFileExtensions) > > Because the needed change seems to be simple, I'd recommend just changing those files > and going ahead, rather than waiting for new-run-webkit-test. > > (In reply to comment #21) > > Thank you for the comment. > > > > (In reply to comment #18) > > > Will it upset the tools too much if the ref, noref, etc. were part of the extension instead of part of the filename, e.g. foo.html-ref, foo.html-ref-!= ? > > > > Yes, adding xx-{ref,notref}.html to LayoutTest directory will upsets existing tools. > > Because we will add reftest support to only new-run-webkit-test in the current plan, we have to update other existing tools so that they should ignore such html files related to reftests. > > > > If it will take long time to make new-run-webkit-test default test runner, we might reconsider our plan.

Hayato Ito

Comment 24 2010-08-04 20:26:41 PDT

Update on file naming convention: I've changed my mind to use the following naming conventions for reftests: 1. foo.html and foo-expected.html Which are used for reftests where both rendering results are expected to be equal. 2. foo.html and foo-expected-notequal.html Which are used for reftests where both rendering results are expected to be different. The motivation of this naming is to match existing file name conventions. There are already conventions in WebKit Tests such as , 'xxx-expected.txt', 'xxx-expected.png' and 'xxx-expected.checksum'. So it looks natural for me to use 'xxx-expected.YYY' naming pattern for reftests.

Dimitri Glazkov (Google)

Comment 25 2010-08-05 08:39:27 PDT

(In reply to comment #24) > Update on file naming convention: > > I've changed my mind to use the following naming conventions for reftests: > > 1. foo.html and foo-expected.html > Which are used for reftests where both rendering results are expected to be equal. > > 2. foo.html and foo-expected-notequal.html > Which are used for reftests where both rendering results are expected to be different. > > The motivation of this naming is to match existing file name conventions. There are already conventions in WebKit Tests such as , 'xxx-expected.txt', 'xxx-expected.png' and 'xxx-expected.checksum'. So it looks natural for me to use 'xxx-expected.YYY' naming pattern for reftests. Sounds good.

Darin Adler

Comment 26 2010-08-05 11:13:26 PDT

(In reply to comment #24) > 2. foo.html and foo-expected-notequal.html > Which are used for reftests where both rendering results are expected to be different. I don’t think you should merge the words "not" and "equal" into "notequal". If you can’t think of a better phrase, then I suggest "expected-not-equal.html" but the phrase "expected not equal" is not good grammar, so maybe we can come up with a better phrase. Generally speaking, an entire test that only says "these must not be identical" seems like a pretty blunt tool. I mean, it's good to make sure that bolding actually does something, but an entire test of this sort can only test one thing! That seems to be a general downside of ref tests. Each test can only effectively check one thing. Unlike dumpAsText tests, where we can test a lot of things in a single test.

Dirk Pranke

Comment 27 2010-08-05 11:25:47 PDT

(In reply to comment #26) > (In reply to comment #24) > > 2. foo.html and foo-expected-notequal.html > > Which are used for reftests where both rendering results are expected to be different. > > I don’t think you should merge the words "not" and "equal" into "notequal". > > If you can’t think of a better phrase, then I suggest "expected-not-equal.html" but the phrase "expected not equal" is not good grammar, so maybe we can come up with a better phrase. > Perhaps "expected-mismatch" or "expected-different" ?

Darin Adler

Comment 28 2010-08-05 12:18:27 PDT

(In reply to comment #27) > Perhaps "expected-mismatch" or "expected-different" ? I like expected-mismatch.

Dimitri Glazkov (Google)

Comment 29 2010-08-05 12:46:28 PDT

(In reply to comment #28) > (In reply to comment #27) > > Perhaps "expected-mismatch" or "expected-different" ? > > I like expected-mismatch. Me too.

Hayato Ito

Comment 30 2010-08-10 21:11:58 PDT

Thank you for comments. I like xx-expected-mismatch.html too. I'll use that convention. (In reply to comment #29) > (In reply to comment #28) > > (In reply to comment #27) > > > Perhaps "expected-mismatch" or "expected-different" ? > > > > I like expected-mismatch. > > Me too.

Simon Fraser (smfr)

Comment 31 2010-09-30 14:43:59 PDT

Any progress here?

Hayato Ito

Comment 32 2010-09-30 20:45:57 PDT

In progress. I am actually working on adding reftest support. I think I can submit an initial patch in a week or later for an early review. That might have a coule of rooms to be improved, but I need early feedbacks and reviews. (In reply to comment #31) > Any progress here?

Hayato Ito

Comment 33 2010-10-19 21:25:15 PDT

Created attachment 71245 [details] snapshot

Hayato Ito

Comment 34 2010-10-19 21:28:19 PDT

The attachment is not for the review. That needs to be polished for the review. I am preparing a patch as an initial effort. (In reply to comment #33) > Created an attachment (id=71245) [details] > snapshot

Shinichiro Hamaji

Comment 35 2010-10-19 22:05:51 PDT

Comment on attachment 71245 [details] snapshot View in context: https://bugs.webkit.org/attachment.cgi?id=71245&action=review The overall approach looks fine. What is still missing? Maybe reftest_diff.py needs more work? I'm very glad to see reftest is becoming a real! > WebKitTools/Scripts/old-run-webkit-tests:2254 > + if ($filename =~ /-expected(-mismatch)?\.html$/) { expectedTag? Also, we may want to consider adding mismatchTag for consistency? > WebKitTools/Scripts/old-run-webkit-tests:2273 > + if (exists $supportedFileExtensions{$extension}) { I'd say exists $supportedFileExtensions($extension) && !isUsedInReftest($filename) > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:90 > + """Receives the output from a DumpRenderTree process, subjects it to a It would be better to have a summary in the first line of a docstring. PEP257 says "Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description.". http://www.python.org/dev/peps/pep-0257/ It's not a strict rule for us, but we are trying to apply style-related PEPs as possible. > WebKitTools/Scripts/webkitpy/layout_tests/port/test_files.py:54 > +_reftest_expected_suffixes = set(['-expected.html', 'expected-mismatch.html']) -expected-mismatch maybe? > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:141 > + def __repr__(self): Do we need this? For many objects, eval(repr(obj)) should be equal to obj but this definition doesn't satisfy this convention.

Hayato Ito

Comment 36 2010-10-19 23:54:06 PDT

Thank you for the quick review. I should have added a comment, 'please don't take a look at the attachment in details. The attachment is only for showing an overall approach. That contains a lot of stuff which are only used for debugging' :) It was my fault. I am now polishing my patch for the review. Anyway, thank you for the review. That is helpful. I'll reflect your comments in following patch. (In reply to comment #35) > (From update of attachment 71245 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=71245&action=review > > The overall approach looks fine. What is still missing? Maybe reftest_diff.py needs more work? > > I'm very glad to see reftest is becoming a real! > > > WebKitTools/Scripts/old-run-webkit-tests:2254 > > + if ($filename =~ /-expected(-mismatch)?\.html$/) { > > expectedTag? > > Also, we may want to consider adding mismatchTag for consistency? > > > WebKitTools/Scripts/old-run-webkit-tests:2273 > > + if (exists $supportedFileExtensions{$extension}) { > > I'd say > > exists $supportedFileExtensions($extension) && !isUsedInReftest($filename) > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:90 > > + """Receives the output from a DumpRenderTree process, subjects it to a > > It would be better to have a summary in the first line of a docstring. PEP257 says "Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description.". > > http://www.python.org/dev/peps/pep-0257/ > > It's not a strict rule for us, but we are trying to apply style-related PEPs as possible. > > > WebKitTools/Scripts/webkitpy/layout_tests/port/test_files.py:54 > > +_reftest_expected_suffixes = set(['-expected.html', 'expected-mismatch.html']) > > -expected-mismatch maybe? > > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:141 > > + def __repr__(self): > > Do we need this? For many objects, eval(repr(obj)) should be equal to obj but this definition doesn't satisfy this convention.

Shinichiro Hamaji

Comment 37 2010-10-20 00:02:33 PDT

> I should have added a comment, 'please don't take a look at the attachment in details. The attachment is only for showing an overall approach. That contains a lot of stuff which are only used for debugging' :) > It was my fault. Yeah, I knew this patch isn't for review. Actually, I didn't review the patch in detail. I put some comments just because I found them while I'm looking your overall approach.

Hayato Ito

Comment 38 2010-11-04 06:28:37 PDT

Created attachment 72932 [details] snapshot2

Hayato Ito

Comment 39 2010-11-04 06:54:37 PDT

I've updated my patch. I am aware that this patch is incomplete, such as missing ChangeLog, including unnecessary sample reftests and so on. I'd like to keep this patch as small as possible for easy reviews. So I don't plan to add any features to this patch. Following patches should take care of other necessary changes. By this patch, we can run reftests by adding '--run-reftest' option to new-run-webkit-test . Without '--run-reftests', there should be no impacts to existing LayoutTest infrastructure. I'd like to commit patches incrementally until reftests becomes completely ready status. Early reviews are highly welcome. (In reply to comment #38) > Created an attachment (id=72932) [details] > snapshot2

Hayato Ito

Comment 40 2010-11-05 06:45:25 PDT

Created attachment 73060 [details] for-review

Shinichiro Hamaji

Comment 41 2010-11-08 05:07:43 PST

Comment on attachment 73060 [details] for-review View in context: https://bugs.webkit.org/attachment.cgi?id=73060&action=review Overall this patch looks good. Putting r- because I think you might want to address some of my comments. Also, I think you can make the patch be applicable to ToT ? > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:79 > + # TODO(hayato): It might be better to increase the timeout two times when Please use FIXME instead of TODO > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:81 > + return _milliseconds_to_seconds( Do we really need this line break? > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:108 > test_types: A list of TestType objects to run the test output How about adding "layout" for this comment? "A list of TestType objects to run the *layout* test output" It might be clearer. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:30 > +"""A test runner which runs a single test file.""" I'd say we'll run either a layout test or a reftest. Runs a single test file (either a layout test or a reftest) or something like this. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:59 > + test_types, reftest_types): It's nice if we have a docstring with an Args section. > WebKitTools/Scripts/webkitpy/layout_tests/port/webkit.py:142 > + if tolerance is not None: A usual idiom for an optional parameter is if opt_param is None: opt_param = something So, I'd slightly prefer if tolerance is None: if self.get_option('tolerance') is not None: tolerance = self.... else tolerance = 0.1 > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:125 > + self.expected_mismatch_html_filename = None I think it's very easy to check the case where we have both foo-expected.html and foo-expected-mismatch.html ? Using _log.error would be nice. > WebKitTools/Scripts/webkitpy/layout_tests/test_types/reference_diff.py:50 > + if test_info.expected_html_filename: How about expected = None if test_info.expected_html_filename: ... expected = reference_image_contents else: ... self.write_output_files(...) ? > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:96 > + """Copies result files into the output directory with appropriate The first line of a docstring should be one line. I think "with appropriate names" has almost no information so I'd remove this. I guess the Args will tell what name is appropriate?

Hayato Ito

Comment 42 2010-11-09 20:12:52 PST

Created attachment 73454 [details] synced-to-tot

Hayato Ito

Comment 43 2010-11-09 20:20:03 PST

Thank you for the review. I've addressed all your comments. I also synched and rebased my patch to ToT. I've confirmed that all tests run by WebKitTools/Scripts/test-webkit-scripts passed even after applying this patch. I think I need to add more tests to test-webkit-scripts. But it would be in another patch. (In reply to comment #41) > (From update of attachment 73060 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=73060&action=review > > Overall this patch looks good. Putting r- because I think you might want to address some of my comments. Also, I think you can make the patch be applicable to ToT ? > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:79 > > + # TODO(hayato): It might be better to increase the timeout two times when > > Please use FIXME instead of TODO > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:81 > > + return _milliseconds_to_seconds( > > Do we really need this line break? > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:108 > > test_types: A list of TestType objects to run the test output > > How about adding "layout" for this comment? "A list of TestType objects to run the *layout* test output" It might be clearer. > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:30 > > +"""A test runner which runs a single test file.""" > > I'd say we'll run either a layout test or a reftest. > > Runs a single test file (either a layout test or a reftest) > > or something like this. > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:59 > > + test_types, reftest_types): > > It's nice if we have a docstring with an Args section. > > > WebKitTools/Scripts/webkitpy/layout_tests/port/webkit.py:142 > > + if tolerance is not None: > > A usual idiom for an optional parameter is > > if opt_param is None: > opt_param = something > > So, I'd slightly prefer > > if tolerance is None: > if self.get_option('tolerance') is not None: > tolerance = self.... > else > tolerance = 0.1 > > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:125 > > + self.expected_mismatch_html_filename = None > > I think it's very easy to check the case where we have both foo-expected.html and foo-expected-mismatch.html ? Using _log.error would be nice. > > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/reference_diff.py:50 > > + if test_info.expected_html_filename: > > How about > > expected = None > if test_info.expected_html_filename: > ... > expected = reference_image_contents > else: > ... > self.write_output_files(...) > > ? > > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:96 > > + """Copies result files into the output directory with appropriate > > The first line of a docstring should be one line. I think "with appropriate names" has almost no information so I'd remove this. I guess the Args will tell what name is appropriate?

Hayato Ito

Comment 44 2010-11-09 20:33:50 PST

One more note: This patch is adding new test_result_type in json result file as follows: test_expectations.REFERENCE_MISMATCH: "R", test_expectations.REFERENCE_MATCH: "M", So I guess buildbots will need to understand these kinds of test result. (In reply to comment #43) > Thank you for the review. > > I've addressed all your comments. I also synched and rebased my patch to ToT. > I've confirmed that all tests run by WebKitTools/Scripts/test-webkit-scripts passed even after applying this patch. > > I think I need to add more tests to test-webkit-scripts. But it would be in another patch. > > > (In reply to comment #41) > > (From update of attachment 73060 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=73060&action=review > > > > Overall this patch looks good. Putting r- because I think you might want to address some of my comments. Also, I think you can make the patch be applicable to ToT ? > > > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:79 > > > + # TODO(hayato): It might be better to increase the timeout two times when > > > > Please use FIXME instead of TODO > > > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:81 > > > + return _milliseconds_to_seconds( > > > > Do we really need this line break? > > > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:108 > > > test_types: A list of TestType objects to run the test output > > > > How about adding "layout" for this comment? "A list of TestType objects to run the *layout* test output" It might be clearer. > > > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:30 > > > +"""A test runner which runs a single test file.""" > > > > I'd say we'll run either a layout test or a reftest. > > > > Runs a single test file (either a layout test or a reftest) > > > > or something like this. > > > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:59 > > > + test_types, reftest_types): > > > > It's nice if we have a docstring with an Args section. > > > > > WebKitTools/Scripts/webkitpy/layout_tests/port/webkit.py:142 > > > + if tolerance is not None: > > > > A usual idiom for an optional parameter is > > > > if opt_param is None: > > opt_param = something > > > > So, I'd slightly prefer > > > > if tolerance is None: > > if self.get_option('tolerance') is not None: > > tolerance = self.... > > else > > tolerance = 0.1 > > > > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:125 > > > + self.expected_mismatch_html_filename = None > > > > I think it's very easy to check the case where we have both foo-expected.html and foo-expected-mismatch.html ? Using _log.error would be nice. > > > > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/reference_diff.py:50 > > > + if test_info.expected_html_filename: > > > > How about > > > > expected = None > > if test_info.expected_html_filename: > > ... > > expected = reference_image_contents > > else: > > ... > > self.write_output_files(...) > > > > ? > > > > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:96 > > > + """Copies result files into the output directory with appropriate > > > > The first line of a docstring should be one line. I think "with appropriate names" has almost no information so I'd remove this. I guess the Args will tell what name is appropriate?

Dirk Pranke

Comment 45 2010-11-10 00:45:01 PST

Hi Ito-san, It's great that you're getting so close, and I apologize for not getting to look at this earlier. I will post more detailed comments tomorrow afternoon, but here's some initial thoughts on the general approach. 1) this is not going to be a pretty merge with the changes I'm making to restructure the code to use processes instead of threads, but it shouldn't be horrible, either. One amusing side effect is that you've moved all of the real testing work out of dump_render_tree_thread and into single_test_runner; My patch moves all of the non-testing work out of dump_render_tree_thread. It'll probably be easier if you keep the two files combined for now. 2) I'm not sure that we want to expose the concept of a reftest as an alternative to a regular test; I'm wondering if we can instead restructure things so that port.expected_text/expected_checksum/expected_image returns the output of the reftest run if there's a reftest, and the checked-in output otherwise. Of course, we'd have to replace the three routines with a single routine that returned all of the types of output. I'm guessing you've already thought about this approach and rejected it in favor of the approach you have now? As you correctly note somewhere in the patch, it would be weird to have both reference tests and reference output, so we should make that impossible. 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities? Anyway, more tomorrow ...

Dirk Pranke

Comment 46 2010-11-10 00:47:43 PST

Caveat - I didn't actually read through the whole bug again before reading the patch (or any of the referenced links), so I may have asked questions that have already been answered. I will read through everything again tomorrow, so if it makes sense, you can hold off on pointing out my stupidity until I can catch it myself :)

Hayato Ito

Comment 47 2010-11-10 21:33:46 PST

Hi Dirk, Thank you for comments, I appreciate that! (In reply to comment #45) > Hi Ito-san, > > It's great that you're getting so close, and I apologize for not getting to look at this earlier. I will post more detailed comments tomorrow afternoon, but here's some initial thoughts on the general approach. > > 1) this is not going to be a pretty merge with the changes I'm making to restructure the code to use processes instead of threads, but it shouldn't be horrible, either. One amusing side effect is that you've moved all of the real testing work out of dump_render_tree_thread and into single_test_runner; My patch moves all of the non-testing work out of dump_render_tree_thread. It'll probably be easier if you keep the two files combined for now. > Thank you for letting me know it. I'll try to merge that later if needed. That should not be a problem, I think. > 2) I'm not sure that we want to expose the concept of a reftest as an alternative to a regular test; I'm wondering if we can instead restructure things so that port.expected_text/expected_checksum/expected_image returns the output of the reftest run if there's a reftest, and the checked-in output otherwise. Of course, we'd have to replace the three routines with a single routine that returned all of the types of output. > > I'm guessing you've already thought about this approach and rejected it in favor of the approach you have now? Exactly! I feel happy to hear you thought the same thing as I thought:) In fact, I tried to take that approach at first, which seems a natural expansion. Let me explain briefly. At first attempt, I tried to simply add a test type of ReferenceDiff and put it to test_types and run each test_type against both existing layout tests and reftests. I'd like to treat them uniformly. But I abandoned this idea due to the following reasons as far as I remember: - It requires many tweaks to handle each combination of results from each test type (a list of test_failures). The code became messy. - I tried to get expected_image for reftest on demand within compare_output(), but that need to run 'driver' again. My impression on current code base tells that when we call _process_output (and compare_output() that follows), we should finish all time-consuming tasks by then. I'd like to avoid calling DumpRenderTree again there. It doesn't look good to me. I found that there are some 'gaps' between both tests than I expected initially. So instead of struggling to apply compare_output() function to reftests for consistency, I just introduced new compare function, compare_actual_output_with_reference(). The overall code became simple by early branching codepath between layout tests and reftests. I am aware that this patch is not ideal from the point of view in OO design. Actually there is no significant reason that a test type of ReferenceDiff is a subtype of TestBaseType. Sigh. > > As you correctly note somewhere in the patch, it would be weird to have both reference tests and reference output, so we should make that impossible. > > 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities? Nice point. I tried to use one test file for both layout tests and reftest, supporting all of the combinations. But I have no confidence that there are enough use cases for that. So until we found that many developers really want to use one test file for both tests and found a nice way to treat them together, I'd like to be conservative and separate existing layout tests and reftests. I am looking forward to your more feed backs. Thank you! > > Anyway, more tomorrow ...

Shinichiro Hamaji

Comment 48 2010-11-10 21:47:07 PST

> > 2) I'm not sure that we want to expose the concept of a reftest as an alternative to a regular test; I'm wondering if we can instead restructure things so that port.expected_text/expected_checksum/expected_image returns the output of the reftest run if there's a reftest, and the checked-in output otherwise. Of course, we'd have to replace the three routines with a single routine that returned all of the types of output. > > > > I'm guessing you've already thought about this approach and rejected it in favor of the approach you have now? > > Exactly! I feel happy to hear you thought the same thing as I thought:) In fact, I tried to take that approach at first, which seems a natural expansion. Let me explain briefly. > > At first attempt, I tried to simply add a test type of ReferenceDiff and put it to test_types and run each test_type against both existing layout tests and reftests. I'd like to treat them uniformly. > But I abandoned this idea due to the following reasons as far as I remember: > > - It requires many tweaks to handle each combination of results from each test type (a list of test_failures). The code became messy. > - I tried to get expected_image for reftest on demand within compare_output(), but that need to run 'driver' again. > My impression on current code base tells that when we call _process_output (and compare_output() that follows), we should finish all time-consuming tasks by then. I'd like to avoid calling DumpRenderTree again there. It doesn't look good to me. > > I found that there are some 'gaps' between both tests than I expected initially. So instead of struggling to apply compare_output() function to reftests for consistency, I just introduced new compare function, compare_actual_output_with_reference(). > The overall code became simple by early branching codepath between layout tests and reftests. > > I am aware that this patch is not ideal from the point of view in OO design. Actually there is no significant reason that a test type of ReferenceDiff is a subtype of TestBaseType. Sigh. I'm not sure if I understand this discussion well, but I'd like to mention supporting "expected-mismatch" would make reusing existing code a bit more difficult. > > 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities? > > Nice point. I tried to use one test file for both layout tests and reftest, supporting all of the combinations. But I have no confidence that there are enough use cases for that. > So until we found that many developers really want to use one test file for both tests and found a nice way to treat them together, I'd like to be conservative and separate existing layout tests and reftests. I think text diff is pretty useful and we need text diff soon or later. The images DRT output only have 800x600 pixels so some mismatch could be ignored if an HTML is big.

Dirk Pranke

Comment 49 2010-11-10 23:40:50 PST

Okay, it is later than I had hoped for and I didn't have time to review all of the Mozilla documentation. So, as far as Mozilla-compatibility goes, I will say for the moment that I think we need to burn some serious brain cycles on how we can develop interoperable test suites. But, I think that that should be viewed as a separate effort from getting reftests working initially in WebKit. I plan to follow up on this w/ Hixie and Tab Atkins and can happily involve whomever is cc'ed on this list who is interested - let me know off-bug.

Dirk Pranke

Comment 50 2010-11-11 00:10:14 PST

(In reply to comment #47) > At first attempt, I tried to simply add a test type of ReferenceDiff and put it to test_types and run each test_type against both existing layout tests and reftests. I'd like to treat them uniformly. > But I abandoned this idea due to the following reasons as far as I remember: > > - It requires many tweaks to handle each combination of results from each test type (a list of test_failures). The code became messy. > - I tried to get expected_image for reftest on demand within compare_output(), but that need to run 'driver' again. > My impression on current code base tells that when we call _process_output (and compare_output() that follows), we should finish all time-consuming tasks by then. I'd like to avoid calling DumpRenderTree again there. It doesn't look good to me. > I don't think I expressed myself well in the previous comment. What I was trying to say is that I think reftests are just a way of generating the expected output that we should compare the actual output to. They aren't actually a different kind of test. So, you don't need to create ReferenceDiffs or add different reference test result types or expectations. I think virtually all, if not all, of the changes can (or perhaps should) be confined to dump_render_tree_thread.py . The major implication of this is that the test_types classes need to be refactored to have the expected output passed in to them. I have also been considering pulling all of the --new-baseline logic out of test_types, because it feels wrong to be there as well, but I go off on a tangent. You are definitely correct that _process_output and _compare_output should be fast and stable, and so we definitely shouldn't call out to DRT again inside them. The method signatures are also really clumsy. Part of the refactoring I hope to land in the next couple days will introduce something like a TestResult class that holds many of the arguments passed to process_output() current (and that also becomes a method on TestShellThread, much like you've done in SingleTestRunner). I realize that this'll play substantial havoc with your patch, but it feels pretty strongly to me at the moment that it's the right thing to do. I'll cc you on the other patches so you can see the changes I'm talking about. Note that what I'm saying implies that, for a given test, we can only do regular tests against checked-in output or tests against an -expected.html file, not both. I think that this is fine and preferable in the long run. I don't think there will be much of an issue here, because we don't have any existing ref tests that we need to support. We can just start adding new ones. Does this sound good? Are there issues that you've run into that I'm not yet seeing? > > 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities? > > Nice point. I tried to use one test file for both layout tests and reftest, supporting all of the combinations. But I have no confidence that there are enough use cases for that. I'm not sure if we understood each other here, but I'm not sure if it matters. Let me explain why ... from conversations today with Dimitri and Darin, I think we agreed that there are only two kinds of reftests that matter. The first is a text-only test (dumpAsText or dumpAsMarkup). The second is an image-only test, where we care about the pixels being rendered but don't care at all about the render tree. I *think* in the first case any two text-only sets of output could probably be rewritten as a single test that produced text only output (you could merge the test.html and test-expected.html into one single test file that produced an -expected.txt text-only output). Conservatively, even if we don't do this, it still reduces to comparing two text files, which of course we support now. Similarly, comparing two image files is already supported. There's nothing special about the fact that the one image was generated from a -expected.html file instead of checked in as -expected.png. So I think, to answer my own questions, we don't need to support any possibilities with reftests beyond what we already have (and, in fact, we probably have fewer combinations, since we won't have IMAGE+TEXT). I will also note that Hamaji-san raises a good point - I can't forget about -expected-mismatch.html, so compare_output probably needs to grow a match/mismatch bool. Or, maybe better, when we create the new TestResult object we pass in the flag and it inverts the result of test_failures.determine_result_type() as necessary.

Hayato Ito

Comment 51 2010-11-11 00:57:18 PST

Hi Dirk, Thank you for comments. I appreciate that. In short answer, I agree. I agree that a refrest is a just yet another way to generate expected image or text. I had noticed that indeed. But I also found that it was difficult to realize that in current codebase without an aggressive refactoring. It seems that I should have refactored the code base more eagerly without *fear*. Maybe I am a little bit biased to reusing existing codebase as soon as possible. If such a refactoring is a way to go, I'd like to do it. Maybe I should wait for your patch in the next couple of days. (In reply to comment #50) > (In reply to comment #47) > > At first attempt, I tried to simply add a test type of ReferenceDiff and put it to test_types and run each test_type against both existing layout tests and reftests. I'd like to treat them uniformly. > > But I abandoned this idea due to the following reasons as far as I remember: > > > > - It requires many tweaks to handle each combination of results from each test type (a list of test_failures). The code became messy. > > - I tried to get expected_image for reftest on demand within compare_output(), but that need to run 'driver' again. > > My impression on current code base tells that when we call _process_output (and compare_output() that follows), we should finish all time-consuming tasks by then. I'd like to avoid calling DumpRenderTree again there. It doesn't look good to me. > > > > I don't think I expressed myself well in the previous comment. What I was trying to say is that I think reftests are just a way of generating the expected output that we should compare the actual output to. They aren't actually a different kind of test. So, you don't need to create ReferenceDiffs or add different reference test result types or expectations. I think virtually all, if not all, of the changes can (or perhaps should) be confined to dump_render_tree_thread.py . > > The major implication of this is that the test_types classes need to be refactored to have the expected output passed in to them. I have also been considering pulling all of the --new-baseline logic out of test_types, because it feels wrong to be there as well, but I go off on a tangent. > > You are definitely correct that _process_output and _compare_output should be fast and stable, and so we definitely shouldn't call out to DRT again inside them. The method signatures are also really clumsy. > > Part of the refactoring I hope to land in the next couple days will introduce something like a TestResult class that holds many of the arguments passed to process_output() current (and that also becomes a method on TestShellThread, much like you've done in SingleTestRunner). I realize that this'll play substantial havoc with your patch, but it feels pretty strongly to me at the moment that it's the right thing to do. I'll cc you on the other patches so you can see the changes I'm talking about. > > Note that what I'm saying implies that, for a given test, we can only do regular tests against checked-in output or tests against an -expected.html file, not both. I think that this is fine and preferable in the long run. I don't think there will be much of an issue here, because we don't have any existing ref tests that we need to support. We can just start adding new ones. > > Does this sound good? Are there issues that you've run into that I'm not yet seeing? > > > > 3) I wonder if it makes sense to have reftests that produce only text output, or should they always produce both text and images? Or does it make sense if they *only* produce an image? Will the render tree be different for the reference and the test? The same? Either/or? Do we need to support all of the possibilities? > > > > Nice point. I tried to use one test file for both layout tests and reftest, supporting all of the combinations. But I have no confidence that there are enough use cases for that. > > I'm not sure if we understood each other here, but I'm not sure if it matters. Let me explain why ... > from conversations today with Dimitri and Darin, I think we agreed that there are only two kinds of reftests that matter. The first is a text-only test (dumpAsText or dumpAsMarkup). The second is an image-only test, where we care about the pixels being rendered but don't care at all about the render tree. > > I *think* in the first case any two text-only sets of output could probably be rewritten as a single test that produced text only output (you could merge the test.html and test-expected.html into one single test file that produced an -expected.txt text-only output). Conservatively, even if we don't do this, it still reduces to comparing two text files, which of course we support now. > > Similarly, comparing two image files is already supported. There's nothing special about the fact that the one image was generated from a -expected.html file instead of checked in as -expected.png. So I think, to answer my own questions, we don't need to support any possibilities with reftests beyond what we already have (and, in fact, we probably have fewer combinations, since we won't have IMAGE+TEXT). > > I will also note that Hamaji-san raises a good point - I can't forget about -expected-mismatch.html, so compare_output probably needs to grow a match/mismatch bool. Or, maybe better, when we create the new TestResult object we pass in the flag and it inverts the result of test_failures.determine_result_type() as necessary.

Dirk Pranke

Comment 52 2010-11-11 11:41:34 PST

(In reply to comment #51) > Hi Dirk, > Thank you for comments. I appreciate that. > > In short answer, I agree. > > I agree that a refrest is a just yet another way to generate expected image or text. > I had noticed that indeed. But I also found that it was difficult to realize that in current codebase without an aggressive refactoring. > It could be that I know this codebase well, or it could be that I'm just overly aggressive, but I have no fear of this :) The good news is that this code is very well tested, so you can make changes without too much concern. run_webkit_tests_unittest.py is your friend. > If such a refactoring is a way to go, I'd like to do it. Terrific! Having thought abou this overnight, I think you can make most of the changes without colliding too much with the changes I'm working on. I suggest approaching this by working on a completely separate patch that refactors the code as it's currently checked in, and create a TestOutput POD (plain old data) class that is returned from port.Driver.run_test(), and then create a separate instance for the"expected" results, and then pass the two to _process_output() and the test_types routines. Once you do that, adding in reftest support to generate a different expected output instance should be straightforward.

Hayato Ito

Comment 53 2010-11-12 02:04:50 PST

(In reply to comment #52) Hi Dirk, Thank you for the commet. I've refactored the current code as a completely separate patch. Please see: https://bugs.webkit.org/show_bug.cgi?id=49431 Could you review this patch? > Having thought abou this overnight, I think you can make most of the changes without colliding too much with the changes I'm working on. I suggest approaching this by working on a completely separate patch that refactors the code as it's currently checked in, and create a TestOutput POD (plain old data) class that is returned from port.Driver.run_test(), and then create a separate instance for the"expected" results, and then pass the two to _process_output() and the test_types routines.

Hayato Ito

Comment 54 2011-03-01 02:28:53 PST

Let me update the current status. Now, I've finished the necessary refactoring for new-run-webkit-tests, I just started to work on adding reftests support. I hope I can post a patch in a few days.

Hayato Ito

Comment 55 2011-03-08 23:28:37 PST

I am starting to write Wiki page about how to use reftests. https://trac.webkit.org/wiki/Writing%20Reftests

Hayato Ito

Comment 56 2011-03-09 02:21:46 PST

I am aiming to add reftest support to new-run-webkit-tests in this quarter. But I think new-run-webkit-tests has not replaced old-run-webkit-tests yet. I am updating old-run-webkit-tests so that it ignores reftests: https://bugs.webkit.org/show_bug.cgi?id=55936 But I am not sure which means that we can check in reftests in LayoutTests directory. Should we refrain us from checking in reftests until new-run-webkit-tests replaces old-run-webkit-tests? Ojan gave me an idea: .. we might be able to get away with adding a mode to new-run-webkit-tests to run just the reftests and then shelling out from run-webkit-tests to new-run-webkit-tests or just requiring people to run both until new-run-webkit-tests replaces old-run-webkit-tests. But I am not sure what is the best way. Could someone summarize where old-run-webkit-tests is used?

Adam Roben (:aroben)

Comment 57 2011-03-09 05:28:38 PST

(In reply to comment #56) > Could someone summarize where old-run-webkit-tests is used? No one uses new-run-webkit-tests by default. See Tools/Scripts/run-webkit-tests, which decides whether NRWT or ORWT is used. I think there are a few configurations that explicitly use NRWT (Chromium?) on their bots. My understanding is that NRWT does not work at all on some platforms, and thus shelling out to it is not an option. Dirk Pranke would know more, though.

Dirk Pranke

Comment 58 2011-03-09 12:34:32 PST

(In reply to comment #57) > (In reply to comment #56) > > Could someone summarize where old-run-webkit-tests is used? > > No one uses new-run-webkit-tests by default. See Tools/Scripts/run-webkit-tests, which decides whether NRWT or ORWT is used. I think there are a few configurations that explicitly use NRWT (Chromium?) on their bots. Well, Chromium uses NRWT by default ;) ORWT doesn't even work w/ Chromium. > > My understanding is that NRWT does not work at all on some platforms, and thus shelling out to it is not an option. Dirk Pranke would know more, though. NRWT works everywhere but on the base Win platform AFAIK. I believe GTK team has a bot that runs NRWT regularly. As soon as I wrap up my current project (should happen this week), I plan to start the push to get NRWT working and running everywhere by default. I'd like to use reftests as one of the carrots for this, along with the speed increases and the fact that it's being actively maintained. So, I suggest we do the minimal amount of work on ORWT necessary to not break things. I'm not expecting a huge influx of reftests over the next few weeks, so hopefully this is the right strategy.

Darin Adler

Comment 59 2011-03-09 12:37:15 PST

(In reply to comment #58) > NRWT works everywhere but on the base Win platform AFAIK. I am surprised to hear it has WebKit2 support.

Dirk Pranke

Comment 60 2011-03-09 12:39:58 PST

(In reply to comment #59) > (In reply to comment #58) > > NRWT works everywhere but on the base Win platform AFAIK. > > I am surprised to hear it has WebKit2 support. Ah, right. Yeah, it doesn't work with WebKit2, either. I haven't yet looked into how hard it will be to add that, but I expect it should be pretty trivial.

Hayato Ito

Comment 61 2011-03-09 23:33:53 PST

Thank you for the info. Giving that new-run-webkit-tests doesn't run on all platforms, it might not be worth adding a 'reftest' mode, which runs only reftests, to new-run-webkit-tests because that won't improve the situations at all. So for a while, reftests will be only available to platforms which run new-run-webkit-tests. To choose reftests or not to choose is a test writer's freedom, but they should be aware that old-run-webkit-tests skips reftests when choosing reftests in writing new tests, I think. (In reply to comment #57) > (In reply to comment #56) > > Could someone summarize where old-run-webkit-tests is used? > > No one uses new-run-webkit-tests by default. See Tools/Scripts/run-webkit-tests, which decides whether NRWT or ORWT is used. I think there are a few configurations that explicitly use NRWT (Chromium?) on their bots. > > My understanding is that NRWT does not work at all on some platforms, and thus shelling out to it is not an option. Dirk Pranke would know more, though.

Darin Adler

Comment 62 2011-03-10 14:04:32 PST

I think it was great when we renamed Chromium’s test runner to new-run-webkit-tests and renamed the old test runner to old-run-webkit-tests, implying that some day we would all switch to it the Chromium one. As long as it’s only used by Chromium, though, that name change isn’t so significant. When will the transition happen? Who is maintaining the list of things we need to do to new-run-webkit-tests before we can switch everyone to it? Is there a bug tracking this effort?

Mihai Parparita

Comment 63 2011-03-10 14:31:10 PST

(In reply to comment #62) > When will the transition happen? Who is maintaining the list of things we need to do to new-run-webkit-tests before we can switch everyone to it? Is there a bug tracking this effort? Bug 34984 is the master bug for switching to NRWT, it depends on a bunch of others.

Dirk Pranke

Comment 64 2011-03-10 16:29:47 PST

(In reply to comment #63) > (In reply to comment #62) > > When will the transition happen? Who is maintaining the list of things we need to do to new-run-webkit-tests before we can switch everyone to it? Is there a bug tracking this effort? > > Bug 34984 is the master bug for switching to NRWT, it depends on a bunch of others. Darin, I've added you to that bug, taken ownership of it, and added a comment. I will be putting together a better list of the remaining tasks soon.

Hayato Ito

Comment 65 2011-03-16 18:58:15 PDT

Let me update the status. - new-run-webkit-tests now supports reftest. https://bugs.webkit.org/show_bug.cgi?id=55457 was landed. - old-run-webkit-tests now ignores reftests. https://bugs.webkit.org/show_bug.cgi?id=55936 was landed. I think that I had done all required changes so that we can check in reftests to the LayoutTests directory. All build bots or dashboard shouldn't be upset even if reftests are checked-in. I tried to maintain compatibility with that existing LayoutTest infrastructure as possible as I can. In order to know what will happen and what additional supports we should add, I'll check-in the first minimum sample reftest here: https://bugs.webkit.org/show_bug.cgi?id=56450 If I am missing something, please let me know.

Hayato Ito

Comment 66 2011-03-29 02:02:26 PDT

I confirmed that the first reftest didn't upset any build bots. So I think it is ready to check-in reftests as long as test writers are aware that reftests work only with new-run-webkit-tests. I've updated the Wiki, https://trac.webkit.org/wiki/Writing%20Reftests, so that it emphasizes this limitation. Also I'd like to know what changes are nice to have in flakiness dashboard. That should work as is. But I think there are some areas to be improved in flakiness dashboard for reftests. (In reply to comment #65) > Let me update the status. > > - new-run-webkit-tests now supports reftest. > https://bugs.webkit.org/show_bug.cgi?id=55457 was landed. > > - old-run-webkit-tests now ignores reftests. > https://bugs.webkit.org/show_bug.cgi?id=55936 was landed. > > I think that I had done all required changes so that we can check in reftests to the LayoutTests directory. > > All build bots or dashboard shouldn't be upset even if reftests are checked-in. I tried to maintain compatibility with that existing LayoutTest infrastructure as possible as I can. > > In order to know what will happen and what additional supports we should add, I'll check-in the first minimum sample reftest here: > https://bugs.webkit.org/show_bug.cgi?id=56450 > > > If I am missing something, please let me know.

Ojan Vafai

Comment 67 2011-03-30 00:11:31 PDT

You'll need to update http://codesearch.google.com/codesearch/p#OAMlx_jo-ck/tools/dashboards/flakiness_dashboard.html&l=2292 to load the -mismatch.html and -expected.html files when showing results, e.g. at http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fharness%2Ffont-weight-bold.html&showExpectations=true

Hayato Ito

Comment 68 2011-04-07 01:28:59 PDT

Hi, Ojan Thank for the info. I filed it as http://code.google.com/p/chromium/issues/detail?id=78698. (In reply to comment #67) > You'll need to update http://codesearch.google.com/codesearch/p#OAMlx_jo-ck/tools/dashboards/flakiness_dashboard.html&l=2292 to load the -mismatch.html and -expected.html files when showing results, e.g. at http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fharness%2Ffont-weight-bold.html&showExpectations=true

Dirk Pranke

Comment 69 2011-12-21 14:06:21 PST

can we close this now?

Darin Adler

Comment 70 2011-12-21 14:06:46 PST

I am going to do that.

Attachments
snapshot (57.60 KB, patch) 2010-10-19 21:25 PDT, Hayato Ito	no flags	Details Formatted Diff Diff
snapshot2 (95.47 KB, patch) 2010-11-04 06:28 PDT, Hayato Ito	no flags	Details Formatted Diff Diff
for-review (50.59 KB, patch) 2010-11-05 06:45 PDT, Hayato Ito	no flags	Details Formatted Diff Diff
synced-to-tot (51.89 KB, patch) 2010-11-09 20:12 PST, Hayato Ito	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.