38879 – new-run-webkit-tests: implement "--reset-results"

RESOLVED FIXED 38879

new-run-webkit-tests: implement "--reset-results"

https://bugs.webkit.org/show_bug.cgi?id=38879

Summary new-run-webkit-tests: implement "--reset-results"

Dirk Pranke

Reported 2010-05-10 18:06:19 PDT

new-run-webkit-tests: add an '--update-baseline' switch

Attachments
Patch (20.28 KB, patch) 2010-05-10 18:07 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
cleaned-up patch, still no tests (9.00 KB, patch) 2010-05-11 14:22 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Patch (14.94 KB, patch) 2010-05-11 16:03 PDT, Dirk Pranke	ojan: review-	Details Formatted Diff Diff
update w/ feedback from ojan, eseidel, abarth (20.26 KB, patch) 2010-05-17 16:18 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
change chromium expectation from WONTFIX SKIP = FAIL to WONTFIX = MISSING (20.26 KB, patch) 2010-05-17 16:43 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
add test for platform-specific baseline, refactor into two unit tests, one for each switch (20.11 KB, patch) 2010-05-18 15:37 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
okay, use a separate layout_tests_dir under webkitpy/layout_tests/data to isolate unit test data from actual layout tests (34.71 KB, patch) 2010-05-18 17:11 PDT, Dirk Pranke	ojan: review+	Details Formatted Diff Diff
lint (35.03 KB, patch) 2010-05-18 17:13 PDT, Dirk Pranke	ojan: review+	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Dirk Pranke

Comment 1 2010-05-10 18:07:19 PDT

Created attachment 55631 [details] Patch

Dirk Pranke

Comment 2 2010-05-10 18:08:21 PDT

Note that the patch I just uploaded does not have any unit tests yet and so it isn't ready to land, but the change itself is complete.

Dirk Pranke

Comment 3 2010-05-11 14:22:08 PDT

Created attachment 55755 [details] cleaned-up patch, still no tests

Dirk Pranke

Comment 4 2010-05-11 16:03:29 PDT

Created attachment 55772 [details] Patch

Dirk Pranke

Comment 5 2010-05-11 16:04:15 PDT

Okay, the latest patch has unit tests and is ready for review.

Ojan Vafai

Comment 6 2010-05-13 16:20:20 PDT

Comment on attachment 55772 [details] Patch The code logic looks correct and I agree that this is necessary functionality. r- mostly just due to naming nits. --------------------------------- http://wkrietveld.appspot.com/38879/diff/1/5 File WebKitTools/Scripts/webkitpy/layout_tests/port/test.py (left): http://wkrietveld.appspot.com/38879/diff/1/5#oldcode141 WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:141: did you mean to delete these two blank lines? http://wkrietveld.appspot.com/38879/diff/1/5 File WebKitTools/Scripts/webkitpy/layout_tests/port/test.py (right): http://wkrietveld.appspot.com/38879/diff/1/5#newcode31 WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:31: from __future__ import with_statement Is this just to satisfy the linter? __future__ is new to me. http://wkrietveld.appspot.com/38879/diff/1/6 File WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py (right): http://wkrietveld.appspot.com/38879/diff/1/6#newcode1527 WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527: "generated results"), This description makes it sound like it will update every expected result for this test, when, in practice, it will update only the result that happens to be used on this platform. --new-baseline and --update-baseline are too similar of names. We shouldn't need the help text to remember which is which. How about: --new-baseline --> --update-platform-baseline --update-baseline --> --update-used-baseline It's a bit more wordy, but at least it's clear which is which. We could add short versions for easier typing (e.g. -b and -u?). http://wkrietveld.appspot.com/38879/diff/1/7 File WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py (right): http://wkrietveld.appspot.com/38879/diff/1/7#newcode106 WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:106: return original_open(name, mode, encoding) Should we make sure the mode here is 'r' to avoid ever writing to disk? http://wkrietveld.appspot.com/38879/diff/1/7#newcode124 WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:124: 'fast/canvas/canvas-zoom.html']) Should we also test a case where --update-baseline would update the results in platform/mac? http://wkrietveld.appspot.com/38879/diff/1/9 File WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py (right): http://wkrietveld.appspot.com/38879/diff/1/9#newcode97 WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:97: new_baseline=True): docstring? I don't like this argument name. It reads like this is the actual new baseline, not like a boolean. How about save_new_baseline? Also, if we end up changing the name of the command-line flag, I think this would be clearer. This could be update_platform_baseline.

Adam Barth

Comment 7 2010-05-13 17:01:51 PDT

How does this differ from --reset-results?

Dirk Pranke

Comment 8 2010-05-13 18:07:35 PDT

Okay, I attempted to use rietveld, but how do you publish the comments? Repeating them here in line ... (In reply to comment #6) > (From update of attachment 55772 [details]) > The code logic looks correct and I agree that this is necessary functionality. r- mostly just due to naming nits. > --------------------------------- > http://wkrietveld.appspot.com/38879/diff/1/5 > File WebKitTools/Scripts/webkitpy/layout_tests/port/test.py (left): > > http://wkrietveld.appspot.com/38879/diff/1/5#oldcode141 > WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:141: > did you mean to delete these two blank lines? No. good catch. > > http://wkrietveld.appspot.com/38879/diff/1/5 > File WebKitTools/Scripts/webkitpy/layout_tests/port/test.py (right): > > http://wkrietveld.appspot.com/38879/diff/1/5#newcode31 > WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:31: from __future__ import with_statement > Is this just to satisfy the linter? __future__ is new to me. > You need "from __future__" to be able to use the "with" statement in Python 2.5. > http://wkrietveld.appspot.com/38879/diff/1/6 > File WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py (right): > > http://wkrietveld.appspot.com/38879/diff/1/6#newcode1527 > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527: "generated results"), > This description makes it sound like it will update every expected result for this test, when, in practice, it will update only the result that happens to be used on this platform. > > --new-baseline and --update-baseline are too similar of names. We shouldn't need the help text to remember which is which. How about: > > --new-baseline --> --update-platform-baseline > --update-baseline --> --update-used-baseline > > It's a bit more wordy, but at least it's clear which is which. We could add short versions for easier typing (e.g. -b and -u?). I will take another spin at the names and the help text. I don't think these features are used enough to justify short versions. > > http://wkrietveld.appspot.com/38879/diff/1/7 > File WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py (right): > > http://wkrietveld.appspot.com/38879/diff/1/7#newcode106 > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:106: return original_open(name, mode, encoding) > Should we make sure the mode here is 'r' to avoid ever writing to disk? No, you need to be able to write some files (e.g., the images so they can be diffed, or the results files). > > http://wkrietveld.appspot.com/38879/diff/1/7#newcode124 > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:124: 'fast/canvas/canvas-zoom.html']) > Should we also test a case where --update-baseline would update the results in platform/mac? I don't see how this tests something interestingly different? > > http://wkrietveld.appspot.com/38879/diff/1/9 > File WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py (right): > > http://wkrietveld.appspot.com/38879/diff/1/9#newcode97 > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:97: new_baseline=True): > docstring? > > I don't like this argument name. It reads like this is the actual new baseline, not like a boolean. How about save_new_baseline? > > Also, if we end up changing the name of the command-line flag, I think this would be clearer. This could be update_platform_baseline. how about "use_platform_dir"?

Dirk Pranke

Comment 9 2010-05-13 18:15:11 PDT

(In reply to comment #7) > How does this differ from --reset-results? Hmm. I was not familiar with --reset-results in old-run-webkit-tests (new-run-webkit-tests doesn't have such a flag). The intended behavior of this new flag is to update the existing baseline wherever it may be found, rather than just writing a new baseline into the platform dir. Is that what reset-results is supposed to do? If so, I'd be happy to use that flag name instead of creating new ones. It looks like ORWT also has --add-platform-exceptions and --new-test-results, which look like they're similar to --new-baseline in NRWT; I'm not sure quite what the difference is between the two flags, though.

Dirk Pranke

Comment 10 2010-05-17 16:18:39 PDT

Created attachment 56287 [details] update w/ feedback from ojan, eseidel, abarth okay, this changes the flag name from --update-baseline to --reset-results (for consistency with old-run-webkit-tests), and adds tests for what happens when the baseline is missing because the file is "new".

Dirk Pranke

Comment 11 2010-05-17 16:20:00 PDT

change bug subject line from "add an --update-baseline switch" to "implement '--reset-results'"

Eric Seidel (no email)

Comment 12 2010-05-17 16:25:47 PDT

Although fast/harness has been used for testing DumpRenderTree bugs in the past. It seems rather strange to add this test here since you had to make so many Skipped edits as well as hack the path into run-webkit-tests itself! Seems this test would make more sense as a python unit test.

Dirk Pranke

Comment 13 2010-05-17 16:40:58 PDT

(In reply to comment #12) > Although fast/harness has been used for testing DumpRenderTree bugs in the past. It seems rather strange to add this test here since you had to make so many Skipped edits as well as hack the path into run-webkit-tests itself! Seems this test would make more sense as a python unit test. I don't think it's actually possible to do this as a python unit test, at least not any more cleanly than this (you'd end up duplicating logic to mock expectations all over the place). I would eventually like to have a set of tests that are supposed to provoke expected behavior in DRT (one that crashes, one that produces an IMAGE diff, etc) that are actually run. Unfortunately, there's no way with the Skipped files to indicate that. Thinking about it, though, I need to change the expectation in the chromium test_expectations from WONTFIX SKIP = FAIL to WONTFIX = MISSING. As far as hacking the file into test.py itself, that had to be done because there's no way for the test code to know whether there should be an image file or not if the expectation doesn't exist. A more general solution would be a way to mock expected results into test.py, but I didn't want to clutter this particular CL with that change.

Dirk Pranke

Comment 14 2010-05-17 16:43:04 PDT

Created attachment 56289 [details] change chromium expectation from WONTFIX SKIP = FAIL to WONTFIX = MISSING

Ojan Vafai

Comment 15 2010-05-17 16:59:14 PDT

(In reply to comment #13) > Thinking about it, though, I need to change the expectation in the chromium test_expectations from WONTFIX SKIP = FAIL to WONTFIX = MISSING. I'm confused, shouldn't we be skipping anything in the harness directory? These are only for the python tests, right? WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:146 + if uri.find('fast/harness/missing-expectation') != -1: No need to put this under fast. I think just a top-level harness directory makes sense if we do need this. WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527 + "generated results"), If we're going to go the route of matching old-run-webkit-tests, then you should change --new-baseline to --add-platform-exceptions. (In reply to comment #8) > Okay, I attempted to use rietveld, but how do you publish the comments? 1. Click on the rietveld review link. 2. Leave comments. 3. Click submit. It's a bit confusing because you go to the review page in order to respond to review feedback. Not sure how to make it better. > > http://wkrietveld.appspot.com/38879/diff/1/7#newcode124 > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:124: 'fast/canvas/canvas-zoom.html']) > > Should we also test a case where --update-baseline would update the results in platform/mac? > > I don't see how this tests something interestingly different? It tests that we don't always just update the results next to the test when using --reset-results. > > http://wkrietveld.appspot.com/38879/diff/1/9 > > File WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py (right): > > > > http://wkrietveld.appspot.com/38879/diff/1/9#newcode97 > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:97: new_baseline=True): > > docstring? > > > > I don't like this argument name. It reads like this is the actual new baseline, not like a boolean. How about save_new_baseline? > > > > Also, if we end up changing the name of the command-line flag, I think this would be clearer. This could be update_platform_baseline. > > how about "use_platform_dir"? Sure.

Dirk Pranke

Comment 16 2010-05-17 17:05:44 PDT

(In reply to comment #15) > (In reply to comment #13) > > Thinking about it, though, I need to change the expectation in the chromium test_expectations from WONTFIX SKIP = FAIL to WONTFIX = MISSING. > > I'm confused, shouldn't we be skipping anything in the harness directory? These are only for the python tests, right? > > WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:146 > + if uri.find('fast/harness/missing-expectation') != -1: > No need to put this under fast. I think just a top-level harness directory makes sense if we do need this. > I'm reusing the existing fast/harness directory, which is used for some DRT-related tests as well (per comments from eseidel). I can certainly create a separate directory if that would be better.

Dirk Pranke

Comment 17 2010-05-17 17:09:14 PDT

(In reply to comment #15) > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527 > + "generated results"), > If we're going to go the route of matching old-run-webkit-tests, then you should change --new-baseline to --add-platform-exceptions. > It looked like --add-platform-exceptions did something slightly different; I'm not sure what the difference between --add-platform-exceptions and --new-test-results is in old-run-webkit-tests. > > > http://wkrietveld.appspot.com/38879/diff/1/7#newcode124 > > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:124: 'fast/canvas/canvas-zoom.html']) > > > Should we also test a case where --update-baseline would update the results in platform/mac? > > > > I don't see how this tests something interestingly different? > > It tests that we don't always just update the results next to the test when using --reset-results. I gotcha. Good idea; I'll add that. > > > > http://wkrietveld.appspot.com/38879/diff/1/9 > > > File WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py (right): > > > > > > http://wkrietveld.appspot.com/38879/diff/1/9#newcode97 > > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:97: new_baseline=True): > > > docstring? > > > > > > I don't like this argument name. It reads like this is the actual new baseline, not like a boolean. How about save_new_baseline? > > > > > > Also, if we end up changing the name of the command-line flag, I think this would be clearer. This could be update_platform_baseline. > > > > how about "use_platform_dir"? > > Sure. I ended up with generate_new_baseline, which was close to your save_new_baseline suggestion.

Ojan Vafai

Comment 18 2010-05-17 17:13:44 PDT

(In reply to comment #16) > (In reply to comment #15) > > (In reply to comment #13) > > > Thinking about it, though, I need to change the expectation in the chromium test_expectations from WONTFIX SKIP = FAIL to WONTFIX = MISSING. > > > > I'm confused, shouldn't we be skipping anything in the harness directory? These are only for the python tests, right? > > > > WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:146 > > + if uri.find('fast/harness/missing-expectation') != -1: > > No need to put this under fast. I think just a top-level harness directory makes sense if we do need this. > > > > I'm reusing the existing fast/harness directory, which is used for some DRT-related tests as well (per comments from eseidel). I can certainly create a separate directory if that would be better. Oh. NM. I didn't realize the directory already existed. Anyways, Eric and I talked about this in person. Instead of using codecs, could you create a python unittest by doing the following? 1. Create a temp directory and populate it with a few dummy tests that test the cases you care about. 2. Have the port look at the temp directory instead of LayoutTests. Then you could actually just have unittests. They write to the real file system, but they don't need to do anything involving the real checkout. > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527 > > + "generated results"), > > If we're going to go the route of matching old-run-webkit-tests, then you should change --new-baseline to --add-platform-exceptions. > > It looked like --add-platform-exceptions did something slightly different; I'm not sure what the difference between --add-platform-exceptions and --new-test-results is in old-run-webkit-tests. I think --new-test-results is whether to generate new results if you run a test and it's missing expectations. old-run-webkit-tests does this by default. It's like --reset-results, but only for tests that are currently missing expectations.

Dirk Pranke

Comment 19 2010-05-17 17:34:26 PDT

(In reply to comment #18) > > I'm reusing the existing fast/harness directory, which is used for some DRT-related tests as well (per comments from eseidel). I can certainly create a separate directory if that would be better. > > Oh. NM. I didn't realize the directory already existed. > > Anyways, Eric and I talked about this in person. Instead of using codecs, could you create a python unittest by doing the following? > > 1. Create a temp directory and populate it with a few dummy tests that test the cases you care about. > 2. Have the port look at the temp directory instead of LayoutTests. > > Then you could actually just have unittests. They write to the real file system, but they don't need to do anything involving the real checkout. > I could do this, but (a) this would involve writing more code in the unit tests module to do setup and cleanup, and (b) it would be slower - you'd actually be creating, writing, and deleting files. Mocking out codecs.open() seems pretty straightforward, so I'm not seeing much upside to your idea. Am I missing something? > > > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527 > > > + "generated results"), > > > If we're going to go the route of matching old-run-webkit-tests, then you should change --new-baseline to --add-platform-exceptions. > > > > It looked like --add-platform-exceptions did something slightly different; I'm not sure what the difference between --add-platform-exceptions and --new-test-results is in old-run-webkit-tests. > > I think --new-test-results is whether to generate new results if you run a test and it's missing expectations. old-run-webkit-tests does this by default. It's like --reset-results, but only for tests that are currently missing expectations. Okay. We can certainly rename --new-baseline to --add-platform-exceptions, although I think this name is not an improvement in clarity (nor is --reset-results). And, in this case, there may be Chromium people used to --new-baseline so there's a minor compatibility question. How about you and Eric reach a consensus and I'll do whatever you two say?

Dirk Pranke

Comment 20 2010-05-17 17:52:43 PDT

Thinking about this a bit more ... If the primary objection is having to put test expectations into the real tree for this, we could create a separate tree outside of WebKit/LayoutTests, and point to that instead of WebKit/LayoutTests/* (i.e., test.py would override layout_tests_dir()). The files would still be checked in. Also, you'd probably still want to mock out the file write() calls since overwriting existing files could be fragile.

Dirk Pranke

Comment 21 2010-05-18 15:37:14 PDT

Created attachment 56416 [details] add test for platform-specific baseline, refactor into two unit tests, one for each switch

Dirk Pranke

Comment 22 2010-05-18 17:11:26 PDT

Created attachment 56428 [details] okay, use a separate layout_tests_dir under webkitpy/layout_tests/data to isolate unit test data from actual layout tests

Dirk Pranke

Comment 23 2010-05-18 17:13:13 PDT

Created attachment 56430 [details] lint

Ojan Vafai

Comment 24 2010-05-18 17:23:58 PDT

Comment on attachment 56428 [details] okay, use a separate layout_tests_dir under webkitpy/layout_tests/data to isolate unit test data from actual layout tests WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:115 + with codecs.open(expectations_path, "r", "utf-8") as file: can we just use regular open here? WebKitTools/Scripts/webkitpy/layout_tests/test_types/image_diff.py:84 + generate_new_baseline: is this a new baseline or an updated one? This description is wrong. Makes it sound like this is actually a baseline, not a boolean. The description you have below in test_type_base.py would be fine. Although, I still prefer these descriptions to be statements, not questions, e.g. "whether to generate a new baseline or update an existing one".

Ojan Vafai

Comment 25 2010-05-18 17:24:54 PDT

Comment on attachment 56430 [details] lint r=me, but please address the two comments for the previous patch.

Dirk Pranke

Comment 26 2010-05-18 17:31:17 PDT

(In reply to comment #24) > (From update of attachment 56428 [details]) > WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:115 > + with codecs.open(expectations_path, "r", "utf-8") as file: > can we just use regular open here? > We probably could, but the other files use codecs.open() and I thought eric wanted us to consistently use codecs.open() all over the place. I'll leave it like this when I commit the change; if eric indicates that it's okay to change this to a regular file open I'll do a separate change for that. > WebKitTools/Scripts/webkitpy/layout_tests/test_types/image_diff.py:84 > + generate_new_baseline: is this a new baseline or an updated one? > This description is wrong. Makes it sound like this is actually a baseline, not a boolean. The description you have below in test_type_base.py would be fine. Although, I still prefer these descriptions to be statements, not questions, e.g. "whether to generate a new baseline or update an existing one". I will update the comment.

Dirk Pranke

Comment 27 2010-05-18 17:52:32 PDT

Committed r59725: <http://trac.webkit.org/changeset/59725>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Other

OS OS X 10.5

Product WebKit

Component New Bugs

Assignee

Dirk Pranke

Reported

2010-05-10 18:06 PDT

Modified

2010-05-18 17:52 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks