WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
38879
new-run-webkit-tests: implement "--reset-results"
https://bugs.webkit.org/show_bug.cgi?id=38879
Summary
new-run-webkit-tests: implement "--reset-results"
Dirk Pranke
Reported
2010-05-10 18:06:19 PDT
new-run-webkit-tests: add an '--update-baseline' switch
Attachments
Patch
(20.28 KB, patch)
2010-05-10 18:07 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
cleaned-up patch, still no tests
(9.00 KB, patch)
2010-05-11 14:22 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
Patch
(14.94 KB, patch)
2010-05-11 16:03 PDT
,
Dirk Pranke
ojan
: review-
Details
Formatted Diff
Diff
update w/ feedback from ojan, eseidel, abarth
(20.26 KB, patch)
2010-05-17 16:18 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
change chromium expectation from WONTFIX SKIP = FAIL to WONTFIX = MISSING
(20.26 KB, patch)
2010-05-17 16:43 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
add test for platform-specific baseline, refactor into two unit tests, one for each switch
(20.11 KB, patch)
2010-05-18 15:37 PDT
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
okay, use a separate layout_tests_dir under webkitpy/layout_tests/data to isolate unit test data from actual layout tests
(34.71 KB, patch)
2010-05-18 17:11 PDT
,
Dirk Pranke
ojan
: review+
Details
Formatted Diff
Diff
lint
(35.03 KB, patch)
2010-05-18 17:13 PDT
,
Dirk Pranke
ojan
: review+
Details
Formatted Diff
Diff
Show Obsolete
(7)
View All
Add attachment
proposed patch, testcase, etc.
Dirk Pranke
Comment 1
2010-05-10 18:07:19 PDT
Created
attachment 55631
[details]
Patch
Dirk Pranke
Comment 2
2010-05-10 18:08:21 PDT
Note that the patch I just uploaded does not have any unit tests yet and so it isn't ready to land, but the change itself is complete.
Dirk Pranke
Comment 3
2010-05-11 14:22:08 PDT
Created
attachment 55755
[details]
cleaned-up patch, still no tests
Dirk Pranke
Comment 4
2010-05-11 16:03:29 PDT
Created
attachment 55772
[details]
Patch
Dirk Pranke
Comment 5
2010-05-11 16:04:15 PDT
Okay, the latest patch has unit tests and is ready for review.
Ojan Vafai
Comment 6
2010-05-13 16:20:20 PDT
Comment on
attachment 55772
[details]
Patch The code logic looks correct and I agree that this is necessary functionality. r- mostly just due to naming nits. ---------------------------------
http://wkrietveld.appspot.com/38879/diff/1/5
File WebKitTools/Scripts/webkitpy/layout_tests/port/test.py (left):
http://wkrietveld.appspot.com/38879/diff/1/5#oldcode141
WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:141: did you mean to delete these two blank lines?
http://wkrietveld.appspot.com/38879/diff/1/5
File WebKitTools/Scripts/webkitpy/layout_tests/port/test.py (right):
http://wkrietveld.appspot.com/38879/diff/1/5#newcode31
WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:31: from __future__ import with_statement Is this just to satisfy the linter? __future__ is new to me.
http://wkrietveld.appspot.com/38879/diff/1/6
File WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py (right):
http://wkrietveld.appspot.com/38879/diff/1/6#newcode1527
WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527: "generated results"), This description makes it sound like it will update every expected result for this test, when, in practice, it will update only the result that happens to be used on this platform. --new-baseline and --update-baseline are too similar of names. We shouldn't need the help text to remember which is which. How about: --new-baseline --> --update-platform-baseline --update-baseline --> --update-used-baseline It's a bit more wordy, but at least it's clear which is which. We could add short versions for easier typing (e.g. -b and -u?).
http://wkrietveld.appspot.com/38879/diff/1/7
File WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py (right):
http://wkrietveld.appspot.com/38879/diff/1/7#newcode106
WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:106: return original_open(name, mode, encoding) Should we make sure the mode here is 'r' to avoid ever writing to disk?
http://wkrietveld.appspot.com/38879/diff/1/7#newcode124
WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:124: 'fast/canvas/canvas-zoom.html']) Should we also test a case where --update-baseline would update the results in platform/mac?
http://wkrietveld.appspot.com/38879/diff/1/9
File WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py (right):
http://wkrietveld.appspot.com/38879/diff/1/9#newcode97
WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:97: new_baseline=True): docstring? I don't like this argument name. It reads like this is the actual new baseline, not like a boolean. How about save_new_baseline? Also, if we end up changing the name of the command-line flag, I think this would be clearer. This could be update_platform_baseline.
Adam Barth
Comment 7
2010-05-13 17:01:51 PDT
How does this differ from --reset-results?
Dirk Pranke
Comment 8
2010-05-13 18:07:35 PDT
Okay, I attempted to use rietveld, but how do you publish the comments? Repeating them here in line ... (In reply to
comment #6
)
> (From update of
attachment 55772
[details]
) > The code logic looks correct and I agree that this is necessary functionality. r- mostly just due to naming nits. > --------------------------------- >
http://wkrietveld.appspot.com/38879/diff/1/5
> File WebKitTools/Scripts/webkitpy/layout_tests/port/test.py (left): > >
http://wkrietveld.appspot.com/38879/diff/1/5#oldcode141
> WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:141: > did you mean to delete these two blank lines?
No. good catch.
> >
http://wkrietveld.appspot.com/38879/diff/1/5
> File WebKitTools/Scripts/webkitpy/layout_tests/port/test.py (right): > >
http://wkrietveld.appspot.com/38879/diff/1/5#newcode31
> WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:31: from __future__ import with_statement > Is this just to satisfy the linter? __future__ is new to me. >
You need "from __future__" to be able to use the "with" statement in Python 2.5.
>
http://wkrietveld.appspot.com/38879/diff/1/6
> File WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py (right): > >
http://wkrietveld.appspot.com/38879/diff/1/6#newcode1527
> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527: "generated results"), > This description makes it sound like it will update every expected result for this test, when, in practice, it will update only the result that happens to be used on this platform. > > --new-baseline and --update-baseline are too similar of names. We shouldn't need the help text to remember which is which. How about: > > --new-baseline --> --update-platform-baseline > --update-baseline --> --update-used-baseline > > It's a bit more wordy, but at least it's clear which is which. We could add short versions for easier typing (e.g. -b and -u?).
I will take another spin at the names and the help text. I don't think these features are used enough to justify short versions.
> >
http://wkrietveld.appspot.com/38879/diff/1/7
> File WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py (right): > >
http://wkrietveld.appspot.com/38879/diff/1/7#newcode106
> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:106: return original_open(name, mode, encoding) > Should we make sure the mode here is 'r' to avoid ever writing to disk?
No, you need to be able to write some files (e.g., the images so they can be diffed, or the results files).
> >
http://wkrietveld.appspot.com/38879/diff/1/7#newcode124
> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:124: 'fast/canvas/canvas-zoom.html']) > Should we also test a case where --update-baseline would update the results in platform/mac?
I don't see how this tests something interestingly different?
> >
http://wkrietveld.appspot.com/38879/diff/1/9
> File WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py (right): > >
http://wkrietveld.appspot.com/38879/diff/1/9#newcode97
> WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:97: new_baseline=True): > docstring? > > I don't like this argument name. It reads like this is the actual new baseline, not like a boolean. How about save_new_baseline? > > Also, if we end up changing the name of the command-line flag, I think this would be clearer. This could be update_platform_baseline.
how about "use_platform_dir"?
Dirk Pranke
Comment 9
2010-05-13 18:15:11 PDT
(In reply to
comment #7
)
> How does this differ from --reset-results?
Hmm. I was not familiar with --reset-results in old-run-webkit-tests (new-run-webkit-tests doesn't have such a flag). The intended behavior of this new flag is to update the existing baseline wherever it may be found, rather than just writing a new baseline into the platform dir. Is that what reset-results is supposed to do? If so, I'd be happy to use that flag name instead of creating new ones. It looks like ORWT also has --add-platform-exceptions and --new-test-results, which look like they're similar to --new-baseline in NRWT; I'm not sure quite what the difference is between the two flags, though.
Dirk Pranke
Comment 10
2010-05-17 16:18:39 PDT
Created
attachment 56287
[details]
update w/ feedback from ojan, eseidel, abarth okay, this changes the flag name from --update-baseline to --reset-results (for consistency with old-run-webkit-tests), and adds tests for what happens when the baseline is missing because the file is "new".
Dirk Pranke
Comment 11
2010-05-17 16:20:00 PDT
change bug subject line from "add an --update-baseline switch" to "implement '--reset-results'"
Eric Seidel (no email)
Comment 12
2010-05-17 16:25:47 PDT
Although fast/harness has been used for testing DumpRenderTree bugs in the past. It seems rather strange to add this test here since you had to make so many Skipped edits as well as hack the path into run-webkit-tests itself! Seems this test would make more sense as a python unit test.
Dirk Pranke
Comment 13
2010-05-17 16:40:58 PDT
(In reply to
comment #12
)
> Although fast/harness has been used for testing DumpRenderTree bugs in the past. It seems rather strange to add this test here since you had to make so many Skipped edits as well as hack the path into run-webkit-tests itself! Seems this test would make more sense as a python unit test.
I don't think it's actually possible to do this as a python unit test, at least not any more cleanly than this (you'd end up duplicating logic to mock expectations all over the place). I would eventually like to have a set of tests that are supposed to provoke expected behavior in DRT (one that crashes, one that produces an IMAGE diff, etc) that are actually run. Unfortunately, there's no way with the Skipped files to indicate that. Thinking about it, though, I need to change the expectation in the chromium test_expectations from WONTFIX SKIP = FAIL to WONTFIX = MISSING. As far as hacking the file into test.py itself, that had to be done because there's no way for the test code to know whether there should be an image file or not if the expectation doesn't exist. A more general solution would be a way to mock expected results into test.py, but I didn't want to clutter this particular CL with that change.
Dirk Pranke
Comment 14
2010-05-17 16:43:04 PDT
Created
attachment 56289
[details]
change chromium expectation from WONTFIX SKIP = FAIL to WONTFIX = MISSING
Ojan Vafai
Comment 15
2010-05-17 16:59:14 PDT
(In reply to
comment #13
)
> Thinking about it, though, I need to change the expectation in the chromium test_expectations from WONTFIX SKIP = FAIL to WONTFIX = MISSING.
I'm confused, shouldn't we be skipping anything in the harness directory? These are only for the python tests, right? WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:146 + if uri.find('fast/harness/missing-expectation') != -1: No need to put this under fast. I think just a top-level harness directory makes sense if we do need this. WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527 + "generated results"), If we're going to go the route of matching old-run-webkit-tests, then you should change --new-baseline to --add-platform-exceptions. (In reply to
comment #8
)
> Okay, I attempted to use rietveld, but how do you publish the comments?
1. Click on the rietveld review link. 2. Leave comments. 3. Click submit. It's a bit confusing because you go to the review page in order to respond to review feedback. Not sure how to make it better.
> >
http://wkrietveld.appspot.com/38879/diff/1/7#newcode124
> > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:124: 'fast/canvas/canvas-zoom.html']) > > Should we also test a case where --update-baseline would update the results in platform/mac? > > I don't see how this tests something interestingly different?
It tests that we don't always just update the results next to the test when using --reset-results.
> >
http://wkrietveld.appspot.com/38879/diff/1/9
> > File WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py (right): > > > >
http://wkrietveld.appspot.com/38879/diff/1/9#newcode97
> > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:97: new_baseline=True): > > docstring? > > > > I don't like this argument name. It reads like this is the actual new baseline, not like a boolean. How about save_new_baseline? > > > > Also, if we end up changing the name of the command-line flag, I think this would be clearer. This could be update_platform_baseline. > > how about "use_platform_dir"?
Sure.
Dirk Pranke
Comment 16
2010-05-17 17:05:44 PDT
(In reply to
comment #15
)
> (In reply to
comment #13
) > > Thinking about it, though, I need to change the expectation in the chromium test_expectations from WONTFIX SKIP = FAIL to WONTFIX = MISSING. > > I'm confused, shouldn't we be skipping anything in the harness directory? These are only for the python tests, right? > > WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:146 > + if uri.find('fast/harness/missing-expectation') != -1: > No need to put this under fast. I think just a top-level harness directory makes sense if we do need this. >
I'm reusing the existing fast/harness directory, which is used for some DRT-related tests as well (per comments from eseidel). I can certainly create a separate directory if that would be better.
Dirk Pranke
Comment 17
2010-05-17 17:09:14 PDT
(In reply to
comment #15
)
> WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527 > + "generated results"), > If we're going to go the route of matching old-run-webkit-tests, then you should change --new-baseline to --add-platform-exceptions.
> It looked like --add-platform-exceptions did something slightly different; I'm not sure what the difference between --add-platform-exceptions and --new-test-results is in old-run-webkit-tests.
> > >
http://wkrietveld.appspot.com/38879/diff/1/7#newcode124
> > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:124: 'fast/canvas/canvas-zoom.html']) > > > Should we also test a case where --update-baseline would update the results in platform/mac? > > > > I don't see how this tests something interestingly different? > > It tests that we don't always just update the results next to the test when using --reset-results.
I gotcha. Good idea; I'll add that.
> > > >
http://wkrietveld.appspot.com/38879/diff/1/9
> > > File WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py (right): > > > > > >
http://wkrietveld.appspot.com/38879/diff/1/9#newcode97
> > > WebKitTools/Scripts/webkitpy/layout_tests/test_types/test_type_base.py:97: new_baseline=True): > > > docstring? > > > > > > I don't like this argument name. It reads like this is the actual new baseline, not like a boolean. How about save_new_baseline? > > > > > > Also, if we end up changing the name of the command-line flag, I think this would be clearer. This could be update_platform_baseline. > > > > how about "use_platform_dir"? > > Sure.
I ended up with generate_new_baseline, which was close to your save_new_baseline suggestion.
Ojan Vafai
Comment 18
2010-05-17 17:13:44 PDT
(In reply to
comment #16
)
> (In reply to
comment #15
) > > (In reply to
comment #13
) > > > Thinking about it, though, I need to change the expectation in the chromium test_expectations from WONTFIX SKIP = FAIL to WONTFIX = MISSING. > > > > I'm confused, shouldn't we be skipping anything in the harness directory? These are only for the python tests, right? > > > > WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:146 > > + if uri.find('fast/harness/missing-expectation') != -1: > > No need to put this under fast. I think just a top-level harness directory makes sense if we do need this. > > > > I'm reusing the existing fast/harness directory, which is used for some DRT-related tests as well (per comments from eseidel). I can certainly create a separate directory if that would be better.
Oh. NM. I didn't realize the directory already existed. Anyways, Eric and I talked about this in person. Instead of using codecs, could you create a python unittest by doing the following? 1. Create a temp directory and populate it with a few dummy tests that test the cases you care about. 2. Have the port look at the temp directory instead of LayoutTests. Then you could actually just have unittests. They write to the real file system, but they don't need to do anything involving the real checkout.
> > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527 > > + "generated results"), > > If we're going to go the route of matching old-run-webkit-tests, then you should change --new-baseline to --add-platform-exceptions. > > It looked like --add-platform-exceptions did something slightly different; I'm not sure what the difference between --add-platform-exceptions and --new-test-results is in old-run-webkit-tests.
I think --new-test-results is whether to generate new results if you run a test and it's missing expectations. old-run-webkit-tests does this by default. It's like --reset-results, but only for tests that are currently missing expectations.
Dirk Pranke
Comment 19
2010-05-17 17:34:26 PDT
(In reply to
comment #18
)
> > I'm reusing the existing fast/harness directory, which is used for some DRT-related tests as well (per comments from eseidel). I can certainly create a separate directory if that would be better. > > Oh. NM. I didn't realize the directory already existed. > > Anyways, Eric and I talked about this in person. Instead of using codecs, could you create a python unittest by doing the following? > > 1. Create a temp directory and populate it with a few dummy tests that test the cases you care about. > 2. Have the port look at the temp directory instead of LayoutTests. > > Then you could actually just have unittests. They write to the real file system, but they don't need to do anything involving the real checkout.
> I could do this, but (a) this would involve writing more code in the unit tests module to do setup and cleanup, and (b) it would be slower - you'd actually be creating, writing, and deleting files. Mocking out codecs.open() seems pretty straightforward, so I'm not seeing much upside to your idea. Am I missing something?
> > > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1527 > > > + "generated results"), > > > If we're going to go the route of matching old-run-webkit-tests, then you should change --new-baseline to --add-platform-exceptions. > > > > It looked like --add-platform-exceptions did something slightly different; I'm not sure what the difference between --add-platform-exceptions and --new-test-results is in old-run-webkit-tests. > > I think --new-test-results is whether to generate new results if you run a test and it's missing expectations. old-run-webkit-tests does this by default. It's like --reset-results, but only for tests that are currently missing expectations.
Okay. We can certainly rename --new-baseline to --add-platform-exceptions, although I think this name is not an improvement in clarity (nor is --reset-results). And, in this case, there may be Chromium people used to --new-baseline so there's a minor compatibility question. How about you and Eric reach a consensus and I'll do whatever you two say?
Dirk Pranke
Comment 20
2010-05-17 17:52:43 PDT
Thinking about this a bit more ... If the primary objection is having to put test expectations into the real tree for this, we could create a separate tree outside of WebKit/LayoutTests, and point to that instead of WebKit/LayoutTests/* (i.e., test.py would override layout_tests_dir()). The files would still be checked in. Also, you'd probably still want to mock out the file write() calls since overwriting existing files could be fragile.
Dirk Pranke
Comment 21
2010-05-18 15:37:14 PDT
Created
attachment 56416
[details]
add test for platform-specific baseline, refactor into two unit tests, one for each switch
Dirk Pranke
Comment 22
2010-05-18 17:11:26 PDT
Created
attachment 56428
[details]
okay, use a separate layout_tests_dir under webkitpy/layout_tests/data to isolate unit test data from actual layout tests
Dirk Pranke
Comment 23
2010-05-18 17:13:13 PDT
Created
attachment 56430
[details]
lint
Ojan Vafai
Comment 24
2010-05-18 17:23:58 PDT
Comment on
attachment 56428
[details]
okay, use a separate layout_tests_dir under webkitpy/layout_tests/data to isolate unit test data from actual layout tests WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:115 + with codecs.open(expectations_path, "r", "utf-8") as file: can we just use regular open here? WebKitTools/Scripts/webkitpy/layout_tests/test_types/image_diff.py:84 + generate_new_baseline: is this a new baseline or an updated one? This description is wrong. Makes it sound like this is actually a baseline, not a boolean. The description you have below in test_type_base.py would be fine. Although, I still prefer these descriptions to be statements, not questions, e.g. "whether to generate a new baseline or update an existing one".
Ojan Vafai
Comment 25
2010-05-18 17:24:54 PDT
Comment on
attachment 56430
[details]
lint r=me, but please address the two comments for the previous patch.
Dirk Pranke
Comment 26
2010-05-18 17:31:17 PDT
(In reply to
comment #24
)
> (From update of
attachment 56428
[details]
) > WebKitTools/Scripts/webkitpy/layout_tests/port/test.py:115 > + with codecs.open(expectations_path, "r", "utf-8") as file: > can we just use regular open here? >
We probably could, but the other files use codecs.open() and I thought eric wanted us to consistently use codecs.open() all over the place. I'll leave it like this when I commit the change; if eric indicates that it's okay to change this to a regular file open I'll do a separate change for that.
> WebKitTools/Scripts/webkitpy/layout_tests/test_types/image_diff.py:84 > + generate_new_baseline: is this a new baseline or an updated one? > This description is wrong. Makes it sound like this is actually a baseline, not a boolean. The description you have below in test_type_base.py would be fine. Although, I still prefer these descriptions to be statements, not questions, e.g. "whether to generate a new baseline or update an existing one".
I will update the comment.
Dirk Pranke
Comment 27
2010-05-18 17:52:32 PDT
Committed
r59725
: <
http://trac.webkit.org/changeset/59725
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug