51802 – Move LayoutTestResults over to new-run-webkit-tests TestResult architecture

Eric Seidel (no email)

Reported 2011-01-02 12:01:54 PST

Move LayoutTestResults over to new-run-webkit-tests TestResult architecture

Eric Seidel (no email)

Comment 1 2011-01-02 12:16:15 PST

Created attachment 77778 [details] Patch

Adam Barth

Comment 2 2011-01-02 12:30:47 PST

Comment on attachment 77778 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=77778&action=review > Tools/Scripts/webkitpy/common/net/buildbot/buildbot_unittest.py:34 > +from webkitpy.layout_tests.layout_package import test_results > +from webkitpy.layout_tests.layout_package import test_failures We should move these classes to common. Common shouldn't depend on layout_tests. > Tools/Scripts/webkitpy/tool/commands/queries.py:275 > - chosen_name = User.prompt_with_list("Which builder to diagnose:", builder_choices) > + chosen_name = self._tool.user.prompt_with_list("Which builder to diagnose:", builder_choices) These changes seem separate, but ok. > Tools/Scripts/webkitpy/tool/commands/rebaseline.py:38 > +from webkitpy.layout_tests.layout_package import test_failures > from webkitpy.layout_tests.port import factory Another bad dependency.... :(

Adam Barth

Comment 3 2011-01-02 12:33:16 PST

Comment on attachment 77778 [details] Patch You might want to fix the dependencies in a follow up patch, but please don't let them languish.

Eric Seidel (no email)

Comment 4 2011-01-03 12:26:59 PST

It's not clear that any of this layout test stuff belongs in common. So maybe I should move LayoutTestResults over net to the rest of the stuff? Except then we have BuildBot depending on stuff outside of common. :)

Eric Seidel (no email)

Comment 5 2011-01-03 12:27:45 PST

Comment on attachment 77778 [details] Patch I would like to clean up the dependencies, yes. I think that's in the next patch. I'd also like to chat with you some in person about how we should lay this all out.

WebKit Commit Bot

Comment 6 2011-01-03 12:59:25 PST

Comment on attachment 77778 [details] Patch Clearing flags on attachment: 77778 Committed r74927: <http://trac.webkit.org/changeset/74927>

WebKit Commit Bot

Comment 7 2011-01-03 12:59:32 PST

All reviewed patches have been landed. Closing bug.

Adam Barth

Comment 8 2011-01-03 13:27:45 PST

(In reply to comment #4) > It's not clear that any of this layout test stuff belongs in common. Yeah, it's possible we need three layers instead of two. > So maybe I should move LayoutTestResults over net to the rest of the stuff? Except then we have BuildBot depending on stuff outside of common. :) LayoutTestResults isn't really net-specific in the sense that you can care about LayoutTestResults without a network. The goal of having different layers is the constraint the effects of changes. We'd like to be able to work on new-run-webkit-tests without worrying about breaking webkit-patch. maybe the thing to do is to pull the new-run-webkit-tests-specific parts out of layout_tests? We could then move the remaining parts of layout_tests into common?

Eric Seidel (no email)

Comment 9 2011-01-03 14:39:11 PST

(In reply to comment #8) > (In reply to comment #4) > > It's not clear that any of this layout test stuff belongs in common. > > Yeah, it's possible we need three layers instead of two. I'm not sure what that would look like. I see at least these pieces of webkitpy: - common - stuff which is shared, but not necessarily to webkit. - config - things specific to webkit which is used by common - tool - webkit-patch specific code - layout_tests (which probably should be renamed), is NRWT specific code. - thirdparty - used by various places, is third-party code. > > So maybe I should move LayoutTestResults over net to the rest of the stuff? Except then we have BuildBot depending on stuff outside of common. :) > > LayoutTestResults isn't really net-specific in the sense that you can care about LayoutTestResults without a network. Yes. It's ORWT specific. Buildbot needs something that's not ORWT specific eventually. Ideally (at least in my mind) it needs some sort of generalized "results" abstraction which knows how to vend some object (like LayoutTestResults) which knows how to handle results from whatever run-webkit-tests we're using. Ideally LayoutTestResults, etc, would be outside of common, possibly in Config, or at least the buildbot results class would be in config, or provided by some webkit-specific config. (I still have this -- likely overengineered -- pipedream that much of webkitpy need not be specific to webkit, and be re-usable by things like chromium.org or andriod or Apple (for Radar, etc.). > The goal of having different layers is the constraint the effects of changes. We'd like to be able to work on new-run-webkit-tests without worrying about breaking webkit-patch. maybe the thing to do is to pull the new-run-webkit-tests-specific parts out of layout_tests? Yes, I think that's a good goal. Some of that is tricky right now due to the fact that webkit-patch deals only in ORWT concepts and NRWT deals only with NRWT concepts. :) > We could then move the remaining parts of layout_tests into common? Maybe.

Adam Barth

Comment 10 2011-01-03 14:48:32 PST

Right, the three layers could be: 1) application-specific (webkit-patch, new-run-webkit-tests, etc) 2) shared, but webkit-specific 3) shared, and non-webkit specific Model classes about test results seem to go in layer 2. Classes for talking to buildbot (generally) seem to go in layer 3. Controller classes for executing DumpRenderTree and creating models of the results seems to go in layer 1.

Dirk Pranke

Comment 11 2011-01-04 15:52:07 PST

Comment on attachment 77778 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=77778&action=review > Tools/Scripts/webkitpy/common/net/layouttestresults.py:98 > + return test_results.TestResult(test_name, failures, test_run_time=None, total_time_for_all_diffs=None, time_for_diffs=None) Technically, test names are port-specific, and you should be getting the name of the test from the port object. In practice, all of the real ports use the same names, and they should not include the "LayoutTests" prefix. > Tools/Scripts/webkitpy/common/net/layouttestresults.py:138 > + return self.tests_matching_failure_types(failure_types) Do you want to include any tests that had missing or incorrect expectations files? Also, you shouldn't be looking at test_failures values at all. You should look at the "type" field on the TestResult object and compare it to one of the test_expectations constants (e.g., TEXT, IMAGE, CRASH, etc.). I'm not sure what you're intended definition of "failing" is, but we should probably push the definition into the TestResult object itself and add a test_failed() method or something on it instead of you having the logic here. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:40 > +# collects them all from the failure list and returns the worst one. I'm not sure what you're trying to say here. The mapping from test_expectation type to TestFailure isn't 1:1, and I would argue that it shouldn't be, given the current design. So I wouldn't add this FIXME at all. But perhaps I'm misunderstanding you? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_results.py:65 > + See the comment above. I think we want a test_failed() or failed() method here, rather than something taking a list of failure types. I.e., this class should own the definition of "failing". > Tools/Scripts/webkitpy/tool/commands/rebaseline.py:92 > + failing_tests = build.layout_test_results().tests_matching_failure_types([test_failures.FailureTextMismatch]) I don't know the context of this routine but (a) it seems like "FailureTextMismatch" is probably the wrong value to be looking for and (b) you should be checking against result.type == test_expectations.TEXT instead of looking at the failure types directly (as per above).

Dirk Pranke

Comment 12 2011-01-04 16:01:02 PST

(In reply to comment #10) > Right, the three layers could be: > > 1) application-specific (webkit-patch, new-run-webkit-tests, etc) > 2) shared, but webkit-specific > 3) shared, and non-webkit specific > > Model classes about test results seem to go in layer 2. Classes for talking to buildbot (generally) seem to go in layer 3. Controller classes for executing DumpRenderTree and creating models of the results seems to go in layer 1. This way of looking at things is a little weird to me. I would rather see stuff grouped by functionality rather than by visibility. i.e., layout_tests/public/foo rather than public/layout_tests/foo. Although either of these things seems like a bit of a hack, and we should just be using python's normal visibility mechanisms to control things. Also, I'm not sure what the current (not intended) definition of "common" is - whether it is (1), (2), (3), or some mixture of all of them? I would be inclined to make the buildbot stuff its own top-level package rather than putting it under "common", as I probably would with some of the other things in common.net.

Dirk Pranke

Comment 13 2011-01-04 16:03:02 PST

(In reply to comment #6) > (From update of attachment 77778 [details]) > Clearing flags on attachment: 77778 > > Committed r74927: <http://trac.webkit.org/changeset/74927> Note that if it wasn't clear from my comments in #11, I would have R-'ed this patch. Assuming my comments make sense and you agree with them, can you please file another bug to fix/undo some of the changes? (And if they don't make sense, or you don't agree, let me know ;).

Eric Seidel (no email)

Comment 14 2011-01-04 16:18:37 PST

(In reply to comment #11) > (From update of attachment 77778 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=77778&action=review > > > Tools/Scripts/webkitpy/common/net/layouttestresults.py:98 > > + return test_results.TestResult(test_name, failures, test_run_time=None, total_time_for_all_diffs=None, time_for_diffs=None) > > Technically, test names are port-specific, and you should be getting the name of the test from the port object. In practice, all of the real ports use the same names, and they should not include the "LayoutTests" prefix. I think we should add an assert() into the TestResult constructor if names are expected not to be passed with LayoutTests/. (They're currently not in this patch, but that would make the class easier to use. > > Tools/Scripts/webkitpy/common/net/layouttestresults.py:138 > > + return self.tests_matching_failure_types(failure_types) > > Do you want to include any tests that had missing or incorrect expectations files? No. The previous failing_tests() logic didn't, and I didn't want to change it. > Also, you shouldn't be looking at test_failures values at all. You should look at the "type" field on the TestResult object and compare it to one of the test_expectations constants (e.g., TEXT, IMAGE, CRASH, etc.). That's much less specific, but that could be easier. > I'm not sure what you're intended definition of "failing" is, but we should probably push the definition into the TestResult object itself and add a test_failed() method or something on it instead of you having the logic here. The definition of "failing" here is something specific to LayoutTestResults and is a historical hold-over. I tried to be more specific in new code. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:40 > > +# collects them all from the failure list and returns the worst one. > > I'm not sure what you're trying to say here. The mapping from test_expectation type to TestFailure isn't 1:1, and I would argue that it shouldn't be, given the current design. So I wouldn't add this FIXME at all. But perhaps I'm misunderstanding you? This may be a mis-understanding on my part. Maybe I wouldn't have felt the need for this if I had tried comparing .type to a test_expectations enum. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_results.py:65 > > + > > See the comment above. I think we want a test_failed() or failed() method here, rather than something taking a list of failure types. I.e., this class should own the definition of "failing". Again, just a historical hold-over. But I wasn't ready to convert all the places to be more specific. > > Tools/Scripts/webkitpy/tool/commands/rebaseline.py:92 > > + failing_tests = build.layout_test_results().tests_matching_failure_types([test_failures.FailureTextMismatch]) > > I don't know the context of this routine but (a) it seems like "FailureTextMismatch" is probably the wrong value to be looking for and (b) you should be checking against result.type == test_expectations.TEXT instead of looking at the failure types directly (as per above). I can try that in a later patch. But I don't think that will work as you expect. .type is only one value. What happens when a test fails both image and text? .type only reflects one of those.

Eric Seidel (no email)

Comment 15 2011-01-04 16:20:32 PST

(In reply to comment #12) > (In reply to comment #10) > > Right, the three layers could be: > > > > 1) application-specific (webkit-patch, new-run-webkit-tests, etc) > > 2) shared, but webkit-specific > > 3) shared, and non-webkit specific > > > > Model classes about test results seem to go in layer 2. Classes for talking to buildbot (generally) seem to go in layer 3. Controller classes for executing DumpRenderTree and creating models of the results seems to go in layer 1. > > This way of looking at things is a little weird to me. I would rather see stuff grouped by functionality rather than by visibility. i.e., layout_tests/public/foo rather than public/layout_tests/foo. Although either of these things seems like a bit of a hack, and we should just be using python's normal visibility mechanisms to control things. I'm not sure what you mean. > Also, I'm not sure what the current (not intended) definition of "common" is - whether it is (1), (2), (3), or some mixture of all of them? common was stuff that was pulled out of the original bugzilla-tool to be shared by other python code. bugzilla-tool became webkit-patch and thus the tool/ directory. :) I think common is a mixture of a lot of things these days. > I would be inclined to make the buildbot stuff its own top-level package rather than putting it under "common", as I probably would with some of the other things in common.net. buildbot depends on stuff in common.net (and is already a package, although not a very useful one).

Eric Seidel (no email)

Comment 16 2011-01-04 16:24:00 PST

(In reply to comment #13) > (In reply to comment #6) > > (From update of attachment 77778 [details] [details]) > > Clearing flags on attachment: 77778 > > > > Committed r74927: <http://trac.webkit.org/changeset/74927> > > Note that if it wasn't clear from my comments in #11, I would have R-'ed this patch. Assuming my comments make sense and you agree with them, can you please file another bug to fix/undo some of the changes? (And if they don't make sense, or you don't agree, let me know ;). This stuff definitely needs more iteration. I find the TestResults stuff very cumbersome to use. The fact that I have to import 3 different packages from layout_tests just to use it is one problem. I think LayoutTestResults needs to move to being in the same package as the rest of this stuff and then it will make more sense. I think .type is not a complete solution for replacing teh type(FailureClass) comparisons I had to do in this code (I did this based on other examples in the codebase). I like the idea of passing around Failure classes, since they can contain more than just a name/number. however I would like to shorten the names of the various failure types so they're easier to use. (Similar to how we use the steps module). I also think that it should be possible to do a one-to-many (or many-to-one?) mapping from test expectations enums to failure classes.

Dirk Pranke

Comment 17 2011-01-04 17:21:01 PST

(In reply to comment #14) > > Technically, test names are port-specific, and you should be getting the name of the test from the port object. In practice, all of the real ports use the same names, and they should not include the "LayoutTests" prefix. > > I think we should add an assert() into the TestResult constructor if names are expected not to be passed with LayoutTests/. (They're currently not in this patch, but that would make the class easier to use. > The design is the way it is intentionally, to allow for situations like the one Chrome used to have where you might have multiple directories of test files. Apart from the "test" port using a directory that doesn't have "LayoutTests" in the name, this feature is currently unused AFAIK, but I'm a bit reluctant to remove it and lose the functionality since there are clear uses for it. > > > Tools/Scripts/webkitpy/common/net/layouttestresults.py:138 > > > + return self.tests_matching_failure_types(failure_types) > > > > Do you want to include any tests that had missing or incorrect expectations files? > > No. The previous failing_tests() logic didn't, and I didn't want to change it. > Okay. If you are trying to map some sort of legacy notion of failing, I can see how a generic failed() method on TestResult might not be helpful. > > Also, you shouldn't be looking at test_failures values at all. You should look at the "type" field on the TestResult object and compare it to one of the test_expectations constants (e.g., TEXT, IMAGE, CRASH, etc.). > > That's much less specific, but that could be easier. > It depends on what you're trying to do. It matches arguably the most common case, which is "how should this failure be suppressed" (or classified). > > > Tools/Scripts/webkitpy/tool/commands/rebaseline.py:92 > > > + failing_tests = build.layout_test_results().tests_matching_failure_types([test_failures.FailureTextMismatch]) > > > > I don't know the context of this routine but (a) it seems like "FailureTextMismatch" is probably the wrong value to be looking for and (b) you should be checking against result.type == test_expectations.TEXT instead of looking at the failure types directly (as per above). > > I can try that in a later patch. But I don't think that will work as you expect. .type is only one value. What happens when a test fails both image and text? .type only reflects one of those. No, type is classified based on all of the failures that happened. So, if it failed both image and text diffs, it gets classified as IMAGE_PLUS_TEXT. If it failed the text diff and the image expectations were missing, it gets classified as MISSING (since who knows if the text result is even meaningful)? Similarly, TIMEOUT and CRASH trump everything else.

Dirk Pranke

Comment 18 2011-01-04 17:23:31 PST

(In reply to comment #15) > (In reply to comment #12) > > (In reply to comment #10) > > > Right, the three layers could be: > > > > > > 1) application-specific (webkit-patch, new-run-webkit-tests, etc) > > > 2) shared, but webkit-specific > > > 3) shared, and non-webkit specific > > > > > > Model classes about test results seem to go in layer 2. Classes for talking to buildbot (generally) seem to go in layer 3. Controller classes for executing DumpRenderTree and creating models of the results seems to go in layer 1. > > > > This way of looking at things is a little weird to me. I would rather see stuff grouped by functionality rather than by visibility. i.e., layout_tests/public/foo rather than public/layout_tests/foo. Although either of these things seems like a bit of a hack, and we should just be using python's normal visibility mechanisms to control things. > > I'm not sure what you mean. Meaning you use the __init__.py files to control what symbols are exported outside of a directory, and you use naming conventions like single and double underscores otherwise to indicate visibility. > > > Also, I'm not sure what the current (not intended) definition of "common" is - whether it is (1), (2), (3), or some mixture of all of them? > > common was stuff that was pulled out of the original bugzilla-tool to be shared by other python code. bugzilla-tool became webkit-patch and thus the tool/ directory. :) > > I think common is a mixture of a lot of things these days. > > > I would be inclined to make the buildbot stuff its own top-level package rather than putting it under "common", as I probably would with some of the other things in common.net. > > buildbot depends on stuff in common.net (and is already a package, although not a very useful one). I think that's fine, since webkitpy/buildbot should have no trouble importing webkitpy/common/net .

Eric Seidel (no email)

Comment 19 2011-01-04 17:27:16 PST

(In reply to comment #17) > > I can try that in a later patch. But I don't think that will work as you expect. .type is only one value. What happens when a test fails both image and text? .type only reflects one of those. > > No, type is classified based on all of the failures that happened. So, if it failed both image and text diffs, it gets classified as IMAGE_PLUS_TEXT. If it failed the text diff and the image expectations were missing, it gets classified as MISSING (since who knows if the text result is even meaningful)? Similarly, TIMEOUT and CRASH trump everything else. We'd still have to write some sort fo comparison function which knew how to take TEXT and compare it with IMAGE_PLUS_TEXT. Perhaps such already exists, but the code I was dealing with wanted to be able to check for various types of failures. passing TEXT would be fine, but having to compare type against TEXT and IMAGE_PLUS_TEXT seems error-prone.

Eric Seidel (no email)

Comment 20 2011-01-04 17:28:28 PST

(In reply to comment #18) > > > This way of looking at things is a little weird to me. I would rather see stuff grouped by functionality rather than by visibility. i.e., layout_tests/public/foo rather than public/layout_tests/foo. Although either of these things seems like a bit of a hack, and we should just be using python's normal visibility mechanisms to control things. > > > > I'm not sure what you mean. > > Meaning you use the __init__.py files to control what symbols are exported outside of a directory, and you use naming conventions like single and double underscores otherwise to indicate visibility. Yes, I support such. Although we've had some confusion with such, see the recent removal of import api from Checkout/__init__.py.

Dirk Pranke

Comment 21 2011-01-04 17:38:09 PST

(In reply to comment #16) > > This stuff definitely needs more iteration. > > I find the TestResults stuff very cumbersome to use. The fact that I have to import 3 different packages from layout_tests just to use it is one problem. > I agree completely. It's not designed to be used outside of the way it was used in NRWT. The troubles you're running into are small compared to the issues James ran into with the rebaselining work. It's not yet clear what a better solution would be, though. (There are other bugs open for this). > I think LayoutTestResults needs to move to being in the same package as the rest of this stuff and then it will make more sense. > Probably. It's probably not a good thing that the code parsing the HTML is separate from the code generating the HTML. > I think .type is not a complete solution for replacing teh type(FailureClass) comparisons I had to do in this code (I did this based on other examples in the codebase). > Perhaps. I haven't really looked at the pre-NRWT code yet, so I can't say how much of this is just remapping old concepts versus trying to handle multiple different needs. The rebaselining tools may need some additional information, but even then that information might be best provided by methods on the TestResult rather than by exposing the underlying failure instances. > I like the idea of passing around Failure classes, since they can contain more than just a name/number. They can contain more than a name/number, but I have yet to see a compelling example for how that additional data would be programmatically used (as opposed to just displayed to the user). > however I would like to shorten the names of the various failure types so they're easier to use. I would be happy to do this as a part of restructuring this code and figuring out what things should be public/shared. > I also think that it should be possible to do a one-to-many (or many-to-one?) mapping from test expectations enums to failure classes. The current mapping, depending on how you look at it, is many-to-many. In order to make it 1:many for enum->failure classes you have to make some assumptions. For example, you should ignore the text diff if the image expectation was missing. I don't think there are any alternatives that would get to you to 1:many in that sense. In a different sense, the relation of (set of failures) -> enum (the overall relation, if you will), is currently many:1, in that a given set of failures will only ever produce one unique result. In all of the uses I've seen so far, this has been fine and simplifies the client code greatly, since the caller almost always only cares about the overall classification of the test, rather than the specifics of how each check ran. I am certainly open to other designs, however.

Dirk Pranke

Comment 22 2011-01-04 17:39:45 PST

(In reply to comment #19) > We'd still have to write some sort fo comparison function which knew how to take TEXT and compare it with IMAGE_PLUS_TEXT. Perhaps such already exists, but the code I was dealing with wanted to be able to check for various types of failures. passing TEXT would be fine, but having to compare type against TEXT and IMAGE_PLUS_TEXT seems error-prone. Why are you trying to compare TEXT to IMAGE_PLUS_TEXT ? What is the larger problem you're trying to solve?

Dirk Pranke

Comment 23 2011-01-04 17:41:05 PST

(In reply to comment #20) > (In reply to comment #18) > > > > This way of looking at things is a little weird to me. I would rather see stuff grouped by functionality rather than by visibility. i.e., layout_tests/public/foo rather than public/layout_tests/foo. Although either of these things seems like a bit of a hack, and we should just be using python's normal visibility mechanisms to control things. > > > > > > I'm not sure what you mean. > > > > Meaning you use the __init__.py files to control what symbols are exported outside of a directory, and you use naming conventions like single and double underscores otherwise to indicate visibility. > > Yes, I support such. Although we've had some confusion with such, see the recent removal of import api from Checkout/__init__.py. Yes. The problem there (as far as I understand it) was that we were mixing both models. You shouldn't both export stuff in __init__.py and allow people to import modules in the package directly.

Eric Seidel (no email)

Comment 24 2011-01-05 13:35:01 PST

Committed r75099: <http://trac.webkit.org/changeset/75099>

Eric Seidel (no email)

Comment 25 2011-01-05 15:57:17 PST

Committed r75113: <http://trac.webkit.org/changeset/75113>

Eric Seidel (no email)

Comment 26 2011-01-05 18:15:20 PST

Committed r75125: <http://trac.webkit.org/changeset/75125>

Attachments
Patch (17.83 KB, patch) 2011-01-02 12:16 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.