Bug 60957 - Pass flags from run_webkit_tests to ImageDiff, read results from stdout and store in unexpected_results.json
Summary: Pass flags from run_webkit_tests to ImageDiff, read results from stdout and s...
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: NRWT
Depends on:
Blocks:
 
Reported: 2011-05-17 07:00 PDT by Tom Hudson
Modified: 2012-06-19 14:46 PDT (History)
2 users (show)

See Also:


Attachments
Patch (11.19 KB, patch)
2011-05-17 07:01 PDT, Tom Hudson
dpranke: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Hudson 2011-05-17 07:00:05 PDT
Pass flags from run_webkit_tests to ImageDiff, read results from stdout and store in unexpected_results.json
Comment 1 Tom Hudson 2011-05-17 07:01:45 PDT
Created attachment 93762 [details]
Patch
Comment 2 Tom Hudson 2011-05-17 07:04:51 PDT
Draft solution to (1) cache data from ImageDiff on Chromium port object in worker processes and (2) use message-passing and message-handling stubs on port object to get the data back to the manager process, where it is written into JSON file.
Comment 3 Tom Hudson 2011-05-25 12:10:23 PDT
Comment on attachment 93762 [details]
Patch

Takes input from bug 60569, which is now r+ tony@chromium.org; output is to be read by bug 60964.
Comment 4 Dirk Pranke 2011-05-26 16:21:15 PDT
Comment on attachment 93762 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93762&action=review

So this patch isn't what I'd like to see as the long-term way to do this, but it's close for an incremental step. Ideally we should do something like modify layout_package/single_test_runner._compare_image and test_result_writer.diff_image so that we can compute your metrics in a single step, and then pass a dict of the metrics through a TestFailure object. However, changing the code to do this is a bit invasive to do all at once and may be more than you're comfortable with (as we've discussed previously). I'll upload a patch that demonstrates the protocol-level changes I have in mind and we can eventually merge that patch with this line of development.

However, this batch is basically fine and doesn't need to be completely redone to make those changes. I'm R-'ing it for a bunch of specific things we can tweak somewhat to simplify things, in particular so that we don't need to generate two "finished_test" messages per test.

Let me know if you have any questions on my comments. I look forward to getting this feature working and will be glad to help now that I'm not either on vacation or in workshops :)

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:263
> +            default="",

I will usually try to omit the default (which is equivalent to default=None) if I want something to be empty by default.

> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:161
> +        port_obj.add_port_test_data(result_type, test, test_dict)

Can you add a FIXME here to extract this data from the failure_types list directly instead of having to jump through the handle_port_finished_test / add_port_test_data hoop instead?

If it's not clear what I have in mind, I can upload a modified version of this patch that does what I'm talking about.

> Tools/Scripts/webkitpy/layout_tests/layout_package/worker.py:143
>          self._worker_connection.post_message('finished_test', result, elapsed_time)

Can we change this so that instead of sending a port_finished_test message we call into a port routine that returns a dict and send the dict along in finished_test? I don't really want to double the number of messages sent in a test run, and I have no concerns over changing the message format.

> Tools/Scripts/webkitpy/layout_tests/port/base.py:686
> +    def add_port_test_data(self, result_type, test, out_test_dict):

If the intent is for these to be default implementations that don't do anything, can we please add a comment to that effect and a 'pass' statement? On the other hand, if you want these to be virtual methods that must be implemented in subclasses, please add a raise NotImplementedError (as we do in other routines).

Also, please add tests for these routines in port/port_testcase.py.

> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:162
> +        return diff_filename[prefix_len:-suffix_len] + '.html'

Ugh. There should be a method on the base class to do something like this, and we shouldn't introduce a dependency on test_result_writer into this file. Can you add a FIXME to fix this in a later patch?

> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:196
> +            if self.get_option('write_image_diff_metrics'):

Is there a reason we shouldn't just always pass the flag and write the metrics?

> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:346
> +    def _test_name_from_filename(self, diff_filename):

There is a port.relative_test_filename() in base.py that does this already.
Comment 5 Tom Hudson 2011-05-31 14:32:26 PDT
> 
> > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:161
> > +        port_obj.add_port_test_data(result_type, test, test_dict)
> 
> Can you add a FIXME here to extract this data from the failure_types list directly instead of having to jump through the handle_port_finished_test / add_port_test_data hoop instead?
> 
> If it's not clear what I have in mind, I can upload a modified version of this patch that does what I'm talking about.
> 
> > Tools/Scripts/webkitpy/layout_tests/layout_package/worker.py:143
> >          self._worker_connection.post_message('finished_test', result, elapsed_time)
> 
> Can we change this so that instead of sending a port_finished_test message we call into a port routine that returns a dict and send the dict along in finished_test? I don't really want to double the number of messages sent in a test run, and I have no concerns over changing the message format.

Part 1 is not 100% clear to me, unless part 2 is tied into it - I should be able to upload what I think you mean tomorrow. It'll be:
 - new field on TestResult object
 - new routine on Port called from worker thread to add arbitrary data to TestResult
 - Port.add_port_test_data replaced by a generic routine that checks for above arbitrary data on the TestResult object and, if present, adds to test_dict
 - Get rid of the port_finished_test send & receive code.
 - Get rid of the dependency on test_writer.

> > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:196
> > +            if self.get_option('write_image_diff_metrics'):
> 
> Is there a reason we shouldn't just always pass the flag and write the metrics?

Tony Chang & I talked for a while when he first reviewed 60569, and it was important to him not to slow down NRWT on the bots; especially once we start computing multiple metrics over the images (qv discussion of 60964), that's multiple passes over every pixel of every difference image, and a lot of extra data being written to unexpected_results.json and archived.

As long as the only consumer of this data is RebaselineServer, it seemed reasonable to make it argument-driven and only happen on interactive local runs, although now thinking about it you might at times want it from a bot if there's a bug you can't reproduce locally?

Tom
Comment 6 Dirk Pranke 2011-05-31 14:39:17 PDT
(In reply to comment #5)
> > 
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:161
> > > +        port_obj.add_port_test_data(result_type, test, test_dict)
> > 
> > Can you add a FIXME here to extract this data from the failure_types list directly instead of having to jump through the handle_port_finished_test / add_port_test_data hoop instead?
> > 
> > If it's not clear what I have in mind, I can upload a modified version of this patch that does what I'm talking about.
> > 
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/worker.py:143
> > >          self._worker_connection.post_message('finished_test', result, elapsed_time)
> > 
> > Can we change this so that instead of sending a port_finished_test message we call into a port routine that returns a dict and send the dict along in finished_test? I don't really want to double the number of messages sent in a test run, and I have no concerns over changing the message format.
> 
> Part 1 is not 100% clear to me, unless part 2 is tied into it - I should be able to upload what I think you mean tomorrow. It'll be:
>  - new field on TestResult object
>  - new routine on Port called from worker thread to add arbitrary data to TestResult
>  - Port.add_port_test_data replaced by a generic routine that checks for above arbitrary data on the TestResult object and, if present, adds to test_dict
>  - Get rid of the port_finished_test send & receive code.
>  - Get rid of the dependency on test_writer.
> 

Sounds pretty close :)

> > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:196
> > > +            if self.get_option('write_image_diff_metrics'):
> > 
> > Is there a reason we shouldn't just always pass the flag and write the metrics?
> 
> Tony Chang & I talked for a while when he first reviewed 60569, and it was important to him not to slow down NRWT on the bots; especially once we start computing multiple metrics over the images (qv discussion of 60964), that's multiple passes over every pixel of every difference image, and a lot of extra data being written to unexpected_results.json and archived.
> 

Well, given that files that fail the image check are a very small percentage of the total number of tests we run, I'd be surprised if it made much difference.

> As long as the only consumer of this data is RebaselineServer, it seemed reasonable to make it argument-driven and only happen on interactive local runs, although now thinking about it you might at times want it from a bot if there's a bug you can't reproduce locally?

That, or eventually we'll build tools where the gardeners (or, any webkit committer looking at failures in the tree) can also pull this data to help them determine how to triage failures.

That said, I'm happy to start with a flag and we can get rid of it later.
Comment 7 Dirk Pranke 2012-06-19 14:46:29 PDT
tom, I'm closing this since AFAIK we have no plans to do anything like this any time soon. Hopefully that's okay; if not, please feel free to reopen and we can discuss what to do with it.