34826 – new-run-webkit-tests --platform=mac-leopard does bogus image diffing

Bug 34826 - new-run-webkit-tests --platform=mac-leopard does bogus image diffing

Summary: new-run-webkit-tests --platform=mac-leopard does bogus image diffing

Status:	RESOLVED FIXED

Alias:	None

Product:	WebKit
Classification:	Unclassified
Component:	Tools / Tests (show other bugs)
Version:	528+ (Nightly build)
Hardware:	PC OS X 10.5

Importance:	P2 Normal
Assignee:	Dirk Pranke

URL:
Keywords:

Depends on:
Blocks:	34984
	Show dependency tree / graph

Reported:	2010-02-10 18:13 PST by Eric Seidel (no email)
Modified:	2010-04-19 13:59 PDT (History)
CC List:	5 users (show)

See Also:

Attachments
first draft of patch - still buggy (10.77 KB, patch) 2010-02-26 19:36 PST, Dirk Pranke	no flags	Details \| Formatted Diff \| Diff
updated patch - this should work correctly and safely (28.22 KB, patch) 2010-03-24 19:44 PDT, Dirk Pranke	no flags	Details \| Formatted Diff \| Diff
split server_process.read() into two calls for clearer usage (29.41 KB, patch) 2010-03-25 14:13 PDT, Dirk Pranke	no flags	Details \| Formatted Diff \| Diff
fix ChangeLog comments (29.21 KB, patch) 2010-03-25 14:36 PDT, Dirk Pranke	no flags	Details \| Formatted Diff \| Diff
back out PEP-8 reformatting to minimize confusion (24.56 KB, patch) 2010-03-25 18:15 PDT, Dirk Pranke	no flags	Details \| Formatted Diff \| Diff
revise w/ review feedback from eric and ojan (26.30 KB, patch) 2010-03-26 15:10 PDT, Dirk Pranke	no flags	Details \| Formatted Diff \| Diff
revise w/ more feedback from ojan (27.37 KB, patch) 2010-03-26 16:05 PDT, Dirk Pranke	no flags	Details \| Formatted Diff \| Diff
Show Obsolete (6) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eric Seidel (no email) 2010-02-10 18:13:26 PST

run-chromium-webkit-tests --platform=mac-leopard does bogus image diffing

    def _path_to_image_diff(self):
        return self._build_path('image_diff') # FIXME: This is wrong and should be "ImageDiff", but having the correct path causes other parts of the script to hang.

turns out that "image_diff" expects command line arguments and is not long-running.

ImageDiff is long running.

We'll have to teach RCWT how to deal with a long-running ImageDiff (similar to how we deal with a long-running DumpRenderTree) via the "Driver" class.

Comment 1 Dirk Pranke 2010-02-26 19:36:14 PST

Created attachment 49671 [details]
first draft of patch - still buggy

This isn't ready to land yet, but it's close. The test script is hanging occasionally, and I think it's due to a bug in the diff_image() implementation but I haven't tracked it down yet. I'm posting the patch so others know it's in progress.

Comment 2 Dirk Pranke 2010-03-24 19:44:20 PDT

Created attachment 51582 [details]
updated patch - this should work correctly and safely

Comment 3 Dirk Pranke 2010-03-25 14:13:19 PDT

Created attachment 51675 [details]
split server_process.read() into two calls for clearer usage

Comment 4 Eric Seidel (no email) 2010-03-25 14:21:36 PDT

Drive-by nit: ChangeLog looks not quite right.

Comment 5 Dirk Pranke 2010-03-25 14:36:07 PDT

Created attachment 51676 [details]
fix ChangeLog comments

Comment 6 Ojan Vafai 2010-03-25 17:56:51 PDT

Can you split the style changes off into a separate page? That would make reviewing the code changes considerably easier.

Comment 7 Dirk Pranke 2010-03-25 18:15:35 PDT

Created attachment 51706 [details]
back out PEP-8 reformatting to minimize confusion

Comment 8 Dirk Pranke 2010-03-25 18:15:47 PDT

(In reply to comment #6)
> Can you split the style changes off into a separate page? That would make
> reviewing the code changes considerably easier.

Done.

Comment 9 Ojan Vafai 2010-03-26 14:25:18 PDT

Comment on attachment 51706 [details]
back out PEP-8 reformatting to minimize confusion

> +    def diff_image(self, expected_filename, actual_filename,
> +                   diff_filename=None):
> +        """Compare two image files and produce a delta image file.
> +
> +        Return 1 if the two files are different, 0 if they are the same.

Why not return a boolean?

> +        actual_file = open(actual_filename).read()
> +        input = 'Content-Length: %d\n' % actual_length
> +        input += actual_file
> +        input += 'Content-Length: %d\n' % expected_length
> +        input += expected_file

How about using a single format string here?

> +        cmd = [self._path_to_image_diff(), '--tolerance', '0.1']

Doesn't this need to be controlled via the commandline? Also, I'm pretty sure tiger, leopard and snow-leopard have different default tolerances.

>              cmd += self._options.wrapper.split()
> -        # FIXME: Using arch here masks any possible file-not-found errors from a non-existant driver executable.
> -        cmd += ['arch', '-i386', port._path_to_driver(), '-']

Is this FIXME no longer true?

> +++ b/WebKitTools/Scripts/webkitpy/layout_tests/port/server_process.py
> +class ServerProcess:

I like the concept of this class, but not so much the name. There's not really a server involved. ReaderProcess? ReadLineProcess? ReadWriteProcess?

> +    """This class provides a wrapper around a subprocess that
> +    implements a simple read/write usage model. The primary benefit
> +    is that reading responses takes a timeout, so that we don't ever block
> +    indefinitely. The class also handles transparently restarting processes
> +    as necessary to keep issuing commands."""


> +    def read_line(self, deadline):
> +        """Read a single line from the subprocess, waiting until the deadline.
> +        If the deadline passes, the call times out. Note that even if the
> +        subprocess has crashed or the deadline has passed, if there is output
> +        pending, it will be returned.
> +
> +        Args:
> +            deadline: timestamp that the read must complete before

Passing in a timestamp is weird. Can you have it be a timeout and then have this code deal with calculating what the deadline should be?

> +        Returns:
> +            output: data returned, if any. If no data is available and the
> +                call times out or crashes, an empty string is returned. Note
> +                that the returned string includes the newline ('\n')."""
> +        return self._read(deadline, 0)
> +
> +    def read(self, deadline, size):
> +        """Attempts to read size characters from the subprocess, waiting until
> +        the deadline passes. If the deadline passes, any available data will be
> +        returned. Note that even if the deadline has passed or if the
> +        subprocess has crashed, any available data will still be returned.
> +
> +        Args:
> +            deadline: timestamp that the read must complete before
> +            size: amount of data to read.
> +        Returns:
> +            output: data returned, if any. If no data is available, an empty
> +                string is returned.
> +        """
> +        return self._read(deadline, size)

Why does this function exist? Why not just make _read public?

> +    def _read(self, deadline, size):
> +        """Internal routine that actually does the read."""
> +        idx = -1

Can you give this a more descriptive name?

> diff --git a/WebKitTools/Scripts/webkitpy/layout_tests/test_types/image_diff.py b/WebKitTools/Scripts/webkitpy/layout_tests/test_types/image_diff.py
> index 42928ba..3a3d8a6 100644
> --- a/WebKitTools/Scripts/webkitpy/layout_tests/test_types/image_diff.py
> +++ b/WebKitTools/Scripts/webkitpy/layout_tests/test_types/image_diff.py
> @@ -90,6 +90,7 @@ class ImageDiff(test_type_base.TestTypeBase):
>          Args:
>            filename: the name of the test
>            target: Debug or Release
> +        Returns 1 if the files are different, 0 if they match
>          """

I know you didn't make it this way, but can we make this return a boolean? Then we can make the above code do the same?

Comment 10 Dirk Pranke 2010-03-26 14:38:31 PDT

(In reply to comment #9)
> (From update of attachment 51706 [details])
> > +    def diff_image(self, expected_filename, actual_filename,
> > +                   diff_filename=None):
> > +        """Compare two image files and produce a delta image file.
> > +
> > +        Return 1 if the two files are different, 0 if they are the same.
> 
> Why not return a boolean?

You saw below why not; I'll change the two files to return booleans instead.

> 
> > +        actual_file = open(actual_filename).read()
> > +        input = 'Content-Length: %d\n' % actual_length
> > +        input += actual_file
> > +        input += 'Content-Length: %d\n' % expected_length
> > +        input += expected_file
> 
> How about using a single format string here?
>

Will do.
 
> > +        cmd = [self._path_to_image_diff(), '--tolerance', '0.1']
> 
> Doesn't this need to be controlled via the commandline? Also, I'm pretty sure
> tiger, leopard and snow-leopard have different default tolerances.

Theoretically. I think the whole "tolerance" concept needs to go away, and we should only allow exact pixel matches.
 
> >              cmd += self._options.wrapper.split()
> > -        # FIXME: Using arch here masks any possible file-not-found errors from a non-existant driver executable.
> > -        cmd += ['arch', '-i386', port._path_to_driver(), '-']
> 
> Is this FIXME no longer true?
> 

That is correct; we now detect and report if the driver is missing properly.

> > +++ b/WebKitTools/Scripts/webkitpy/layout_tests/port/server_process.py
> > +class ServerProcess:
> 
> I like the concept of this class, but not so much the name. There's not really
> a server involved. ReaderProcess? ReadLineProcess? ReadWriteProcess?

I agree with you that ServerProcess isn't great, but I don't like any of your suggestions any better. RpcProcess?

> Passing in a timestamp is weird. Can you have it be a timeout and then have
> this code deal with calculating what the deadline should be?

I agree that this is a little weird (compared to select, etc), but I think it's actually easier to get right on the caller side than the normal API design pattern. I can change it if you think following established (IMO, bad) API conventions is better here.

> > +        """
> > +        return self._read(deadline, size)
> 
> Why does this function exist? Why not just make _read public?
>

There are two public use cases supported, <read one line of unknown length> and <read an exact number of bytes>). Combining those into a single function means you have to use some sort of a magic value to indicate which you want. I thought having two public functions was better. However, as you saw below, implementing them separately would be silly, so I combined them into a single internal routine.
  
> > +    def _read(self, deadline, size):
> > +        """Internal routine that actually does the read."""
> > +        idx = -1
> 
> Can you give this a more descriptive name?

Suggestions?

> I know you didn't make it this way, but can we make this return a boolean? Then
> we can make the above code do the same?

will do.

Comment 11 Ojan Vafai 2010-03-26 15:02:13 PDT

(In reply to comment #10)
> (In reply to comment #9)
> > > +        cmd = [self._path_to_image_diff(), '--tolerance', '0.1']
> > 
> > Doesn't this need to be controlled via the commandline? Also, I'm pretty sure
> > tiger, leopard and snow-leopard have different default tolerances.
> 
> Theoretically. I think the whole "tolerance" concept needs to go away, and we
> should only allow exact pixel matches.

I'm inclined to agree, but for now this should at least have a FIXME.

> > > +++ b/WebKitTools/Scripts/webkitpy/layout_tests/port/server_process.py
> > > +class ServerProcess:
> > 
> > I like the concept of this class, but not so much the name. There's not really
> > a server involved. ReaderProcess? ReadLineProcess? ReadWriteProcess?
> 
> I agree with you that ServerProcess isn't great, but I don't like any of your
> suggestions any better. RpcProcess?

How about IpcProcess? RPC is about actually executing code on the remote process. This is more about communicating with the remote process.

> > Passing in a timestamp is weird. Can you have it be a timeout and then have
> > this code deal with calculating what the deadline should be?
> 
> I agree that this is a little weird (compared to select, etc), but I think it's
> actually easier to get right on the caller side than the normal API design
> pattern. I can change it if you think following established (IMO, bad) API
> conventions is better here.

I think the confusion of this violating established API conventions is more likely to cause bugs. I'd rather it just be a timeout.

> > > +        """
> > > +        return self._read(deadline, size)
> > 
> > Why does this function exist? Why not just make _read public?
> 
> There are two public use cases supported, <read one line of unknown length> and
> <read an exact number of bytes>). Combining those into a single function means
> you have to use some sort of a magic value to indicate which you want. I
> thought having two public functions was better. However, as you saw below,
> implementing them separately would be silly, so I combined them into a single
> internal routine.

Sorry, I phrased poorly. I'm suggesting read_line just call the public read function. No need for a private one.

> > > +    def _read(self, deadline, size):
> > > +        """Internal routine that actually does the read."""
> > > +        idx = -1
> > 
> > Can you give this a more descriptive name?
> 
> Suggestions?

Even just "index" would be fine.

Comment 12 Dirk Pranke 2010-03-26 15:10:18 PDT

Created attachment 51785 [details]
revise w/ review feedback from eric and ojan

Comment 13 Dirk Pranke 2010-03-26 16:05:09 PDT

(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > > +        cmd = [self._path_to_image_diff(), '--tolerance', '0.1']
> > > 
> > > Doesn't this need to be controlled via the commandline? Also, I'm pretty sure
> > > tiger, leopard and snow-leopard have different default tolerances.
> > 
> > Theoretically. I think the whole "tolerance" concept needs to go away, and we
> > should only allow exact pixel matches.
> 
> I'm inclined to agree, but for now this should at least have a FIXME.
> 

FIXME added.

> > > > +++ b/WebKitTools/Scripts/webkitpy/layout_tests/port/server_process.py
> > > > +class ServerProcess:
> > > 
> > > I like the concept of this class, but not so much the name. There's not really
> > > a server involved. ReaderProcess? ReadLineProcess? ReadWriteProcess?
> > 
> > I agree with you that ServerProcess isn't great, but I don't like any of your
> > suggestions any better. RpcProcess?
> 
> How about IpcProcess? RPC is about actually executing code on the remote
> process. This is more about communicating with the remote process.
>

The intent of the api is that there is a request followed by a reply. Unfortunately due to the way ImageDiff and DRT produce output, you can't easily encapsulate this into a single reply() method, and so I was left with the more generic read() and write() routines. However, IPC is too vague to capture the intent. Per face-to-face discussion w/ Ojan, we agreed to leave it as ServerProcess until we could think of something better.
 
> > > Passing in a timestamp is weird. Can you have it be a timeout and then have
> > > this code deal with calculating what the deadline should be?
> > 
> > I agree that this is a little weird (compared to select, etc), but I think it's
> > actually easier to get right on the caller side than the normal API design
> > pattern. I can change it if you think following established (IMO, bad) API
> > conventions is better here.
> 
> I think the confusion of this violating established API conventions is more
> likely to cause bugs. I'd rather it just be a timeout.

Okay, I've changed it to be a timeout. Judging by the fact that I introduced half a dozen bugs while change this, and by the fact that the code got longer on both sides, I think I was right. 

For a relevant discussion of the merits of API design in select() and read APIs, see http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext , although he repeats the error of not being able to distinguish between "don't wait", "wait for X", and "wait forever" in a delta timestamp. Using a deadline is clearer still.

> 
> > > > +        """
> > > > +        return self._read(deadline, size)
> > > 
> > > Why does this function exist? Why not just make _read public?
> > 
> > There are two public use cases supported, <read one line of unknown length> and
> > <read an exact number of bytes>). Combining those into a single function means
> > you have to use some sort of a magic value to indicate which you want. I
> > thought having two public functions was better. However, as you saw below,
> > implementing them separately would be silly, so I combined them into a single
> > internal routine.
> 
> Sorry, I phrased poorly. I'm suggesting read_line just call the public read
> function. No need for a private one.

As we discussed face-to-face, read() probably should've raised an exception if called with a non-positive number for size. I have added that, and you were okay with the change.

> 
> > > > +    def _read(self, deadline, size):
> > > > +        """Internal routine that actually does the read."""
> > > > +        idx = -1
> > > 
> > > Can you give this a more descriptive name?
> > 
> > Suggestions?
> 
> Even just "index" would be fine.

Ah, sorry, I thought you were complaining about "_read". I think "idx" is pretty obvious and shorter, but I've changed it to "index".

Comment 14 Dirk Pranke 2010-03-26 16:05:34 PDT

Created attachment 51791 [details]
revise w/ more feedback from ojan

Comment 15 Dirk Pranke 2010-03-26 16:07:52 PDT

(In reply to comment #13)
> > > > > +++ b/WebKitTools/Scripts/webkitpy/layout_tests/port/server_process.py
> > > > > +class ServerProcess:
> > > > 
> > > > I like the concept of this class, but not so much the name. There's not really
> > > > a server involved. ReaderProcess? ReadLineProcess? ReadWriteProcess?
> > > 
> > > I agree with you that ServerProcess isn't great, but I don't like any of your
> > > suggestions any better. RpcProcess?
> > 
> > How about IpcProcess? RPC is about actually executing code on the remote
> > process. This is more about communicating with the remote process.
> >
> 
> The intent of the api is that there is a request followed by a reply.
> Unfortunately due to the way ImageDiff and DRT produce output, you can't easily
> encapsulate this into a single reply() method, and so I was left with the more
> generic read() and write() routines. However, IPC is too vague to capture the
> intent. Per face-to-face discussion w/ Ojan, we agreed to leave it as
> ServerProcess until we could think of something better.
>

Originally when I wrote this my idea was to have ImageDiff and DumpRenderTree derive from ServerProcess, and then the complexity in parsing the responses would be hidden. Unfortunately, since they return such weird sets of arguments, even trying to fit them both into a single reply() method was awkward, and so I abandoned that idea.

Comment 16 Eric Seidel (no email) 2010-03-26 16:15:51 PDT

Comment on attachment 51791 [details]
revise w/ more feedback from ojan

I'm told that you and Ojan reached a happy place.  This looks OK to me.  I think this new architecture is much cleaner.

Two letter variable names make my brain kink.  But I don't think it's worth another round of changes to fix them.  In general I'd rather write out full english names and phrases to describe variables.  Then again, I come from a non-80 column limited world. :)

Thanks for all your hard work!

Comment 17 Dirk Pranke 2010-03-26 16:38:45 PDT

Comment on attachment 51791 [details]
revise w/ more feedback from ojan

Clearing flags on attachment: 51791

Committed r56647: <http://trac.webkit.org/changeset/56647>

Comment 18 Dirk Pranke 2010-03-26 16:38:50 PDT

All reviewed patches have been landed.  Closing bug.

Comment 19 James Robinson 2010-04-19 12:21:35 PDT

This killed the chromium windows canary:
http://build.chromium.org/buildbot/waterfall.fyi/builders/Webkit%20(webkit.org)/builds/25766/steps/webkit_tests/logs/stdio

The failure is:
Traceback (most recent call last):
  File "C:\b\slave\webkit-rel-webkit-org\build\src\third_party\WebKit\WebKitTools\Scripts\new-run-webkit-tests", line 37, in <module>
    sys.exit(run_webkit_tests.main(options, args))
  File "C:\b\slave\webkit-rel-webkit-org\build\src\third_party\WebKit\WebKitTools\Scripts\webkitpy\layout_tests\run_webkit_tests.py", line 1401, in main
    port_obj = port.get(options.platform, options)
  File "C:\b\slave\webkit-rel-webkit-org\build\src\third_party\WebKit\WebKitTools\Scripts\webkitpy\layout_tests\port\factory.py", line 84, in get
    import chromium_win
  File "C:\b\slave\webkit-rel-webkit-org\build\src\third_party\WebKit\WebKitTools\Scripts\webkitpy\layout_tests\port\chromium_win.py", line 39, in <module>
    import chromium
  File "C:\b\slave\webkit-rel-webkit-org\build\src\third_party\WebKit\WebKitTools\Scripts\webkitpy\layout_tests\port\chromium.py", line 43, in <module>
    import webkit
  File "C:\b\slave\webkit-rel-webkit-org\build\src\third_party\WebKit\WebKitTools\Scripts\webkitpy\layout_tests\port\webkit.py", line 46, in <module>
    import webkitpy.layout_tests.port.server_process as server_process
  File "C:\b\slave\webkit-rel-webkit-org\build\src\third_party\WebKit\WebKitTools\Scripts\webkitpy\layout_tests\port\server_process.py", line 32, in <module>
    import fcntl
ImportError: No module named fcntl
program finished with exit code 1

I think the 'import fcntl' line needs to be guarded.  Can you take a look at this Dirk?  If you don't have time I can try to fix it later today.  Thanks.

Comment 20 Dirk Pranke 2010-04-19 13:59:45 PDT

(In reply to comment #19)
> This killed the chromium windows canary:
> http://build.chromium.org/buildbot/waterfall.fyi/builders/Webkit%20(webkit.org)/builds/25766/steps/webkit_tests/logs/stdio
> 

This whole module won't work on Windows, because the non-blocking I/O model is totally different. We need to do much more invasive surgery than that. I've checked in a fix to webkit.py to not pull the file in if we're on windows, which fixes this for now.