Bug 37630

Summary: delete redundant test outputs
Product: WebKit Reporter: Evan Martin <evan>
Component: New BugsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, bulach, commit-queue, dpranke, eric, ojan, tony, victorw, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Attachments:
Description Flags
Patch
abarth: review-
Patch
none
Patch
none
Patch none

Description Evan Martin 2010-04-14 18:49:22 PDT
delete redundant test outputs
Comment 1 Evan Martin 2010-04-14 18:51:03 PDT
Created attachment 53398 [details]
Patch
Comment 2 Alexey Proskuryakov 2010-04-16 11:37:33 PDT
Comment on attachment 53398 [details]
Patch

It might be slightly clearer if the script was named deduplicate-webkit-tests.
Comment 3 Evan Martin 2010-04-16 12:54:12 PDT
I can rename it.  BTW, this found 1300 redundant files in the current tree.
Comment 4 Eric Seidel (no email) 2010-04-16 13:26:18 PDT
Comment on attachment 53398 [details]
Patch

Tricky stuff.

No clue what this does:
+        _, _, hash = attrs.split(' ')

Does that ignore the first two split values?
Comment 5 Evan Martin 2010-04-16 13:31:04 PDT
(In reply to comment #4)
> (From update of attachment 53398 [details])
> Tricky stuff.
> 
> No clue what this does:
> +        _, _, hash = attrs.split(' ')
> 
> Does that ignore the first two split values?

Yes.  Two features you might not be aware of:

1) foo, bar = x.split(' ')
unpacks into two variables.
2) _ = blah()
"_" is the name of the variable meaning "I don't want this variable".

That style is common for self-documenting the following:
 - throw an exception if the thing doesn't split into three components exactly
 - I don't care about the first two
It probably could be aided by including in a comment what kind of input it's parsing.  (The git output is designed to be consumed by scripts in this way.)
Comment 6 Adam Barth 2010-04-22 15:54:03 PDT
Comment on attachment 53398 [details]
Patch

I'm sorry, but we're not accepting any new python code that doesn't have unit tests.
Comment 7 Marcus Bulach 2010-08-09 05:01:10 PDT
Hi Evan,

I was working on something pretty similar https://bugs.webkit.org/show_bug.cgi?id=40792

Are you still working on this? I think you can either get some of the tests I wrote in the bug above and go ahead here, or I could take your get_port_fallbacks pieces.. :)
Either way is fine by me, which one would you prefer?
Comment 8 Marcus Bulach 2010-08-09 06:55:34 PDT
(In reply to comment #7)
> Hi Evan,
> 
> I was working on something pretty similar https://bugs.webkit.org/show_bug.cgi?id=40792
> 
> Are you still working on this? I think you can either get some of the tests I wrote in the bug above and go ahead here, or I could take your get_port_fallbacks pieces.. :)
> Either way is fine by me, which one would you prefer?

on a second read: I think your patch using git ls-tree is much simpler than mine (which used python's hashlib.sha1()), so if you're ok, I'd be happy to write some tests to get this patch (rather than mine) in!
Comment 9 Evan Martin 2010-08-09 11:31:41 PDT
I would love it if you finished either this patch or another one!  Please be my guest and use whichever code seems appropriate to you.
Comment 10 Marcus Bulach 2010-08-09 11:38:45 PDT
(In reply to comment #9)
> I would love it if you finished either this patch or another one!  Please be my guest and use whichever code seems appropriate to you.

great, thanks Evan! I'll reuse your code and upload a new patch with tests here soon.
Comment 11 Marcus Bulach 2010-08-10 02:22:39 PDT
Created attachment 63990 [details]
Patch
Comment 12 Marcus Bulach 2010-08-10 02:27:52 PDT
Thanks Evan! I took over your patch, did some minor changes (using log, optparse, added a glob for filtering file names, etc..), and added tests.

I'm also cc:ing everybody that was on https://bugs.webkit.org/show_bug.cgi?id=40792, can you all please take a look?
Comment 13 Marcus Bulach 2010-08-10 02:30:03 PDT
*** Bug 40792 has been marked as a duplicate of this bug. ***
Comment 14 Tony Chang 2010-08-10 09:58:03 PDT
Comment on attachment 63990 [details]
Patch

> diff --git a/WebKitTools/ChangeLog b/WebKitTools/ChangeLog
> +        * Scripts/deduplicate_tests.py: Added.
> +        * Scripts/deduplicate_tests_unittest.py: Added.

There are no other unittest.py files or files with _ in the filename in Scripts and there's only one file that ends in .py.  Maybe it would be better to put these files in Scripts/webkitpy somewhere and have a wrapper file (e.g., duplicate-tests) in Scripts that just imports the files from webkitpy.  I think the unittest needs to be in webkitpy anyway for Scripts/test-webkitpy to find it.


> diff --git a/WebKitTools/Scripts/deduplicate_tests.py b/WebKitTools/Scripts/deduplicate_tests.py
> +# 2.  Redistributions in binary form must reproduce the above copyright
> +#     notice, this list of conditions and the following disclaimer in the
> +#     documentation and/or other materials provided with the distribution.

The third clause of the license seems to be missing.  There seems to be some inconsistency among the existing files in Scripts, but that's a separate issue.  I would use something like what's in webkit-tools-completion.sh.

> +    # Fill in the map.
> +    cmd = ['git', 'ls-tree', '-r', 'HEAD', 'LayoutTests']

Nit: () instead of [] since cmd doesn't change.

The unittests look great.
Comment 15 Marcus Bulach 2010-08-10 10:32:30 PDT
Created attachment 64026 [details]
Patch
Comment 16 Marcus Bulach 2010-08-10 10:35:59 PDT
thanks Tony! replies inline, new patch uploaded:

(In reply to comment #14)
> (From update of attachment 63990 [details])
> > diff --git a/WebKitTools/ChangeLog b/WebKitTools/ChangeLog
> > +        * Scripts/deduplicate_tests.py: Added.
> > +        * Scripts/deduplicate_tests_unittest.py: Added.
> 
> There are no other unittest.py files or files with _ in the filename in Scripts and there's only one file that ends in .py.  Maybe it would be better to put these files in Scripts/webkitpy somewhere and have a wrapper file (e.g., duplicate-tests) in Scripts that just imports the files from webkitpy.  I think the unittest needs to be in webkitpy anyway for Scripts/test-webkitpy to find it.

ahn, right! I was running the tests manually, didn't know about test-webkitpy.
it's now much clearer with the wrapper in Scripts and the module / unittest in webkitpy, thanks!

> 
> 
> > diff --git a/WebKitTools/Scripts/deduplicate_tests.py b/WebKitTools/Scripts/deduplicate_tests.py
> > +# 2.  Redistributions in binary form must reproduce the above copyright
> > +#     notice, this list of conditions and the following disclaimer in the
> > +#     documentation and/or other materials provided with the distribution.
> 
> The third clause of the license seems to be missing.  There seems to be some inconsistency among the existing files in Scripts, but that's a separate issue.  I would use something like what's in webkit-tools-completion.sh.

hehe, I'm confused. in this bug I've been told to use the two clause version.. ;)
https://bugs.webkit.org/show_bug.cgi?id=40250#c5

do you know what's the real latest? :)


> 
> > +    # Fill in the map.
> > +    cmd = ['git', 'ls-tree', '-r', 'HEAD', 'LayoutTests']
> 
> Nit: () instead of [] since cmd doesn't change.

done.

> 
> The unittests look great.

thanks! another look, please?
Comment 17 Tony Chang 2010-08-10 10:41:20 PDT
Comment on attachment 64026 [details]
Patch

Jeremy probably knows the license situation better than me.
Comment 18 Eric Seidel (no email) 2010-08-10 11:09:35 PDT
Comment on attachment 64026 [details]
Patch

WebKitTools/ChangeLog:1
 +  2010-08-10  Marcus Bulach  <bulach@chromium.org>
Should be a new line between the author line and the reviewed by line.

WebKitTools/Scripts/deduplicate-tests:65
 +      if options.pretty_print:
Is this some sort of verbose mode?

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:47
 +  def get_port_fallbacks():
I dont' think we usually use get_* in python code.  Check PEP8.  We certainly don't use get* in our c++ webkit code.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:58
 +              _log.error("'%s' lacks baseline_search_path()" % port_name)
Do we ever hit this?  Wouldn't that be a bug in the port implementation?  I think it's OK to be robust here, but we should be more explicit that the port needs fixing.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:59
 +              port_fallbacks[port_name] = ['base']
base?  Is that a magic string here?

Seems we might want a constant for that string instead of repeating it.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:79
 +      cmd = ('git', 'ls-tree', '-r', 'HEAD', 'LayoutTests')
I feel like the running of ls tree and returning of the split lines should be a separate funciton.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:98
 +          _, _, hash = attrs.split(' ')
This is a cool pattern, I wish we had understood it better when writing the rest of webkitpy.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:89
 +      for line in git.split('\n'):
This loop feels like it wants to call some helper function to proces each line.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:113
 +          if len(cluster) < 2:
This per-test block feels like a function.  :)  Mostly because I was initially confused as to how you were avoiding deduping the empty file.  But it now seems more clear to me that your'e not looping on hashes, but rather test paths (which is important).

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:116
 +          # Compute the list of platforms we have this particular hash for.
Isn't this a function?  platforms_with_test_matching_hash()?

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:139
 +      for test, platform, fallback, path in find_dups(hashes, port_fallbacks):
If this was a function, then deduplicate just becomes a list comprehension.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:32
 +  class MockExecutive(object):
Doesn't Mock.py already do this for you?  We include it in webkitpy/third_party iirc.  It's not great, but it might be easier than rolling your own.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:57
 +              '100644 blob 5053240b3353f6eb39f7cb00259785f16d121df2\tLayoutTests/mac/foo-expected.txt\n'
Would be nice to have a parsing function split out from the logic so we could test that independently.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:64
 +          self.assertEquals(('git', 'ls-tree', '-r', 'HEAD', 'LayoutTests'), MockExecutive.last_run_command[-1])
I see.  Yeah, we don't have quite so slick an interface for this in our existing mocks.  If you look at mocktool.py you'll find a MockExecutive I think.  But the way we've been asserting is using OutputCapture.  Your solution is more pythony, I think.

We may wish to move some variant of your MockExecutive into a shared file at some point.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:113
 +  if __name__ == '__main__':
I don't think we need this.  Did you verify that test-webkitpy will automagically find and run your unit tests?  I would expect it to.

Overall I think this ia a FANTASTIC patch.  I really can't thank you and evan enough for taking this on.

I would like to see one more round of this.  Don't feel obliged to answer each of my nits above, but you should at least thumb through them.  I think this patch is OK to land as is, but it would be nice to at least fix the ChangeLog and consider the nits listed.
Comment 19 Eric Seidel (no email) 2010-08-10 11:10:36 PDT
Oh, and my understanding is that Google uses the 2-clause BSD for all new webkit contributions.  I'm not sure what Apple's standard template looks like.
Comment 20 Marcus Bulach 2010-08-10 12:43:55 PDT
Created attachment 64040 [details]
Patch
Comment 21 Marcus Bulach 2010-08-10 12:51:27 PDT
thanks Eric! I think I addressed most of your comments, would you mind taking another look please?

(In reply to comment #18)
> (From update of attachment 64026 [details])
> WebKitTools/ChangeLog:1
>  +  2010-08-10  Marcus Bulach  <bulach@chromium.org>
> Should be a new line between the author line and the reviewed by line.
Done

> 
> WebKitTools/Scripts/deduplicate-tests:65
>  +      if options.pretty_print:
> Is this some sort of verbose mode?
yep, changed to verbose.

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:47
>  +  def get_port_fallbacks():
> I dont' think we usually use get_* in python code.  Check PEP8.  We certainly don't use get* in our c++ webkit code.
done.

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:58
>  +              _log.error("'%s' lacks baseline_search_path()" % port_name)
> Do we ever hit this?  Wouldn't that be a bug in the port implementation?  I think it's OK to be robust here, but we should be more explicit that the port needs fixing.
Clarified the message.

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:59
>  +              port_fallbacks[port_name] = ['base']
> base?  Is that a magic string here?
> 
> Seems we might want a constant for that string instead of repeating it.
Added as constant.

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:79
>  +      cmd = ('git', 'ls-tree', '-r', 'HEAD', 'LayoutTests')
> I feel like the running of ls tree and returning of the split lines should be a separate funciton.
agreed! split.

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:98
>  +          _, _, hash = attrs.split(' ')
> This is a cool pattern, I wish we had understood it better when writing the rest of webkitpy.
:)


> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:89
>  +      for line in git.split('\n'):
> This loop feels like it wants to call some helper function to proces each line.
done.

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:113
>  +          if len(cluster) < 2:
> This per-test block feels like a function.  :)  Mostly because I was initially confused as to how you were avoiding deduping the empty file.  But it now seems more clear to me that your'e not looping on hashes, but rather test paths (which is important).

I split this up, let me know if it's any clearer.

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:116
>  +          # Compute the list of platforms we have this particular hash for.
> Isn't this a function?  platforms_with_test_matching_hash()?
I named "extract_platforms", hope it's ok?

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:139
>  +      for test, platform, fallback, path in find_dups(hashes, port_fallbacks):
> If this was a function, then deduplicate just becomes a list comprehension.
yay, more python-foo! :)

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:32
>  +  class MockExecutive(object):
> Doesn't Mock.py already do this for you?  We include it in webkitpy/third_party iirc.  It's not great, but it might be easier than rolling your own.
well, I'm not doing anything particularly complicated, so I think it's simpler to keep it?

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:57
>  +              '100644 blob 5053240b3353f6eb39f7cb00259785f16d121df2\tLayoutTests/mac/foo-expected.txt\n'
> Would be nice to have a parsing function split out from the logic so we could test that independently.
done.

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:64
>  +          self.assertEquals(('git', 'ls-tree', '-r', 'HEAD', 'LayoutTests'), MockExecutive.last_run_command[-1])
> I see.  Yeah, we don't have quite so slick an interface for this in our existing mocks.  If you look at mocktool.py you'll find a MockExecutive I think.  But the way we've been asserting is using OutputCapture.  Your solution is more pythony, I think.
> 
> We may wish to move some variant of your MockExecutive into a shared file at some point.

I'm happy to move out at any point..

> 
> WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:113
>  +  if __name__ == '__main__':
> I don't think we need this.  Did you verify that test-webkitpy will automagically find and run your unit tests?  I would expect it to.
right, it does find. removed it.

> 
> Overall I think this ia a FANTASTIC patch.  I really can't thank you and evan enough for taking this on.

all credits to Evan, my changes were mostly cosmetic. :)

> 
> I would like to see one more round of this.  Don't feel obliged to answer each of my nits above, but you should at least thumb through them.  I think this patch is OK to land as is, but it would be nice to at least fix the ChangeLog and consider the nits listed.

I think I addressed all of them. another look please?
Comment 22 Eric Seidel (no email) 2010-08-11 09:43:05 PDT
Comment on attachment 64040 [details]
Patch

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:53
 +          back on.  All platforms fall back on 'base'.
Seems part of this comment belongs next to _BASE_PLATFORM, but ok.

WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:68
 +  def parse_git_output(git_output, glob_pattern):
I would have named this parse_git_ls_tree_output

Looks great though.  Thanks for taking this on.

We probably should have done more of this as a class, rather than free functions.  Eventually it will probably get moved deeper into webkitpy.  But I think this is a good starting point.
Comment 23 WebKit Commit Bot 2010-08-11 11:31:27 PDT
Comment on attachment 64040 [details]
Patch

Clearing flags on attachment: 64040

Committed r65166: <http://trac.webkit.org/changeset/65166>
Comment 24 WebKit Commit Bot 2010-08-11 11:31:34 PDT
All reviewed patches have been landed.  Closing bug.
Comment 25 WebKit Review Bot 2010-08-11 11:49:39 PDT
http://trac.webkit.org/changeset/65166 might have broken Chromium Win Release
The following changes are on the blame list:
http://trac.webkit.org/changeset/65166
http://trac.webkit.org/changeset/65167
Comment 26 Dirk Pranke 2010-08-12 18:26:52 PDT
It looks like this code assumes you're running in a git checkout with the current working directory == the top of the tree, and it seems to gork test-webkitpy so that the rest of the test suite doesn't work right.

Can you please fix this?
Comment 27 Eric Seidel (no email) 2010-08-12 18:38:14 PDT
scm.py knows how to handle running in those cases.  Perhaps this should use more of webkitpy code to take care of that for you.

You can use detect_scm_system to find the checkout_root.

http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/webkitpy/common/checkout/scm.py#L41

We've long talked about making it easier to write non-webkit-patch tools which use webkitpy.  We just have a couple bugs to fix to make that possible.  (Mostly abstracting the data storage out of WebKitPatch into a separate Tool class that can be set as self.tool on Command instances instead of WebKitPatch.)
Comment 28 Dirk Pranke 2010-08-12 20:08:09 PDT
(In reply to comment #27)
> scm.py knows how to handle running in those cases.  Perhaps this should use more of webkitpy code to take care of that for you.
> 
> You can use detect_scm_system to find the checkout_root.
> 
> http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/webkitpy/common/checkout/scm.py#L41
> 
> We've long talked about making it easier to write non-webkit-patch tools which use webkitpy.  We just have a couple bugs to fix to make that possible.  (Mostly abstracting the data storage out of WebKitPatch into a separate Tool class that can be set as self.tool on Command instances instead of WebKitPatch.)

detect_scm_system() takes a path as input, which doesn't help if you don't know what path you
want to use. 

It would be nice if it had a way to bootstrap itself better, like we do in webkitpy/layout_tests/port/base.py:

        if not self._webkit_base_dir:
            abspath = os.path.abspath(__file__)
            self._webkit_base_dir = abspath[0:abspath.find('WebKitTools')]
            _log.debug("Using WebKit root: %s" % self._webkit_base_dir)

You could do something like this if path was None.
Comment 29 Eric Seidel (no email) 2010-08-12 20:22:51 PDT
detect_scm_system knows how to crawl up until it finds the root.

I think you want something like this:
http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/webkitpy/tool/main.py#L80

Which is why that code should be shared on some object other than WebKitPatch's main class.
Comment 30 Dirk Pranke 2010-08-13 13:21:34 PDT
(In reply to comment #29)
> detect_scm_system knows how to crawl up until it finds the root.
> 
> I think you want something like this:
> http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/webkitpy/tool/main.py#L80
> 
> Which is why that code should be shared on some object other than WebKitPatch's main class.

Well, sort of.

That code finds the root given $CWD, which is appropriate in some cases. The code in port.base finds the root given the path to port/base.py, which means that it'll work even if you're not actually in the tree; it uses the root of the tree containing the code you're running.

I'm not sure if that would be appropriate for scm-related stuff, but it seems fine for test-related stuff.
Comment 31 Eric Seidel (no email) 2010-08-13 14:09:17 PDT
The code I linked to tries to use the CWD, if its not a webkit checkout, then it falls back to being __file__ relative.
Comment 32 Dirk Pranke 2010-08-13 14:18:57 PDT
(In reply to comment #31)
> The code I linked to tries to use the CWD, if its not a webkit checkout, then it falls back to being __file__ relative.

Oh, i see. Interesting; I didn't realize that sys.path[0] was defined to be the script directory. That's useful.

So, in this case, if I have a webkit-patch from one checkout in my path, and I'm in another checkout, it will use the checkout I'm in, not the checkout the code is running from. Probably the right thing to do in the scm case; possibly the wrong thing to do in the test case. Perhaps in either case you might want to prompt/warn that you're mixing checkouts. But this is a corner condition to be sure.
Comment 33 Dirk Pranke 2010-08-13 17:48:48 PDT
Another issue is that deduplicate_tests.py:44 appears to change the default configuration of the logger, which causes a whole bunch of [ERROR] messages to start showing up when you run test-webkitpy (although the tests succeed).

I'll post a separate patch to address these two issues and we can continue the discussion of what to do about this there.
Comment 34 Dirk Pranke 2010-08-13 19:10:03 PDT
(In reply to comment #33)
> I'll post a separate patch to address these two issues and we can continue the discussion of what to do about this there.

okay, follow up in bug 44001.