delete redundant test outputs
Created attachment 53398 [details] Patch
Comment on attachment 53398 [details] Patch It might be slightly clearer if the script was named deduplicate-webkit-tests.
I can rename it. BTW, this found 1300 redundant files in the current tree.
Comment on attachment 53398 [details] Patch Tricky stuff. No clue what this does: + _, _, hash = attrs.split(' ') Does that ignore the first two split values?
(In reply to comment #4) > (From update of attachment 53398 [details]) > Tricky stuff. > > No clue what this does: > + _, _, hash = attrs.split(' ') > > Does that ignore the first two split values? Yes. Two features you might not be aware of: 1) foo, bar = x.split(' ') unpacks into two variables. 2) _ = blah() "_" is the name of the variable meaning "I don't want this variable". That style is common for self-documenting the following: - throw an exception if the thing doesn't split into three components exactly - I don't care about the first two It probably could be aided by including in a comment what kind of input it's parsing. (The git output is designed to be consumed by scripts in this way.)
Comment on attachment 53398 [details] Patch I'm sorry, but we're not accepting any new python code that doesn't have unit tests.
Hi Evan, I was working on something pretty similar https://bugs.webkit.org/show_bug.cgi?id=40792 Are you still working on this? I think you can either get some of the tests I wrote in the bug above and go ahead here, or I could take your get_port_fallbacks pieces.. :) Either way is fine by me, which one would you prefer?
(In reply to comment #7) > Hi Evan, > > I was working on something pretty similar https://bugs.webkit.org/show_bug.cgi?id=40792 > > Are you still working on this? I think you can either get some of the tests I wrote in the bug above and go ahead here, or I could take your get_port_fallbacks pieces.. :) > Either way is fine by me, which one would you prefer? on a second read: I think your patch using git ls-tree is much simpler than mine (which used python's hashlib.sha1()), so if you're ok, I'd be happy to write some tests to get this patch (rather than mine) in!
I would love it if you finished either this patch or another one! Please be my guest and use whichever code seems appropriate to you.
(In reply to comment #9) > I would love it if you finished either this patch or another one! Please be my guest and use whichever code seems appropriate to you. great, thanks Evan! I'll reuse your code and upload a new patch with tests here soon.
Created attachment 63990 [details] Patch
Thanks Evan! I took over your patch, did some minor changes (using log, optparse, added a glob for filtering file names, etc..), and added tests. I'm also cc:ing everybody that was on https://bugs.webkit.org/show_bug.cgi?id=40792, can you all please take a look?
*** Bug 40792 has been marked as a duplicate of this bug. ***
Comment on attachment 63990 [details] Patch > diff --git a/WebKitTools/ChangeLog b/WebKitTools/ChangeLog > + * Scripts/deduplicate_tests.py: Added. > + * Scripts/deduplicate_tests_unittest.py: Added. There are no other unittest.py files or files with _ in the filename in Scripts and there's only one file that ends in .py. Maybe it would be better to put these files in Scripts/webkitpy somewhere and have a wrapper file (e.g., duplicate-tests) in Scripts that just imports the files from webkitpy. I think the unittest needs to be in webkitpy anyway for Scripts/test-webkitpy to find it. > diff --git a/WebKitTools/Scripts/deduplicate_tests.py b/WebKitTools/Scripts/deduplicate_tests.py > +# 2. Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in the > +# documentation and/or other materials provided with the distribution. The third clause of the license seems to be missing. There seems to be some inconsistency among the existing files in Scripts, but that's a separate issue. I would use something like what's in webkit-tools-completion.sh. > + # Fill in the map. > + cmd = ['git', 'ls-tree', '-r', 'HEAD', 'LayoutTests'] Nit: () instead of [] since cmd doesn't change. The unittests look great.
Created attachment 64026 [details] Patch
thanks Tony! replies inline, new patch uploaded: (In reply to comment #14) > (From update of attachment 63990 [details]) > > diff --git a/WebKitTools/ChangeLog b/WebKitTools/ChangeLog > > + * Scripts/deduplicate_tests.py: Added. > > + * Scripts/deduplicate_tests_unittest.py: Added. > > There are no other unittest.py files or files with _ in the filename in Scripts and there's only one file that ends in .py. Maybe it would be better to put these files in Scripts/webkitpy somewhere and have a wrapper file (e.g., duplicate-tests) in Scripts that just imports the files from webkitpy. I think the unittest needs to be in webkitpy anyway for Scripts/test-webkitpy to find it. ahn, right! I was running the tests manually, didn't know about test-webkitpy. it's now much clearer with the wrapper in Scripts and the module / unittest in webkitpy, thanks! > > > > diff --git a/WebKitTools/Scripts/deduplicate_tests.py b/WebKitTools/Scripts/deduplicate_tests.py > > +# 2. Redistributions in binary form must reproduce the above copyright > > +# notice, this list of conditions and the following disclaimer in the > > +# documentation and/or other materials provided with the distribution. > > The third clause of the license seems to be missing. There seems to be some inconsistency among the existing files in Scripts, but that's a separate issue. I would use something like what's in webkit-tools-completion.sh. hehe, I'm confused. in this bug I've been told to use the two clause version.. ;) https://bugs.webkit.org/show_bug.cgi?id=40250#c5 do you know what's the real latest? :) > > > + # Fill in the map. > > + cmd = ['git', 'ls-tree', '-r', 'HEAD', 'LayoutTests'] > > Nit: () instead of [] since cmd doesn't change. done. > > The unittests look great. thanks! another look, please?
Comment on attachment 64026 [details] Patch Jeremy probably knows the license situation better than me.
Comment on attachment 64026 [details] Patch WebKitTools/ChangeLog:1 + 2010-08-10 Marcus Bulach <bulach@chromium.org> Should be a new line between the author line and the reviewed by line. WebKitTools/Scripts/deduplicate-tests:65 + if options.pretty_print: Is this some sort of verbose mode? WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:47 + def get_port_fallbacks(): I dont' think we usually use get_* in python code. Check PEP8. We certainly don't use get* in our c++ webkit code. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:58 + _log.error("'%s' lacks baseline_search_path()" % port_name) Do we ever hit this? Wouldn't that be a bug in the port implementation? I think it's OK to be robust here, but we should be more explicit that the port needs fixing. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:59 + port_fallbacks[port_name] = ['base'] base? Is that a magic string here? Seems we might want a constant for that string instead of repeating it. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:79 + cmd = ('git', 'ls-tree', '-r', 'HEAD', 'LayoutTests') I feel like the running of ls tree and returning of the split lines should be a separate funciton. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:98 + _, _, hash = attrs.split(' ') This is a cool pattern, I wish we had understood it better when writing the rest of webkitpy. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:89 + for line in git.split('\n'): This loop feels like it wants to call some helper function to proces each line. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:113 + if len(cluster) < 2: This per-test block feels like a function. :) Mostly because I was initially confused as to how you were avoiding deduping the empty file. But it now seems more clear to me that your'e not looping on hashes, but rather test paths (which is important). WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:116 + # Compute the list of platforms we have this particular hash for. Isn't this a function? platforms_with_test_matching_hash()? WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:139 + for test, platform, fallback, path in find_dups(hashes, port_fallbacks): If this was a function, then deduplicate just becomes a list comprehension. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:32 + class MockExecutive(object): Doesn't Mock.py already do this for you? We include it in webkitpy/third_party iirc. It's not great, but it might be easier than rolling your own. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:57 + '100644 blob 5053240b3353f6eb39f7cb00259785f16d121df2\tLayoutTests/mac/foo-expected.txt\n' Would be nice to have a parsing function split out from the logic so we could test that independently. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:64 + self.assertEquals(('git', 'ls-tree', '-r', 'HEAD', 'LayoutTests'), MockExecutive.last_run_command[-1]) I see. Yeah, we don't have quite so slick an interface for this in our existing mocks. If you look at mocktool.py you'll find a MockExecutive I think. But the way we've been asserting is using OutputCapture. Your solution is more pythony, I think. We may wish to move some variant of your MockExecutive into a shared file at some point. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:113 + if __name__ == '__main__': I don't think we need this. Did you verify that test-webkitpy will automagically find and run your unit tests? I would expect it to. Overall I think this ia a FANTASTIC patch. I really can't thank you and evan enough for taking this on. I would like to see one more round of this. Don't feel obliged to answer each of my nits above, but you should at least thumb through them. I think this patch is OK to land as is, but it would be nice to at least fix the ChangeLog and consider the nits listed.
Oh, and my understanding is that Google uses the 2-clause BSD for all new webkit contributions. I'm not sure what Apple's standard template looks like.
Created attachment 64040 [details] Patch
thanks Eric! I think I addressed most of your comments, would you mind taking another look please? (In reply to comment #18) > (From update of attachment 64026 [details]) > WebKitTools/ChangeLog:1 > + 2010-08-10 Marcus Bulach <bulach@chromium.org> > Should be a new line between the author line and the reviewed by line. Done > > WebKitTools/Scripts/deduplicate-tests:65 > + if options.pretty_print: > Is this some sort of verbose mode? yep, changed to verbose. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:47 > + def get_port_fallbacks(): > I dont' think we usually use get_* in python code. Check PEP8. We certainly don't use get* in our c++ webkit code. done. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:58 > + _log.error("'%s' lacks baseline_search_path()" % port_name) > Do we ever hit this? Wouldn't that be a bug in the port implementation? I think it's OK to be robust here, but we should be more explicit that the port needs fixing. Clarified the message. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:59 > + port_fallbacks[port_name] = ['base'] > base? Is that a magic string here? > > Seems we might want a constant for that string instead of repeating it. Added as constant. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:79 > + cmd = ('git', 'ls-tree', '-r', 'HEAD', 'LayoutTests') > I feel like the running of ls tree and returning of the split lines should be a separate funciton. agreed! split. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:98 > + _, _, hash = attrs.split(' ') > This is a cool pattern, I wish we had understood it better when writing the rest of webkitpy. :) > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:89 > + for line in git.split('\n'): > This loop feels like it wants to call some helper function to proces each line. done. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:113 > + if len(cluster) < 2: > This per-test block feels like a function. :) Mostly because I was initially confused as to how you were avoiding deduping the empty file. But it now seems more clear to me that your'e not looping on hashes, but rather test paths (which is important). I split this up, let me know if it's any clearer. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:116 > + # Compute the list of platforms we have this particular hash for. > Isn't this a function? platforms_with_test_matching_hash()? I named "extract_platforms", hope it's ok? > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:139 > + for test, platform, fallback, path in find_dups(hashes, port_fallbacks): > If this was a function, then deduplicate just becomes a list comprehension. yay, more python-foo! :) > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:32 > + class MockExecutive(object): > Doesn't Mock.py already do this for you? We include it in webkitpy/third_party iirc. It's not great, but it might be easier than rolling your own. well, I'm not doing anything particularly complicated, so I think it's simpler to keep it? > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:57 > + '100644 blob 5053240b3353f6eb39f7cb00259785f16d121df2\tLayoutTests/mac/foo-expected.txt\n' > Would be nice to have a parsing function split out from the logic so we could test that independently. done. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:64 > + self.assertEquals(('git', 'ls-tree', '-r', 'HEAD', 'LayoutTests'), MockExecutive.last_run_command[-1]) > I see. Yeah, we don't have quite so slick an interface for this in our existing mocks. If you look at mocktool.py you'll find a MockExecutive I think. But the way we've been asserting is using OutputCapture. Your solution is more pythony, I think. > > We may wish to move some variant of your MockExecutive into a shared file at some point. I'm happy to move out at any point.. > > WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests_unittest.py:113 > + if __name__ == '__main__': > I don't think we need this. Did you verify that test-webkitpy will automagically find and run your unit tests? I would expect it to. right, it does find. removed it. > > Overall I think this ia a FANTASTIC patch. I really can't thank you and evan enough for taking this on. all credits to Evan, my changes were mostly cosmetic. :) > > I would like to see one more round of this. Don't feel obliged to answer each of my nits above, but you should at least thumb through them. I think this patch is OK to land as is, but it would be nice to at least fix the ChangeLog and consider the nits listed. I think I addressed all of them. another look please?
Comment on attachment 64040 [details] Patch WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:53 + back on. All platforms fall back on 'base'. Seems part of this comment belongs next to _BASE_PLATFORM, but ok. WebKitTools/Scripts/webkitpy/layout_tests/deduplicate_tests.py:68 + def parse_git_output(git_output, glob_pattern): I would have named this parse_git_ls_tree_output Looks great though. Thanks for taking this on. We probably should have done more of this as a class, rather than free functions. Eventually it will probably get moved deeper into webkitpy. But I think this is a good starting point.
Comment on attachment 64040 [details] Patch Clearing flags on attachment: 64040 Committed r65166: <http://trac.webkit.org/changeset/65166>
All reviewed patches have been landed. Closing bug.
http://trac.webkit.org/changeset/65166 might have broken Chromium Win Release The following changes are on the blame list: http://trac.webkit.org/changeset/65166 http://trac.webkit.org/changeset/65167
It looks like this code assumes you're running in a git checkout with the current working directory == the top of the tree, and it seems to gork test-webkitpy so that the rest of the test suite doesn't work right. Can you please fix this?
scm.py knows how to handle running in those cases. Perhaps this should use more of webkitpy code to take care of that for you. You can use detect_scm_system to find the checkout_root. http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/webkitpy/common/checkout/scm.py#L41 We've long talked about making it easier to write non-webkit-patch tools which use webkitpy. We just have a couple bugs to fix to make that possible. (Mostly abstracting the data storage out of WebKitPatch into a separate Tool class that can be set as self.tool on Command instances instead of WebKitPatch.)
(In reply to comment #27) > scm.py knows how to handle running in those cases. Perhaps this should use more of webkitpy code to take care of that for you. > > You can use detect_scm_system to find the checkout_root. > > http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/webkitpy/common/checkout/scm.py#L41 > > We've long talked about making it easier to write non-webkit-patch tools which use webkitpy. We just have a couple bugs to fix to make that possible. (Mostly abstracting the data storage out of WebKitPatch into a separate Tool class that can be set as self.tool on Command instances instead of WebKitPatch.) detect_scm_system() takes a path as input, which doesn't help if you don't know what path you want to use. It would be nice if it had a way to bootstrap itself better, like we do in webkitpy/layout_tests/port/base.py: if not self._webkit_base_dir: abspath = os.path.abspath(__file__) self._webkit_base_dir = abspath[0:abspath.find('WebKitTools')] _log.debug("Using WebKit root: %s" % self._webkit_base_dir) You could do something like this if path was None.
detect_scm_system knows how to crawl up until it finds the root. I think you want something like this: http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/webkitpy/tool/main.py#L80 Which is why that code should be shared on some object other than WebKitPatch's main class.
(In reply to comment #29) > detect_scm_system knows how to crawl up until it finds the root. > > I think you want something like this: > http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/webkitpy/tool/main.py#L80 > > Which is why that code should be shared on some object other than WebKitPatch's main class. Well, sort of. That code finds the root given $CWD, which is appropriate in some cases. The code in port.base finds the root given the path to port/base.py, which means that it'll work even if you're not actually in the tree; it uses the root of the tree containing the code you're running. I'm not sure if that would be appropriate for scm-related stuff, but it seems fine for test-related stuff.
The code I linked to tries to use the CWD, if its not a webkit checkout, then it falls back to being __file__ relative.
(In reply to comment #31) > The code I linked to tries to use the CWD, if its not a webkit checkout, then it falls back to being __file__ relative. Oh, i see. Interesting; I didn't realize that sys.path[0] was defined to be the script directory. That's useful. So, in this case, if I have a webkit-patch from one checkout in my path, and I'm in another checkout, it will use the checkout I'm in, not the checkout the code is running from. Probably the right thing to do in the scm case; possibly the wrong thing to do in the test case. Perhaps in either case you might want to prompt/warn that you're mixing checkouts. But this is a corner condition to be sure.
Another issue is that deduplicate_tests.py:44 appears to change the default configuration of the logger, which causes a whole bunch of [ERROR] messages to start showing up when you run test-webkitpy (although the tests succeed). I'll post a separate patch to address these two issues and we can continue the discussion of what to do about this there.
(In reply to comment #33) > I'll post a separate patch to address these two issues and we can continue the discussion of what to do about this there. okay, follow up in bug 44001.