51222 – new-run-webkit-tests : need to support more modifiers (gpu, x86-64, etc.)

Dirk Pranke

Reported 2010-12-16 16:54:12 PST

new-run-webkit-tests : need to support more modifiers (gpu, x86-64, etc.)

Dirk Pranke

Comment 1 2010-12-16 16:57:56 PST

Created attachment 76829 [details] Patch

Mihai Parparita

Comment 2 2010-12-20 12:49:30 PST

Comment on attachment 76829 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=76829&action=review > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:939 > + """Initialize a ModifierMatcher argument with the dict of values that Can you elaborate on what the keys/values in modifiers_to_match_against are? > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:947 > + self.shortcuts = self.DEFAULT_SHORTCUTS Should you validate that the shortcuts expand to valid modifiers? > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:950 > + self._all_modifiers = [] Make into a set()? > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:953 > + # self.allowed_values. I think you meant to say that this inverts self.allowed_modifiers. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:964 > + # Verify that the modifiers_to_check_against are legal. We And here modifiers_to_check_against should be modifiers_to_match_against (you define modifiers_to_check_against in _count_matches, perhaps that should be modifiers_to_check_against too, for the sake of consistency). > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:971 > + if category not in self.allowed_modifiers.keys(): You don't need the .keys() call, checking for "in" in a dictionary checks for presence of a key. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:989 > + The results of the most recent match are available in the 'options', Is there much benefit to having ModifierMatcher be mutable like this? Returning all this in a result object seems cleaner. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1038 > + more_mods, more_errors = self._expand_shortcut(option, options) _update_regex_matching_state takes errors as a parameter and adds things to it, here you return a list of errors and use errors.extend. It seems like consistency is best. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1051 > + def _matches_a_regex(self, option): Maybe call this _matches_an_ignore_regex to make its intention clearer? > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1060 > + errors.append("More than one option matching '%s'" % regex_str) You may want to make this into a warning, we currently have lines with multiple BUG modifiers: https://www.google.com/codesearch?hl=en&vert=chromium&lr=&q=bug.*bug.*\d%2B.*:+file:test_expectations.txt&sbtn=Search > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1093 > + pairs = [(m, self._category(m)) for m in modifiers] Extra space after in. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1094 > + return _group_by(pairs, 1) _group_by might be overkill in this case, you can also say: modifiers_by_category = {} for m in modifiers: modifiers_by_category.setdefault(self._category(m), []).append(m) return modifiers_by_category

Dirk Pranke

Comment 3 2010-12-20 19:34:45 PST

Created attachment 77072 [details] Patch

Dirk Pranke

Comment 4 2010-12-20 19:39:10 PST

Thanks for the feedback. It turned out that I kinda needed to do the second patch at the same time in order to be able to pass all of tests properly and without too many hacks. Let me know if this patch is too big and I can try to split it up into smaller chunks. (In reply to comment #2) > (From update of attachment 76829 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=76829&action=review > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:939 > > + """Initialize a ModifierMatcher argument with the dict of values that > > Can you elaborate on what the keys/values in modifiers_to_match_against are? > This argument goes away, so, problem solved ;) > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:947 > > + self.shortcuts = self.DEFAULT_SHORTCUTS > > Should you validate that the shortcuts expand to valid modifiers? > Now done. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:950 > > + self._all_modifiers = [] > > Make into a set()? > Done. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:953 > > + # self.allowed_values. > > I think you meant to say that this inverts self.allowed_modifiers. > Yes, but the logic is now slightly different. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:964 > > + # Verify that the modifiers_to_check_against are legal. We > > And here modifiers_to_check_against should be modifiers_to_match_against (you define modifiers_to_check_against in _count_matches, perhaps that should be modifiers_to_check_against too, for the sake of consistency). > Fixed. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:971 > > + if category not in self.allowed_modifiers.keys(): > > You don't need the .keys() call, checking for "in" in a dictionary checks for presence of a key. > Done. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:989 > > + The results of the most recent match are available in the 'options', > > Is there much benefit to having ModifierMatcher be mutable like this? Returning all this in a result object seems cleaner. > There isn't a huge benefit. Modified as per your suggestion. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1038 > > + more_mods, more_errors = self._expand_shortcut(option, options) > > _update_regex_matching_state takes errors as a parameter and adds things to it, here you return a list of errors and use errors.extend. It seems like consistency is best. > Things should be consistent now. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1051 > > + def _matches_a_regex(self, option): > > Maybe call this _matches_an_ignore_regex to make its intention clearer? > The logic is slightly different now. Let me know if you think the new code is fine as is or needs a similar name change. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1060 > > + errors.append("More than one option matching '%s'" % regex_str) > > You may want to make this into a warning, we currently have lines with multiple BUG modifiers: > Yeah, that was a bug. It shouldn't even be a warning. I had to add a whitelist of regexes that were okay to be repeated. > https://www.google.com/codesearch?hl=en&vert=chromium&lr=&q=bug.*bug.*\d%2B.*:+file:test_expectations.txt&sbtn=Search > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1093 > > + pairs = [(m, self._category(m)) for m in modifiers] > > Extra space after in. > Fixed. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:1094 > > + return _group_by(pairs, 1) > > _group_by might be overkill in this case, you can also say: > modifiers_by_category = {} > for m in modifiers: > modifiers_by_category.setdefault(self._category(m), []).append(m) > return modifiers_by_category Good idea. Done.

Dirk Pranke

Comment 5 2010-12-22 19:14:30 PST

Created attachment 77297 [details] rebase to head post webkittools -> tools rename

Eric Seidel (no email)

Comment 6 2010-12-23 12:01:52 PST

Comment on attachment 77297 [details] rebase to head post webkittools -> tools rename View in context: https://bugs.webkit.org/attachment.cgi?id=77297&action=review I like the concept. I'm as of yet not clear on the execution. I'll read through this a second time in a bit. You probably want Ojan's view on this onion, given that I think he wrote a bunch of the original code here. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:201 > def __repr__(self): > - return 'ParseError(fatal=%s, errors=%s)' % (fatal, errors) > + return 'ParseError(fatal=%s, errors=%s)' % (self.fatal, self.errors) I guess this didn't work before. Do we have a test for it now? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:532 > + #if line.find(":") is -1: > + # test_and_expectation = line.split("=") > + #else: > + # parts = line.split(":") > + # options = self._get_options_list(parts[0]) > + # test_and_expectation = parts[1].split('=') I don't think you meant to leave this in. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:626 > + if not self._full_test_list: > + tests = [test_list_path] > + else: > + tests = self._expand_tests(test_list_path) Should this just be a helper accessor? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:628 > + modifiers = [o for o in options if o in self.MODIFIERS] This seems like set.intersection. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:671 > + has_bug = True Looks like you're setting has_bug twice? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:675 > + self._log_non_fatal_error(lineno, 'Test lacks BUG modifier.', > + test_list_path) Strange indent. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:680 > + self._add_error(lineno, > + 'REBASELINE should only be used for running rebaseline.py. ' > + 'Cannot be checked in.', test_list_path) Again here. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:695 > + # WebKit's way of skipping tests is to add a -disabled suffix. > + # So we should consider the path existing if the path or the Disabled is actually different from skipping. We disable tests when the need to be skipped on all platforms. I think using -disabled pre-dates this Skipped list. :) > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:697 > + if (not self._port.path_exists(full_path) Why use port here and not filesystem? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:835 > + if len(prev_base_path) > len(base_path): > + # The previous path matched more of the test. > + return True > + elif len(prev_base_path) < len(base_path): > + # This path matches more of the test. > + return False Seems we could just write this: if len(prev_base_path) != len(base_path): return len(prev_base_path) > len(base_path) with explanitory comment if necessary. I'm not quite sure what that check is for yet though. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:938 > + TestExpectationsFile.MODIFIERS.keys()[:-1]) is -1 tryign to avoid "none"? Seems fragile at best. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:939 > + DUPLICATE_REGEXES_ALLOWED = ['bug\w+'] I think you want r'foo' when making regexp strings, no?

Mihai Parparita

Comment 7 2010-12-23 12:02:37 PST

Comment on attachment 77297 [details] rebase to head post webkittools -> tools rename View in context: https://bugs.webkit.org/attachment.cgi?id=77297&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:97 > + test_config: specific values to check against when Can you mention that this is a TestConfiguration instance ("specific values" made me think this was a list or a set). > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:527 > + #if line.find(":") is -1: Did you mean to leave in this commented-out code? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:534 > + parts = line.split(':') Though not a regression from your patch, should this be line.split(':', 1) since you only look at parts[0] and [1]? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:536 > + test_and_expectation = parts[1].split('=') Same here. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:652 > + def _check_syntax(self, matcher, options, lineno, test_list_path): Call this _check_options_syntax to make it clearer what it's checking? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:660 > + def _check_semantics(self, options, lineno, test_list_path): Same here, call this _check_options_semantics. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:674 > + self._log_non_fatal_error(lineno, 'Test lacks BUG modifier.', The error message should perhaps say "Test with WONTFIX lacks BUG modifier" to make the error state cleaner? > Tools/Scripts/webkitpy/layout_tests/port/base.py:890 > + self.os = os or port.test_platform_name().replace(port_version, '') What if port is not passed in?

Dirk Pranke

Comment 8 2010-12-23 12:36:01 PST

(In reply to comment #6) > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:201 > > def __repr__(self): > > - return 'ParseError(fatal=%s, errors=%s)' % (fatal, errors) > > + return 'ParseError(fatal=%s, errors=%s)' % (self.fatal, self.errors) > > I guess this didn't work before. Do we have a test for it now? > Yes, but not an explicit one. I will add one. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:532 > > + #if line.find(":") is -1: > > + # test_and_expectation = line.split("=") > > + #else: > > + # parts = line.split(":") > > + # options = self._get_options_list(parts[0]) > > + # test_and_expectation = parts[1].split('=') > > I don't think you meant to leave this in. > Correct. I will remove it. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:626 > > + if not self._full_test_list: > > + tests = [test_list_path] > > + else: > > + tests = self._expand_tests(test_list_path) > > Should this just be a helper accessor? I think I'll move it inside _expand_tests(); I don't think having another routine helps much. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:628 > > + modifiers = [o for o in options if o in self.MODIFIERS] > > This seems like set.intersection. > It is, but options and self.MODIFERS are lists, and modifiers needs to end up being a list. I didn't think converting back and forth would help clarity or speed. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:671 > > + has_bug = True > > Looks like you're setting has_bug twice? > Whoops. Will fix. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:675 > > + self._log_non_fatal_error(lineno, 'Test lacks BUG modifier.', > > + test_list_path) > > Strange indent. Fixed. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:680 > > + self._add_error(lineno, > > + 'REBASELINE should only be used for running rebaseline.py. ' > > + 'Cannot be checked in.', test_list_path) > > Again here. > Fixed. Wonder where those came from. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:695 > > + # WebKit's way of skipping tests is to add a -disabled suffix. > > + # So we should consider the path existing if the path or the > > Disabled is actually different from skipping. We disable tests when the need to be skipped on all platforms. I think using -disabled pre-dates this Skipped list. :) > Will updated the comment accordingly. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:697 > > + if (not self._port.path_exists(full_path) > > Why use port here and not filesystem? > I think this routine predates the filesystem concept. It will be updated when I remove all of the other direct references to the os module and route everything through the filesystem. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:835 > > + if len(prev_base_path) > len(base_path): > > + # The previous path matched more of the test. > > + return True > > + elif len(prev_base_path) < len(base_path): > > + # This path matches more of the test. > > + return False > > Seems we could just write this: > > if len(prev_base_path) != len(base_path): > return len(prev_base_path) > len(base_path) > > with explanitory comment if necessary. > We could, but I'm not sure that that is clearer. The question is whether we have two expectations of the form: BUGX : fast = PASS BUGX: fast/html = FAIL or of the form: BUGX : fast/html = FAIL BUGX : fast = PASS In the first case, the second line is the one that matters. In the second case, it is ignored. > I'm not quite sure what that check is for yet though. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:938 > > + TestExpectationsFile.MODIFIERS.keys()[:-1]) > > is -1 tryign to avoid "none"? Seems fragile at best. > Yes, and I agree. I'm open to a better way of trying to assign that to what should be a constant. Somehow breaking it into two statements feels wrong. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:939 > > + DUPLICATE_REGEXES_ALLOWED = ['bug\w+'] > > I think you want r'foo' when making regexp strings, no? You don't have to use them. My understanding is that r'' means "raw" string, not "regexp" string, and all it does is disable the backslash escaping that normally happens. However, they're certainly most common as regexps, so I'll add it to make that a bit clearer (I wish python had followed perl and ruby and allowed for a separate literal syntax for regexps instead).

Mihai Parparita

Comment 9 2011-01-10 11:24:44 PST

Comment on attachment 77297 [details] rebase to head post webkittools -> tools rename View in context: https://bugs.webkit.org/attachment.cgi?id=77297&action=review >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:527 >> + #if line.find(":") is -1: > > Did you mean to leave in this commented-out code? Still seeing this commented out block. >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:671 >> + has_bug = True > > Looks like you're setting has_bug twice? has_bug is still being set twice.

Ojan Vafai

Comment 10 2011-01-10 12:46:24 PST

Comment on attachment 77297 [details] rebase to head post webkittools -> tools rename View in context: https://bugs.webkit.org/attachment.cgi?id=77297&action=review r- for my nits + mihaip and eric's unaddressed comments. Overall, this is much cleaner. >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:534 >> + parts = line.split(':') > > Though not a regression from your patch, should this be line.split(':', 1) since you only look at parts[0] and [1]? Here and below, it would be an error in the test_expectations.txt file if there were a line with more than one : or =. Maybe as a separate patch we should modify the error check above to check that there is exactly 1 : and 1 =? In either case, seems better for a separate patch. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:599 > + def _read_line(self, line, lineno, matcher, overrides_allowed): Nit: This does a lot more than just read the line. Maybe call this _process_line or something like that? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:827 > + prev_entry = self._test_list_paths[test] > + prev_base_path, prev_num_matches, prev_lineno = prev_entry Nit: this can just be 1 line since you don't use prev_entry elsewhere. Remember, we don't have an 80 col requirement. :) > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:839 > + if (not overrides_allowed or test in self._overridding_tests): This would be easier to read if you inverted the if-statement as most of this wouldn't need to be so indented. It would be: if (...): # We have seen this path ... # ... # ... return False ... > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:890 > +class ModifierMatcher(object): This is a fairly involved class. Perhaps it should go in it's own file? ModifierMatchResult should probably go in that file as well? > Tools/Scripts/webkitpy/layout_tests/port/base.py:879 > +class TestConfiguration(object): This is also a pretty involved class. Put it in it's own file?

Dirk Pranke

Comment 11 2011-01-10 13:10:22 PST

(In reply to comment #9) > (From update of attachment 77297 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=77297&action=review > > >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:527 > >> + #if line.find(":") is -1: > > > > Did you mean to leave in this commented-out code? > > Still seeing this commented out block. > > >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:671 > >> + has_bug = True > > > > Looks like you're setting has_bug twice? > > has_bug is still being set twice. Ah, yeah, I got interrupted with something on 12/22 and forgot to upload the last few changes. I will roll them in to the next patch. Sorry!

Ojan Vafai

Comment 12 2011-01-10 13:51:54 PST

Comment on attachment 77297 [details] rebase to head post webkittools -> tools rename View in context: https://bugs.webkit.org/attachment.cgi?id=77297&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:922 > + BUG1 RELEASE : foo.html = FAIL > + BUG1 WIN RELEASE : foo.html = PASS > + BUG2 WIN : bar.html = FAIL > + BUG2 DEBUG : bar.html = PASS > + > + the expectation for foo.html on a Win XP Release bot would be PASS since > + the scores for the first two lines are 1 and 2, respectively. However, > + lines three and four would produce a duplicate expectation on a Win > + Debug bot since both the 'win' and the 'debug' expectations would apply. I'm confused. Shouldn't all of these be duplicates except lines 1 and 4? I thought we had agreed that more modifiers doesn't override fewer modifiers, they still conflict.

Dirk Pranke

Comment 13 2011-01-10 13:57:28 PST

Created attachment 78447 [details] Update w/ more review feedback.

Dirk Pranke

Comment 14 2011-01-10 13:58:21 PST

(In reply to comment #10) > (From update of attachment 77297 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=77297&action=review > > r- for my nits + mihaip and eric's unaddressed comments. Overall, this is much cleaner. > > >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:534 > >> + parts = line.split(':') > > > > Though not a regression from your patch, should this be line.split(':', 1) since you only look at parts[0] and [1]? > > Here and below, it would be an error in the test_expectations.txt file if there were a line with more than one : or =. Maybe as a separate patch we should modify the error check above to check that there is exactly 1 : and 1 =? > > In either case, seems better for a separate patch. > Will add better tests in a separate patch. Note that multiple '=' is caught now, although with a confusing error message. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:599 > > + def _read_line(self, line, lineno, matcher, overrides_allowed): > > Nit: This does a lot more than just read the line. Maybe call this _process_line or something like that? > Done. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:827 > > + prev_entry = self._test_list_paths[test] > > + prev_base_path, prev_num_matches, prev_lineno = prev_entry > > Nit: this can just be 1 line since you don't use prev_entry elsewhere. Remember, we don't have an 80 col requirement. :) > Done. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:839 > > + if (not overrides_allowed or test in self._overridding_tests): > > This would be easier to read if you inverted the if-statement as most of this wouldn't need to be so indented. It would be: > > if (...): > # We have seen this path ... > # ... > # ... > return False > > ... > Done. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:890 > > +class ModifierMatcher(object): > > This is a fairly involved class. Perhaps it should go in it's own file? ModifierMatchResult should probably go in that file as well? > I don't really want to do this, because ModifierMatcher is only used by the test expectations classes and is logically a part of that module. Make sense? > > Tools/Scripts/webkitpy/layout_tests/port/base.py:879 > > +class TestConfiguration(object): > > This is also a pretty involved class. Put it in it's own file? I put this in the same file as the Port and Driver classes because ports may want to override one or more of these classes, but never need to override any of the other classes in other files. So, I think they logically belong together. Does that make sense?

Dirk Pranke

Comment 15 2011-01-10 14:35:47 PST

(In reply to comment #12) > (From update of attachment 77297 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=77297&action=review > > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:922 > > + BUG1 RELEASE : foo.html = FAIL > > + BUG1 WIN RELEASE : foo.html = PASS > > + BUG2 WIN : bar.html = FAIL > > + BUG2 DEBUG : bar.html = PASS > > + > > + the expectation for foo.html on a Win XP Release bot would be PASS since > > + the scores for the first two lines are 1 and 2, respectively. However, > > + lines three and four would produce a duplicate expectation on a Win > > + Debug bot since both the 'win' and the 'debug' expectations would apply. > > I'm confused. Shouldn't all of these be duplicates except lines 1 and 4? I thought we had agreed that more modifiers doesn't override fewer modifiers, they still conflict. No, I don't think we agreed to that. If we had, then you would have to update old expectations every time we added new expectations for one particular configuration, which could be really annoying. E.g.: BUG1 : foo.html = FAIL Now foo starts to pass on WIN-XP. BUG1 : foo.html = FAIL BUG2 WIN XP GPU : foo.html = PASS If more modifiers doesn't override fewer modifiers, we'd have to change line 1 to something like: BUG1 MAC LINUX : foo.html = FAIL BUG1 WIN VISTA WIN7 : foo.html = FAIL BUG1 WIN XP CPU: foo.html = FAIL BUG1 WIN XP GPU: foo.html = PASS That seems less good to me, especially when you consider the original motivation. In particular, the GPU tests skip a whole bunch of directories. If more modifiers didn't override fewer, we would have to explicitly whitelist those directories in the CPU case, which is essentially adding noise to the file.

Dirk Pranke

Comment 16 2011-01-10 14:38:40 PST

thinking about it a bit more, it seems like if we allow more modifiers to override fewer modifiers, then it is possible for a good citizen to produce a smaller, cleaner file than otherwise. It is also possible to allow a bad citizen to just add cruft (though not errors) to the file. I'm not generally in favor of languages that protect you from yourself without offering other benefits, so I think I'm still on the side of the code as written (although more modifiers probably shouldn't even produce a warning). I will generate a couple of merged expectations files either way and maybe seeing what the resulting files will look like will be helpful.

Tony Chang

Comment 17 2011-01-10 14:58:41 PST

I am concerned about the following case: BUG1 : foo.html = FAIL BUG2 WIN : foo.html = PASS Turning into: BUG1 : foo.html = FAIL BUG2 WIN : foo.html = PASS ... BUG3 LINUX MAC : foo.html = PASS because someone didn't see the BUG1 somewhere else in the file. Also, in your example, instead of: BUG1 MAC LINUX : foo.html = FAIL BUG1 WIN VISTA WIN7 : foo.html = FAIL BUG1 WIN XP CPU: foo.html = FAIL BUG1 WIN XP GPU: foo.html = PASS couldn't I just write: BUG1 MAC LINUX WIN VISTA WIN7 : foo.html = FAIL BUG1 WIN XP CPU: foo.html = FAIL In general, I'm skeptical of times where use put PASS in the test_expectations.txt unless it's for flakey results or slow tests. How many tests are passing with gpu enabled that are failing without gpu? It's not really that many, is it?

Dirk Pranke

Comment 18 2011-01-10 16:23:08 PST

(In reply to comment #17) > I am concerned about the following case: > > BUG1 : foo.html = FAIL > BUG2 WIN : foo.html = PASS > > Turning into: > > BUG1 : foo.html = FAIL > BUG2 WIN : foo.html = PASS > ... > BUG3 LINUX MAC : foo.html = PASS > > because someone didn't see the BUG1 somewhere else in the file. > Yes. That is a risk. The risk could be offset, as I discussed with Ojan over IM, by building a "deduplicating" script that identified opportunities for simplification. Alternatively, the lint warnings would be sufficient here (although as I mentioned to Ojan, if there's no good way to distinguish between a valid warning and a warning that should be suppressed, the lint warning may not be useful). > Also, in your example, instead of: > > BUG1 MAC LINUX : foo.html = FAIL > BUG1 WIN VISTA WIN7 : foo.html = FAIL > BUG1 WIN XP CPU: foo.html = FAIL > BUG1 WIN XP GPU: foo.html = PASS > > couldn't I just write: > > BUG1 MAC LINUX WIN VISTA WIN7 : foo.html = FAIL > BUG1 WIN XP CPU: foo.html = FAIL > No. MAC VISTA is not a valid combination and would keep *any* Mac or Linux configuration from matching. > In general, I'm skeptical of times where use put PASS in the test_expectations.txt unless it's for flakey results or slow tests. > I agree. We don't want to specify PASSes. My worry is that we end up having to add them to distinguish the axes that pass from the axes that fail. It is perhaps an unfounded worry. One problem that we're looking at here is that there's no good way to say "not X" for a given configuration X. In the past, "not X" usually meant a single "Y". With the increase in axes of configuration, it's more cumbersome to get that, which is why I ended up with the "most matches win" algorithm instead. > How many tests are passing with gpu enabled that are failing without gpu? It's not really that many, is it? That is a good question. I don't know, but posting the merged test expectations files should tell us. Of course, the cases could also be that they fail differently in the two configs.

Stephen White

Comment 19 2011-01-10 16:55:40 PST

> How many tests are passing with gpu enabled that are failing without gpu? It's not really that many, is it? In canvas, it's about 3 (and about 5 the reverse). Compositing and WebGL, I'm not sure. Probably low double digits at worst.

Dirk Pranke

Comment 20 2011-01-10 17:27:10 PST

(In reply to comment #19) > > How many tests are passing with gpu enabled that are failing without gpu? It's not really that many, is it? > > In canvas, it's about 3 (and about 5 the reverse). Compositing and WebGL, I'm not sure. Probably low double digits at worst. There's 158 tests in compositing and 78 in fast/canvas/webgl. All of the compositing tests fail, but that's one line. There's about 10 webgl tests that fail using Mesa.

Dirk Pranke

Comment 21 2011-01-10 17:43:36 PST

Created attachment 78479 [details] sample merged test_expectations file where conflicting modifiers are not allowed

Dirk Pranke

Comment 22 2011-01-10 17:58:20 PST

Created attachment 78483 [details] sample merged test_expectations file where conflicting modifiers are allowed if they're unambiguous.

Dirk Pranke

Comment 23 2011-01-10 18:04:42 PST

Created attachment 78484 [details] diff of the two test expectations files (and the code)

Dirk Pranke

Comment 24 2011-01-10 18:09:28 PST

Okay, I've posted samples of what the files would look like if (a) we didn't allow conflicting sets of expectations, (b) we did allow them and didn't report warnings about the overrides, and (c) the diff between the two. It's pretty much a wash. The overrides-allowed version is slightly terser, and the merge can be done w/o modifying the existing main file, but the number of mods needed is minimal in either case. Given that, I don't feel strongly about what we do here. I think we all agree that there are plusses and minuses to either design. If y'all would prefer to raise conflicts in any case, which is arguably safer to start off with, I'm okay with that. A third option would be to allow overrides but to generate warnings or errors unless the override was explicitly noted via something like an OVERRIDE modifier. This would probably add a couple lines of code, but I doubt it's worth the effort and added complexity.

Stephen White

Comment 25 2011-01-10 18:15:28 PST

I'd prefer the former (w/CPU modifier added). It's only a few extra lines in text_expectations, and if nothing else, it's less education required for the WebKit sheriffs (since it behaves more like other modifiers do currently).

Dirk Pranke

Comment 26 2011-01-11 15:25:58 PST

Created attachment 78609 [details] Patch

Dirk Pranke

Comment 27 2011-01-11 15:27:44 PST

I'll take the lack of any further comments as having reached consensus that if there are multiple lines in the file that match the given config, we should raise errors instead of allowing a more specific line to override a less specific line. I've updated the code and tests accordingly and re-uploaded the patch. Can someone please review?

Ojan Vafai

Comment 28 2011-01-11 16:23:04 PST

Comment on attachment 78609 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=78609&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:927 > + BUG1 RELEASE : foo.html = FAIL > + BUG1 WIN RELEASE : foo.html = PASS > + BUG2 WIN : bar.html = FAIL > + BUG2 DEBUG : bar.html = PASS > + > + the expectation for foo.html on a Win XP Release bot would be PASS since > + the scores for the first two lines are 1 and 2, respectively. However, > + lines three and four would produce a duplicate expectation on a Win > + Debug bot since both the 'win' and the 'debug' expectations would apply. This comment is no longer accurate, right?

Dirk Pranke

Comment 29 2011-01-11 16:41:46 PST

(In reply to comment #28) > (From update of attachment 78609 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=78609&action=review > > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:927 > > + BUG1 RELEASE : foo.html = FAIL > > + BUG1 WIN RELEASE : foo.html = PASS > > + BUG2 WIN : bar.html = FAIL > > + BUG2 DEBUG : bar.html = PASS > > + > > + the expectation for foo.html on a Win XP Release bot would be PASS since > > + the scores for the first two lines are 1 and 2, respectively. However, > > + lines three and four would produce a duplicate expectation on a Win > > + Debug bot since both the 'win' and the 'debug' expectations would apply. > > This comment is no longer accurate, right? Duh. Good catch. I'll update the comments on this as I land the patch.

Dirk Pranke

Comment 30 2011-01-11 17:15:57 PST

Committed r75576: <http://trac.webkit.org/changeset/75576>

Nico Weber

Comment 31 2011-01-12 09:04:49 PST

Hi, this broke rebaseline-chromium-webkit-tests. Repro: 1. add "BUGTHAKIS REBASELINE WIN : svg/custom/js-late-gradient-and-object-creation.svg = IMAGE" to the test_expectations.txt file and run r-c-w-t -w. It should rebaseline that one test, instead it does nothing. 2. add "BUGTHAKIS REBASELINE : svg/custom/js-late-gradient-and-object-creation.svg = IMAGE" (i.e. no "WIN") to test_expectations.txt and run r-c-w-t -w. This correctly tries to rebaseline on mac and then dies with 110112 08:58:58 test_expectations.py:367 ERROR Line:3090 Can't specify both modifier 'win' and macro 'win-xp' svg/custom/js-late-gradient-and-object-creation.svg If I revert this patch locally, both these problems disappear.

Nico Weber

Comment 32 2011-01-12 09:05:12 PST

(see previous comment)

Dirk Pranke

Comment 33 2011-01-13 15:39:54 PST

Ugh. It turns out there are (at least?) two bugs that this patch introduces to the rebaselining tool, which of course is very poorly covered by unit tests. The first is a simple typo; when I change the constructor for the test expectations object, I didn't update the r-c-w-t code correctly and passed in the "target" port instead of the "rebaselining" port. That explains why your first repro case wasn't finding any files. The second bug (and the second repro case) is much worse, and has to do with the feature that allows us to rebaseline "every" port. Prior to this change, there was a 1:1 mapping between the values returned by port.test_platform_names() and the list of identifiers that would uniquely describe every port that might need to be rebaselined. The way the rebaselining code works is that if you have an expectation that is marked for rebaselining that doesn't contain any test platforms (like in your second example), we first rebaseline the file for the rebaselining port, and then we replace that line with a line that contains every platform name except for the name of the rebaselining port (*). There are two problems introduced by the patch. The first is that we can no longer capture "every port but one" in a single test expectation line. This is the conflict that we're tripping over in your second repro case, and it's easy to fix (just generate N lines instead of 1). The second is that it's much less clear what the N need to be. test_platform_names() as a whole probably needs to be revamped, because we don't have a set of identifiers that uniquely describes every variant (e.g., there's no GPU-WIN-XP). I think we can get around this by having test_platform_name() return a string containing a list of modifiers that will uniquely identify the port variant. We also need to replace the code that figures out the list of platforms to check for rebaselines against. I will be working on what that patch will look like now. (*) - We actually also have a hard-coded whitelist of ports that shouldn't be rebaselined automatically ('win-vista', 'win-7') because the bots on those ports aren't reliable enough to trust. We may need to support this in some way as well. Mihai suggests that the whole logic of "rebaseline every port" is somewhat questionable to begin with. Given that the way we rebaseline things is to push a change to a set of bots and run the change and then pull down files, perhaps the logic should be "rebaseline every failing port" instead, and determine the list of directories to update from the bots themselves, rather than a priori assuming that every bot should fail.

Dirk Pranke

Comment 34 2011-01-24 17:38:58 PST

Created attachment 79996 [details] update w/ more unit tests

Dirk Pranke

Comment 35 2011-01-24 17:44:26 PST

Right ... okay, after a two-week detour to shave some yaks, we're ready with this again. Changes to note from the previous version: 1) we've added a bunch more logic to port/test.py to better simulate the cases that come up with chromium (there are now 'mac', 'win', and 'win-xp' versions of the test port) 2) we've added unit tests to rebaseline-chromium-webkit-tests that catch the features that the patch broke before 3) In test_expectations.TestExpectationsFile.remove_platform_from_expectations(), instead of cramming all of the platform modifiers onto one line, we create one new line with each modifier. This is a change in lines 491-506. The new way of handling platform configurations doesn't support the old behavior. This shouldn't break anyone in practice, since I doubt anyone was really rebaselining multiple version-specific sets of files at once using the rebaselining tool.

Dirk Pranke

Comment 36 2011-01-31 13:11:42 PST

Created attachment 80673 [details] Patch

Dirk Pranke

Comment 37 2011-01-31 13:55:20 PST

Created attachment 80676 [details] This is the diff between what was checked in w/ r75576, and the new version (ignoring the ChangeLog deltas, which got confused) Things to note in the new version of this patch: 1) The actual bug that broke the rebaselining tool is fixed in test_expectations.py, in the patch near line 498. Instead of putting all of the ports that need rebaselining onto one line, we create multiple lines containing one port each (the new configuration matching logic can't won't accept conflicting sets of modifiers on the same line). 2) I added a 'test-win-xp' version to test that we handle the version conflicts while rebaselining correctly. 3) I changed all of the "mac snowleopard" references in test_expectations_unittest.py to "win xp" because there is no test-mac-snowleopard configuration (this just makes things somewhat more consistent).

Dirk Pranke

Comment 38 2011-01-31 13:56:40 PST

Created attachment 80677 [details] rebase to r73176, make work with rebaselining changes

Mihai Parparita

Comment 39 2011-01-31 14:35:09 PST

Comment on attachment 80677 [details] rebase to r73176, make work with rebaselining changes View in context: https://bugs.webkit.org/attachment.cgi?id=80677&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:668 > + test_list_path) The wrapped line should be indented more. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:672 > + 'REBASELINE should only be used for running rebaseline.py. ' Same here. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:679 > + 'A test can not be both slow and timeout. If it times out ' Capitalize both SLOW and TIMEOUT in the message.

Dirk Pranke

Comment 40 2011-01-31 14:59:23 PST

Committed r77163: <http://trac.webkit.org/changeset/77163>

Attachments
Patch (16.64 KB, patch) 2010-12-16 16:57 PST, Dirk Pranke	no flags	Details Formatted Diff Diff
Patch (60.71 KB, patch) 2010-12-20 19:34 PST, Dirk Pranke	no flags	Details Formatted Diff Diff
rebase to head post webkittools -> tools rename (67.42 KB, patch) 2010-12-22 19:14 PST, Dirk Pranke	no flags	Details Formatted Diff Diff
Update w/ more review feedback. (68.32 KB, patch) 2011-01-10 13:57 PST, Dirk Pranke	no flags	Details Formatted Diff Diff
sample merged test_expectations file where conflicting modifiers are not allowed (16.33 KB, text/plain) 2011-01-10 17:43 PST, Dirk Pranke	no flags	Details
sample merged test_expectations file where conflicting modifiers are allowed if they're unambiguous. (13.37 KB, text/plain) 2011-01-10 17:58 PST, Dirk Pranke	no flags	Details
diff of the two test expectations files (and the code) (5.65 KB, text/plain) 2011-01-10 18:04 PST, Dirk Pranke	no flags	Details
Patch (69.00 KB, patch) 2011-01-11 15:25 PST, Dirk Pranke	no flags	Details Formatted Diff Diff
update w/ more unit tests (79.80 KB, patch) 2011-01-24 17:38 PST, Dirk Pranke	no flags	Details Formatted Diff Diff
Patch (81.86 KB, patch) 2011-01-31 13:11 PST, Dirk Pranke	no flags	Details Formatted Diff Diff
This is the diff between what was checked in w/ r75576, and the new version (ignoring the ChangeLog deltas, which got confused) (23.38 KB, patch) 2011-01-31 13:55 PST, Dirk Pranke	no flags	Details Formatted Diff Diff
rebase to r73176, make work with rebaselining changes (81.14 KB, patch) 2011-01-31 13:56 PST, Dirk Pranke	mihaip: review+	Details Formatted Diff Diff
Show Obsolete (8) View All Add attachment proposed patch, testcase, etc.