Add selector checker for :lang pseudo class in Selectors level 4
Created attachment 240790 [details] Patch
WIP
Created attachment 240792 [details] Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-09 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Created attachment 240793 [details] Archive of layout-test-results from webkit-ews-08 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: webkit-ews-08 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Created attachment 241244 [details] Patch
Comment on attachment 241244 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=241244&action=review The spec says "case-insensitively within the ASCII range". I think equalIgnoringCase() does case insensitive comparison for any character range. I'll need to check that. We should make test for this in any case. For example: "È" and "è" should not match. We may have to do that in a separate patch though. > Source/WebCore/css/SelectorCheckerTestFunctions.h:145 > + if (value.startsWith(filter, false) && value.impl()->at(filter.length()) == '-') I am not certain this is still correct with the new definition of :lang(). For example, if you match: :lang(en-) with <div lang="en--">, should it match? > Source/WebCore/css/SelectorCheckerTestFunctions.h:156 > + if ((offset = value.find(subTag, offset, false)) == notFound) > + return false; > + offset += subTag.length(); I don't think this is correct. The subTag in value might not be surrounded by dashes. For example: :lang("fr-BE") on <div lang="fr-paddingBEpadding"> should not match. > LayoutTests/ChangeLog:11 > + * fast/selectors/lang-multiple-expected.txt: Added. > + * fast/selectors/lang-multiple.html: Added. I would also test the implementation with: -Various cases of dashes, try to break the implementation. -Case sensitivity outside the ASCII range.
(In reply to comment #6) > Comment on attachment 241244 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241244&action=review > > The spec says "case-insensitively within the ASCII range". I think > equalIgnoringCase() does case insensitive comparison for any character range. > I'll need to check that. We should make test for this in any case. For > example: "È" and "è" should not match. We may have to do that in a separate > patch though. > > > Source/WebCore/css/SelectorCheckerTestFunctions.h:145 > > + if (value.startsWith(filter, false) && value.impl()->at(filter.length()) == '-') > > I am not certain this is still correct with the new definition of :lang(). > > For example, if you match: :lang(en-) with <div lang="en--">, should it > match? > Should we invalidate any filters ends with dashes? > > Source/WebCore/css/SelectorCheckerTestFunctions.h:156 > > + if ((offset = value.find(subTag, offset, false)) == notFound) > > + return false; > > + offset += subTag.length(); > > I don't think this is correct. The subTag in value might not be surrounded > by dashes. > > For example: :lang("fr-BE") on <div lang="fr-paddingBEpadding"> should not > match. > > > LayoutTests/ChangeLog:11 > > + * fast/selectors/lang-multiple-expected.txt: Added. > > + * fast/selectors/lang-multiple.html: Added. > > I would also test the implementation with: > -Various cases of dashes, try to break the implementation. > -Case sensitivity outside the ASCII range.
(In reply to comment #7) > Should we invalidate any filters ends with dashes? I am not 100% sure. I have emailed that question to the standard working group: http://lists.w3.org/Archives/Public/www-style/2014Nov/0136.html
Created attachment 241590 [details] Patch
Comment on attachment 241590 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=241590&action=review I know it is not marked for review but I made some quick comments anyway :) I like the new matchesLangPseudoClassWithCommaSeparatedArguments(). It is really easy to follow your code along the spec and verify each step. > Source/WebCore/css/SelectorCheckerTestFunctions.h:143 > + Vector<String> subVals; I prefer not abbreviating names: subValues Maybe it would be better to use the same names as the spec? -rangeSubtags -languageSubtags > Source/WebCore/css/SelectorCheckerTestFunctions.h:148 > + Extra blank line. > Source/WebCore/css/SelectorCheckerTestFunctions.h:154 > + if (subTags.size() > subVals.size()) Interesting, this is a good idea. > Source/WebCore/css/SelectorCheckerTestFunctions.h:160 > + size_t currentIndex = 1; lastMatchedLanguageTagIndex? > Source/WebCore/css/SelectorCheckerTestFunctions.h:163 > + for (size_t i = 1; i < subTags.size(); i++) { WebKit style: i++ -> ++i For nested loop, I prefer to give indices full names. This could be rangeSubtagIndex. > Source/WebCore/css/SelectorCheckerTestFunctions.h:164 > + bool notFound = true; I would reverse the boolean for clarity. bool matchedRangeSubtag = false; That makes conditions easier to read. Otherwise code like if (!notFound) gets tricky. > Source/WebCore/css/SelectorCheckerTestFunctions.h:166 > + for (size_t j = currentIndex; j < subVals.size(); j++) { WebKit style: j++ -> ++j "j" could be "languageSubtagIndex > Source/WebCore/css/SelectorCheckerTestFunctions.h:167 > + if (equalIgnoringCase(subTags.at(i), subVals.at(j)) && subVals.at(j) != "x") { I think the condition: subVals.at(j) != "x" is not enough. The specs says any singleton fails. > LayoutTests/fast/selectors/lang-multiple.html:29 > + <div lang="en"></div> > + <div lang="en-"></div> > + <div lang="en--"></div> Let's also cover the cases with "foo--bar". > LayoutTests/fast/selectors/lang-multiple.html:32 > + <div lang="fr-x"></div> > + <div lang="fr-xenomorph"></div> Let's have full extensions too: "fr-x-xenomorph". And other cases of singletons: "foo-a-bar" "foo-3-bar"
Thanks Benjamin! I'll update the patch as soon as possible. I also created a playground (in .js) to confirm the filtering process http://codepen.io/anon/pen/wBvmzE hopefully could grab the ideas outlined in the spec.
I'm sorry, the pen is on http://codepen.io/diorahman/pen/EaxEEN
Created attachment 241662 [details] Patch
Created attachment 241663 [details] Patch
Comment on attachment 241663 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=241663&action=review Round1: only minor stuff really. The extended filtering is an important new part and it looks correct to me. The big thing will be using extended filtering when matching a single range. Extend the testing to those cases. > Source/WebCore/ChangeLog:14 > + Currently, the implementation of checking functionality of matchesLangPseudoClass and > + matchesLangPseudoClassWithCommaSeparatedArguments is similar. It should be merged. After the latest improvements, this is no longer true. The function matchesLangPseudoClassWithCommaSeparatedArguments() implements the Extended Filtering from RFC4647. The function matchesLangPseudoClass() only does the dumb matching for CSS Level 3. I think we should just rename matchesLangPseudoClass() to matchesLangPseudoClassDeprecated. The function matchesLangPseudoClassWithCommaSeparatedArguments() can then become matchesLangPseudoClass(). > Source/WebCore/css/SelectorChecker.cpp:868 > + if (selector->argumentList()->size() > 1) > + return matchesLangPseudoClassWithCommaSeparatedArguments(element, *selector->argumentList()); Given the improved algorithm, we should always use matchesLangPseudoClassWithCommaSeparatedArguments() even when size == 1. The code of matchesLangPseudoClassWithCommaSeparatedArguments() is more advanced due to extended filtering, it should become the new default for matching :lang(). > Source/WebCore/css/SelectorCheckerTestFunctions.h:141 > + It would be worth putting a link to the RFC in a comment here to explain what is implemented in the following code. > Source/WebCore/css/SelectorCheckerTestFunctions.h:145 > + String(language).split('-', true, languageSubtags); It is better to do language.string().split() and String(language).split(). The reason is that String(language) will construct a new String from the String reference of AtomicString. This in turn will cause ref() to be called to increase the ref-count. After that split is called, the destructor of String is called, which call deref(). On the other hand, language.string().split() does not cause a ref-deref of the original string. > Source/WebCore/css/SelectorCheckerTestFunctions.h:151 > + String(range).split('-', true, rangeSubtags); Ditto: range.string().split() > Source/WebCore/css/SelectorCheckerTestFunctions.h:159 > + size_t lastMatchedLanguageTagIndex = 1; lastMatchedLanguageTagIndex -> lastMatchedLanguageSubtagIndex for consistency. > Source/WebCore/css/SelectorCheckerTestFunctions.h:164 > + for (size_t languageSubtagIndex = lastMatchedLanguageTagIndex; languageSubtagIndex < languageSubtags.size(); ++languageSubtagIndex) { To clarify the code, maybe the loop should have its own function with a name describing its purpose? For example: bool findLanguageSubtagMatchingRange(const Vector<String>& languageSubtags, const String& range, size_t& position) This would let you use early return statement instead of break+boolean. > Source/WebCore/css/SelectorCheckerTestFunctions.h:165 > + String currentLanguageSubtag = languageSubtags.at(languageSubtagIndex); String currentLanguageSubtag = languageSubtags.at(languageSubtagIndex); Reference and bracket access: const String& currentLanguageSubtag = languageSubtage[i]; The bracket accessor is just a convention in WebKit. Using the reference avoids a ref-deref. It is okay to keep a reference here because we know that "languageSubtags" will outlive this scope. > Source/WebCore/css/SelectorCheckerTestFunctions.h:166 > + if (!currentLanguageSubtag.isEmpty() && currentLanguageSubtag.length() == 1) You should not need "currentLanguageSubtag.isEmpty()". The condition "currentLanguageSubtag.length() == 1" implies the language subtag cannot be empty. > Source/WebCore/css/SelectorCheckerTestFunctions.h:169 > + if (equalIgnoringCase(rangeSubtags.at(rangeSubtagIndex), currentLanguageSubtag)) { at() -> [] I would put rangeSubtags.at(rangeSubtagIndex) in a temporary outside the loop to avoid accessing the pointer for each language subtag. > Source/WebCore/cssjit/SelectorCompiler.cpp:732 > ASSERT(selector.argumentList() && !selector.argumentList()->isEmpty()); > + if (selector.argumentList()->size() > 1) > + return FunctionType::CannotCompile; > + Because of the extended matching, we should no longer matchesLangPseudoClass() even if there is a single range. For now we should just always return CannotCompile. > LayoutTests/ChangeLog:11 > + * fast/selectors/lang-multiple-expected.txt: Added. > + * fast/selectors/lang-multiple.html: Added. You have lots of interesting cases of extended filtering that are certainly not covered by any other tests. I think you should have a "lang-extended-filtering.html" with a copy of all the cases of extended filtering from lang-multiple.html
Created attachment 241764 [details] Patch
Comment on attachment 241764 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=241764&action=review Quick review. I think you made a mistake when extracting "containslanguageSubtagMatchingRange()" ? > Source/WebCore/ChangeLog:20 > + * css/SelectorCheckerTestFunctions.h: Feel free to add your copyright in the header of this file if you want. > Source/WebCore/css/SelectorCheckerTestFunctions.h:127 > +inline bool containslanguageSubtagMatchingRange(const Vector<String>& languageSubtags, const String& range, const size_t& position) case: containsLangu... No need to pass the size_t by const reference, the size fits into a register. But shouldn't you pass position by reference and update it? (and name it lastMatchedLanguageSubtagIndex). > Source/WebCore/css/SelectorCheckerTestFunctions.h:154 > + // Implement basic and extended filterings of given language tags as specified in www.ietf.org/rfc/rfc4647.txt Missing period at the end of the sentence. > Source/WebCore/css/SelectorCheckerTestFunctions.h:179 > + lastMatchedLanguageSubtagIndex = rangeSubtagIndex + 1; This can't be right. This is mixing the language and range indices. There must be a way to cover this in tests. Maybe?: language: "foo-bar-bar-abc" range: "foo-abc-bar" "foo" matches. Then "abc" matches the fourth sub tag of the language. But the lastMatchedLanguageSubtagIndex is set to the third subtag. Next matching "bar" would match. I am just making this up, but there must be a way to test this case... > LayoutTests/fast/selectors/lang-extended-filtering.html:53 > + shouldBe('document.querySelectorAll(":lang(de-de)").length', '8'); > + shouldBe('document.querySelectorAll(":lang(de-DE)").length', '8'); > + shouldBe('document.querySelectorAll(":lang(DE-de)").length', '8'); > + shouldBe('document.querySelectorAll(":lang(dE-dE)").length', '8'); > + shouldBe('document.querySelectorAll(":lang(De-De)").length', '8'); > + shouldBe('document.querySelectorAll(":lang(DE-DE)").length', '8'); Nice. > LayoutTests/fast/selectors/lang-extended-filtering.html:71 > + debug('i = ' + i); > + shouldBeEqualToString('getComputedStyle(document.querySelectorAll(":lang(de-DE, DE-de)")[i]).color', 'rgb(1, 2, 3)'); Instead of using i as a global and using debug(), you can do this: shouldBeEqualToString('getComputedStyle(document.querySelectorAll(":lang(de-DE, DE-de)")[' + i + ']).color' That way i becomes a literal in the string.
Created attachment 241771 [details] Patch
Created attachment 241772 [details] Patch
Created attachment 241777 [details] Patch
Did you add a test case for the lastMatchedLanguageSubtagIndex thingy?
Comment on attachment 241777 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=241777&action=review Everything looks good to me. A minor comment and a question about testing in the previous comment. > Source/WebCore/css/SelectorCheckerTestFunctions.h:186 > + if (!matchedRange) > + continue; > + return true; I would just inverse this for clarity. if (matchedRange) return true;
(In reply to comment #21) > Did you add a test case for the lastMatchedLanguageSubtagIndex thingy? Yes, I have this <div lang="tic-tac-tac-toe"></div> And try to match with: 1. tic-tac-toe, it has to be matched 2. tic-toe-tac, it has to be failed. - `tic` will match then lastMatchedLanguageSubtagIndex=0, the next scanning process will start from 1 - `toe` will match then lastMatchedLanguageSubtagIndex=3, the next scanning process will start from 4. It causes no match for the next `tac` of the range. Is that enough? I'll think another case if it hasn't properly test the case.
Created attachment 241847 [details] Patch
Comment on attachment 241847 [details] Patch Looks great! Let's land Once we have full support, I think we'll want to have some kind of test were we pass a list of ["language", "list of language range", expected=true/false]. That way we'll be able to write a huge list of inputs to test, maybe even every standardized language... ...that's for later though :)
Comment on attachment 241847 [details] Patch Clearing flags on attachment: 241847 Committed r176313: <http://trac.webkit.org/changeset/176313>
All reviewed patches have been landed. Closing bug.