Bug 138281 - Add selector checker for :lang pseudo class in Selectors level 4
Summary: Add selector checker for :lang pseudo class in Selectors level 4
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dhi Aurrahman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-01 11:24 PDT by Dhi Aurrahman
Modified: 2014-11-19 00:57 PST (History)
2 users (show)

See Also:


Attachments
Patch (12.45 KB, patch)
2014-11-01 11:33 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2 (736.51 KB, application/zip)
2014-11-01 12:41 PDT, Build Bot
no flags Details
Archive of layout-test-results from webkit-ews-08 for mac-mountainlion (744.74 KB, application/zip)
2014-11-01 13:03 PDT, Build Bot
no flags Details
Patch (10.56 KB, patch)
2014-11-08 18:46 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (13.95 KB, patch)
2014-11-14 08:22 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (16.64 KB, patch)
2014-11-15 04:46 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (16.69 KB, patch)
2014-11-15 05:28 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (29.90 KB, patch)
2014-11-17 22:08 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (30.51 KB, patch)
2014-11-18 02:02 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (30.55 KB, patch)
2014-11-18 02:17 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (30.39 KB, patch)
2014-11-18 03:18 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (30.76 KB, patch)
2014-11-18 23:01 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dhi Aurrahman 2014-11-01 11:24:42 PDT
Add selector checker for :lang pseudo class in Selectors level 4
Comment 1 Dhi Aurrahman 2014-11-01 11:33:10 PDT
Created attachment 240790 [details]
Patch
Comment 2 Dhi Aurrahman 2014-11-01 11:34:46 PDT
WIP
Comment 3 Build Bot 2014-11-01 12:41:31 PDT
Created attachment 240792 [details]
Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-09  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 4 Build Bot 2014-11-01 13:03:52 PDT
Created attachment 240793 [details]
Archive of layout-test-results from webkit-ews-08 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: webkit-ews-08  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 5 Dhi Aurrahman 2014-11-08 18:46:30 PST
Created attachment 241244 [details]
Patch
Comment 6 Benjamin Poulain 2014-11-10 11:15:48 PST
Comment on attachment 241244 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=241244&action=review

The spec says "case-insensitively within the ASCII range". I think equalIgnoringCase() does case insensitive comparison for any character range.
I'll need to check that. We should make test for this in any case. For example: "È" and "è" should not match. We may have to do that in a separate patch though.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:145
> +        if (value.startsWith(filter, false) && value.impl()->at(filter.length()) == '-')

I am not certain this is still correct with the new definition of :lang().

For example, if you match: :lang(en-) with <div lang="en--">, should it match?

> Source/WebCore/css/SelectorCheckerTestFunctions.h:156
> +                if ((offset = value.find(subTag, offset, false)) == notFound)
> +                    return false;
> +                offset += subTag.length();

I don't think this is correct. The subTag in value might not be surrounded by dashes.

For example: :lang("fr-BE") on <div lang="fr-paddingBEpadding"> should not match.

> LayoutTests/ChangeLog:11
> +        * fast/selectors/lang-multiple-expected.txt: Added.
> +        * fast/selectors/lang-multiple.html: Added.

I would also test the implementation with:
-Various cases of dashes, try to break the implementation.
-Case sensitivity outside the ASCII range.
Comment 7 Dhi Aurrahman 2014-11-10 16:47:15 PST
(In reply to comment #6)
> Comment on attachment 241244 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=241244&action=review
> 
> The spec says "case-insensitively within the ASCII range". I think
> equalIgnoringCase() does case insensitive comparison for any character range.
> I'll need to check that. We should make test for this in any case. For
> example: "È" and "è" should not match. We may have to do that in a separate
> patch though.
> 
> > Source/WebCore/css/SelectorCheckerTestFunctions.h:145
> > +        if (value.startsWith(filter, false) && value.impl()->at(filter.length()) == '-')
> 
> I am not certain this is still correct with the new definition of :lang().
> 
> For example, if you match: :lang(en-) with <div lang="en--">, should it
> match?
> 

Should we invalidate any filters ends with dashes?

> > Source/WebCore/css/SelectorCheckerTestFunctions.h:156
> > +                if ((offset = value.find(subTag, offset, false)) == notFound)
> > +                    return false;
> > +                offset += subTag.length();
> 
> I don't think this is correct. The subTag in value might not be surrounded
> by dashes.
> 
> For example: :lang("fr-BE") on <div lang="fr-paddingBEpadding"> should not
> match.
> 
> > LayoutTests/ChangeLog:11
> > +        * fast/selectors/lang-multiple-expected.txt: Added.
> > +        * fast/selectors/lang-multiple.html: Added.
> 
> I would also test the implementation with:
> -Various cases of dashes, try to break the implementation.
> -Case sensitivity outside the ASCII range.
Comment 8 Benjamin Poulain 2014-11-10 21:19:46 PST
(In reply to comment #7) 
> Should we invalidate any filters ends with dashes?

I am not 100% sure. I have emailed that question to the standard working group: http://lists.w3.org/Archives/Public/www-style/2014Nov/0136.html
Comment 9 Dhi Aurrahman 2014-11-14 08:22:01 PST
Created attachment 241590 [details]
Patch
Comment 10 Benjamin Poulain 2014-11-14 16:15:50 PST
Comment on attachment 241590 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=241590&action=review

I know it is not marked for review but I made some quick comments anyway :)

I like the new matchesLangPseudoClassWithCommaSeparatedArguments(). It is really easy to follow your code along the spec and verify each step.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:143
> +    Vector<String> subVals;

I prefer not abbreviating names: subValues

Maybe it would be better to use the same names as the spec?
-rangeSubtags
-languageSubtags

> Source/WebCore/css/SelectorCheckerTestFunctions.h:148
> +        

Extra blank line.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:154
> +        if (subTags.size() > subVals.size()) 

Interesting, this is a good idea.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:160
> +        size_t currentIndex = 1;

lastMatchedLanguageTagIndex?

> Source/WebCore/css/SelectorCheckerTestFunctions.h:163
> +        for (size_t i = 1; i < subTags.size(); i++) {

WebKit style: i++ -> ++i

For nested loop, I prefer to give indices full names. This could be rangeSubtagIndex.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:164
> +            bool notFound = true;

I would reverse the boolean for clarity.
    bool matchedRangeSubtag = false;
That makes conditions easier to read.

Otherwise code like if (!notFound) gets tricky.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:166
> +            for (size_t j = currentIndex; j < subVals.size(); j++) {

WebKit style: j++ -> ++j

"j" could be "languageSubtagIndex

> Source/WebCore/css/SelectorCheckerTestFunctions.h:167
> +                if (equalIgnoringCase(subTags.at(i), subVals.at(j)) && subVals.at(j) != "x") {

I think the condition:
    subVals.at(j) != "x"
is not enough. The specs says any singleton fails.

> LayoutTests/fast/selectors/lang-multiple.html:29
> +    <div lang="en"></div>
> +    <div lang="en-"></div>
> +    <div lang="en--"></div>

Let's also cover the cases with "foo--bar".

> LayoutTests/fast/selectors/lang-multiple.html:32
> +    <div lang="fr-x"></div>
> +    <div lang="fr-xenomorph"></div>

Let's have full extensions too: "fr-x-xenomorph".

And other cases of singletons:
"foo-a-bar"
"foo-3-bar"
Comment 11 Dhi Aurrahman 2014-11-14 18:24:53 PST
Thanks Benjamin!

I'll update the patch as soon as possible. I also created a playground (in .js) to confirm the filtering process http://codepen.io/anon/pen/wBvmzE hopefully could grab the ideas outlined in the spec.
Comment 12 Dhi Aurrahman 2014-11-14 18:29:23 PST
I'm sorry, the pen is on http://codepen.io/diorahman/pen/EaxEEN
Comment 13 Dhi Aurrahman 2014-11-15 04:46:50 PST
Created attachment 241662 [details]
Patch
Comment 14 Dhi Aurrahman 2014-11-15 05:28:44 PST
Created attachment 241663 [details]
Patch
Comment 15 Benjamin Poulain 2014-11-17 15:01:32 PST
Comment on attachment 241663 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=241663&action=review

Round1: only minor stuff really.

The extended filtering is an important new part and it looks correct to me.

The big thing will be using extended filtering when matching a single range. Extend the testing to those cases.

> Source/WebCore/ChangeLog:14
> +        Currently, the implementation of checking functionality of matchesLangPseudoClass and
> +        matchesLangPseudoClassWithCommaSeparatedArguments is similar. It should be merged.

After the latest improvements, this is no longer true. 

The function matchesLangPseudoClassWithCommaSeparatedArguments() implements the Extended Filtering from RFC4647.
The function matchesLangPseudoClass() only does the dumb matching for CSS Level 3.

I think we should just rename matchesLangPseudoClass() to matchesLangPseudoClassDeprecated.
The function matchesLangPseudoClassWithCommaSeparatedArguments() can then become matchesLangPseudoClass().

> Source/WebCore/css/SelectorChecker.cpp:868
> +                if (selector->argumentList()->size() > 1)
> +                    return matchesLangPseudoClassWithCommaSeparatedArguments(element, *selector->argumentList());

Given the improved algorithm, we should always use matchesLangPseudoClassWithCommaSeparatedArguments() even when size == 1.

The code of matchesLangPseudoClassWithCommaSeparatedArguments() is more advanced due to extended filtering, it should become the new default for matching :lang().

> Source/WebCore/css/SelectorCheckerTestFunctions.h:141
> +

It would be worth putting a link to the RFC in a comment here to explain what is implemented in the following code.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:145
> +    String(language).split('-', true, languageSubtags);

It is better to do language.string().split() and String(language).split().

The reason is that String(language) will construct a new String from the String reference of AtomicString. This in turn will cause ref() to be called to increase the ref-count. After that split is called, the destructor of String is called, which call deref().

On the other hand, language.string().split() does not cause a ref-deref of the original string.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:151
> +        String(range).split('-', true, rangeSubtags);

Ditto: range.string().split()

> Source/WebCore/css/SelectorCheckerTestFunctions.h:159
> +        size_t lastMatchedLanguageTagIndex = 1;

lastMatchedLanguageTagIndex -> lastMatchedLanguageSubtagIndex for consistency.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:164
> +            for (size_t languageSubtagIndex = lastMatchedLanguageTagIndex; languageSubtagIndex < languageSubtags.size(); ++languageSubtagIndex) {

To clarify the code, maybe the loop should have its own function with a name describing its purpose?

For example:
     bool findLanguageSubtagMatchingRange(const Vector<String>& languageSubtags, const String& range, size_t& position)

This would let you use early return statement instead of break+boolean.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:165
> +                String currentLanguageSubtag = languageSubtags.at(languageSubtagIndex);

String currentLanguageSubtag = languageSubtags.at(languageSubtagIndex);

Reference and bracket access:
    const String& currentLanguageSubtag = languageSubtage[i];

The bracket accessor is just a convention in WebKit. Using the reference avoids a ref-deref. It is okay to keep a reference here because we know that "languageSubtags" will outlive this scope.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:166
> +                if (!currentLanguageSubtag.isEmpty() && currentLanguageSubtag.length() == 1)

You should not need "currentLanguageSubtag.isEmpty()". The condition "currentLanguageSubtag.length() == 1" implies the language subtag cannot be empty.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:169
> +                if (equalIgnoringCase(rangeSubtags.at(rangeSubtagIndex), currentLanguageSubtag)) {

at() -> []

I would put rangeSubtags.at(rangeSubtagIndex) in a temporary outside the loop to avoid accessing the pointer for each language subtag.

> Source/WebCore/cssjit/SelectorCompiler.cpp:732
>              ASSERT(selector.argumentList() && !selector.argumentList()->isEmpty());
> +            if (selector.argumentList()->size() > 1)
> +                return FunctionType::CannotCompile;
> +

Because of the extended matching, we should no longer matchesLangPseudoClass() even if there is a single range.

For now we should just always return CannotCompile.

> LayoutTests/ChangeLog:11
> +        * fast/selectors/lang-multiple-expected.txt: Added.
> +        * fast/selectors/lang-multiple.html: Added.

You have lots of interesting cases of extended filtering that are certainly not covered by any other tests.

I think you should have a "lang-extended-filtering.html" with a copy of all the cases of extended filtering from lang-multiple.html
Comment 16 Dhi Aurrahman 2014-11-17 22:08:20 PST
Created attachment 241764 [details]
Patch
Comment 17 Benjamin Poulain 2014-11-18 01:08:16 PST
Comment on attachment 241764 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=241764&action=review

Quick review.

I think you made a mistake when extracting "containslanguageSubtagMatchingRange()" ?

> Source/WebCore/ChangeLog:20
> +        * css/SelectorCheckerTestFunctions.h:

Feel free to add your copyright in the header of this file if you want.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:127
> +inline bool containslanguageSubtagMatchingRange(const Vector<String>& languageSubtags, const String& range, const size_t& position)

case: containsLangu...

No need to pass the size_t by const reference, the size fits into a register.

But shouldn't you pass position by reference and update it? (and name it lastMatchedLanguageSubtagIndex).

> Source/WebCore/css/SelectorCheckerTestFunctions.h:154
> +    // Implement basic and extended filterings of given language tags as specified in www.ietf.org/rfc/rfc4647.txt

Missing period at the end of the sentence.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:179
> +            lastMatchedLanguageSubtagIndex = rangeSubtagIndex + 1;

This can't be right. This is mixing the language and range indices.

There must be a way to cover this in tests.
Maybe?:
language: "foo-bar-bar-abc"
range: "foo-abc-bar"

"foo" matches.
Then "abc" matches the fourth sub tag of the language. But the lastMatchedLanguageSubtagIndex is set to the third subtag. Next matching "bar" would match.

I am just making this up, but there must be a way to test this case...

> LayoutTests/fast/selectors/lang-extended-filtering.html:53
> +    shouldBe('document.querySelectorAll(":lang(de-de)").length', '8');
> +    shouldBe('document.querySelectorAll(":lang(de-DE)").length', '8');
> +    shouldBe('document.querySelectorAll(":lang(DE-de)").length', '8');
> +    shouldBe('document.querySelectorAll(":lang(dE-dE)").length', '8');
> +    shouldBe('document.querySelectorAll(":lang(De-De)").length', '8');
> +    shouldBe('document.querySelectorAll(":lang(DE-DE)").length', '8');

Nice.

> LayoutTests/fast/selectors/lang-extended-filtering.html:71
> +        debug('i = ' + i);
> +        shouldBeEqualToString('getComputedStyle(document.querySelectorAll(":lang(de-DE, DE-de)")[i]).color', 'rgb(1, 2, 3)');

Instead of using i as a global and using debug(), you can do this:
    shouldBeEqualToString('getComputedStyle(document.querySelectorAll(":lang(de-DE, DE-de)")[' + i + ']).color'

That way i becomes a literal in the string.
Comment 18 Dhi Aurrahman 2014-11-18 02:02:11 PST
Created attachment 241771 [details]
Patch
Comment 19 Dhi Aurrahman 2014-11-18 02:17:03 PST
Created attachment 241772 [details]
Patch
Comment 20 Dhi Aurrahman 2014-11-18 03:18:22 PST
Created attachment 241777 [details]
Patch
Comment 21 Benjamin Poulain 2014-11-18 20:39:01 PST
Did you add a test case for the lastMatchedLanguageSubtagIndex thingy?
Comment 22 Benjamin Poulain 2014-11-18 20:55:31 PST
Comment on attachment 241777 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=241777&action=review

Everything looks good to me. A minor comment and a question about testing in the previous comment.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:186
> +        if (!matchedRange)
> +            continue;
> +        return true;

I would just inverse this for clarity.
if (matchedRange)
    return true;
Comment 23 Dhi Aurrahman 2014-11-18 22:23:44 PST
(In reply to comment #21)
> Did you add a test case for the lastMatchedLanguageSubtagIndex thingy?

Yes, I have this

<div lang="tic-tac-tac-toe"></div>

And try to match with:

1. tic-tac-toe, it has to be matched
2. tic-toe-tac, it has to be failed. 
   - `tic` will match then lastMatchedLanguageSubtagIndex=0, the next scanning process will start from 1
   - `toe` will match then lastMatchedLanguageSubtagIndex=3, the next scanning process will start from 4. It causes no match for the next `tac` of the range.

Is that enough? I'll think another case if it hasn't properly test the case.
Comment 24 Dhi Aurrahman 2014-11-18 23:01:11 PST
Created attachment 241847 [details]
Patch
Comment 25 Benjamin Poulain 2014-11-19 00:18:53 PST
Comment on attachment 241847 [details]
Patch

Looks great! Let's land

Once we have full support, I think we'll want to have some kind of test were we pass a list of ["language", "list of language range", expected=true/false]. That way we'll be able to write a huge list of inputs to test, maybe even every standardized language...

...that's for later though :)
Comment 26 WebKit Commit Bot 2014-11-19 00:57:05 PST
Comment on attachment 241847 [details]
Patch

Clearing flags on attachment: 241847

Committed r176313: <http://trac.webkit.org/changeset/176313>
Comment 27 WebKit Commit Bot 2014-11-19 00:57:10 PST
All reviewed patches have been landed.  Closing bug.