Bug 138170 - Add initial parsing functionality of :lang pseudo class in Selectors Level 4.
Summary: Add initial parsing functionality of :lang pseudo class in Selectors Level 4.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dhi Aurrahman
URL:
Keywords:
Depends on:
Blocks: 138421
  Show dependency treegraph
 
Reported: 2014-10-29 06:24 PDT by Dhi Aurrahman
Modified: 2014-11-05 08:54 PST (History)
4 users (show)

See Also:


Attachments
Patch (11.72 KB, patch)
2014-10-29 06:26 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (14.83 KB, patch)
2014-10-29 09:59 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (14.28 KB, patch)
2014-10-29 23:06 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (14.28 KB, patch)
2014-10-29 23:12 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (19.28 KB, patch)
2014-10-30 00:59 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (19.38 KB, patch)
2014-10-30 02:12 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (19.38 KB, patch)
2014-10-30 16:03 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (27.48 KB, patch)
2014-10-30 23:33 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (28.81 KB, patch)
2014-10-31 01:31 PDT, Dhi Aurrahman
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dhi Aurrahman 2014-10-29 06:24:48 PDT
Extend the :lang() pseudo-class to accept comma-separated list arguments
Comment 1 Dhi Aurrahman 2014-10-29 06:26:21 PDT
Created attachment 240598 [details]
Patch
Comment 2 Dhi Aurrahman 2014-10-29 09:59:23 PDT
Created attachment 240608 [details]
Patch
Comment 3 Benjamin Poulain 2014-10-29 18:19:32 PDT
Comment on attachment 240608 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=240608&action=review

First round of review really quick.

I did not look into the crash yet. Can you make an update from the comments above? I'll try it locally.

> Source/WebCore/ChangeLog:27
> +        * css/SelectorChecker.cpp:

Let's do matching in a separate patch. There is enough complexity in parsing alone to justify a patch.

> Source/WebCore/css/CSSGrammar.y.in:1555
> +comma_separated_identifiers:

There is one more case I did not mention in my email: errors.

Let's say we have ":lang(foo, bar, 25)" 

The grammar would work like this on "foo, bar, 25"):
1) Find that "foo" is an IDENT.
2) Get the "," with look ahead.
3) Resolve the IDENT to "comma_separated_identifiers". Create the new Vector<CSSParserString>.
4) (Resolve the maybe_space)
5) Find that bar is an IDENT.
6) Get the "," with look ahead.
7) Resolve "comma_separated_identifiers ',' maybe_space IDENT" to comma_separated_identifiers", append the new IDENT.
8) Find that "25" is not an IDENT
9) Fails to parse this rule.

Now the problem is the Vector allocated in [3] was never freed by the parser. To solve that, you will need one more case handling the error and deleting the vector.

You can look at the rule for "selector_list" as an example.

> Source/WebCore/css/CSSParserValues.cpp:255
> +void CSSParserSelector::adoptStringVector(Vector<CSSParserString>& argumentVector) 

I think I would prefer setArgumentList() here instead of adoptStringVector(). The function does not really adopt the values or the vector, it creates a new vector based on the identifiers.

> Source/WebCore/css/CSSParserValues.cpp:256
> +{

At the function entry, it would be good to test for valid input:
    ASSERT_WITH_MESSAGE(!argumentVector.isEmpty(), "No CSS Selector accept an empty argument list.");

> Source/WebCore/css/CSSParserValues.cpp:257
> +    auto argumentList = std::make_unique<Vector<AtomicString>>();

After creating the vector, you can initialize its memory:
    argumentList->reserveInitialCapacity(argumentVector->size());

That way the following loop does not need to grow the vector dynamically, the vector will already be of the right size.

> Source/WebCore/css/CSSParserValues.cpp:258
> +    for (const auto& argument : argumentVector)

I would prefer the type CSSParserString instead of "auto" here to avoid any confusion between CSSParserString and AtomicString when reading the code.

> Source/WebCore/css/CSSParserValues.cpp:259
> +        argumentList->append(static_cast<String>(argument));

Wouldn't this code work without the static_cast through implicit conversion?

If not, the cast should be to AtomicString instead of String.

> Source/WebCore/css/CSSParserValues.h:203
> +#if ENABLE(CSS_SELECTORS_LEVEL4)    
> +    void adoptStringVector(Vector<CSSParserString>& stringVector);
> +#endif

Let's move this after the definition of adoptSelectorVector().

> Source/WebCore/css/CSSSelector.cpp:257
> +static void appendArgumentList(StringBuilder& str, const Vector<AtomicString>* argumentList)

Let's pass argumenList by reference instead of pointer.

> Source/WebCore/css/CSSSelector.cpp:260
> +    if (!argumentList)
> +        return;

Let's remove this.

> Source/WebCore/css/CSSSelector.cpp:261
> +    for (auto& argument : *argumentList) {

auto& -> const AtomicString&

This is a personal preference, other reviewers would not necessary say the same. I prefer using "auto" only when the type is obvious from an operation on the same line.

> Source/WebCore/css/CSSSelector.cpp:262
> +        if (argument != argumentList->last()) 

It is better to evaluate "argumentList->last()" out of the loop and keep the reference in a temporary variable.

The reason is that the compiler will have to reevaluate this expression from memory every time. The implementation of last() is:
    T& last() { return at(size() - 1); }
To do that, the code will have to read the size from memory, subtract 1 from the size, load the AtomicString from memory, and compare the string to argument.

Why I don't think the compiler will optimize this out of the loop by itself?
The reason is StringBuilder::appendLiteral() and StringBuilder::append() will eventually do one out-of-line function call. The compiler does not know what those function does to the memory. Because of that unknown, it cannot ensure that the input vector was not modified, and it cannot prove that argumentList->last() will return the same value. Thus it needs to evaluate that code every time.

> Source/WebCore/css/CSSSelector.cpp:413
> +                appendArgumentList(str, cs->argumentList());
> +                str.append(')');

Your grammar make it incorrect to have an empty list.

Here you could do:
    ASSERT_WITH_MESSAGE(cs->argumentList() && !cs->argumentList().isEmpty(), "An empty :lang() is invalid and should never be generated by the parser.");

> Source/WebCore/css/CSSSelector.h:318
>              AtomicString m_argument; // Used for :contains, :lang and :nth-*

Let's remove :lang from this comment.

> Source/WebCore/css/CSSSelector.h:320
> +            std::unique_ptr<Vector<AtomicString>> m_argumentList; // Used for :lang

You can remove the comment, the code is clear enough IMHO.

> Source/WebCore/css/SelectorChecker.cpp:866
> +                for (const auto& argument : *argumentList) {
> +                    if (matchesLangPseudoClass(element, argument.impl()))

We will need to be more efficient than that, but we'll see that in the next patch.

> Source/WebCore/cssjit/SelectorCompiler.cpp:734
> +            if (argumentList->isEmpty() || argumentList->size() > 1)

You could assert that argumentList->isEmpty() here instead of checking at runtime.

> Source/WebCore/cssjit/SelectorCompiler.cpp:737
> +            const AtomicString& argument = argumentList->at(0);

at(0) = first()

> LayoutTests/ChangeLog:4
> +        Extend the :lang() pseudo-class to accept comma-separated list arguments
> +        https://bugs.webkit.org/show_bug.cgi?id=138170

The way you can test this without implementing SelectorChecker is through CSSOM.

For example, you can see the tests added in:
-http://trac.webkit.org/changeset/173698
-http://trac.webkit.org/changeset/174259
Comment 4 Dhi Aurrahman 2014-10-29 23:06:47 PDT
Created attachment 240660 [details]
Patch
Comment 5 Dhi Aurrahman 2014-10-29 23:12:43 PDT
Created attachment 240661 [details]
Patch
Comment 6 Benjamin Poulain 2014-10-30 00:14:32 PDT
Sorry, I was not clear.

You don't need to implement the matching of multiple languages now but you need to keep the current functionality working.

In SelectorChecker.cpp, you can do something like:

#if ENABLE(CSS_SELECTORS_LEVEL4)
const AtomicString& argument = selector->argumentList()->first();
#else
const AtomicString& argument = selector->argument();
#endif

And something similar for the compiler.

Did you find what was causing the crash?
Comment 7 Benjamin Poulain 2014-10-30 00:17:37 PDT
Comment on attachment 240661 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=240661&action=review

Quick look before going to bed:

> Source/WebCore/css/CSSSelector.cpp:263
> +            str.append(',');

This should be ", " for CSSOM.

> LayoutTests/ChangeLog:12
> +        * fast/css/css-selector-text-expected.txt: Updated
> +        * fast/css/css-selector-text.html: Updated

Let's also have tests make sure that:
-empty :lang().
-invalid language
cause parsing errors.
Comment 8 Dhi Aurrahman 2014-10-30 00:59:33 PDT
Created attachment 240662 [details]
Patch
Comment 9 Dhi Aurrahman 2014-10-30 01:01:14 PDT
(In reply to comment #6)
> Sorry, I was not clear.
> 
> You don't need to implement the matching of multiple languages now but you
> need to keep the current functionality working.
> 
> In SelectorChecker.cpp, you can do something like:
> 
> #if ENABLE(CSS_SELECTORS_LEVEL4)
> const AtomicString& argument = selector->argumentList()->first();
> #else
> const AtomicString& argument = selector->argument();
> #endif

Yes, I'm sorry I forgot to put --no-review, I was just wanted to know which tests are affected.

> 
> And something similar for the compiler.
> 
> Did you find what was causing the crash?

I cleanup the build and hey, it works.
Comment 10 Dhi Aurrahman 2014-10-30 02:12:43 PDT
Created attachment 240663 [details]
Patch
Comment 11 Dhi Aurrahman 2014-10-30 16:03:22 PDT
Created attachment 240711 [details]
Patch
Comment 12 Benjamin Poulain 2014-10-30 17:51:28 PDT
Comment on attachment 240711 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=240711&action=review

The patch looks correct. cq- because I had some ideas for test coverage but everything looks good.

> Source/WebCore/css/CSSParserValues.cpp:263
> +    ASSERT_WITH_MESSAGE(!stringVector.isEmpty(), "No CSS Selector accept an empty argument list.");

accept->accepts/takes

> LayoutTests/fast/css/css-selector-text.html:86
> +testSelectorRoundTrip(":lang(fr, fr-ca, fr-be)");

Can you please add a test case with:
"de-DE, de-DE-1996, de-Latn-DE, de-Latf-DE, de-Latn-DE-1996, de-CH, it-CH, fr-CH, rm-CH"?

They are the examples in the spec, it would be a good idea to have them in our parsing tests.

Let's also have :lang(lang) to make sure the internal lang is recognized properly.

> LayoutTests/fast/css/css-selector-text.html:325
> +shouldBe("parseThenSerializeRule(':lang(a   ) { }')", "':lang(a) { }'");
> +shouldBe("parseThenSerializeRule(':lang(a, b, c   ) { }')", "':lang(a, b, c) { }'");
> +shouldBe("parseThenSerializeRule(':lang(a,                 b, c   ) { }')", "':lang(a, b, c) { }'");
> +shouldBe("parseThenSerializeRule(':lang(a,b, c,d,e   ) { }')", "':lang(a, b, c, d, e) { }'");

Looks like we need one ore two more cases of normalization: when there is one ore more spaces after the opening parenthesis. E.g.:
:lang(   a)
:lang(   a    )
:lang( a , b , c )

> LayoutTests/fast/selectors/lang-invalid.html:28
> +description('Verify invalid :lang() selectors.');

Awesome.

> LayoutTests/fast/selectors/lang-invalid.html:30
> +shouldThrow('document.querySelectorAll(":lang()").length');

Let's add cases for various valid tokens and keyword:
-:lang(:lang(en))
-:lang())
-:lang({})
-:lang({)
-:lang(}})
-:lang([])
-:lang([)
-:lang(])
-:lang(@media screen {})

...that's all I can think of, feel free to add other cases that could be break parsing.
Comment 13 Dhi Aurrahman 2014-10-30 23:33:59 PDT
Created attachment 240726 [details]
Patch
Comment 14 Benjamin Poulain 2014-10-30 23:39:38 PDT
Comment on attachment 240726 [details]
Patch

Lots of good ideas in the new tests. Let's r+ already. I'll wait for the bots to cq+.
Comment 15 Dhi Aurrahman 2014-10-31 01:31:40 PDT
Created attachment 240729 [details]
Patch
Comment 16 Dhi Aurrahman 2014-10-31 01:38:11 PDT

(In reply to comment #15)
> Created attachment 240729 [details]
> Patch

1. It removes the 'warning: unset value: $$'
2. Add some tests on multiple arguments, but some of it are invalid. 
e.g. 

shouldThrow("parseThenSerializeRule(':lang(12, b, c) { }')");
shouldThrow("parseThenSerializeRule(':lang(a, 12, c) { }')");
shouldThrow("parseThenSerializeRule(':lang(a, b, 12) { }')");
shouldThrow("parseThenSerializeRule(':lang(a, 12, 12) { }')");
shouldThrow("parseThenSerializeRule(':lang(12, b, 12) { }')");
Comment 17 Benjamin Poulain 2014-10-31 19:29:15 PDT
Comment on attachment 240729 [details]
Patch

Great job. Let's land it.
Comment 18 WebKit Commit Bot 2014-10-31 20:07:38 PDT
Comment on attachment 240729 [details]
Patch

Clearing flags on attachment: 240729

Committed r175446: <http://trac.webkit.org/changeset/175446>
Comment 19 WebKit Commit Bot 2014-10-31 20:07:43 PDT
All reviewed patches have been landed.  Closing bug.