Bug 139678 - Extend :lang() with string arguments
Summary: Extend :lang() with string arguments
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dhi Aurrahman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-16 06:25 PST by Dhi Aurrahman
Modified: 2015-01-09 14:25 PST (History)
2 users (show)

See Also:


Attachments
Patch (10.47 KB, patch)
2014-12-16 06:34 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (4.63 KB, patch)
2014-12-18 05:09 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (7.98 KB, patch)
2014-12-19 18:36 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (8.32 KB, patch)
2014-12-19 18:57 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (8.44 KB, patch)
2014-12-19 18:58 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (8.60 KB, patch)
2014-12-19 19:02 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dhi Aurrahman 2014-12-16 06:25:43 PST
Extend :lang() with string arguments
Comment 1 Dhi Aurrahman 2014-12-16 06:34:25 PST
Created attachment 243358 [details]
Patch
Comment 2 Dhi Aurrahman 2014-12-16 06:47:46 PST
Comment on attachment 243358 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243358&action=review

> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:13
> +PASS parseThenSerializeRule(':lang("*-1997", "*-1998") { }') is ':lang(*-1997, *-1998) { }'

Per David Baron's comment http://lists.w3.org/Archives/Public/www-style/2014Dec/0169.html it can be done correctly.

> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:14
> +PASS parseThenSerializeRule(':lang("   *-1997   ", "   *-1998   ") { }') is ':lang(*-1997, *-1998) { }'

This is my speculation, yet my own assumption for striping the white spaces inside the "". I'm not sure if this is the expected behavior.

Or, is the "" specially dedicated for escaping non-css valid values and has the same behavior with \\? Hence, :lang("<space>*-1997<space>") will give :lang("<space>*-1997<space>") instead of :lang(*-1997)?

> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:17
> +PASS parseThenSerializeRule(':lang(" ") { }') threw exception TypeError: undefined is not an object (evaluating 'styleElement.sheet.cssRules[0].cssText').

Yet another assumption, since the argument only provides white spaces.

> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:19
> +PASS parseThenSerializeRule(':lang("* - 1997") { }') threw exception TypeError: undefined is not an object (evaluating 'styleElement.sheet.cssRules[0].cssText').

This one is an attempt to be consistent with previous assumptions. After left and right stripping, if the value inside "" (the string)  still has white spaces, then it is invalid.
Comment 3 Benjamin Poulain 2014-12-16 21:40:38 PST
Comment on attachment 243358 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243358&action=review

> Source/WebCore/css/CSSGrammar.y.in:1109
> +    $1.stripWhiteSpace();

I am not sure we want to do that. I would just use the literal as given.

> Source/WebCore/css/CSSGrammar.y.in:1118
> +        if ($1.length() > 0) {

I am not sure we should have an exception for empty strings.

> Source/WebCore/css/CSSGrammar.y.in:1125
> +        if ($$ && $5.length() > 0)

ditto

> Source/WebCore/css/CSSGrammar.y.in:1406
> +        if ($4 && $4->size() > 0) {

ditto

>> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:13
>> +PASS parseThenSerializeRule(':lang("*-1997", "*-1998") { }') is ':lang(*-1997, *-1998) { }'
> 
> Per David Baron's comment http://lists.w3.org/Archives/Public/www-style/2014Dec/0169.html it can be done correctly.

Yup, that's awesome

>> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:14
>> +PASS parseThenSerializeRule(':lang("   *-1997   ", "   *-1998   ") { }') is ':lang(*-1997, *-1998) { }'
> 
> This is my speculation, yet my own assumption for striping the white spaces inside the "". I'm not sure if this is the expected behavior.
> 
> Or, is the "" specially dedicated for escaping non-css valid values and has the same behavior with \\? Hence, :lang("<space>*-1997<space>") will give :lang("<space>*-1997<space>") instead of :lang(*-1997)?

I would not try to do any additional rules over the string. Anything passed as a string would be interpreted as it is.

Using the string as-is gives a lot more power for developers, they can literally use anything for language, including space characters.

> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:16
> +PASS parseThenSerializeRule(':lang("") { }') threw exception TypeError: undefined is not an object (evaluating 'styleElement.sheet.cssRules[0].cssText').

This case is a bit annoying. I am torn between an invalid selector or a selector that never match anything.
Comment 4 Dhi Aurrahman 2014-12-16 22:47:42 PST
Comment on attachment 243358 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243358&action=review

>>> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:14
>>> +PASS parseThenSerializeRule(':lang("   *-1997   ", "   *-1998   ") { }') is ':lang(*-1997, *-1998) { }'
>> 
>> This is my speculation, yet my own assumption for striping the white spaces inside the "". I'm not sure if this is the expected behavior.
>> 
>> Or, is the "" specially dedicated for escaping non-css valid values and has the same behavior with \\? Hence, :lang("<space>*-1997<space>") will give :lang("<space>*-1997<space>") instead of :lang(*-1997)?
> 
> I would not try to do any additional rules over the string. Anything passed as a string would be interpreted as it is.
> 
> Using the string as-is gives a lot more power for developers, they can literally use anything for language, including space characters.

Understood. In that case, we can just add:

lang_range: LANGRANGE | IDENT | STRING

to the grammar. Nice!

>> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:16
>> +PASS parseThenSerializeRule(':lang("") { }') threw exception TypeError: undefined is not an object (evaluating 'styleElement.sheet.cssRules[0].cssText').
> 
> This case is a bit annoying. I am torn between an invalid selector or a selector that never match anything.

Yes. If we consider this as a valid selector, it will give us :lang() which is invalid in "normal" setup
Comment 5 Dhi Aurrahman 2014-12-18 05:09:33 PST
Created attachment 243492 [details]
Patch
Comment 6 Dhi Aurrahman 2014-12-19 18:36:17 PST
Created attachment 243596 [details]
Patch
Comment 7 Dhi Aurrahman 2014-12-19 18:57:27 PST
Created attachment 243598 [details]
Patch
Comment 8 Dhi Aurrahman 2014-12-19 18:58:58 PST
Created attachment 243599 [details]
Patch
Comment 9 Dhi Aurrahman 2014-12-19 19:02:43 PST
Created attachment 243600 [details]
Patch
Comment 10 Dhi Aurrahman 2014-12-19 19:07:25 PST
Comment on attachment 243600 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243600&action=review

> LayoutTests/fast/selectors/lang-extended-filtering-with-string-arguments-expected.txt:10
> +PASS document.querySelectorAll(':lang("")').length is 0

test on :lang("") to give never matching.

> LayoutTests/fast/selectors/lang-extended-filtering-with-string-arguments-expected.txt:12
> +PASS document.querySelectorAll(':lang("", "*-1997")').length is 1

given: en, "", de-1997. 

:lang("en", "*-1997") gives two, while :lang("", *-1997) gives only one.
Comment 11 Benjamin Poulain 2014-12-20 17:01:05 PST
Comment on attachment 243600 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243600&action=review

> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:6
> +PASS parseThenSerializeRule(':lang("a") { }') is ':lang(a) { }'

I think ideally the canonicalization would keep the same form as the original token.

The canonical version of :lang(   a    ) would be :lang(a). The canonical version of :lang(    "a"      ) would be :lang("a").

WebKit has other canonicalization bugs and I doubt this one is critical so I am okay with keeping the internal representation for simplicity. But I think it may be better to always canonicalize :lang() to string instead of identifier. For example,
:lang(foo, \*-bar) would be canonicalized to :lang("foo", "*-bar").

Do you have an opinion on this?
If we do that, we'll need a massive test result rebaseline... :(
Comment 12 Dhi Aurrahman 2014-12-20 17:19:08 PST
Comment on attachment 243600 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243600&action=review

>> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:6
>> +PASS parseThenSerializeRule(':lang("a") { }') is ':lang(a) { }'
> 
> I think ideally the canonicalization would keep the same form as the original token.
> 
> The canonical version of :lang(   a    ) would be :lang(a). The canonical version of :lang(    "a"      ) would be :lang("a").
> 
> WebKit has other canonicalization bugs and I doubt this one is critical so I am okay with keeping the internal representation for simplicity. But I think it may be better to always canonicalize :lang() to string instead of identifier. For example,
> :lang(foo, \*-bar) would be canonicalized to :lang("foo", "*-bar").
> 
> Do you have an opinion on this?
> If we do that, we'll need a massive test result rebaseline... :(

I'm not sure I'm familiar with this canonicalization in css selector context and its impact to selector checker. 

In order to always canonicalize :lang() to string, does that mean the result of parseThenSerializeRule() should be in string (with quotes)? Should we change the appendArgumentList() in CSSSelector too? Does this specifically only give impact to :lang()?

Anyway, I'm ok to do test result rebaseline.

> WebKit has other canonicalization bugs
Can I get more info about this? :)
Comment 13 Darin Adler 2014-12-20 20:02:01 PST
Comment on attachment 243600 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243600&action=review

> Source/WebCore/css/SelectorCheckerTestFunctions.h:161
> -    if (language.isNull())
> +    if (language.isNull() || language.isEmpty())

All null strings are empty, so we don’t need to check both. Should change isNull to isEmpty rather than adding a new check.

> Source/WebCore/css/SelectorCheckerTestFunctions.h:173
> +        if (range.isEmpty())
> +            continue;

Why exactly is this needed? Which of the tests fails without this line of code? What kind of failure do we see?
Comment 14 Dhi Aurrahman 2014-12-20 20:23:12 PST
Comment on attachment 243600 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243600&action=review

>> Source/WebCore/css/SelectorCheckerTestFunctions.h:161
>> +    if (language.isNull() || language.isEmpty())
> 
> All null strings are empty, so we don’t need to check both. Should change isNull to isEmpty rather than adding a new check.

OK, I'll update it. Thanks.

>> Source/WebCore/css/SelectorCheckerTestFunctions.h:173
>> +            continue;
> 
> Why exactly is this needed? Which of the tests fails without this line of code? What kind of failure do we see?

From my understanding, the "" (empty string) argument for :lang() is always treated as never matching, e.g. if we have :lang("", "*-1997", de-DE), I believe it is better to jump right away to the next range (*-1997) as soon as the current range (e.g. the first range "") is detected as an empty string.
Comment 15 Benjamin Poulain 2014-12-22 19:36:40 PST
(In reply to comment #14)
> > Why exactly is this needed? Which of the tests fails without this line of code? What kind of failure do we see?
> 
> From my understanding, the "" (empty string) argument for :lang() is always
> treated as never matching, e.g. if we have :lang("", "*-1997", de-DE), I
> believe it is better to jump right away to the next range (*-1997) as soon
> as the current range (e.g. the first range "") is detected as an empty
> string.

Darin's argument is that the empty string would not match anyway in the later code.

But I am not sure we can remove the isEmpty() check. I suspect we will crash at rangeSubtags.first() since rangeSubtags would be empty.
Comment 16 Benjamin Poulain 2014-12-22 19:50:41 PST
(In reply to comment #12)
> Comment on attachment 243600 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=243600&action=review
> 
> >> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:6
> >> +PASS parseThenSerializeRule(':lang("a") { }') is ':lang(a) { }'
> > 
> > I think ideally the canonicalization would keep the same form as the original token.
> > 
> > The canonical version of :lang(   a    ) would be :lang(a). The canonical version of :lang(    "a"      ) would be :lang("a").
> > 
> > WebKit has other canonicalization bugs and I doubt this one is critical so I am okay with keeping the internal representation for simplicity. But I think it may be better to always canonicalize :lang() to string instead of identifier. For example,
> > :lang(foo, \*-bar) would be canonicalized to :lang("foo", "*-bar").
> > 
> > Do you have an opinion on this?
> > If we do that, we'll need a massive test result rebaseline... :(
> 
> I'm not sure I'm familiar with this canonicalization in css selector context
> and its impact to selector checker. 

The canonical form is defined by CSSOM: http://dev.w3.org/csswg/cssom/#serializing-selectors

I *think* the only relavant entry points in WebKit are through CSSSelector::selectorText() and CSSSelectorList::selectorText().

> In order to always canonicalize :lang() to string, does that mean the result
> of parseThenSerializeRule() should be in string (with quotes)? 

Yep.

> Should we
> change the appendArgumentList() in CSSSelector too? Does this specifically
> only give impact to :lang()?

Since appendArgumentList() is only used by :lang(), we can just update its definition.

Something like appendLanguageArgumentList/appendLangArgumentList/appendStringArgumentList?

> Anyway, I'm ok to do test result rebaseline.
> 
> > WebKit has other canonicalization bugs
> Can I get more info about this? :)

The canonical form of Selectors is supposed to escape identifiers characters and escape some characters of strings.

Take http://codepen.io/anon/pen/vEXOYx for example. Firefox serializes it nicely, WebKit just returns the parsed identifier.

-

Note that we have at least one more thing to do for :lang() before we can qualify it: evaluate performance. We should make sure the matching is efficient, WebKit should be faster than the other engines.
Comment 17 Dhi Aurrahman 2014-12-23 22:21:55 PST
Comment on attachment 243600 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243600&action=review

>>>> LayoutTests/fast/css/css-lang-selector-with-string-arguments-text-expected.txt:6
>>>> +PASS parseThenSerializeRule(':lang("a") { }') is ':lang(a) { }'
>>> 
>>> I think ideally the canonicalization would keep the same form as the original token.
>>> 
>>> The canonical version of :lang(   a    ) would be :lang(a). The canonical version of :lang(    "a"      ) would be :lang("a").
>>> 
>>> WebKit has other canonicalization bugs and I doubt this one is critical so I am okay with keeping the internal representation for simplicity. But I think it may be better to always canonicalize :lang() to string instead of identifier. For example,
>>> :lang(foo, \*-bar) would be canonicalized to :lang("foo", "*-bar").
>>> 
>>> Do you have an opinion on this?
>>> If we do that, we'll need a massive test result rebaseline... :(
>> 
>> I'm not sure I'm familiar with this canonicalization in css selector context and its impact to selector checker. 
>> 
>> In order to always canonicalize :lang() to string, does that mean the result of parseThenSerializeRule() should be in string (with quotes)? Should we change the appendArgumentList() in CSSSelector too? Does this specifically only give impact to :lang()?
>> 
>> Anyway, I'm ok to do test result rebaseline.
> 
> The canonical form is defined by CSSOM: http://dev.w3.org/csswg/cssom/#serializing-selectors
> 
> I *think* the only relavant entry points in WebKit are through CSSSelector::selectorText() and CSSSelectorList::selectorText().

Hi Benjamin, I put the ideas to always canonicalize :lang() arguments as strings in following patch: https://bugs.webkit.org/show_bug.cgi?id=139928
Comment 18 Benjamin Poulain 2014-12-25 22:53:10 PST
Comment on attachment 243600 [details]
Patch

LGTM
Comment 19 WebKit Commit Bot 2014-12-25 23:34:43 PST
Comment on attachment 243600 [details]
Patch

Clearing flags on attachment: 243600

Committed r177745: <http://trac.webkit.org/changeset/177745>
Comment 20 WebKit Commit Bot 2014-12-25 23:34:47 PST
All reviewed patches have been landed.  Closing bug.
Comment 21 Darin Adler 2014-12-27 13:26:41 PST
Comment on attachment 243600 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243600&action=review

>>> Source/WebCore/css/SelectorCheckerTestFunctions.h:161
>>> +    if (language.isNull() || language.isEmpty())
>> 
>> All null strings are empty, so we don’t need to check both. Should change isNull to isEmpty rather than adding a new check.
> 
> OK, I'll update it. Thanks.

Looks like the patch was landed without the change I requested here.
Comment 22 Benjamin Poulain 2015-01-09 14:25:30 PST
(In reply to comment #21)
> Comment on attachment 243600 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=243600&action=review
> 
> >>> Source/WebCore/css/SelectorCheckerTestFunctions.h:161
> >>> +    if (language.isNull() || language.isEmpty())
> >> 
> >> All null strings are empty, so we don’t need to check both. Should change isNull to isEmpty rather than adding a new check.
> > 
> > OK, I'll update it. Thanks.
> 
> Looks like the patch was landed without the change I requested here.

My bad, fixed here: http://trac.webkit.org/changeset/178198