Bug 139928 - Canonicalization of :lang() should preserve the :lang()'s arguments representations
Summary: Canonicalization of :lang() should preserve the :lang()'s arguments represent...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dhi Aurrahman
URL:
Keywords:
Depends on: 140548
Blocks:
  Show dependency treegraph
 
Reported: 2014-12-23 17:12 PST by Dhi Aurrahman
Modified: 2015-01-19 15:09 PST (History)
4 users (show)

See Also:


Attachments
Patch (37.74 KB, patch)
2014-12-23 17:31 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (37.86 KB, patch)
2014-12-23 17:55 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-mountainlion (546.75 KB, application/zip)
2014-12-23 19:24 PST, Build Bot
no flags Details
Archive of layout-test-results from ews106 for mac-mountainlion-wk2 (548.31 KB, application/zip)
2014-12-23 19:39 PST, Build Bot
no flags Details
Patch (70.30 KB, patch)
2014-12-23 19:56 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (14.71 KB, patch)
2015-01-10 14:35 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews101 for mac-mountainlion (516.57 KB, application/zip)
2015-01-10 15:23 PST, Build Bot
no flags Details
Archive of layout-test-results from ews106 for mac-mountainlion-wk2 (520.71 KB, application/zip)
2015-01-10 15:27 PST, Build Bot
no flags Details
Patch (15.43 KB, patch)
2015-01-10 17:52 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (16.83 KB, patch)
2015-01-10 18:54 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (17.62 KB, patch)
2015-01-13 05:23 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (17.75 KB, patch)
2015-01-13 06:44 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (17.89 KB, patch)
2015-01-13 15:33 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (17.25 KB, patch)
2015-01-14 15:18 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (17.26 KB, patch)
2015-01-16 18:09 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff
Patch (18.20 KB, patch)
2015-01-18 15:12 PST, Dhi Aurrahman
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dhi Aurrahman 2014-12-23 17:12:08 PST
Always canonicalize :lang() to string instead of identifier
Comment 1 Dhi Aurrahman 2014-12-23 17:31:45 PST
Created attachment 243716 [details]
Patch
Comment 2 Dhi Aurrahman 2014-12-23 17:55:00 PST
Created attachment 243717 [details]
Patch
Comment 3 Build Bot 2014-12-23 19:24:21 PST
Comment on attachment 243717 [details]
Patch

Attachment 243717 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/5597974700752896

New failing tests:
fast/css/parsing-css-lang.html
fast/css/css-set-selector-text.html
fast/dom/css-selectorText.html
Comment 4 Build Bot 2014-12-23 19:24:23 PST
Created attachment 243721 [details]
Archive of layout-test-results from ews100 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 5 Build Bot 2014-12-23 19:39:38 PST
Comment on attachment 243717 [details]
Patch

Attachment 243717 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/6658914972073984

New failing tests:
fast/css/parsing-css-lang.html
fast/css/css-set-selector-text.html
fast/dom/css-selectorText.html
Comment 6 Build Bot 2014-12-23 19:39:40 PST
Created attachment 243722 [details]
Archive of layout-test-results from ews106 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 7 Dhi Aurrahman 2014-12-23 19:56:34 PST
Created attachment 243723 [details]
Patch
Comment 8 Benjamin Poulain 2014-12-25 22:57:02 PST
Comment on attachment 243723 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243723&action=review

> Source/WebCore/ChangeLog:8
> +        The canonicalization of :lang()'s arguments is expected to be always represented as string.

Before moving forward with this, I'll have a look at the parser and CSSSelector.

If it is not too messy, it would be nice to serialize identifier and strings as their original types.

> Source/WebCore/ChangeLog:9
> +        For example, :lang(foo, \*-bar) will be canonicalized as :lang("foo", "bar").

"bar" -> "*-bar"
Comment 9 Dhi Aurrahman 2014-12-26 15:23:15 PST
Comment on attachment 243723 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243723&action=review

>> Source/WebCore/ChangeLog:8
>> +        The canonicalization of :lang()'s arguments is expected to be always represented as string.
> 
> Before moving forward with this, I'll have a look at the parser and CSSSelector.
> 
> If it is not too messy, it would be nice to serialize identifier and strings as their original types.

I'm a bit confused. By preserving original types, does that mean IDENTs stay IDENTs and STRINGs stay STRINGs? Or are you talking about anything other than :lang()? :)

>> Source/WebCore/ChangeLog:9
>> +        For example, :lang(foo, \*-bar) will be canonicalized as :lang("foo", "bar").
> 
> "bar" -> "*-bar"

Thank you
Comment 10 Benjamin Poulain 2015-01-09 14:44:06 PST
(In reply to comment #9)
> Comment on attachment 243723 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=243723&action=review
> 
> >> Source/WebCore/ChangeLog:8
> >> +        The canonicalization of :lang()'s arguments is expected to be always represented as string.
> > 
> > Before moving forward with this, I'll have a look at the parser and CSSSelector.
> > 
> > If it is not too messy, it would be nice to serialize identifier and strings as their original types.
> 
> I'm a bit confused. By preserving original types, does that mean IDENTs stay
> IDENTs and STRINGs stay STRINGs? Or are you talking about anything other
> than :lang()? :)

I am very sorry for the delay. I had tons of stuff on my mind and I kept pushing this back.

I think we should try to preserve the representation of IDENT and STRING when serializing the string. It would be cleaner.
Ideally, a selector like:
    :lang(foo,"bar"   ,baz)
would serialize as
    :lang(foo, "bar", baz)

This creates some challenges on our side to do that. We will have to record the type when parsing and extend CSSSelector to keep the string and its type. But overall I believe it would be better.

What do you think?
Comment 11 Dhi Aurrahman 2015-01-10 14:35:54 PST
Created attachment 244416 [details]
Patch
Comment 12 Build Bot 2015-01-10 15:23:20 PST
Comment on attachment 244416 [details]
Patch

Attachment 244416 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/5543898512359424

New failing tests:
fast/css/css-lang-selector-with-string-arguments-text.html
Comment 13 Build Bot 2015-01-10 15:23:22 PST
Created attachment 244417 [details]
Archive of layout-test-results from ews101 for mac-mountainlion

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-mountainlion  Platform: Mac OS X 10.8.5
Comment 14 Build Bot 2015-01-10 15:27:10 PST
Comment on attachment 244416 [details]
Patch

Attachment 244416 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/5911193780748288

New failing tests:
fast/css/css-lang-selector-with-string-arguments-text.html
Comment 15 Build Bot 2015-01-10 15:27:12 PST
Created attachment 244418 [details]
Archive of layout-test-results from ews106 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.5
Comment 16 Dhi Aurrahman 2015-01-10 17:52:52 PST
Created attachment 244420 [details]
Patch
Comment 17 Dhi Aurrahman 2015-01-10 18:01:39 PST
Comment on attachment 243723 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=243723&action=review

>>>> Source/WebCore/ChangeLog:8
>>>> +        The canonicalization of :lang()'s arguments is expected to be always represented as string.
>>> 
>>> Before moving forward with this, I'll have a look at the parser and CSSSelector.
>>> 
>>> If it is not too messy, it would be nice to serialize identifier and strings as their original types.
>> 
>> I'm a bit confused. By preserving original types, does that mean IDENTs stay IDENTs and STRINGs stay STRINGs? Or are you talking about anything other than :lang()? :)
> 
> I am very sorry for the delay. I had tons of stuff on my mind and I kept pushing this back.
> 
> I think we should try to preserve the representation of IDENT and STRING when serializing the string. It would be cleaner.
> Ideally, a selector like:
>     :lang(foo,"bar"   ,baz)
> would serialize as
>     :lang(foo, "bar", baz)
> 
> This creates some challenges on our side to do that. We will have to record the type when parsing and extend CSSSelector to keep the string and its type. But overall I believe it would be better.
> 
> What do you think?

Sounds great. I'll update the patch.
Comment 18 Dhi Aurrahman 2015-01-10 18:54:43 PST
Created attachment 244421 [details]
Patch
Comment 19 Dhi Aurrahman 2015-01-10 19:50:50 PST
Comment on attachment 244421 [details]
Patch

Updated to be consistent on renaming some related methods.
Comment 20 Benjamin Poulain 2015-01-12 18:29:05 PST
Comment on attachment 244421 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=244421&action=review

I was expecting this fix to be much bigger for some reason...this is pretty cool

> Source/WebCore/css/CSSSelector.h:226
> -        const Vector<AtomicString>* argumentList() const { return m_hasRareData ? m_data.m_rareData->m_argumentList.get() : nullptr; }
> +        const Vector<AtomicString>* langArgumentList() const { return m_hasRareData ? m_data.m_rareData->m_langArgumentList.get() : nullptr; }
> +        const Vector<bool>* langArgumentParsedFromStringFlagList() const { return langArgumentList() ? m_data.m_rareData->m_langArgumentParsedFromStringFlagList.get() : nullptr; }

I think it would be cleaner to store the string and its type together.

something like:

enum class TokenType {
    AtomicString
    Identifier
}
struct LanguageArgument {
    AtomicString languageRange;
    TokenType type;
}

...

std::unique_ptr<Vector< LanguageArgument>> m_langArgumentList;
Comment 21 Dhi Aurrahman 2015-01-13 05:23:17 PST
Created attachment 244508 [details]
Patch
Comment 22 Dhi Aurrahman 2015-01-13 06:44:29 PST
Created attachment 244511 [details]
Patch
Comment 23 Dhi Aurrahman 2015-01-13 15:33:42 PST
Created attachment 244556 [details]
Patch
Comment 24 Benjamin Poulain 2015-01-14 11:02:33 PST
Comment on attachment 244556 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=244556&action=review

Some style nitpick below. Everything looks correct otherwise.

> Source/WebCore/css/CSSParserValues.cpp:265
> +void CSSParserSelector::setLangArgumentList(Vector<CSSParserString>& stringVector)

Could the "stringVector" be const?

> Source/WebCore/css/CSSParserValues.cpp:270
> +    argumentList->reserveInitialCapacity(stringVectorSize);

Having "auto argumentList = std::make_unique<Vector<LanguageArgument>>();" between "size_t stringVectorSize = stringVector.size();" and "argumentList->reserveInitialCapacity(stringVectorSize);" will add register pressure.

First, "size_t stringVectorSize = stringVector.size();" will put "stringVectorSize" into a register. Then we have "auto argumentList = std::make_unique<Vector<LanguageArgument>>();", that will likely cause a function call, which will spill all registers to the stack.

By moving "size_t stringVectorSize = stringVector.size();" one line down, you would avoid this.

> Source/WebCore/css/CSSParserValues.cpp:272
> +    for (size_t stringVectorIndex = 0; stringVectorIndex < stringVectorSize; ++stringVectorIndex) {
> +        const CSSParserString& string = stringVector[stringVectorIndex];

You can use a C++ 11 range based for here.

> Source/WebCore/css/CSSParserValues.h:39
> +enum class TokenType {
> +    IdentifierToken = 0,
> +    AtomicStringToken
> +};

I would define this in CSSSelector.h to reverse the dependencies.

> Source/WebCore/css/CSSSelector.cpp:30
> +#include "CSSParserValues.h"

Having the CSSSelector implementation relying on the parser is unfortunate.

But with TokenType on CSSSelector.h, we would not need this.
Comment 25 Dhi Aurrahman 2015-01-14 15:18:14 PST
Created attachment 244649 [details]
Patch
Comment 26 Benjamin Poulain 2015-01-15 14:10:27 PST
Comment on attachment 244649 [details]
Patch

Great work. Let's land!
Comment 27 WebKit Commit Bot 2015-01-15 14:52:29 PST
Comment on attachment 244649 [details]
Patch

Clearing flags on attachment: 244649

Committed r178532: <http://trac.webkit.org/changeset/178532>
Comment 28 WebKit Commit Bot 2015-01-15 14:52:36 PST
All reviewed patches have been landed.  Closing bug.
Comment 29 Alexey Proskuryakov 2015-01-16 10:36:02 PST
This broke fast/css/parsing-css-lang.html on Mac Yosemite Debug:

https://build.webkit.org/results/Apple%20Yosemite%20Debug%20WK2%20(Tests)/r178579%20(1470)/fast/css/parsing-css-lang-diff.txt

-PASS document.getElementById('style-container').sheet.cssRules[0].selectorText is ":lang(fr-CH)"
+FAIL document.getElementById('style-container').sheet.cssRules[0].selectorText should be :lang(fr-CH). Was :lang("fr-CH").
...
-PASS document.getElementById('style-container').sheet.cssRules[0].selectorText is ":lang(rm-CH)"
+FAIL document.getElementById('style-container').sheet.cssRules[0].selectorText should be :lang(rm-CH). Was :lang("rm-CH").
...

Not yet sure how this is possible; will roll out if I cannot figure it out quickly.
Comment 30 Alexey Proskuryakov 2015-01-16 10:44:12 PST
This test has the same expectations for all platforms, and doesn't have TestExpectations, other than for Gtk. 

Smells like memory corruption. It may or may not be new for this patch (cf. bug 139353), but it seems right to investigate it before re-landing the patch that exposes it.
Comment 31 WebKit Commit Bot 2015-01-16 10:46:36 PST
Re-opened since this is blocked by bug 140548
Comment 32 Dhi Aurrahman 2015-01-16 18:09:41 PST
Created attachment 244827 [details]
Patch
Comment 33 Dhi Aurrahman 2015-01-16 18:17:04 PST
Comment on attachment 244827 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=244827&action=review

> Source/WebCore/css/CSSGrammar.y.in:1108
> +lang_range: 

I found that the token type of the parsed CSSParserString should be explicitly defined after it's initialized. Do we have preference of this tokenType? 

We can also put e.g. 

tokenType = TokenType::IdentifierToken; inside each init() in CSSParserString.
Comment 34 Benjamin Poulain 2015-01-18 13:33:10 PST
(In reply to comment #33)
> Comment on attachment 244827 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=244827&action=review
> 
> > Source/WebCore/css/CSSGrammar.y.in:1108
> > +lang_range: 
> 
> I found that the token type of the parsed CSSParserString should be
> explicitly defined after it's initialized. Do we have preference of this
> tokenType? 
> 
> We can also put e.g. 
> 
> tokenType = TokenType::IdentifierToken; inside each init() in
> CSSParserString.

Arg. It is unfortunate that EWS did not catch the mistake.

IMHO, you should have the initialization in each init() and clear() to ensure CSSParserString is always in a valid state.
Comment 35 Dhi Aurrahman 2015-01-18 15:12:12 PST
Created attachment 244867 [details]
Patch
Comment 36 Dhi Aurrahman 2015-01-18 18:31:17 PST
Comment on attachment 244867 [details]
Patch

I'm not sure what's wrong with the red-label on windows EWS build bot status. I hope it is just a false alarm.
Comment 37 Benjamin Poulain 2015-01-19 14:27:32 PST
Comment on attachment 244867 [details]
Patch

Let's try again.
Comment 38 WebKit Commit Bot 2015-01-19 15:09:34 PST
Comment on attachment 244867 [details]
Patch

Clearing flags on attachment: 244867

Committed r178675: <http://trac.webkit.org/changeset/178675>
Comment 39 WebKit Commit Bot 2015-01-19 15:09:43 PST
All reviewed patches have been landed.  Closing bug.