WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
139014
Implement parser for :lang pseudo class selector arguments that contain wildcard '*' subtags
https://bugs.webkit.org/show_bug.cgi?id=139014
Summary
Implement parser for :lang pseudo class selector arguments that contain wildc...
Dhi Aurrahman
Reported
2014-11-23 03:53:17 PST
Implement parser for :lang pseudo class selector arguments with wildcard '*' subtags
Attachments
Patch
(22.93 KB, patch)
2014-11-23 04:11 PST
,
Dhi Aurrahman
no flags
Details
Formatted Diff
Diff
Patch
(22.86 KB, patch)
2014-11-23 20:43 PST
,
Dhi Aurrahman
no flags
Details
Formatted Diff
Diff
Patch
(26.22 KB, patch)
2014-11-25 15:47 PST
,
Dhi Aurrahman
no flags
Details
Formatted Diff
Diff
Patch
(11.18 KB, patch)
2014-12-01 21:58 PST
,
Dhi Aurrahman
no flags
Details
Formatted Diff
Diff
Patch
(40.17 KB, patch)
2014-12-03 01:39 PST
,
Dhi Aurrahman
no flags
Details
Formatted Diff
Diff
Patch
(40.17 KB, patch)
2014-12-03 02:02 PST
,
Dhi Aurrahman
no flags
Details
Formatted Diff
Diff
Patch
(50.77 KB, patch)
2014-12-03 03:28 PST
,
Dhi Aurrahman
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Dhi Aurrahman
Comment 1
2014-11-23 04:11:56 PST
Created
attachment 242133
[details]
Patch
Dhi Aurrahman
Comment 2
2014-11-23 20:43:04 PST
Created
attachment 242141
[details]
Patch
Dhi Aurrahman
Comment 3
2014-11-25 15:47:42 PST
Created
attachment 242214
[details]
Patch
Benjamin Poulain
Comment 4
2014-11-30 18:31:58 PST
Comment on
attachment 242214
[details]
Patch I'll try to review this tomorrow, I am not doing too well. Some comments form a quick reading: -I don't think '*' is valid language range for CSS. The specification says that '*' can be the first character only when followed by an hyphen: "consist of an asterisk (* U+002A) immediately followed by an identifier beginning with an ASCII hyphen (U+002D)". -The terminals LANGRANGE and IDENT alone have no action, that is suspicious but that may be correct. -The production for the rule "lang_range" is more complex than I'd like. The more complex a production is, the harder it is to verify its correctness. I'll need to look at the parser/lexer side of things to see if lang_range can be simplified. Hopefully I'll do that tomorrow.
Dhi Aurrahman
Comment 5
2014-11-30 19:19:47 PST
(In reply to
comment #4
)
> Comment on
attachment 242214
[details]
> Patch > > I'll try to review this tomorrow, I am not doing too well. >
Ah, get well soon :).
> Some comments form a quick reading: > -I don't think '*' is valid language range for CSS. The specification says > that '*' can be the first character only when followed by an hyphen: > "consist of an asterisk (* U+002A) immediately followed by an identifier > beginning with an ASCII hyphen (U+002D)".
Is valid language range for CSS != language range defined in
http://www.ietf.org/rfc/rfc3066.txt
(*) section 2.5? (*) In that rfc3066, the language range is denoted as: The following definition of language-range is derived from HTTP/1.1 [RFC 2616]. language-range = language-tag / "*" That is, a language-range has the same syntax as a language-tag, or is the single character "*".
> -The terminals LANGRANGE and IDENT alone have no action, that is suspicious > but that may be correct. > -The production for the rule "lang_range" is more complex than I'd like. The > more complex a production is, the harder it is to verify its correctness.
Yes it is, I tempted to use the `goto` statement, but I'm not sure if that path will lead us to simpler production.
> > I'll need to look at the parser/lexer side of things to see if lang_range > can be simplified. Hopefully I'll do that tomorrow.
Thanks!
Benjamin Poulain
Comment 6
2014-12-01 13:05:45 PST
(In reply to
comment #5
)
> > Some comments form a quick reading: > > -I don't think '*' is valid language range for CSS. The specification says > > that '*' can be the first character only when followed by an hyphen: > > "consist of an asterisk (* U+002A) immediately followed by an identifier > > beginning with an ASCII hyphen (U+002D)". > > Is valid language range for CSS != language range defined in >
http://www.ietf.org/rfc/rfc3066.txt
(*) section 2.5? > > (*) In that rfc3066, the language range is denoted as: > The following definition of language-range is derived from HTTP/1.1 [RFC > 2616]. > > language-range = language-tag / "*" > > That is, a language-range has the same syntax as a language-tag, or is the > single character "*".
It looks like the specification of CSS is more restrictive than the definition of range in rfc3066 (
http://dev.w3.org/csswg/selectors4/#the-lang-pseudo
). As far as I know, the CSS working group has given no rationale for this.
Benjamin Poulain
Comment 7
2014-12-01 13:48:20 PST
Comment on
attachment 242214
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=242214&action=review
Looking at the parser, the problem is that you are interpreting the range as defined by BCP47. Here we are in a much more restricted context. Since we are parsing a stylesheet, we are expected to parse using the rules of CSS, then try to follow BCP 47 as closely as possible for interpreting the range. CSS Level 4 defines the valid input of :lang() as (
http://dev.w3.org/csswg/selectors4/#the-lang-pseudo
): "Each language range in :lang() must be a valid CSS identifier [CSS21] or consist of an asterisk (* U+002A) immediately followed by an identifier beginning with an ASCII hyphen (U+002D) for the selector to be valid.". This implies we should use all the rules of CSS Identifiers for interpreting the input. For example, with CSS, you can escape characters with \. A range with a star would be encoded in CSS as :lang(foo-\*-bar). You can even include parenthesis in the range by escaping them, something like :lang(foo-\(bar\)) probably already work today. I think CharacterAsterisk + "currentCharacter<SrcCharacterType>() == '-'" is a good start to get the new token. But then I would try to take advantage of parseIdentifier() to get the remaining characters.
> Source/WebCore/css/CSSParser.cpp:11531 > + String range;
Using String::append() does a new memory allocation for every call. When creating String from a series of parts, we use StringBuilder to reduce the allocation impact.
Benjamin Poulain
Comment 8
2014-12-01 13:48:51 PST
Sorry it took me so long to do the first round of review.
Dhi Aurrahman
Comment 9
2014-12-01 21:58:54 PST
Created
attachment 242384
[details]
Patch
Benjamin Poulain
Comment 10
2014-12-02 10:51:33 PST
Comment on
attachment 242384
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=242384&action=review
You removed a bunch of tests between the last two iterations. You had many great tests in the previous iteration, it is sad to lose them. What I would do is: -Create a new test parsing-css-lang.html -Add all the tests you had previously, just change the expected results to your new parser. -Duplicate some of the tests you had previously and use escape characters in there. You can check the tests parsing-matches-X.html, parsing-not-X.html, parsing-nth-child-X.html. Those are extensive parsing tests. Appart from that, the new grammar looks much simpler, I like it.
> Source/WebCore/css/CSSParser.cpp:11501 > + result++;
I would do result = currentCharacter<SrcCharacterType>(); ...it seems clearer.
> Source/WebCore/css/CSSParser.cpp:11502 > + parseIdentifier(result, yylval->string, hasEscape);
I find it confusing to use yylval->string here, I would use a temporary variable, then yylval->string.init(parsedLangRange.toString()); later initialize the yylval. I think you will hit an assertion if you use this code with "*--".
Dhi Aurrahman
Comment 11
2014-12-02 16:47:00 PST
I have a question, related to the spec and escaped '*'. For example we have: :lang(*-foo-\*), after parsing, it can be then serialized as :lang(*-foo-*), which has '*' that is not at the beginning of the language range. Should it be treated as invalid range? Since in the spec
http://dev.w3.org/csswg/selectors4/#the-lang-pseudo
EXAMPLE 31, there is a note: "Note that asterisks are not allowed anywhere else in :lang()'s language range syntax: they only have meaning, and are therefore only allowed, at the beginning." Thanks!
Benjamin Poulain
Comment 12
2014-12-02 17:36:47 PST
(In reply to
comment #11
)
> I have a question, related to the spec and escaped '*'. > > For example we have: > > :lang(*-foo-\*), after parsing, it can be then serialized as :lang(*-foo-*), > which has '*' that is not at the beginning of the language range. Should it > be treated as invalid range? Since in the spec >
http://dev.w3.org/csswg/selectors4/#the-lang-pseudo
EXAMPLE 31, there is a > note: > > "Note that asterisks are not allowed anywhere else in :lang()'s language > range syntax: they only have meaning, and are therefore only allowed, at the > beginning."
That's a very good question. That line in Example 31 is ambiguous. I think that sentence is wrong, or was referring to the raw input. In any case, I have asked the CSS Working Group to clarify:
http://lists.w3.org/Archives/Public/www-style/2014Dec/0046.html
Dhi Aurrahman
Comment 13
2014-12-03 01:39:19 PST
Created
attachment 242479
[details]
Patch
Dhi Aurrahman
Comment 14
2014-12-03 01:43:04 PST
(In reply to
comment #12
)
> (In reply to
comment #11
) > That's a very good question. That line in Example 31 is ambiguous. > > I think that sentence is wrong, or was referring to the raw input. In any > case, I have asked the CSS Working Group to clarify: >
http://lists.w3.org/Archives/Public/www-style/2014Dec/0046.html
Thanks for posting the question to the www-style group. I know we are still waiting for the clarification of
http://lists.w3.org/Archives/Public/www-style/2014Dec/0046.html
, but I guess with this #4 patch I'm trying to grab your idea, especially for the tests.
Dhi Aurrahman
Comment 15
2014-12-03 02:02:19 PST
Created
attachment 242482
[details]
Patch
Dhi Aurrahman
Comment 16
2014-12-03 03:28:46 PST
Created
attachment 242487
[details]
Patch
Benjamin Poulain
Comment 17
2014-12-03 18:15:44 PST
Comment on
attachment 242487
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=242487&action=review
I like the patch and the test. If we do not get feedback from the CSS WG tomorrow, let's move forward and review it.
> LayoutTests/fast/css/css-selector-text-expected.txt:323 > +PASS parseThenSerializeRule(':lang(*-foo-\\0072 aisin) { }') is ':lang(*-foo-raisin) { }' > +PASS parseThenSerializeRule(':lang(*-foo-\\0062 \\0061 r) { }') is ':lang(*-foo-bar) { }'
Interesting.
Benjamin Poulain
Comment 18
2014-12-05 20:28:39 PST
Comment on
attachment 242487
[details]
Patch The patch looks great. Very thorough testing, I like it. Let's land it
WebKit Commit Bot
Comment 19
2014-12-05 21:08:42 PST
Comment on
attachment 242487
[details]
Patch Clearing flags on attachment: 242487 Committed
r176902
: <
http://trac.webkit.org/changeset/176902
>
WebKit Commit Bot
Comment 20
2014-12-05 21:08:46 PST
All reviewed patches have been landed. Closing bug.
Mark Lam
Comment 21
2014-12-06 11:37:14 PST
This cause 2 crashes in the following tests due to assertion failures on debug builds: fast/css/parsing-css-lang.html fast/css/css-selector-text.html See:
https://build.webkit.org/builders/Apple%20Yosemite%20Debug%20WK2%20%28Tests%29/builds/806
https://build.webkit.org/results/Apple%20Yosemite%20Debug%20WK2%20(Tests)/r176902%20(806)/results.html
The tests have been skipped in
r176908
: <
http://trac.webkit.org/r176908
>. Please un-skip them when the issue has been fixed. Thanks.
Alexey Proskuryakov
Comment 22
2014-12-07 11:00:14 PST
Wouldn't it be more appropriate to roll out? It doesn't seem good to have these failures in the tree for any extended period of time (and it's already been more than a day).
Benjamin Poulain
Comment 23
2014-12-07 13:12:06 PST
(In reply to
comment #22
)
> Wouldn't it be more appropriate to roll out? It doesn't seem good to have > these failures in the tree for any extended period of time (and it's already > been more than a day).
I'll have a look. We can also revert without any impact, this is an experimental feature (CSS_SELECTORS_LEVEL4).
Benjamin Poulain
Comment 24
2014-12-07 22:32:37 PST
Patch for review:
https://bugs.webkit.org/show_bug.cgi?id=139379
Ahmad Saleem
Comment 25
2022-07-20 17:19:35 PDT
It seems that it was reopened due to failure, which were fixed in separate bug as per
Comment 24
rather than rolling back this patch (at least as per discussion). If there is nothing much left here, can this be marked as "RESOLVED FIXED" or accordingly? Thanks!
Radar WebKit Bug Importer
Comment 26
2022-07-21 17:20:22 PDT
<
rdar://problem/97407945
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug