RESOLVED FIXED139014
Implement parser for :lang pseudo class selector arguments that contain wildcard '*' subtags
https://bugs.webkit.org/show_bug.cgi?id=139014
Summary Implement parser for :lang pseudo class selector arguments that contain wildc...
Dhi Aurrahman
Reported 2014-11-23 03:53:17 PST
Implement parser for :lang pseudo class selector arguments with wildcard '*' subtags
Attachments
Patch (22.93 KB, patch)
2014-11-23 04:11 PST, Dhi Aurrahman
no flags
Patch (22.86 KB, patch)
2014-11-23 20:43 PST, Dhi Aurrahman
no flags
Patch (26.22 KB, patch)
2014-11-25 15:47 PST, Dhi Aurrahman
no flags
Patch (11.18 KB, patch)
2014-12-01 21:58 PST, Dhi Aurrahman
no flags
Patch (40.17 KB, patch)
2014-12-03 01:39 PST, Dhi Aurrahman
no flags
Patch (40.17 KB, patch)
2014-12-03 02:02 PST, Dhi Aurrahman
no flags
Patch (50.77 KB, patch)
2014-12-03 03:28 PST, Dhi Aurrahman
no flags
Dhi Aurrahman
Comment 1 2014-11-23 04:11:56 PST
Dhi Aurrahman
Comment 2 2014-11-23 20:43:04 PST
Dhi Aurrahman
Comment 3 2014-11-25 15:47:42 PST
Benjamin Poulain
Comment 4 2014-11-30 18:31:58 PST
Comment on attachment 242214 [details] Patch I'll try to review this tomorrow, I am not doing too well. Some comments form a quick reading: -I don't think '*' is valid language range for CSS. The specification says that '*' can be the first character only when followed by an hyphen: "consist of an asterisk (* U+002A) immediately followed by an identifier beginning with an ASCII hyphen (U+002D)". -The terminals LANGRANGE and IDENT alone have no action, that is suspicious but that may be correct. -The production for the rule "lang_range" is more complex than I'd like. The more complex a production is, the harder it is to verify its correctness. I'll need to look at the parser/lexer side of things to see if lang_range can be simplified. Hopefully I'll do that tomorrow.
Dhi Aurrahman
Comment 5 2014-11-30 19:19:47 PST
(In reply to comment #4) > Comment on attachment 242214 [details] > Patch > > I'll try to review this tomorrow, I am not doing too well. > Ah, get well soon :). > Some comments form a quick reading: > -I don't think '*' is valid language range for CSS. The specification says > that '*' can be the first character only when followed by an hyphen: > "consist of an asterisk (* U+002A) immediately followed by an identifier > beginning with an ASCII hyphen (U+002D)". Is valid language range for CSS != language range defined in http://www.ietf.org/rfc/rfc3066.txt (*) section 2.5? (*) In that rfc3066, the language range is denoted as: The following definition of language-range is derived from HTTP/1.1 [RFC 2616]. language-range = language-tag / "*" That is, a language-range has the same syntax as a language-tag, or is the single character "*". > -The terminals LANGRANGE and IDENT alone have no action, that is suspicious > but that may be correct. > -The production for the rule "lang_range" is more complex than I'd like. The > more complex a production is, the harder it is to verify its correctness. Yes it is, I tempted to use the `goto` statement, but I'm not sure if that path will lead us to simpler production. > > I'll need to look at the parser/lexer side of things to see if lang_range > can be simplified. Hopefully I'll do that tomorrow. Thanks!
Benjamin Poulain
Comment 6 2014-12-01 13:05:45 PST
(In reply to comment #5) > > Some comments form a quick reading: > > -I don't think '*' is valid language range for CSS. The specification says > > that '*' can be the first character only when followed by an hyphen: > > "consist of an asterisk (* U+002A) immediately followed by an identifier > > beginning with an ASCII hyphen (U+002D)". > > Is valid language range for CSS != language range defined in > http://www.ietf.org/rfc/rfc3066.txt (*) section 2.5? > > (*) In that rfc3066, the language range is denoted as: > The following definition of language-range is derived from HTTP/1.1 [RFC > 2616]. > > language-range = language-tag / "*" > > That is, a language-range has the same syntax as a language-tag, or is the > single character "*". It looks like the specification of CSS is more restrictive than the definition of range in rfc3066 (http://dev.w3.org/csswg/selectors4/#the-lang-pseudo). As far as I know, the CSS working group has given no rationale for this.
Benjamin Poulain
Comment 7 2014-12-01 13:48:20 PST
Comment on attachment 242214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242214&action=review Looking at the parser, the problem is that you are interpreting the range as defined by BCP47. Here we are in a much more restricted context. Since we are parsing a stylesheet, we are expected to parse using the rules of CSS, then try to follow BCP 47 as closely as possible for interpreting the range. CSS Level 4 defines the valid input of :lang() as (http://dev.w3.org/csswg/selectors4/#the-lang-pseudo): "Each language range in :lang() must be a valid CSS identifier [CSS21] or consist of an asterisk (* U+002A) immediately followed by an identifier beginning with an ASCII hyphen (U+002D) for the selector to be valid.". This implies we should use all the rules of CSS Identifiers for interpreting the input. For example, with CSS, you can escape characters with \. A range with a star would be encoded in CSS as :lang(foo-\*-bar). You can even include parenthesis in the range by escaping them, something like :lang(foo-\(bar\)) probably already work today. I think CharacterAsterisk + "currentCharacter<SrcCharacterType>() == '-'" is a good start to get the new token. But then I would try to take advantage of parseIdentifier() to get the remaining characters. > Source/WebCore/css/CSSParser.cpp:11531 > + String range; Using String::append() does a new memory allocation for every call. When creating String from a series of parts, we use StringBuilder to reduce the allocation impact.
Benjamin Poulain
Comment 8 2014-12-01 13:48:51 PST
Sorry it took me so long to do the first round of review.
Dhi Aurrahman
Comment 9 2014-12-01 21:58:54 PST
Benjamin Poulain
Comment 10 2014-12-02 10:51:33 PST
Comment on attachment 242384 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242384&action=review You removed a bunch of tests between the last two iterations. You had many great tests in the previous iteration, it is sad to lose them. What I would do is: -Create a new test parsing-css-lang.html -Add all the tests you had previously, just change the expected results to your new parser. -Duplicate some of the tests you had previously and use escape characters in there. You can check the tests parsing-matches-X.html, parsing-not-X.html, parsing-nth-child-X.html. Those are extensive parsing tests. Appart from that, the new grammar looks much simpler, I like it. > Source/WebCore/css/CSSParser.cpp:11501 > + result++; I would do result = currentCharacter<SrcCharacterType>(); ...it seems clearer. > Source/WebCore/css/CSSParser.cpp:11502 > + parseIdentifier(result, yylval->string, hasEscape); I find it confusing to use yylval->string here, I would use a temporary variable, then yylval->string.init(parsedLangRange.toString()); later initialize the yylval. I think you will hit an assertion if you use this code with "*--".
Dhi Aurrahman
Comment 11 2014-12-02 16:47:00 PST
I have a question, related to the spec and escaped '*'. For example we have: :lang(*-foo-\*), after parsing, it can be then serialized as :lang(*-foo-*), which has '*' that is not at the beginning of the language range. Should it be treated as invalid range? Since in the spec http://dev.w3.org/csswg/selectors4/#the-lang-pseudo EXAMPLE 31, there is a note: "Note that asterisks are not allowed anywhere else in :lang()'s language range syntax: they only have meaning, and are therefore only allowed, at the beginning." Thanks!
Benjamin Poulain
Comment 12 2014-12-02 17:36:47 PST
(In reply to comment #11) > I have a question, related to the spec and escaped '*'. > > For example we have: > > :lang(*-foo-\*), after parsing, it can be then serialized as :lang(*-foo-*), > which has '*' that is not at the beginning of the language range. Should it > be treated as invalid range? Since in the spec > http://dev.w3.org/csswg/selectors4/#the-lang-pseudo EXAMPLE 31, there is a > note: > > "Note that asterisks are not allowed anywhere else in :lang()'s language > range syntax: they only have meaning, and are therefore only allowed, at the > beginning." That's a very good question. That line in Example 31 is ambiguous. I think that sentence is wrong, or was referring to the raw input. In any case, I have asked the CSS Working Group to clarify: http://lists.w3.org/Archives/Public/www-style/2014Dec/0046.html
Dhi Aurrahman
Comment 13 2014-12-03 01:39:19 PST
Dhi Aurrahman
Comment 14 2014-12-03 01:43:04 PST
(In reply to comment #12) > (In reply to comment #11) > That's a very good question. That line in Example 31 is ambiguous. > > I think that sentence is wrong, or was referring to the raw input. In any > case, I have asked the CSS Working Group to clarify: > http://lists.w3.org/Archives/Public/www-style/2014Dec/0046.html Thanks for posting the question to the www-style group. I know we are still waiting for the clarification of http://lists.w3.org/Archives/Public/www-style/2014Dec/0046.html, but I guess with this #4 patch I'm trying to grab your idea, especially for the tests.
Dhi Aurrahman
Comment 15 2014-12-03 02:02:19 PST
Dhi Aurrahman
Comment 16 2014-12-03 03:28:46 PST
Benjamin Poulain
Comment 17 2014-12-03 18:15:44 PST
Comment on attachment 242487 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=242487&action=review I like the patch and the test. If we do not get feedback from the CSS WG tomorrow, let's move forward and review it. > LayoutTests/fast/css/css-selector-text-expected.txt:323 > +PASS parseThenSerializeRule(':lang(*-foo-\\0072 aisin) { }') is ':lang(*-foo-raisin) { }' > +PASS parseThenSerializeRule(':lang(*-foo-\\0062 \\0061 r) { }') is ':lang(*-foo-bar) { }' Interesting.
Benjamin Poulain
Comment 18 2014-12-05 20:28:39 PST
Comment on attachment 242487 [details] Patch The patch looks great. Very thorough testing, I like it. Let's land it
WebKit Commit Bot
Comment 19 2014-12-05 21:08:42 PST
Comment on attachment 242487 [details] Patch Clearing flags on attachment: 242487 Committed r176902: <http://trac.webkit.org/changeset/176902>
WebKit Commit Bot
Comment 20 2014-12-05 21:08:46 PST
All reviewed patches have been landed. Closing bug.
Mark Lam
Comment 21 2014-12-06 11:37:14 PST
This cause 2 crashes in the following tests due to assertion failures on debug builds: fast/css/parsing-css-lang.html fast/css/css-selector-text.html See: https://build.webkit.org/builders/Apple%20Yosemite%20Debug%20WK2%20%28Tests%29/builds/806 https://build.webkit.org/results/Apple%20Yosemite%20Debug%20WK2%20(Tests)/r176902%20(806)/results.html The tests have been skipped in r176908: <http://trac.webkit.org/r176908>. Please un-skip them when the issue has been fixed. Thanks.
Alexey Proskuryakov
Comment 22 2014-12-07 11:00:14 PST
Wouldn't it be more appropriate to roll out? It doesn't seem good to have these failures in the tree for any extended period of time (and it's already been more than a day).
Benjamin Poulain
Comment 23 2014-12-07 13:12:06 PST
(In reply to comment #22) > Wouldn't it be more appropriate to roll out? It doesn't seem good to have > these failures in the tree for any extended period of time (and it's already > been more than a day). I'll have a look. We can also revert without any impact, this is an experimental feature (CSS_SELECTORS_LEVEL4).
Benjamin Poulain
Comment 24 2014-12-07 22:32:37 PST
Ahmad Saleem
Comment 25 2022-07-20 17:19:35 PDT
It seems that it was reopened due to failure, which were fixed in separate bug as per Comment 24 rather than rolling back this patch (at least as per discussion). If there is nothing much left here, can this be marked as "RESOLVED FIXED" or accordingly? Thanks!
Radar WebKit Bug Importer
Comment 26 2022-07-21 17:20:22 PDT
Note You need to log in before you can comment on or make changes to this bug.