21472 – escaped characters cause parse errors in idents (for example in @namespace rules)

RESOLVED CONFIGURATION CHANGED21472

escaped characters cause parse errors in idents (for example in @namespace rules)

https://bugs.webkit.org/show_bug.cgi?id=21472

Summary escaped characters cause parse errors in idents (for example in @namespace ru...

Anne van Kesteren

Reported 2008-10-08 06:14:30 PDT

@import is not ignored despite it coming after @namespace. http://dev.w3.org/csswg/css3-namespace/#syntax

Attachments
proposed fix, tested (14.37 KB, patch) 2010-05-06 03:24 PDT, Daniel Glazman	hyatt: review-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Simon Hollingshead

Comment 1 2010-02-11 17:01:35 PST

Proof of issue shown at w3 testsuite case http://www.w3.org/Style/CSS/Test/CSS3/Namespace/20090210/syntax-006.xml

Alexey Proskuryakov

Comment 2 2010-05-05 13:57:21 PDT

*** Bug 38587 has been marked as a duplicate of this bug. ***

Daniel Glazman

Comment 3 2010-05-06 01:33:45 PDT

Ok I found the root of the problem and it's NOT related to @import, it's related to "url(" !!! The test http://www.w3.org/Style/CSS/Test/CSS3/Namespace/20090210/syntax-006.xml contains the following stylesheet: @namespace x u\00072l("test"); @import url("support/fail.css"); @namespace url("test2"); x|test { background:lime } test { background:red } and the first line fails because of u\00072l("test") !!! So the second line, the @import rule, is applied when it should not. Fundamentally, this bug should have its title changed. @import and @namespace and even url() all are implemented correctly. This bug exists because escaped characters FAIL in most idents. As an example, @i\mport "foo.css"; will always fail and never import the foo.css stylesheet...

Daniel Glazman

Comment 4 2010-05-06 03:24:00 PDT

Created attachment 55223 [details] proposed fix, tested

Cedric Vivier

Comment 5 2010-05-06 04:48:48 PDT

(In reply to comment #4) > Created an attachment (id=55223) [details] > proposed fix, tested Shouldn't the tokenizer lazily unescape escaped characters so that it does not potentially complexify/slow down the processing of all stylesheets to handle this (very?) rare use case? Something like: ESCAPED_CHAR \\(0{0,4}[0-9][0-9](\r\n|[ \t\r\n\f])?) | [a-z]) { //unput is a flex function to set the next character to be scanned. //unescape is a function to implement (or likely rather reuse from somewhere) that returns an unescaped character. /*insert unescaped character */ unput(unescape(yytext)); /*eat up escaped sequence*/ }

Darin Adler

Comment 6 2010-05-06 07:48:47 PDT

Comment on attachment 55223 [details] proposed fix, tested This patch has tabs in it. Please make a new one without tabs. How did you test the effect of this change on CSS parsing performance? What was the effect?

Darin Adler

Comment 7 2010-05-06 07:49:21 PDT

(In reply to comment #5) > (In reply to comment #4) > > Created an attachment (id=55223) [details] [details] > > proposed fix, tested > > Shouldn't the tokenizer lazily unescape escaped characters so that it does not > potentially complexify/slow down the processing of all stylesheets to handle > this (very?) rare use case? > > Something like: > > ESCAPED_CHAR \\(0{0,4}[0-9][0-9](\r\n|[ \t\r\n\f])?) | [a-z]) > { > //unput is a flex function to set the next character to be scanned. > //unescape is a function to implement (or likely rather reuse from > somewhere) that returns an unescaped character. > > /*insert unescaped character */ > unput(unescape(yytext)); > > /*eat up escaped sequence*/ > } That will probably work better, yes.

Alexey Proskuryakov

Comment 8 2010-05-06 08:34:33 PDT

See also: bug 32045.

Daniel Glazman

Comment 9 2010-05-06 08:45:20 PDT

(In reply to comment #6) > (From update of attachment 55223 [details]) > This patch has tabs in it. Please make a new one without tabs. > > How did you test the effect of this change on CSS parsing performance? What was > the effect? I did not. This is my very first patch in the WebKit world and I'm not yet familiar with the environment (10 years of mozillianism is barely an excuse;-) I agree the proposed fix could be a wrong thing to do on a perf basis. But the diagnosis of the bug is at least the right one. There are two issues here: - escaped chars in tokens - WebCore::CSSParser::text assuming a fixed length for functional notations

Darin Adler

Comment 10 2010-05-06 09:32:57 PDT

(In reply to comment #9) > But the diagnosis of the bug is at least the right one. Yes, agreed. Good work on that! Lets find an efficient fix. I think Cedric Vivier’s suggestion may be the best way to approach the tokenizer half of the problem.

Dave Hyatt

Comment 11 2010-05-06 18:53:17 PDT

Yeah, I think the current patch uglifies the tokenizer too much, and as a result will be minusing it. I think what Cedric is suggesting will keep the tokenizer side much cleaner.

Daniel Glazman

Comment 12 2010-05-10 01:19:09 PDT

(In reply to comment #11) > Yeah, I think the current patch uglifies the tokenizer too much, and as a result will be minusing it. I think what Cedric is suggesting will keep the tokenizer side much cleaner. Aaaaah :-) I missed hyatt's r- ;-) BTW, this bug is the last outstanding bug on the CSS 3 Namespaces radar blocking the move to PR. See http://disruptive-innovations.com/zoo/csswg/NAMESPACES_IMP_REPORT/

Alexey Proskuryakov

Comment 13 2010-05-10 09:56:36 PDT

> http://disruptive-innovations.com/zoo/csswg/NAMESPACES_IMP_REPORT/ What do the green "fail" boxes in Safari column mean? I can't find it explained on this page.

Daniel Glazman

Comment 14 2010-05-10 10:43:36 PDT

(In reply to comment #13) > > http://disruptive-innovations.com/zoo/csswg/NAMESPACES_IMP_REPORT/ > > What do the green "fail" boxes in Safari column mean? I can't find it explained on this page. D'oh. Thanks for the heads up, I updated the classes on the table cells but forgot the prose. Fixed now.

Alexey Proskuryakov

Comment 15 2011-02-21 15:51:21 PST