Bug 21472

Summary: escaped characters cause parse errors in idents (for example in @namespace rules)
Product: WebKit Reporter: Anne van Kesteren <annevk>
Component: CSSAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: ap, bdakin, cedricv, daniel, darin, hyatt, mitz, mjs, rik, simon.fraser, simon.hollingshead, tonikitoo
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
URL: http://tc.labs.opera.com/css/namespaces/syntax-006.xml
Attachments:
Description Flags
proposed fix, tested hyatt: review-

Description Anne van Kesteren 2008-10-08 06:14:30 PDT
@import is not ignored despite it coming after @namespace.

http://dev.w3.org/csswg/css3-namespace/#syntax
Comment 1 Simon Hollingshead 2010-02-11 17:01:35 PST
Proof of issue shown at w3 testsuite case http://www.w3.org/Style/CSS/Test/CSS3/Namespace/20090210/syntax-006.xml
Comment 2 Alexey Proskuryakov 2010-05-05 13:57:21 PDT
*** Bug 38587 has been marked as a duplicate of this bug. ***
Comment 3 Daniel Glazman 2010-05-06 01:33:45 PDT
Ok I found the root of the problem and it's NOT related to @import, it's related to "url(" !!!

The test http://www.w3.org/Style/CSS/Test/CSS3/Namespace/20090210/syntax-006.xml
contains the following stylesheet:

   @namespace x u\00072l("test");
   @import url("support/fail.css");
   @namespace url("test2");
   x|test { background:lime }
   test { background:red }

and the first line fails because of u\00072l("test") !!!
So the second line, the @import rule, is applied when it should not.

Fundamentally, this bug should have its title changed. @import and @namespace
and even url() all are implemented correctly. This bug exists because escaped
characters FAIL in most idents. As an example,

  @i\mport "foo.css";

will always fail and never import the foo.css stylesheet...
Comment 4 Daniel Glazman 2010-05-06 03:24:00 PDT
Created attachment 55223 [details]
proposed fix, tested
Comment 5 Cedric Vivier 2010-05-06 04:48:48 PDT
(In reply to comment #4)
> Created an attachment (id=55223) [details]
> proposed fix, tested

Shouldn't the tokenizer lazily unescape escaped characters so that it does not potentially complexify/slow down the processing of all stylesheets to handle this (very?) rare use case?

Something like:

ESCAPED_CHAR    \\(0{0,4}[0-9][0-9](\r\n|[ \t\r\n\f])?) | [a-z])
{
    //unput is a flex function to set the next character to be scanned.
    //unescape is a function to implement (or likely rather reuse from somewhere) that returns an unescaped character.

    /*insert unescaped character */
    unput(unescape(yytext)); 

    /*eat up escaped sequence*/
}
Comment 6 Darin Adler 2010-05-06 07:48:47 PDT
Comment on attachment 55223 [details]
proposed fix, tested

This patch has tabs in it. Please make a new one without tabs.

How did you test the effect of this change on CSS parsing performance? What was the effect?
Comment 7 Darin Adler 2010-05-06 07:49:21 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > Created an attachment (id=55223) [details] [details]
> > proposed fix, tested
> 
> Shouldn't the tokenizer lazily unescape escaped characters so that it does not
> potentially complexify/slow down the processing of all stylesheets to handle
> this (very?) rare use case?
> 
> Something like:
> 
> ESCAPED_CHAR    \\(0{0,4}[0-9][0-9](\r\n|[ \t\r\n\f])?) | [a-z])
> {
>     //unput is a flex function to set the next character to be scanned.
>     //unescape is a function to implement (or likely rather reuse from
> somewhere) that returns an unescaped character.
> 
>     /*insert unescaped character */
>     unput(unescape(yytext)); 
> 
>     /*eat up escaped sequence*/
> }

That will probably work better, yes.
Comment 8 Alexey Proskuryakov 2010-05-06 08:34:33 PDT
See also: bug 32045.
Comment 9 Daniel Glazman 2010-05-06 08:45:20 PDT
(In reply to comment #6)

> (From update of attachment 55223 [details])
> This patch has tabs in it. Please make a new one without tabs.
> 
> How did you test the effect of this change on CSS parsing performance? What was
> the effect?

I did not. This is my very first patch in the WebKit world and I'm not yet
familiar with the environment (10 years of mozillianism is barely an excuse;-)

I agree the proposed fix could be a wrong thing to do on a perf basis. But
the diagnosis of the bug is at least the right one. There are two issues here:

  - escaped chars in tokens
  - WebCore::CSSParser::text assuming a fixed length for functional notations
Comment 10 Darin Adler 2010-05-06 09:32:57 PDT
(In reply to comment #9)
> But the diagnosis of the bug is at least the right one.

Yes, agreed. Good work on that! Lets find an efficient fix. I think Cedric Vivier’s suggestion may be the best way to approach the tokenizer half of the problem.
Comment 11 Dave Hyatt 2010-05-06 18:53:17 PDT
Yeah, I think the current patch uglifies the tokenizer too much, and as a result will be minusing it.  I think what Cedric is suggesting will keep the tokenizer side much cleaner.
Comment 12 Daniel Glazman 2010-05-10 01:19:09 PDT
(In reply to comment #11)

> Yeah, I think the current patch uglifies the tokenizer too much, and as a result will be minusing it.  I think what Cedric is suggesting will keep the tokenizer side much cleaner.

Aaaaah :-) I missed hyatt's r- ;-)

BTW, this bug is the last outstanding bug on the CSS 3 Namespaces radar blocking the move to PR. See

  http://disruptive-innovations.com/zoo/csswg/NAMESPACES_IMP_REPORT/
Comment 13 Alexey Proskuryakov 2010-05-10 09:56:36 PDT
>   http://disruptive-innovations.com/zoo/csswg/NAMESPACES_IMP_REPORT/

What do the green "fail" boxes in Safari column mean? I can't find it explained on this page.
Comment 14 Daniel Glazman 2010-05-10 10:43:36 PDT
(In reply to comment #13)
> >   http://disruptive-innovations.com/zoo/csswg/NAMESPACES_IMP_REPORT/
> 
> What do the green "fail" boxes in Safari column mean? I can't find it explained on this page.

D'oh. Thanks for the heads up, I updated the classes on the table cells
but forgot the prose. Fixed now.
Comment 15 Alexey Proskuryakov 2011-02-21 15:51:21 PST
See also: bug 54838.