Bug 76152 - Incorrect handling of CSS escape sequences that encode surrogates
: Incorrect handling of CSS escape sequences that encode surrogates
Status: RESOLVED FIXED
: WebKit
CSS
: 528+ (Nightly build)
: All All
: P2 Normal
Assigned To:
: http://jsfiddle.net/mathias/jY7ra/
:
:
: 69083
  Show dependency treegraph
 
Reported: 2012-01-12 00:05 PST by
Modified: 2012-04-23 00:48 PST (History)


Attachments
CSS unicode patch (1.79 KB, patch)
2012-02-22 09:50 PST, Szilard Ledan
no flags Review Patch | Details | Formatted Diff | Diff
patch (6.95 KB, patch)
2012-02-29 09:25 PST, Szilard Ledan
webkit-ews: commit‑queue-
Review Patch | Details | Formatted Diff | Diff
patch (6.95 KB, patch)
2012-03-13 07:04 PST, Szilard Ledan
tkent: review+
tkent: commit‑queue-
Review Patch | Details | Formatted Diff | Diff
patch (5.69 KB, patch)
2012-04-20 07:29 PST, Szilard Ledan
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2012-01-12 00:05:48 PST
WebKit is the only engine that doesn’t support CSS escape sequences as per http://www.w3.org/TR/CSS21/syndata.html#characters. E.g.  `\1d306 ` or `\01d306` are supposed to be escape sequences for the “tetragram for centre” symbol (U+1D306). It would be better for interoperability if WebKit supported these.

OTOH, escape sequences of the form `\d834\df06 ` (broken up in UTF-16 code units) do work in WebKit (although I cannot find any mention of these in the spec).
------- Comment #1 From 2012-01-12 00:17:05 PST -------
Side note: Gecko is the only engine that doesn’t support escape sequences of the form `\d834\df06 `. See https://bugzilla.mozilla.org/show_bug.cgi?id=717529.
------- Comment #2 From 2012-01-12 08:03:44 PST -------
Related issue: WebKit discards the entire declaration when a surrogate pair in the wrong order is used. http://jsfiddle.net/mathias/wvPdr/ Gecko, Opera and IE treat a surrogate pair in the wrong order as invalid and replace it with two U+FFFD characters. This seems like the most sensible thing to do.

Also see: http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536
------- Comment #3 From 2012-01-12 09:10:41 PST -------
The main issue reported here is already tracked as bug 74815. Let's use this bug for surrogates issues only.
------- Comment #4 From 2012-02-01 10:23:05 PST -------
As per http://lists.w3.org/Archives/Public/www-style/2012Feb/0006.html, the surrogate pair syntax for CSS escape sequences should NOT be supported. It should be removed from WebKit as soon as bug 74815 is fixed.
------- Comment #5 From 2012-02-22 09:50:38 PST -------
Created an attachment (id=128232) [details]
CSS unicode patch

I have a draft solution of the problem which uses surrogate pairs.

What do you think about it?
------- Comment #6 From 2012-02-22 10:15:47 PST -------
Please don't use custom written arithmetics for UTF-16. There are macros for this in ICU, and we have versions of these for platforms that don't have ICU in wtf/unicode/UnicodeMacrosFromICU.h.

The most important part of this fix will be adding test cases, and making sure that all the changes agree with what other browsers do. For example, how should "\110000" be handled (the spec says "MAY", which is unhelpful)?
------- Comment #7 From 2012-02-22 14:20:23 PST -------
(In reply to comment #6)
> The most important part of this fix will be adding test cases, and making sure that all the changes agree with what other browsers do. For example, how should "\110000" be handled (the spec says "MAY", which is unhelpful)?

See this thread: http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536 The consensus was that UAs that don’t follow the spec in this regard are buggy and should be fixed.
------- Comment #8 From 2012-02-22 14:29:56 PST -------
This e-mail thread is a great source of ideas for tests.
------- Comment #9 From 2012-02-29 09:25:17 PST -------
Created an attachment (id=129463) [details]
patch
------- Comment #10 From 2012-02-29 11:21:56 PST -------
(From update of attachment 129463 [details])
Attachment 129463 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/11732282
------- Comment #11 From 2012-03-13 07:04:09 PST -------
Created an attachment (id=131605) [details]
patch

I have made some corrections on my patch; now it builds on Qt 4.8.
------- Comment #12 From 2012-04-03 03:53:45 PST -------
Since a couple of weeks passed since I uploaded my latest patch, it is going to be obsolete soon. Before that, I would like to know if it's correct or not. Could you guys take a look at it please? Thanks for advance: Szilárd.
------- Comment #13 From 2012-04-03 08:54:43 PST -------
The patch looks reasonable to me at cursory reading.
------- Comment #14 From 2012-04-19 16:53:06 PST -------
(From update of attachment 131605 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=131605&action=review

> Source/WebCore/css/CSSParser.cpp:7759
> +            // Lead/High surrogate character

This comment looks useless.

> Source/WebCore/css/CSSParser.cpp:7761
> +            *result = U16_LEAD(unicode);
> +            ++result;

We can write this as: *result++ = U16_LEAD(unicode);

> Source/WebCore/css/CSSParser.cpp:7762
> +            // Trail/Low surrogate character

This comment looks useless.
------- Comment #15 From 2012-04-20 07:29:21 PST -------
Created an attachment (id=138091) [details]
patch

Thank you for your suggestions and your ideas, and I fixed the patch also.
------- Comment #16 From 2012-04-23 00:47:55 PST -------
(From update of attachment 138091 [details])
Clearing flags on attachment: 138091

Committed r114876: <http://trac.webkit.org/changeset/114876>
------- Comment #17 From 2012-04-23 00:48:02 PST -------
All reviewed patches have been landed.  Closing bug.