Bug 76152 - Incorrect handling of CSS escape sequences that encode surrogates
Summary: Incorrect handling of CSS escape sequences that encode surrogates
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL: http://jsfiddle.net/mathias/jY7ra/
Keywords:
: 74815 (view as bug list)
Depends on:
Blocks: 69083
  Show dependency treegraph
 
Reported: 2012-01-12 00:05 PST by Mathias Bynens
Modified: 2015-04-30 09:22 PDT (History)
10 users (show)

See Also:


Attachments
CSS unicode patch (1.79 KB, patch)
2012-02-22 09:50 PST, Szilard Ledan
no flags Details | Formatted Diff | Diff
patch (6.95 KB, patch)
2012-02-29 09:25 PST, Szilard Ledan
webkit-ews: commit-queue-
Details | Formatted Diff | Diff
patch (6.95 KB, patch)
2012-03-13 07:04 PDT, Szilard Ledan
tkent: review+
tkent: commit-queue-
Details | Formatted Diff | Diff
patch (5.69 KB, patch)
2012-04-20 07:29 PDT, Szilard Ledan
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mathias Bynens 2012-01-12 00:05:48 PST
WebKit is the only engine that doesn’t support CSS escape sequences as per http://www.w3.org/TR/CSS21/syndata.html#characters. E.g.  `\1d306 ` or `\01d306` are supposed to be escape sequences for the “tetragram for centre” symbol (U+1D306). It would be better for interoperability if WebKit supported these.

OTOH, escape sequences of the form `\d834\df06 ` (broken up in UTF-16 code units) do work in WebKit (although I cannot find any mention of these in the spec).
Comment 1 Mathias Bynens 2012-01-12 00:17:05 PST
Side note: Gecko is the only engine that doesn’t support escape sequences of the form `\d834\df06 `. See https://bugzilla.mozilla.org/show_bug.cgi?id=717529.
Comment 2 Mathias Bynens 2012-01-12 08:03:44 PST
Related issue: WebKit discards the entire declaration when a surrogate pair in the wrong order is used. http://jsfiddle.net/mathias/wvPdr/ Gecko, Opera and IE treat a surrogate pair in the wrong order as invalid and replace it with two U+FFFD characters. This seems like the most sensible thing to do.

Also see: http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536
Comment 3 Alexey Proskuryakov 2012-01-12 09:10:41 PST
The main issue reported here is already tracked as bug 74815. Let's use this bug for surrogates issues only.
Comment 4 Mathias Bynens 2012-02-01 10:23:05 PST
As per http://lists.w3.org/Archives/Public/www-style/2012Feb/0006.html, the surrogate pair syntax for CSS escape sequences should NOT be supported. It should be removed from WebKit as soon as bug 74815 is fixed.
Comment 5 Szilard Ledan 2012-02-22 09:50:38 PST
Created attachment 128232 [details]
CSS unicode patch

I have a draft solution of the problem which uses surrogate pairs.

What do you think about it?
Comment 6 Alexey Proskuryakov 2012-02-22 10:15:47 PST
Please don't use custom written arithmetics for UTF-16. There are macros for this in ICU, and we have versions of these for platforms that don't have ICU in wtf/unicode/UnicodeMacrosFromICU.h.

The most important part of this fix will be adding test cases, and making sure that all the changes agree with what other browsers do. For example, how should "\110000" be handled (the spec says "MAY", which is unhelpful)?
Comment 7 Mathias Bynens 2012-02-22 14:20:23 PST
(In reply to comment #6)
> The most important part of this fix will be adding test cases, and making sure that all the changes agree with what other browsers do. For example, how should "\110000" be handled (the spec says "MAY", which is unhelpful)?

See this thread: http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536 The consensus was that UAs that don’t follow the spec in this regard are buggy and should be fixed.
Comment 8 Alexey Proskuryakov 2012-02-22 14:29:56 PST
This e-mail thread is a great source of ideas for tests.
Comment 9 Szilard Ledan 2012-02-29 09:25:17 PST
Created attachment 129463 [details]
patch
Comment 10 Early Warning System Bot 2012-02-29 11:21:56 PST
Comment on attachment 129463 [details]
patch

Attachment 129463 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/11732282
Comment 11 Szilard Ledan 2012-03-13 07:04:09 PDT
Created attachment 131605 [details]
patch

I have made some corrections on my patch; now it builds on Qt 4.8.
Comment 12 Szilard Ledan 2012-04-03 03:53:45 PDT
Since a couple of weeks passed since I uploaded my latest patch, it is going to be obsolete soon. Before that, I would like to know if it's correct or not. Could you guys take a look at it please? Thanks for advance: Szilárd.
Comment 13 Alexey Proskuryakov 2012-04-03 08:54:43 PDT
The patch looks reasonable to me at cursory reading.
Comment 14 Kent Tamura 2012-04-19 16:53:06 PDT
Comment on attachment 131605 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=131605&action=review

> Source/WebCore/css/CSSParser.cpp:7759
> +            // Lead/High surrogate character

This comment looks useless.

> Source/WebCore/css/CSSParser.cpp:7761
> +            *result = U16_LEAD(unicode);
> +            ++result;

We can write this as: *result++ = U16_LEAD(unicode);

> Source/WebCore/css/CSSParser.cpp:7762
> +            // Trail/Low surrogate character

This comment looks useless.
Comment 15 Szilard Ledan 2012-04-20 07:29:21 PDT
Created attachment 138091 [details]
patch

Thank you for your suggestions and your ideas, and I fixed the patch also.
Comment 16 WebKit Review Bot 2012-04-23 00:47:55 PDT
Comment on attachment 138091 [details]
patch

Clearing flags on attachment: 138091

Committed r114876: <http://trac.webkit.org/changeset/114876>
Comment 17 WebKit Review Bot 2012-04-23 00:48:02 PDT
All reviewed patches have been landed.  Closing bug.
Comment 18 Alexey Proskuryakov 2015-04-30 09:22:12 PDT
*** Bug 74815 has been marked as a duplicate of this bug. ***