Bug 76152

Summary: Incorrect handling of CSS escape sequences that encode surrogates
Product: WebKit Reporter: Mathias Bynens <mathias>
Component: CSSAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, eoconnor, macpherson, mathias, menard, onorpj, peter, szledan, webkit.review.bot, zherczeg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
URL: http://jsfiddle.net/mathias/jY7ra/
Bug Depends on:    
Bug Blocks: 69083    
Attachments:
Description Flags
CSS unicode patch
none
patch
webkit-ews: commit-queue-
patch
tkent: review+, tkent: commit-queue-
patch none

Mathias Bynens
Reported 2012-01-12 00:05:48 PST
WebKit is the only engine that doesn’t support CSS escape sequences as per http://www.w3.org/TR/CSS21/syndata.html#characters. E.g. `\1d306 ` or `\01d306` are supposed to be escape sequences for the “tetragram for centre” symbol (U+1D306). It would be better for interoperability if WebKit supported these. OTOH, escape sequences of the form `\d834\df06 ` (broken up in UTF-16 code units) do work in WebKit (although I cannot find any mention of these in the spec).
Attachments
CSS unicode patch (1.79 KB, patch)
2012-02-22 09:50 PST, Szilard Ledan
no flags
patch (6.95 KB, patch)
2012-02-29 09:25 PST, Szilard Ledan
webkit-ews: commit-queue-
patch (6.95 KB, patch)
2012-03-13 07:04 PDT, Szilard Ledan
tkent: review+
tkent: commit-queue-
patch (5.69 KB, patch)
2012-04-20 07:29 PDT, Szilard Ledan
no flags
Mathias Bynens
Comment 1 2012-01-12 00:17:05 PST
Side note: Gecko is the only engine that doesn’t support escape sequences of the form `\d834\df06 `. See https://bugzilla.mozilla.org/show_bug.cgi?id=717529.
Mathias Bynens
Comment 2 2012-01-12 08:03:44 PST
Related issue: WebKit discards the entire declaration when a surrogate pair in the wrong order is used. http://jsfiddle.net/mathias/wvPdr/ Gecko, Opera and IE treat a surrogate pair in the wrong order as invalid and replace it with two U+FFFD characters. This seems like the most sensible thing to do. Also see: http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536
Alexey Proskuryakov
Comment 3 2012-01-12 09:10:41 PST
The main issue reported here is already tracked as bug 74815. Let's use this bug for surrogates issues only.
Mathias Bynens
Comment 4 2012-02-01 10:23:05 PST
As per http://lists.w3.org/Archives/Public/www-style/2012Feb/0006.html, the surrogate pair syntax for CSS escape sequences should NOT be supported. It should be removed from WebKit as soon as bug 74815 is fixed.
Szilard Ledan
Comment 5 2012-02-22 09:50:38 PST
Created attachment 128232 [details] CSS unicode patch I have a draft solution of the problem which uses surrogate pairs. What do you think about it?
Alexey Proskuryakov
Comment 6 2012-02-22 10:15:47 PST
Please don't use custom written arithmetics for UTF-16. There are macros for this in ICU, and we have versions of these for platforms that don't have ICU in wtf/unicode/UnicodeMacrosFromICU.h. The most important part of this fix will be adding test cases, and making sure that all the changes agree with what other browsers do. For example, how should "\110000" be handled (the spec says "MAY", which is unhelpful)?
Mathias Bynens
Comment 7 2012-02-22 14:20:23 PST
(In reply to comment #6) > The most important part of this fix will be adding test cases, and making sure that all the changes agree with what other browsers do. For example, how should "\110000" be handled (the spec says "MAY", which is unhelpful)? See this thread: http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536 The consensus was that UAs that don’t follow the spec in this regard are buggy and should be fixed.
Alexey Proskuryakov
Comment 8 2012-02-22 14:29:56 PST
This e-mail thread is a great source of ideas for tests.
Szilard Ledan
Comment 9 2012-02-29 09:25:17 PST
Early Warning System Bot
Comment 10 2012-02-29 11:21:56 PST
Szilard Ledan
Comment 11 2012-03-13 07:04:09 PDT
Created attachment 131605 [details] patch I have made some corrections on my patch; now it builds on Qt 4.8.
Szilard Ledan
Comment 12 2012-04-03 03:53:45 PDT
Since a couple of weeks passed since I uploaded my latest patch, it is going to be obsolete soon. Before that, I would like to know if it's correct or not. Could you guys take a look at it please? Thanks for advance: Szilárd.
Alexey Proskuryakov
Comment 13 2012-04-03 08:54:43 PDT
The patch looks reasonable to me at cursory reading.
Kent Tamura
Comment 14 2012-04-19 16:53:06 PDT
Comment on attachment 131605 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=131605&action=review > Source/WebCore/css/CSSParser.cpp:7759 > + // Lead/High surrogate character This comment looks useless. > Source/WebCore/css/CSSParser.cpp:7761 > + *result = U16_LEAD(unicode); > + ++result; We can write this as: *result++ = U16_LEAD(unicode); > Source/WebCore/css/CSSParser.cpp:7762 > + // Trail/Low surrogate character This comment looks useless.
Szilard Ledan
Comment 15 2012-04-20 07:29:21 PDT
Created attachment 138091 [details] patch Thank you for your suggestions and your ideas, and I fixed the patch also.
WebKit Review Bot
Comment 16 2012-04-23 00:47:55 PDT
Comment on attachment 138091 [details] patch Clearing flags on attachment: 138091 Committed r114876: <http://trac.webkit.org/changeset/114876>
WebKit Review Bot
Comment 17 2012-04-23 00:48:02 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 18 2015-04-30 09:22:12 PDT
*** Bug 74815 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.