WebKit is the only engine that doesn’t support CSS escape sequences as per http://www.w3.org/TR/CSS21/syndata.html#characters. E.g. `\1d306 ` or `\01d306` are supposed to be escape sequences for the “tetragram for centre” symbol (U+1D306). It would be better for interoperability if WebKit supported these.
OTOH, escape sequences of the form `\d834\df06 ` (broken up in UTF-16 code units) do work in WebKit (although I cannot find any mention of these in the spec).
Side note: Gecko is the only engine that doesn’t support escape sequences of the form `\d834\df06 `. See https://bugzilla.mozilla.org/show_bug.cgi?id=717529.
Related issue: WebKit discards the entire declaration when a surrogate pair in the wrong order is used. http://jsfiddle.net/mathias/wvPdr/ Gecko, Opera and IE treat a surrogate pair in the wrong order as invalid and replace it with two U+FFFD characters. This seems like the most sensible thing to do.
Also see: http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536
The main issue reported here is already tracked as bug 74815. Let's use this bug for surrogates issues only.
As per http://lists.w3.org/Archives/Public/www-style/2012Feb/0006.html, the surrogate pair syntax for CSS escape sequences should NOT be supported. It should be removed from WebKit as soon as bug 74815 is fixed.
Created attachment 128232 [details]
CSS unicode patch
I have a draft solution of the problem which uses surrogate pairs.
What do you think about it?
Please don't use custom written arithmetics for UTF-16. There are macros for this in ICU, and we have versions of these for platforms that don't have ICU in wtf/unicode/UnicodeMacrosFromICU.h.
The most important part of this fix will be adding test cases, and making sure that all the changes agree with what other browsers do. For example, how should "\110000" be handled (the spec says "MAY", which is unhelpful)?
(In reply to comment #6)
> The most important part of this fix will be adding test cases, and making sure that all the changes agree with what other browsers do. For example, how should "\110000" be handled (the spec says "MAY", which is unhelpful)?
See this thread: http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536 The consensus was that UAs that don’t follow the spec in this regard are buggy and should be fixed.
This e-mail thread is a great source of ideas for tests.
Created attachment 129463 [details]
Comment on attachment 129463 [details]
Attachment 129463 [details] did not pass qt-ews (qt):
Created attachment 131605 [details]
I have made some corrections on my patch; now it builds on Qt 4.8.
Since a couple of weeks passed since I uploaded my latest patch, it is going to be obsolete soon. Before that, I would like to know if it's correct or not. Could you guys take a look at it please? Thanks for advance: Szilárd.
The patch looks reasonable to me at cursory reading.
Comment on attachment 131605 [details]
View in context: https://bugs.webkit.org/attachment.cgi?id=131605&action=review
> + // Lead/High surrogate character
This comment looks useless.
> + *result = U16_LEAD(unicode);
> + ++result;
We can write this as: *result++ = U16_LEAD(unicode);
> + // Trail/Low surrogate character
This comment looks useless.
Created attachment 138091 [details]
Thank you for your suggestions and your ideas, and I fixed the patch also.
Comment on attachment 138091 [details]
Clearing flags on attachment: 138091
Committed r114876: <http://trac.webkit.org/changeset/114876>
All reviewed patches have been landed. Closing bug.