WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED DUPLICATE of
bug 76152
74815
Non-BMP Unicode character codes aren't properly unescaped in CSS
https://bugs.webkit.org/show_bug.cgi?id=74815
Summary
Non-BMP Unicode character codes aren't properly unescaped in CSS
P.J. Onori
Reported
2011-12-18 10:42:31 PST
Glyphs in what I'm suspecting to be anything above the Basic Multilingual Plane (0x0000-0xffff) gets a diamond with a question mark in the middle, even when a glyph exists at the value.
Attachments
A zip file containing a demonstration of the bug
(74.93 KB, application/zip)
2011-12-18 22:34 PST
,
P.J. Onori
no flags
Details
demo of literal character working fine
(152 bytes, text/html)
2011-12-20 17:27 PST
,
Alexey Proskuryakov
no flags
Details
reduced test case
(297 bytes, text/html)
2012-01-12 11:01 PST
,
Mathias Bynens
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2011-12-18 21:51:56 PST
Could you please provide a test case? Given that you mention the diamond with a question mark, I suspect that you're seeing this problem on Mac, but then I'm confused because Mac WebKit certainly supports non-BMP characters. Are you seeing this in Safari on Mac, some other WebKit based browser on Mac, or some entirely different platform?
P.J. Onori
Comment 2
2011-12-18 22:34:40 PST
Created
attachment 119817
[details]
A zip file containing a demonstration of the bug The zip file contains an HTML file which displays characters at specific unicode values. The table shows the corresponding Unicode value on the right-most column.
P.J. Onori
Comment 3
2011-12-18 22:36:02 PST
I've checked this on Safari v5.1, Chrome v16 and the latest Webkit build - all on the Mac. Let me know if there's anything else I can provide.
Alexey Proskuryakov
Comment 4
2011-12-20 17:26:59 PST
This is specifically an issue with parsing strings like '\01f3a4'. It will work if you just paste a Unicode character in your CSS (and add a @charset rule to make sure it's decoded correctly). I don't know if we're matching the spec or not here.
Alexey Proskuryakov
Comment 5
2011-12-20 17:27:22 PST
Created
attachment 120122
[details]
demo of literal character working fine
P.J. Onori
Comment 6
2011-12-20 20:24:45 PST
Thanks Alexey. That works for me in Safari 5.1 and Webkit (Chrome doesn't seem to support it). I poked around as well and couldn't discern if this isn't appropriate. Pragmatically, it may be prove frustrating for people looking at the CSS file with a typeface that doesn't contain those glyphs. But that's not technically your problem. ;)
P.J. Onori
Comment 7
2011-12-20 20:26:42 PST
My lord, apologies for such a poorly-written comment. It's been a long day...
Mathias Bynens
Comment 8
2012-01-12 10:46:55 PST
(In reply to
comment #4
)
> This is specifically an issue with parsing strings like '\01f3a4'. It will work if you just paste a Unicode character in your CSS (and add a @charset rule to make sure it's decoded correctly). > > I don't know if we're matching the spec or not here.
FWIW, the spec is here:
http://www.w3.org/TR/CSS21/syndata.html#characters
/
http://www.w3.org/TR/css3-syntax/#characters
It doesn’t mention anything about UTF-16 or surrogate pairs in escapes (which are thus non-standard, although they happen to be supported in WebKit); only Unicode / ISO 10646 code points are allowed in CSS escape sequences. This kind of CSS escape sequence doesn’t work in WebKit for characters outside the BMP, which is what this bug is about. For more info, see this mailing list discussion:
http://lists.w3.org/Archives/Public/www-style/2012Jan/thread.html#msg536
For example, `\1d306 ` or `\01d306` are supposed to be CSS escape sequences for the “tetragram for centre” symbol (U+1D306), but they currently don’t work in WebKit. (In reply to
comment #5
)
> Created an attachment (id=120122) [details] > reduced test case
I’m not sure how that test case helps, as it doesn’t contain a CSS escape sequence, just the literal character. Am I missing something? Here’s an appropriate test case:
http://jsfiddle.net/mathias/jY7ra/
The first escape sequence (used with `html:before`) is the standard one. WebKit is the only engine this fails in.
Mathias Bynens
Comment 9
2012-01-12 11:01:14 PST
Created
attachment 122271
[details]
reduced test case
Alexey Proskuryakov
Comment 10
2012-01-12 11:46:53 PST
Comment on
attachment 120122
[details]
demo of literal character working fine
> I’m not sure how that test case helps
It was meant as a demonstration that the issue is more limited in scope than originally reported. I chose a poor description for the attachment, sorry for the confusion.
Zoltan Herczeg
Comment 11
2012-01-12 21:25:01 PST
This is a tokenizer level issue (AP thanks for CC'ing me). Would not be much trouble to fix it in the custom written tokenizer after it is landed, just adding some extra parsing to the escape sequences.
Mathias Bynens
Comment 12
2012-01-24 08:15:10 PST
Note that this also affects `document.querySelector` and `document.querySelectorAll`. Failing test case: data:text/html;charset=utf-8,%3C!DOCTYPE%20html%3E%3Ctitle%3EMothereffing%20CSS%20escapes%20example%3C%2Ftitle%3E%3Cstyle%3Epre%7Bbackground%3A%23eee%3Bpadding%3A.5em%7Dp%7Bdisplay%3Anone%7D%23ab%5Ca9%20de%5C1d306%20fg%7Bdisplay%3Ablock%7D%3C%2Fstyle%3E%3Ch1%3E%3Ca%20href%3D%22http%3A%2F%2Fmothereff.in%2Fcss-escapes%23ab%25C2%25A9de%25F0%259D%258C%2586fg%22%3EMothereffing%20CSS%20escapes%3C%2Fa%3E%20example%3C%2Fh1%3E%3Cpre%3E%3Ccode%3Eab%C2%A9de%F0%9D%8C%86fg%3C%2Fcode%3E%3C%2Fpre%3E%3Cp%20id%3D%22ab%C2%A9de%F0%9D%8C%86fg%22%3EIf%20you%20can%20read%20this%2C%20the%20escaped%20CSS%20selector%20worked.%20%3C%2Fp%3E%3Cscript%3Edocument.getElementById('ab%C2%A9de%F0%9D%8C%86fg').innerHTML%20%2B%3D%20'%20%3Ccode%3Edocument.getElementById%3C%2Fcode%3E%20worked.'%3Bdocument.querySelector('%23ab%5C%5Ca9%20de%5C%5C1d306%20fg').innerHTML%2B%3D'%20%3Ccode%3Edocument.querySelector%3C%2Fcode%3E%20worked.'%3C%2Fscript%3E (In reply to
comment #11
)
> This is a tokenizer level issue (AP thanks for CC'ing me). Would not be much trouble to fix it in the custom written tokenizer after it is landed, just adding some extra parsing to the escape sequences.
Out of curiosity, when will the custom-written tokenizer land (if it hasn’t already)? Any bug tickets I can subscribe to?
Zoltan Herczeg
Comment 13
2012-01-24 12:27:41 PST
> Out of curiosity, when will the custom-written tokenizer land (if it hasn’t already)? Any bug tickets I can subscribe to?
https://bugs.webkit.org/show_bug.cgi?id=70107
I just got an r+ to it, but I will land it tomorrow because I want to see the bots.
Mathias Bynens
Comment 14
2012-01-24 12:43:23 PST
FWIW, I’ve just deployed some changes to my CSS escaper tool to make it easier to create test cases for this bug. E.g. click the “example” link on
http://mothereff.in/css-escapes#1%F0%9D%8C%86
. (In reply to
comment #13
)
> I just got an r+ to it, but I will land it tomorrow because I want to see the bots.
That’s awesome news!
Mathias Bynens
Comment 15
2012-01-30 00:32:32 PST
https://bugs.webkit.org/show_bug.cgi?id=70107
is now RESOLVED FIXED, landed here:
http://trac.webkit.org/changeset/106217
Mathias Bynens
Comment 16
2012-01-30 00:36:20 PST
Better test case that will show a red/lime background depending on success/failure: data:text/html;charset=utf-8,<!DOCTYPE%20html><title>Mothereffing%20CSS%20escapes%20example<%2Ftitle><style>pre%7Bbackground%3A%23eee%3Bpadding%3A.5em%7D.test%7Bdisplay%3Anone%7D%23b%5Ca9%20de%5C1d306%20fg%7Bdisplay%3Ablock%7D.pass%7Bbackground%3Alime%7D.fail%7Bbackground%3Ared%7D<%2Fstyle><h1><a%20href%3D"http%3A%2F%2Fmothereff.in%2Fcss-escapes%231b%25C2%25A9de%25F0%259D%258C%2586fg">Mothereffing%20CSS%20escapes<%2Fa>%20example<%2Fh1><pre><code>b%C2%A9de%F0%9D%8C%86fg<%2Fcode><%2Fpre><p%20id%3D"b%C2%A9de%F0%9D%8C%86fg"%20class%3Dtest>If%20you%20can%20read%20this%2C%20the%20escaped%20CSS%20selector%20worked.%20<%2Fp><p>Standard%20CSS%20character%20escape%20sequences%20for%20supplementary%20Unicode%20characters%20aren%E2%80%99t%20currently%20supported%20in%20WebKit.%20<strong>This%20test%20case%20will%20fail%20in%20those%20browsers.<%2Fstrong>%20It%E2%80%99s%20better%20to%20leave%20these%20characters%20unescaped.<%2Fp><script>var%20el%3Ddocument.getElementsByTagName('p')%5B0%5D%3Btry%7Bdocument.getElementById('b%5Cxa9de%5Cud834%5Cudf06fg').innerHTML%20%2B%3D%20'%20<code>document.getElementById<%2Fcode>%20worked.'%3Bdocument.querySelector('%23b%5C%5Ca9%20de%5C%5C1d306%20fg').innerHTML%2B%3D'%20<code>document.querySelector<%2Fcode>%20worked.'%3Bel.className%3D'pass'%7Dcatch(e)%7Bel.innerHTML%3D'FAIL'%3Bel.className%3D'fail'%7D<%2Fscript> Short URL:
http://mths.be/bel
Mathias Bynens
Comment 17
2015-04-30 00:26:09 PDT
This seems fixed. Feel free to mark this bug as RESOLVED FIXED.
Alexey Proskuryakov
Comment 18
2015-04-30 09:22:12 PDT
*** This bug has been marked as a duplicate of
bug 76152
***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug