WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
17964
WebKit has problem decoding characters in GBK but not in CP936
https://bugs.webkit.org/show_bug.cgi?id=17964
Summary
WebKit has problem decoding characters in GBK but not in CP936
Jiang Jiang
Reported
2008-03-20 02:55:23 PDT
WebKit treat GB2312 encoding as GBK, but it fails to decoded characters in GBK but not in CP936, their code points (in GBK) is 0xFE50 to 0xFE9F. The GBK <-> Unicode mapping of all these characters is: 0xFE50 0x2E81 #CJK UNIFIED IDEOGRAPH 0xFE51 0xE816 #CJK UNIFIED IDEOGRAPH 0xFE52 0xE817 #CJK UNIFIED IDEOGRAPH 0xFE53 0xE818 #CJK UNIFIED IDEOGRAPH 0xFE54 0x2E84 #CJK UNIFIED IDEOGRAPH 0xFE55 0x3473 #CJK UNIFIED IDEOGRAPH 0xFE56 0x3447 #CJK UNIFIED IDEOGRAPH 0xFE57 0x2E88 #CJK UNIFIED IDEOGRAPH 0xFE58 0x2E8B #CJK UNIFIED IDEOGRAPH 0xFE59 0xE81E #CJK UNIFIED IDEOGRAPH 0xFE5A 0x359E #CJK UNIFIED IDEOGRAPH 0xFE5B 0x361A #CJK UNIFIED IDEOGRAPH 0xFE5C 0x360E #CJK UNIFIED IDEOGRAPH 0xFE5D 0x2E8C #CJK UNIFIED IDEOGRAPH 0xFE5E 0x2E97 #CJK UNIFIED IDEOGRAPH 0xFE5F 0x396E #CJK UNIFIED IDEOGRAPH 0xFE60 0x3918 #CJK UNIFIED IDEOGRAPH 0xFE61 0xE826 #CJK UNIFIED IDEOGRAPH 0xFE62 0x39CF #CJK UNIFIED IDEOGRAPH 0xFE63 0x39DF #CJK UNIFIED IDEOGRAPH 0xFE64 0x3A73 #CJK UNIFIED IDEOGRAPH 0xFE65 0x39D0 #CJK UNIFIED IDEOGRAPH 0xFE66 0xE82B #CJK UNIFIED IDEOGRAPH 0xFE67 0xE82C #CJK UNIFIED IDEOGRAPH 0xFE68 0x3B4E #CJK UNIFIED IDEOGRAPH 0xFE69 0x3C6E #CJK UNIFIED IDEOGRAPH 0xFE6A 0x3CE0 #CJK UNIFIED IDEOGRAPH 0xFE6B 0x2EA7 #CJK UNIFIED IDEOGRAPH 0xFE6C 0xE831 #CJK UNIFIED IDEOGRAPH 0xFE6D 0xE832 #CJK UNIFIED IDEOGRAPH 0xFE6E 0x2EAA #CJK UNIFIED IDEOGRAPH 0xFE6F 0x4056 #CJK UNIFIED IDEOGRAPH 0xFE70 0x415F #CJK UNIFIED IDEOGRAPH 0xFE71 0x2EAE #CJK UNIFIED IDEOGRAPH 0xFE72 0x4337 #CJK UNIFIED IDEOGRAPH 0xFE73 0x2EB3 #CJK UNIFIED IDEOGRAPH 0xFE74 0x2EB6 #CJK UNIFIED IDEOGRAPH 0xFE75 0x2EB7 #CJK UNIFIED IDEOGRAPH 0xFE76 0xE83B #CJK UNIFIED IDEOGRAPH 0xFE77 0x43B1 #CJK UNIFIED IDEOGRAPH 0xFE78 0x43AC #CJK UNIFIED IDEOGRAPH 0xFE79 0x2EBB #CJK UNIFIED IDEOGRAPH 0xFE7A 0x43DD #CJK UNIFIED IDEOGRAPH 0xFE7B 0x44D6 #CJK UNIFIED IDEOGRAPH 0xFE7C 0x4661 #CJK UNIFIED IDEOGRAPH 0xFE7D 0x464C #CJK UNIFIED IDEOGRAPH 0xFE7E 0xE843 #CJK UNIFIED IDEOGRAPH 0xFE80 0x4723 #CJK UNIFIED IDEOGRAPH 0xFE81 0x4729 #CJK UNIFIED IDEOGRAPH 0xFE82 0x477C #CJK UNIFIED IDEOGRAPH 0xFE83 0x478D #CJK UNIFIED IDEOGRAPH 0xFE84 0x2ECA #CJK UNIFIED IDEOGRAPH 0xFE85 0x4947 #CJK UNIFIED IDEOGRAPH 0xFE86 0x497A #CJK UNIFIED IDEOGRAPH 0xFE87 0x497D #CJK UNIFIED IDEOGRAPH 0xFE88 0x4982 #CJK UNIFIED IDEOGRAPH 0xFE89 0x4983 #CJK UNIFIED IDEOGRAPH 0xFE8A 0x4985 #CJK UNIFIED IDEOGRAPH 0xFE8B 0x4986 #CJK UNIFIED IDEOGRAPH 0xFE8C 0x499F #CJK UNIFIED IDEOGRAPH 0xFE8D 0x499B #CJK UNIFIED IDEOGRAPH 0xFE8E 0x49B7 #CJK UNIFIED IDEOGRAPH 0xFE8F 0x49B6 #CJK UNIFIED IDEOGRAPH 0xFE90 0xE854 #CJK UNIFIED IDEOGRAPH 0xFE91 0xE855 #CJK UNIFIED IDEOGRAPH 0xFE92 0x4CA3 #CJK UNIFIED IDEOGRAPH 0xFE93 0x4C9F #CJK UNIFIED IDEOGRAPH 0xFE94 0x4CA0 #CJK UNIFIED IDEOGRAPH 0xFE95 0x4CA1 #CJK UNIFIED IDEOGRAPH 0xFE96 0x4C77 #CJK UNIFIED IDEOGRAPH 0xFE97 0x4CA2 #CJK UNIFIED IDEOGRAPH 0xFE98 0x4D13 #CJK UNIFIED IDEOGRAPH 0xFE99 0x4D14 #CJK UNIFIED IDEOGRAPH 0xFE9A 0x4D15 #CJK UNIFIED IDEOGRAPH 0xFE9B 0x4D16 #CJK UNIFIED IDEOGRAPH 0xFE9C 0x4D17 #CJK UNIFIED IDEOGRAPH 0xFE9D 0x4D18 #CJK UNIFIED IDEOGRAPH 0xFE9E 0x4D19 #CJK UNIFIED IDEOGRAPH 0xFE9F 0x4DAE #CJK UNIFIED IDEOGRAPH See also:
http://en.wikipedia.org/wiki/GBK
for the differences between GBK and CP936.
Attachments
test case
(907 bytes, text/html)
2008-03-20 03:11 PDT
,
Alexey Proskuryakov
no flags
Details
state of the missing characters
(32.28 KB, image/gif)
2008-03-20 06:02 PDT
,
Jiang Jiang
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2008-03-20 03:11:17 PDT
Created
attachment 19895
[details]
test case Test case for these 79 characters. Wikipedia mentions 95, we'll need to see what the remaining 16 are mapped to. IE seems to have a different mapping for FE94.
Alexey Proskuryakov
Comment 2
2008-03-20 03:21:53 PDT
Filed
bug 17965
for a display issue that we have with these characters.
Jiang Jiang
Comment 3
2008-03-20 06:02:14 PDT
Created
attachment 19896
[details]
state of the missing characters The figure shows the state of 80 missing characters, it's indexed by GBK, with their PUA code point and their (later assigned) CJK-ExtA code point. U+9FA6 to U+9FBB were code points for CJK Compatibility Ideographs added in Unicode 5.0. (Show in light yellow background in this figure.) Characters with light grey background are those who still only have a PUA code point, no other code points are assigned yet.
Jiang Jiang
Comment 4
2008-03-20 06:26:29 PDT
well, I managed to find another 13 missing characters: A989 U+303E ЕЅ A98A U+2FF0 Еі A98B U+2FF1 ЕІ A98C U+2FF2 Еї A98D U+2FF3 ЕЇ A98E U+2FF4 Еј A98F U+2FF5 ЕЈ A990 U+2FF6 Ељ A991 U+2FF7 ЕЉ A992 U+2FF8 Ењ A993 U+2FF9 ЕЊ A994 U+2FFA Ећ A995 U+2FFB ЕЋ BTW, I'm not sure if we should add this: A2E3 U+20AC бс
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug