WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
4195
REGRESSION: KOI8-U encoding no longer supported
https://bugs.webkit.org/show_bug.cgi?id=4195
Summary
REGRESSION: KOI8-U encoding no longer supported
Alexey Proskuryakov
Reported
2005-07-29 04:55:39 PDT
Steps to reproduce: 1. Go to
http://www.oper.ru
. 2. Manually set the encoding to KOI8-U Results: empty page, empty "view source", errors in debug log: ================= ERROR: the ICU Converter won't convert from text encoding 0xA08, error 4 (/Users/ap/WebKit/WebCore/kwq/KWQTextCodec.mm:379 UErrorCode KWQTextDecoder::createICUConverter()) ================= Marking P1, since this is a regression.
Attachments
what is missing from ICU
(4.84 KB, text/plain)
2005-12-12 14:15 PST
,
Alexey Proskuryakov
no flags
Details
what is missing from ICU
(4.88 KB, text/plain)
2005-12-14 13:36 PST
,
Alexey Proskuryakov
no flags
Details
proposed fix
(82.28 KB, patch)
2006-07-14 13:03 PDT
,
Alexey Proskuryakov
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2005-07-29 05:00:16 PDT
Apparently caused by changes from 2005-07-12: "Switched over from TEC to ICU for unicode text conversion." It would be interesting to know why it was decided to use ICU directly, not CFString (which is open source, cross-platform, and well optimized for performance).
Alexey Proskuryakov
Comment 2
2005-08-15 11:17:47 PDT
Apparently, the reason is that ICU 3.2 (version shipped with 10.4) didn't support KOI8-U. This encoding has been added in ICU 3.4, so it won't be an issue in 10.5. Still, not using CFString looks strange to me... Apple strongly discourages external developers from directly using ICU - eat your own dog food! :)
Alexey Proskuryakov
Comment 3
2005-10-01 11:46:12 PDT
ICU usage has been discussed elsewhere, e.g. in
bug 4821
.
Darin Adler
Comment 4
2005-10-10 08:45:48 PDT
We're going to need to fix this one way or another. Someone should check with Deborah Goldsmith on why ICU is not handling this encoding. It could be as simple as fixing the string we pass in to ICU.
Alexey Proskuryakov
Comment 5
2005-10-10 09:56:00 PDT
Here is the ICU bug: <
http://dev.icu-project.org/cgi-bin/icu-bugs/closed
? id=1143;expression=koi;user=guest;searchclosed=1>. Support for KOI8-U just isn't present in the ICU version shipped with Tiger.
Adele Peterson
Comment 6
2005-10-10 10:00:32 PDT
we'll have to restore the old TEC codepath until ICU has support for all the codesets we need.
Alice Liu
Comment 7
2005-11-04 13:59:43 PST
related to <
rdar://problem/3546838
> some more detail about behavior with different builds: behavior observed in 416.12 1) load
http://www.oper.ru/
- page shows up but encoded wrong 2) change encoding to from default to KOI8-U - page shows up correctly in ukrainian (it doesn't turn blank) behavior observed in TOT: 1) load
http://www.oper.ru/
- page shows up correctly in ukrainian 2) change encoding from default to KOI8-U - page turns blank
Alexey Proskuryakov
Comment 8
2005-11-04 14:14:15 PST
(In reply to
comment #7
) ToT renders this site correctly because of a fix in
bug 3590
.
Daniel Udey
Comment 9
2005-11-09 06:32:53 PST
(In reply to
comment #8
)
> ToT renders this site correctly because of a fix in
bug 3590
.
This bug is confirmed fixed in the latest ToT, marking as such.
Alexey Proskuryakov
Comment 10
2005-11-09 09:50:07 PST
(In reply to
comment #9
)
> > ToT renders this site correctly because of a fix in
bug 3590
. > This bug is confirmed fixed in the latest ToT, marking as such.
Which has nothing to do with _this_ bug...
Darin Adler
Comment 11
2005-11-09 10:55:15 PST
Exactly. The site is fixed, but the bug is not, and it's still a P1.
Alexey Proskuryakov
Comment 12
2005-12-12 14:15:09 PST
Created
attachment 5054
[details]
what is missing from ICU Lists all aliases that are not known to Tiger version of ICU. As it turns out, there are quite a few (both aliases and encodings). I haven't analyzed which encodings have actually regressed (even if ICU knows an encoding by some aliases, it doesn't mean that WebCore gives it one of these). Of course, all encodings that are not supported by ICU no longer work, and many of these aren't even in the current ICU <
http://www-950.ibm.com/software/globalization/icu/demo/converters
>. Some possible bugs in WebCore's tables are also highlighted (one of these filed as
bug 4362
; I am not sure which is actually correct in other cases).
Alexey Proskuryakov
Comment 13
2005-12-14 13:36:45 PST
Created
attachment 5081
[details]
what is missing from ICU Added mac-cyrillic; corrected x-mac-ukrainian comments. For info on MacUkrainian, see <
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/UKRAINE.TXT
>.
Alice Liu
Comment 14
2006-01-10 10:05:57 PST
<
rdar://problem/4404312
>
Joost de Valk (AlthA)
Comment 15
2006-01-22 04:33:45 PST
Adding Regression keyword.
Alexey Proskuryakov
Comment 16
2006-07-13 12:07:51 PDT
Taking this.
Darin Adler
Comment 17
2006-07-14 06:06:03 PDT
FYI, what I'd really like to see is an implementation that uses ICU for a lot more, eliminating our own decoding table except for the few exceptions where ICU doesn't do the right thing in either looking up an encoding by name or in handling character sets. So while it doesn't have to be part of this bug fix, I'd like to see mac-encodings.txt and its friends done away with entirely.
Alexey Proskuryakov
Comment 18
2006-07-14 06:23:12 PDT
Yes, the fact that encoding names have to round-trip via CFStringEncoding, even when handled by ICU, seems unfortunate to me, too. In this patch, I'm taking a conservative route though, probably not even getting rid of DeprecatedStrings (to avoid the need to change all clients).
Alexey Proskuryakov
Comment 19
2006-07-14 13:03:08 PDT
Created
attachment 9453
[details]
proposed fix At the moment, StreamingTextDecoderMac is a complete implementation, and I have verified that all existing tests pass even with StreamingTextDecoderICU disabled. I haven't attempted to factor out any common code, as the TEC/CFString code path will be simplified in the future; also, Unity may have a largely different implementation with its Qt back-end. WebCoreTextDecoder was dead code (used for WebTextView before it was moved to WebCore).
Darin Adler
Comment 20
2006-07-14 18:45:29 PDT
Comment on
attachment 9453
[details]
proposed fix Looks great! r=me
Alexey Proskuryakov
Comment 21
2006-07-14 23:52:15 PDT
Committed revision 15449. Added one more test (http/tests/misc/BOM-override-script.html) with additional examples of BOM handling.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug