Bug 5932 - Wrong encoding used for http://www.miel.ru
Summary: Wrong encoding used for http://www.miel.ru
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Alexey Proskuryakov
URL: http://www.miel.ru
Depends on:
Reported: 2005-12-04 06:25 PST by Alexey Proskuryakov
Modified: 2005-12-18 14:36 PST (History)
0 users

See Also:

proposed fix (2.16 KB, patch)
2005-12-11 05:55 PST, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
proposed fix (2.40 KB, patch)
2005-12-11 06:40 PST, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
proposed fix (2.44 KB, patch)
2005-12-11 06:55 PST, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2005-12-04 06:25:16 PST
This site specifies the encoding "almost" correctly:

$ curl -I http://www.miel.ru/
Content-Type: text/html; charset=cp1251

(unknown alias for windows-1251)

<span />
<!-- 0 -->
<base href="http://www.miel.ru/" />

<title>Недвижимость Москвы и Подмосковья. Агентство недвижимости МИЭЛЬ</title>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">

(has a meta, but khtml::Decoder doesn't see it because of a <span> in the beginning).

Need to figure out which workaround would be more compatible...
Comment 1 Alexey Proskuryakov 2005-12-10 01:34:57 PST
Actually, ICU supports the "cp1251" alias, and it's WebCore that blocks its usage in 
Comment 2 Alexey Proskuryakov 2005-12-11 05:55:36 PST
Created attachment 5028 [details]
proposed fix

If a charset name is not known, try to normalize it using ICU.

Admittedly, this is a band-aid fix, and the way to go is probably to get rid of
CFStringEncoding-related functions throughout WebKit, so that
KWQCFStringEncodingFromIANACharsetName() wouldn't be needed at all.
Comment 3 Alexey Proskuryakov 2005-12-11 06:40:19 PST
Created attachment 5029 [details]
proposed fix

Oops, no need to do the lookup again if the first attempt was successful.
Comment 4 Alexey Proskuryakov 2005-12-11 06:55:04 PST
Created attachment 5030 [details]
proposed fix

Fixed paths for non-existing files (see bug 5846).
Comment 5 Darin Adler 2005-12-11 16:48:52 PST
Comment on attachment 5030 [details]
proposed fix

If we're going to use the ICU aliases, then I would like to see all the
redundant entries in our encoding table removed.
Comment 6 Darin Adler 2005-12-11 17:06:54 PST
Comment on attachment 5030 [details]
proposed fix

Seems fine to make this change. Would have liked to have a comment explaining
why the code is doing what it's doing.
Comment 7 Alexey Proskuryakov 2005-12-11 22:16:13 PST
Filed bug 6046 about getting rid of CFStringEncoding and tables cleanup.