This site specifies the encoding "almost" correctly: $ curl -I http://www.miel.ru/ <...> Content-Type: text/html; charset=cp1251 (unknown alias for windows-1251) <html> <span /> <!-- 0 --> <head> <base href="http://www.miel.ru/" /> <title>Недвижимость Москвы и Подмосковья. Агентство недвижимости МИЭЛЬ</title> <meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> (has a meta, but khtml::Decoder doesn't see it because of a <span> in the beginning). Need to figure out which workaround would be more compatible...
Actually, ICU supports the "cp1251" alias, and it's WebCore that blocks its usage in KWQCFStringEncodingFromIANACharsetName().
Created attachment 5028 [details] proposed fix If a charset name is not known, try to normalize it using ICU. Admittedly, this is a band-aid fix, and the way to go is probably to get rid of CFStringEncoding-related functions throughout WebKit, so that KWQCFStringEncodingFromIANACharsetName() wouldn't be needed at all.
Created attachment 5029 [details] proposed fix Oops, no need to do the lookup again if the first attempt was successful.
Created attachment 5030 [details] proposed fix Fixed paths for non-existing files (see bug 5846).
Comment on attachment 5030 [details] proposed fix If we're going to use the ICU aliases, then I would like to see all the redundant entries in our encoding table removed.
Comment on attachment 5030 [details] proposed fix Seems fine to make this change. Would have liked to have a comment explaining why the code is doing what it's doing.
Filed bug 6046 about getting rid of CFStringEncoding and tables cleanup.