NEW 78584
[Encoding] We should run text encoding detector for iframes and child frame.
https://bugs.webkit.org/show_bug.cgi?id=78584
Summary [Encoding] We should run text encoding detector for iframes and child frame.
yosin
Reported 2012-02-14 00:44:15 PST
There are garbled text in iframe/child frame even if auto text encoding detection. Below is sample URIs for re-producing: o http://www.tku.ac.jp/~z-jinnai/ Main document having charset declaration ISO-2022-JP o http://www.tku.ac.jp/~z-jinnai/06.09.13.htm IFrame document. No charset declaration. Encoding is Shift_JIS Here is observation. o When loading iframe document, TextResourceDectoder states are - m_source = EncodingFromParentFrame - m_hintEncoding = NULL o Because of TextResouceDecorder::setHintEncoding is called with - hintDecoder.m_source = EncodingFromMetaTag A comment of setHintEncoding says hint encoding should only be from auto detection. I'm not sure why it does so. If we set hint encoding regardless of encoding source, this page won't have garbled text.
Attachments
Alexey Proskuryakov
Comment 1 2012-02-14 11:07:06 PST
Charset is normally inherited from main frame (if same origin). When a site has pages in subframes that don't match explicitly specified main frame encoding, it's just an authoring error. I don't think that any browser should go as far as "fix" such cases.
yosin
Comment 2 2012-02-14 17:36:14 PST
In this case, authors are different, e.g. teacher and students. Defaulting to parent frame's charset is meaningful. However, it should not prevent to run auto detector, if users enable auto detector. My proposal is use parent's charset as hint for auto detector and run auto detector if document has no charset declaration. There are two way to fix this issue: (1) Change ShouldAutoDetect: bool TextResourceDecoder::ShouldAutoDetect() { return m_usesEncodingDetector && (m_source == DefaultEncoding || m_source == EncodingFromParentFrame); } (2) Change setHintEncoding // We use parent document's encoding information for hint of child document encoding. void TextResourceDecoder::setHintEncoding(const TextResourceDecoder* hintDecoder) { if (hintDecoder) { m_hintEncoding = hintDecoder->encoding().name(); } }
Alexey Proskuryakov
Comment 3 2012-02-14 21:18:05 PST
> However, it should not prevent to run auto detector, if users enable auto detector. This is something I'll take issue with. Proliferation of encoding detection in one browser essentially randomizes what users and authors see. It's barely acceptable to sniff when there is no encoding indication at all, but not when there is an established behavior already. More encoding detection is bad for the Open Web, not good.
yosin
Comment 4 2012-02-14 22:55:29 PST
I agree not to implement smarter encoding sniffer in WebKit. In this case, my experiment is: FF10: Same as WebKit IE9: Display correctly OP11: Display correctly It seems using parent frame's charset for default charset is not established way. I can't say both IE9 and OP11 sniffing encoding instead of using parent's charset. Although, once we do sniffing, WK get same results as IE9/OP11.
yosin
Comment 5 2012-02-14 23:02:31 PST
Correction. (Sorry, I just upgrade to FF10 by automatic upgrade.) FF10 w/AutoDect Display correctly. So, WK does different.
Alexey Proskuryakov
Comment 6 2012-02-14 23:14:10 PST
> OP11: Display correctly This doesn't match what I'm seeing in Opera 11 (unless you meant that it correctly displays garbage in subframe). > FF10 w/AutoDect Display correctly. Autodetect is a non-default setting in Firefox. I don't have IE here to verify what it does. Racing for "best" encoding detection is harmful. It's non-standard, unpredictable, and not how the Web should (and can!) work. Pages where it's needed are a rare exception.
yosin
Comment 7 2012-02-14 23:25:06 PST
I've not tried to create best encoding sniffer. Rather, I would like to have clear behavior. It seems TextResourceDecoder (and associated HTMLMetaCharsetParser) has some of ad-hoc thing. It seems we should propose to WHATWG how user agent handles parent's charset on child resource handling. How do you think?
Alexey Proskuryakov
Comment 8 2012-02-14 23:54:07 PST
HTML5 has an uncharacteristically vague algorithm (see <http://www.whatwg.org/specs/web-apps/current-work/#determining-the-character-encoding>). It lets UA use arbitrary "other algorithms" for encoding detection, and it also mandates an extremely error-prone and unnecessary algorithm for changing encoding on the fly <http://www.whatwg.org/specs/web-apps/current-work/#change-the-encoding>. I think that we should strive for simplifying this, but so far, even Safari implementation experience with a drastically simpler approach hasn't convinced the spec editor.
Ian 'Hixie' Hickson
Comment 9 2012-02-15 11:30:25 PST
It hasn't convinced me because other vendors have said they need it to get more compat than you have. :-)
Note You need to log in before you can comment on or make changes to this bug.