Bug 73519

Summary: [Qt] QtWebKit does not apply correct encoding on some pages with CJK characters
Product: WebKit Reporter: Dawit A. <adawit>
Component: WebKit QtAssignee: Nobody <webkit-unassigned>
Status: RESOLVED INVALID    
Severity: Normal CC: ap, moriramar, sfcheng
Priority: P3 Keywords: Qt
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Linux   
Attachments:
Description Flags
test page none

Dawit A.
Reported 2011-11-30 21:19:18 PST
The following bug was reported downstream against the kwebkitpart, but was validated to be an upstream issue using QtTestBrowser: https://bugs.kde.org/show_bug.cgi?id=287690 Summary: KWebkitPart does not apply correct locale encoding settings on some pages with CJK characters. Product: kwebkitpart Version: unspecified Platform: Gentoo Packages OS/Version: Linux Status: UNCONFIRMED Severity: normal Priority: NOR Component: general AssignedTo: webkit-devel@kde.org ReportedBy: moriramar@gmail.com Version: unspecified (using KDE 4.7.2) OS: Linux When I open some pages with both simplified Chinese characters and traditional Chinese characters, some characters are not displayed correctly. Pages containing both Chinese characters and Japanese characters might cause this problem as well. Personal guess: These pages might be encoded in zh_CN.GBK or zh_CN.GB18030 (which contains more character encodings), while KWebkitPart might apply zh_CN.GB2312 (which is generally considered as a subset of GBK). Reproducible: Always Steps to Reproduce: 1. Install a font covering CJK characters. Bitstream Cyberbit, WenQuanYi Zen Hei, WenQuanYi Microhei or Droid is OK. 2. Make sure zh_CN.GBK, zh_CN.GB2312, zh_CN.GB18030, zh_CN.UTF-8 locales are available on the system. 3. Open Konqueror 4.7.2 and enable Webkit mode. 4. Go to http://www.acfun.tv/v/ac265957/ , which might be a little slow. Actual Results: In the top bold title line of the page content, a black box with white question mark appears. In the next line, there are two black boxes seperated by a "W" character, followed by a "o" character. Trying "View >> Encoding >> Simplified Chinese >>" any GB* locales does not solve the problem. Opening this kind of pages has a chance to crash Konqueror. Expected Results: No these black boxes and "W" or "o" characters in these two line. KHTML can show this page well when encoding is set to "Simplified Chinese >> GBK" or "Simplified Chinese >> GB18030", which can be referred to.
Attachments
test page (576 bytes, text/html)
2012-01-08 18:52 PST, Stephen
no flags
moriramar
Comment 1 2011-11-30 22:28:38 PST
*** Bug 73447 has been marked as a duplicate of this bug. ***
Stephen
Comment 2 2012-01-08 18:52:49 PST
Created attachment 121609 [details] test page
Stephen
Comment 3 2012-01-08 18:56:29 PST
I can produce the same bug as well. The attachment above is a test page which contains both chinese simplified and chinese traditional characters. The chinese traditional characters show up as junk inside QtWebkit. The same page is displayed correctly inside IE and Webkit. (In reply to comment #2) > Created an attachment (id=121609) [details] > test page
Jocelyn Turcotte
Comment 4 2014-02-03 03:19:19 PST
=== Bulk closing of Qt bugs === If you believe that this bug report is still relevant for a non-Qt port of webkit.org, please re-open it and remove [Qt] from the summary. If you believe that this is still an important QtWebKit bug, please fill a new report at https://bugreports.qt-project.org and add a link to this issue. See http://qt-project.org/wiki/ReportingBugsInQt for additional guidelines.
Note You need to log in before you can comment on or make changes to this bug.