Encoding detector doesn't work on a specific euc-kr case.
https://bugs.webkit.org/show_bug.cgi?id=97054
Summary Encoding detector doesn't work on a specific euc-kr case.
Kangil Han
Reported 2012-09-18 18:10:56 PDT
This case, that has euc-kr encoded text lower than 10 characters in its html file w/o charset definition, is failed on encoding detector since ICU library always returns confidence value as '10'. From this, I have uploaded a patch in 'http://bugs.icu-project.org/trac/ticket/9585' and waiting for review. To adopt this case on layout(regression) test, I've tried to manipulate javaScript but realized it wouldn't be easy because encoding detector works on reading input stream level. Therefore, I will ask webkit-dev for advice/opinion to resolve this.
Attachments
A bad test case (245 bytes, text/html)
2012-09-18 18:57 PDT, Kangil Han
kangil.han: review-
kangil.han: commit-queue-
Safari 15.5 differs from other browsers (239.96 KB, image/png)
2022-06-05 04:00 PDT, Ahmad Saleem
no flags
yosin
Comment 1 2012-09-18 18:48:00 PDT
I think 10 characters is too few for encode detecting. Could you tell me the tests you mentioned? In my feeling, auto encode detecting may be feature of browser rather than webkit. It may want to know, user's language preference list, referrer page encoding/language, encoding/language in pages in links of the page, etc.
Kangil Han
Comment 2 2012-09-18 18:57:47 PDT
Created attachment 164643 [details] A bad test case
Kangil Han
Comment 3 2012-09-18 19:00:33 PDT
(In reply to comment #1) > I think 10 characters is too few for encode detecting. > Could you tell me the tests you mentioned? > > In my feeling, auto encode detecting may be feature of browser rather than webkit. It may want to know, user's language preference list, referrer page encoding/language, encoding/language in pages in links of the page, etc. I attached a test case I worked on lately. I agree with that language setting would be browser stuff. However, we can do test encoding detector solely with WebCore. :-)
yosin
Comment 4 2012-09-18 19:16:51 PDT
How about adding method to window.internal to enable/disable auto encoding detection? We may want to specify boosting encoding too. For EUC-KR case, detector may return GBK, BIG5, EUC-JP, etc.
yosin
Comment 5 2012-09-18 19:22:51 PDT
We worked encoding detection: https://bugs.webkit.org/show_bug.cgi?id=75594 Although, the patch wasn't landed. See WebCore::TextResourceDecoder::setUsesEncodingDetector(), how we tried to control auto encoding detection. Hope your help.
Kangil Han
Comment 6 2012-09-18 19:39:35 PDT
(In reply to comment #4) > How about adding method to window.internal to enable/disable auto encoding detection? > > We may want to specify boosting encoding too. For EUC-KR case, detector may return GBK, BIG5, EUC-JP, etc. window.internal is also javaScript manipulation method to enable encoding detector. The problem I've found is that It won't work because encoding detector finishes its work on reading input stream stage.
Kangil Han
Comment 7 2012-09-18 19:46:41 PDT
(In reply to comment #5) > We worked encoding detection: https://bugs.webkit.org/show_bug.cgi?id=75594 > Although, the patch wasn't landed. > > See WebCore::TextResourceDecoder::setUsesEncodingDetector(), how we tried to control auto encoding detection. > > Hope your help. So huge.. :P BTW, doesn't ICU support Kanji code?
yosin
Comment 8 2012-09-18 20:43:06 PDT
(In reply to comment #7) > (In reply to comment #5) > > We worked encoding detection: https://bugs.webkit.org/show_bug.cgi?id=75594 > > Although, the patch wasn't landed. > > > > See WebCore::TextResourceDecoder::setUsesEncodingDetector(), how we tried to control auto encoding detection. > > > > Hope your help. > > So huge.. :P Most of them are re-factoring. You can ignore WebKit/*, WebCore/platform/* > BTW, doesn't ICU support Kanji code? Yes, ICU supports Kanji characters, Japanese encoding. By historical reasons, WebKit has special detector for Japanese encoding.
Kangil Han
Comment 9 2012-09-18 22:17:23 PDT
(In reply to comment #8) > > Yes, ICU supports Kanji characters, Japanese encoding. By historical reasons, WebKit has special detector for Japanese encoding. Oh, I see!
Ahmad Saleem
Comment 10 2022-06-05 04:00:22 PDT
Created attachment 460037 [details] Safari 15.5 differs from other browsers I am still able to reproduce the following bug in Safari 15.5 on macOS 12.4. As shown in the attached screenshots, all other browsers work correctly. Thanks!
Ahmad Saleem
Comment 11 2024-07-03 10:33:13 PDT
Note You need to log in before you can comment on or make changes to this bug.