WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
Bug 97054
Encoding detector doesn't work on a specific euc-kr case.
https://bugs.webkit.org/show_bug.cgi?id=97054
Summary
Encoding detector doesn't work on a specific euc-kr case.
Kangil Han
Reported
2012-09-18 18:10:56 PDT
This case, that has euc-kr encoded text lower than 10 characters in its html file w/o charset definition, is failed on encoding detector since ICU library always returns confidence value as '10'. From this, I have uploaded a patch in '
http://bugs.icu-project.org/trac/ticket/9585
' and waiting for review. To adopt this case on layout(regression) test, I've tried to manipulate javaScript but realized it wouldn't be easy because encoding detector works on reading input stream level. Therefore, I will ask webkit-dev for advice/opinion to resolve this.
Attachments
A bad test case
(245 bytes, text/html)
2012-09-18 18:57 PDT
,
Kangil Han
kangil.han
: review-
kangil.han
: commit-queue-
Details
Safari 15.5 differs from other browsers
(239.96 KB, image/png)
2022-06-05 04:00 PDT
,
Ahmad Saleem
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
yosin
Comment 1
2012-09-18 18:48:00 PDT
I think 10 characters is too few for encode detecting. Could you tell me the tests you mentioned? In my feeling, auto encode detecting may be feature of browser rather than webkit. It may want to know, user's language preference list, referrer page encoding/language, encoding/language in pages in links of the page, etc.
Kangil Han
Comment 2
2012-09-18 18:57:47 PDT
Created
attachment 164643
[details]
A bad test case
Kangil Han
Comment 3
2012-09-18 19:00:33 PDT
(In reply to
comment #1
)
> I think 10 characters is too few for encode detecting. > Could you tell me the tests you mentioned? > > In my feeling, auto encode detecting may be feature of browser rather than webkit. It may want to know, user's language preference list, referrer page encoding/language, encoding/language in pages in links of the page, etc.
I attached a test case I worked on lately. I agree with that language setting would be browser stuff. However, we can do test encoding detector solely with WebCore. :-)
yosin
Comment 4
2012-09-18 19:16:51 PDT
How about adding method to window.internal to enable/disable auto encoding detection? We may want to specify boosting encoding too. For EUC-KR case, detector may return GBK, BIG5, EUC-JP, etc.
yosin
Comment 5
2012-09-18 19:22:51 PDT
We worked encoding detection:
https://bugs.webkit.org/show_bug.cgi?id=75594
Although, the patch wasn't landed. See WebCore::TextResourceDecoder::setUsesEncodingDetector(), how we tried to control auto encoding detection. Hope your help.
Kangil Han
Comment 6
2012-09-18 19:39:35 PDT
(In reply to
comment #4
)
> How about adding method to window.internal to enable/disable auto encoding detection? > > We may want to specify boosting encoding too. For EUC-KR case, detector may return GBK, BIG5, EUC-JP, etc.
window.internal is also javaScript manipulation method to enable encoding detector. The problem I've found is that It won't work because encoding detector finishes its work on reading input stream stage.
Kangil Han
Comment 7
2012-09-18 19:46:41 PDT
(In reply to
comment #5
)
> We worked encoding detection:
https://bugs.webkit.org/show_bug.cgi?id=75594
> Although, the patch wasn't landed. > > See WebCore::TextResourceDecoder::setUsesEncodingDetector(), how we tried to control auto encoding detection. > > Hope your help.
So huge.. :P BTW, doesn't ICU support Kanji code?
yosin
Comment 8
2012-09-18 20:43:06 PDT
(In reply to
comment #7
)
> (In reply to
comment #5
) > > We worked encoding detection:
https://bugs.webkit.org/show_bug.cgi?id=75594
> > Although, the patch wasn't landed. > > > > See WebCore::TextResourceDecoder::setUsesEncodingDetector(), how we tried to control auto encoding detection. > > > > Hope your help. > > So huge.. :P
Most of them are re-factoring. You can ignore WebKit/*, WebCore/platform/*
> BTW, doesn't ICU support Kanji code?
Yes, ICU supports Kanji characters, Japanese encoding. By historical reasons, WebKit has special detector for Japanese encoding.
Kangil Han
Comment 9
2012-09-18 22:17:23 PDT
(In reply to
comment #8
)
> > Yes, ICU supports Kanji characters, Japanese encoding. By historical reasons, WebKit has special detector for Japanese encoding.
Oh, I see!
Ahmad Saleem
Comment 10
2022-06-05 04:00:22 PDT
Created
attachment 460037
[details]
Safari 15.5 differs from other browsers I am still able to reproduce the following bug in Safari 15.5 on macOS 12.4. As shown in the attached screenshots, all other browsers work correctly. Thanks!
Ahmad Saleem
Comment 11
2024-07-03 10:33:13 PDT
Blink removed Japanese encoding detector here -
https://chromium.googlesource.com/chromium/blink/+/3b87a35b8ccb719156c4af78968915de96e23517
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug