WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
Bug 245305
Implement general encoding sniffing
https://bugs.webkit.org/show_bug.cgi?id=245305
Summary
Implement general encoding sniffing
Sam Sneddon [:gsnedders]
Reported
2022-09-16 15:34:51 PDT
Every other major browser engine has some form of encoding sniffing, and has for years. We have, consistently, resisted having much, preferring to be more conservative with any sort of magic heuristics. (Currently, we do a small amount of sniffing if the OS language is Japanese between different Japanese legacy encodings.) To quote: (In reply to Alexey Proskuryakov from
comment #3
to
bug 78584
)
> > However, it should not prevent to run auto detector, if users enable auto detector. > > This is something I'll take issue with. Proliferation of encoding detection > in one browser essentially randomizes what users and authors see. It's > barely acceptable to sniff when there is no encoding indication at all, but > not when there is an established behavior already. > > More encoding detection is bad for the Open Web, not good.
Henri Sivonen has previously written about this at
https://hsivonen.fi/chardetng/
, with regards to Firefox's modern character detection, and I know he had some interest in standardising this if anyone else was interested in implementing.
rdar://17033341
Attachments
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2022-09-16 15:35:06 PDT
<
rdar://problem/100046263
>
Sam Sneddon [:gsnedders]
Comment 2
2022-09-16 15:37:33 PDT
<
rdar://17033341
>
Alexey Proskuryakov
Comment 3
2022-09-16 16:30:20 PDT
Based on bug reports that we are (not) receiving, and also on usage level of encoding overrides in Mac Safari, there doesn't appear to be significant customer impact.
Karl Dubost
Comment 4
2022-09-21 02:35:27 PDT
Some examples. C F S URL (C=Chrome, F=Firefox, S=Safari) ========================================= 1 1 0
http://sefer-li.net
0 1 0
http://next.nm.land.to/trip.cgi
1 1 0
http://soobcha.gramsk.ru/faq/Ovr0/index.html@topic=22&list=1
1 = Recover 0 = Fail See also
https://bugzilla.mozilla.org/show_bug.cgi?id=1551276
Alexey Proskuryakov
Comment 5
2022-09-21 19:21:39 PDT
> 1 1 0
http://soobcha.gramsk.ru/faq/Ovr0/index.html@topic=22&list=1
This kind of underlines why there are no bug reports. This is a catalog of a CD with documentation for novice computer users that was last issued in 2004 :)
Karl Dubost
Comment 6
2022-09-21 20:41:06 PDT
There are probably a couple of things that could be taken into consideration. * The Web from today is massively better than 10 years ago for the encoding story. Many more people are working with UTF-8. The toolchain, the server, the OSes have been converted to using UTF-8 by default. Win for the open web. * The legacy (not as irrelevant but unmaintained) Web will not be updated and will stay encoded with weird mistakes, hacks, etc. The person using such a site here suffers when the page is broken and not readable. That's the theme of webcompat where we try to fix things for the users. * The probability of having reports about wrong encoding probably gradually goes down with time. It's hard to know if the bugs are reported at all and it also depends on the market. As Sam mentioned, Japan which has a huge iPhone market share, needs some forms of encoding detection so the sites are not broken on these devices. Safari on iOS doesn't have a menu item to report bugs, that probably doesn't help either. Safari on macOS has "Send Safari feedback". I do not have a strong feeling either way. It would be good if it could be actually measured to understand the extent of the issue.
Anne van Kesteren
Comment 7
2022-09-27 07:51:24 PDT
Perhaps we could approach Chrome and Firefox about jointly creating a standardized and deterministic sniffing algorithm. That would save WebKit some design work and would ensure we don't end up in a race to the bottom with regards to sniffing.
Alexey Proskuryakov
Comment 8
2022-09-27 10:54:47 PDT
My view is that it would be best to spend zero effort on this.
Karl Dubost
Comment 9
2023-01-22 20:14:16 PST
Another case, I hit last week.
https://www.yappe.net/yakumo/kuma/
While on desktop it is possible to change the text encoding through the view menu (note that you need to be aware about this type of issues), it's not possible (to the best of my knowledge) on iOS Safari. There is probably an accessibility issue, where a person not able to read the text would have hard time knowing what next step is needed to fix the text.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug