WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
VERIFIED FIXED
3556
black diamond question mark shown for invalid UTF-8 sequences
https://bugs.webkit.org/show_bug.cgi?id=3556
Summary
black diamond question mark shown for invalid UTF-8 sequences
Darin Adler
Reported
2005-06-15 21:03:09 PDT
The link above is one site that has invalid UTF-8 sequences. There are many others. Also seen on news.google.com. Other browsers just seem to ignore these sequences. So we should too.
Attachments
Patch to ignore U+FFFD characters coming out of the decoder
(3.07 KB, patch)
2005-06-15 21:07 PDT
,
Darin Adler
sullivan
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Darin Adler
Comment 1
2005-06-15 21:05:46 PDT
The bad sequences are partway down the page, where it says "including a 120-screen cube". I imagine they are em dashes, probably in Windows Latin-1 encoding.
Darin Adler
Comment 2
2005-06-15 21:07:13 PDT
Created
attachment 2379
[details]
Patch to ignore U+FFFD characters coming out of the decoder
Nicholas Shanks
Comment 3
2005-06-16 07:10:07 PDT
I see these everywhere. Just hiding them is not really optimal though: 1) Go to safari preferences 2) Set default encoding to UTF-8 3) Browse the internet for a bit You will see that many sites aren't sending encoding information, Safari is ignoring the Content- Encoding HTTP header override <meta> tag, or it's ignoring the XML charset information for xhtml served as text/html, (or all of the above, I can't really tell). Whatever the cause, it would make websites harder to read if the user was not aware that a character was missing/mis-encoded. Words would just appear with letters missing, and their meanings might change! One solution I can think of would be to note all the invalid characters encountered and try to match up a likely encoding based on document language perhaps, then suggest a document re-interpretation to the user. This is something that should be reported as an error when in web developer mode too.
Darin Adler
Comment 4
2005-06-16 10:07:27 PDT
Yes, automatically determining the correct encoding for web pages would be pretty neat. But that's not what this bug is about. This bug is about matching other browsers' behavior on various sites. All the other browsers, and older versions of Safari, simply ignore those bytes. We stopped ignoring them and started putting in black diamond question marks because of a change in the underlying OS. Please file a new bug report with specific suggestions about your enhancement idea. I don't think that idea and the concept that "skipping these characters is not good enough" should prevent us from fixing this regression and once-again matching the behavior of other browsers. Lets not continue that discussion here unless there's a really good reason to do so.
John Sullivan
Comment 5
2005-06-16 10:49:46 PDT
Comment on
attachment 2379
[details]
Patch to ignore U+FFFD characters coming out of the decoder r=me, excellent comment
Nicholas Shanks
Comment 6
2005-06-16 11:39:19 PDT
(In reply to
comment #4
)
> I don't think that idea and the concept that "skipping these characters is not good enough" > should prevent us from fixing this regression and once-again matching the behavior of > other browsers.
Oh, I agree. I was just saying it was not optimal, and that further work could be done to improve the situation. Was definitely not suggestion that the patch shouldn't be applied! Apologies if I gave that impression. I shall open a bug about automatic encoding detection.
Joost de Valk (AlthA)
Comment 7
2005-07-03 08:10:28 PDT
Darin, please mark this as verified if you think it is ;).
Darin Adler
Comment 8
2005-08-04 18:16:10 PDT
In Radar as <
rdar://problem/4206050
> 8A345: Bad (question mark in black diamond) characters in news.google.com
Alexey Proskuryakov
Comment 9
2006-06-19 09:11:42 PDT
This change was reverted in
bug 8972
.
Lucas Forschler
Comment 10
2019-02-06 09:04:03 PST
Mass moving XML DOM bugs to the "DOM" Component.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug