WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
8972
REGRESSION: invalid UTF-8 sequences are not displayed
https://bugs.webkit.org/show_bug.cgi?id=8972
Summary
REGRESSION: invalid UTF-8 sequences are not displayed
tim bates
Reported
2006-05-18 03:26:07 PDT
if you visit the URL in Tiger release Safari, you see a a balck diamond question mark character in the sentence "Store an unlimited amount of data for only 15" This is also shown in the source Store an unlimited amount of data for only 15� per gigabyte Under 420+, the cents character is simply missing from the view and the source. Not sure what the bug, if any, is here.
Attachments
Broken UTF-8
(121 bytes, text/html)
2006-05-21 03:39 PDT
,
Alexey Proskuryakov
no flags
Details
windows-1252
(118 bytes, text/html)
2006-05-21 03:40 PDT
,
Alexey Proskuryakov
no flags
Details
ShiftJIS
(114 bytes, text/html)
2006-05-21 03:40 PDT
,
Alexey Proskuryakov
no flags
Details
proposed patch
(25.08 KB, patch)
2006-06-17 08:17 PDT
,
Alexey Proskuryakov
no flags
Details
Formatted Diff
Diff
proposed patch
(26.25 KB, patch)
2006-06-17 23:42 PDT
,
Alexey Proskuryakov
no flags
Details
Formatted Diff
Diff
proposed patch
(26.82 KB, patch)
2006-06-18 01:01 PDT
,
Alexey Proskuryakov
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2006-05-18 04:40:54 PDT
This was an intentional change, see
bug 3556
. However, I cannot confirm the comment that other browsers ignore invalid UTF-8 sequences - WinIE and Mac Firefox do display either question marks or empty boxes at both bug URLs for me.
Darin Adler
Comment 2
2006-05-18 09:18:27 PDT
I'd like to match other browsers. I don't know how I could have gotten it wrong originally, though. I'm quite sure there were sites with black question marks in Safari only and nothing there in other browsers. Maybe there are different categories of illegal UTF-8 sequences that are handled differently? Someone should do some research on this and find out what I got wrong originally.
Darin Adler
Comment 3
2006-05-18 09:21:43 PDT
Strange, I tested the
http://www.cheap-hotel-rooms.com/Reno/Peppermill-Hotel.htm
page mentioned in the original bug. It shows plain old "?" characters in Firefox 1.5.0.3 on Macintosh where we used to use our black diamond question mark. (Around the text "including a 120-screen cube".) But I could have sworn I tested this back when I fixed the bug. Is there a chance Firefox changed its behavior? I probably never tested Windows Internet Explorer behavior.
Alexey Proskuryakov
Comment 4
2006-05-18 12:59:59 PDT
(In reply to
comment #3
)
> Is there a chance Firefox changed its behavior?
Firefox 1.0.3 and 1.0.5 also display question marks for me. I don't have other versions archived.
Alexey Proskuryakov
Comment 5
2006-05-21 03:39:40 PDT
Created
attachment 8442
[details]
Broken UTF-8 Looks like various kinds of UTF-8 brokenness all give question marks in Firefox 1.5. Invalid WinLatin bytes get discarded (but it is so in shipping Safari, too); recovery from broken ShiftJIS is very different in Firefox, shipping Safari and ToT WebKit.
Alexey Proskuryakov
Comment 6
2006-05-21 03:40:04 PDT
Created
attachment 8443
[details]
windows-1252
Alexey Proskuryakov
Comment 7
2006-05-21 03:40:54 PDT
Created
attachment 8444
[details]
ShiftJIS
Alice Liu
Comment 8
2006-06-06 09:37:57 PDT
<
rdar://problem/4575223
>
Nicholas Shanks
Comment 9
2006-06-10 10:57:27 PDT
I would like to see this fixed. I believe the "fix" to
bug 3556
should never have been authorised, as I don't believe it was a bug. It is very important to know that the page is not being displayed in the correct encoding so that I can try alternates manually. Not displaying the black diamonds disguises this and means the user is not aware that data is missing, which could potentially be very bad! Firefox's question marks are not very conspicuous, but at least they are there.
Alexey Proskuryakov
Comment 10
2006-06-17 08:17:03 PDT
Created
attachment 8885
[details]
proposed patch
Darin Adler
Comment 11
2006-06-17 17:18:08 PDT
Comment on
attachment 8885
[details]
proposed patch appendOmittingUnwanted should be renamed to appendOmittingBOM -- that was its original name way back in the mists of time before we added null (now gone) and replacement character (now gone) to the list of characters to strip.
Alexey Proskuryakov
Comment 12
2006-06-17 23:42:39 PDT
Created
attachment 8895
[details]
proposed patch Renamed appendOmittingUnwanted().
Alexey Proskuryakov
Comment 13
2006-06-18 00:33:22 PDT
Now that the layout tests work again, I've found that this change uncovers an apparent bug in XML entity handling, looking into it...
Alexey Proskuryakov
Comment 14
2006-06-18 01:01:07 PDT
Created
attachment 8897
[details]
proposed patch Now with a getXHTMLEntity() fix.
Darin Adler
Comment 15
2006-06-18 16:41:48 PDT
Comment on
attachment 8897
[details]
proposed patch r=me
Alexey Proskuryakov
Comment 16
2006-06-19 09:10:55 PDT
Committed revision 14911.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug