WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
10697
REGRESSION (
r16175
): Errors in incremental decoding of UTF-8
https://bugs.webkit.org/show_bug.cgi?id=10697
Summary
REGRESSION (r16175): Errors in incremental decoding of UTF-8
mitz
Reported
2006-09-02 16:27:07 PDT
Try reloading the attachment a few times - sometimes some of the Alephs are replaced by the Unicode replacement character (U+FFFD, looks like a question mark inside a black rhombus). Decreasing the network interface's MTU (say, to 150) can help achieve the result. This is a regression from
r16175
(fix for
bug 10155
et al.). This happens because the UTF-8 decoder is destroyed and a new one is created mid-character. In fact, the decoder is replaced every time Decoder::decode() is called! I think the following is wrong: // If we still haven't found an encoding, assume latin1 // (this can happen if an empty name is passed from outside). if (m_encodingName.isEmpty() || !m_encoding.isValid()) { m_encodingName = "iso8859-1"; m_encoding = TextEncoding(Latin1Encoding); } m_decoder.set(StreamingTextDecoder::create(m_encoding)); The last line should go inside the braces too.
Attachments
Test case
(17.86 KB, text/html)
2006-09-02 16:28 PDT
,
mitz
no flags
Details
proposed fix
(31.76 KB, patch)
2006-09-03 01:48 PDT
,
Alexey Proskuryakov
eric
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
mitz
Comment 1
2006-09-02 16:28:06 PDT
Created
attachment 10372
[details]
Test case
Alexey Proskuryakov
Comment 2
2006-09-03 01:48:36 PDT
Created
attachment 10376
[details]
proposed fix
Eric Seidel (no email)
Comment 3
2006-09-03 09:43:03 PDT
Comment on
attachment 10376
[details]
proposed fix ap and I talked about this over IRC. This looks sane. I complained to ap about the duplicated logic for encoding fallback which seems to exist in both setEncodingName and the constructor here. It would be nice to get rid of that (fewer code repetitions leads to cleaner and less fragile code). ap noted that there would be an upcoming re-write of TextEncoding to get rid of custom-encoding to ID tables. All and all this looks sane. I just have to trust you that our test cases are adequate to cover this.
Alexey Proskuryakov
Comment 4
2006-09-03 09:53:00 PDT
Committed revision 16198.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug