RESOLVED FIXED 10697
REGRESSION (r16175): Errors in incremental decoding of UTF-8
https://bugs.webkit.org/show_bug.cgi?id=10697
Summary REGRESSION (r16175): Errors in incremental decoding of UTF-8
mitz
Reported 2006-09-02 16:27:07 PDT
Try reloading the attachment a few times - sometimes some of the Alephs are replaced by the Unicode replacement character (U+FFFD, looks like a question mark inside a black rhombus). Decreasing the network interface's MTU (say, to 150) can help achieve the result. This is a regression from r16175 (fix for bug 10155 et al.). This happens because the UTF-8 decoder is destroyed and a new one is created mid-character. In fact, the decoder is replaced every time Decoder::decode() is called! I think the following is wrong: // If we still haven't found an encoding, assume latin1 // (this can happen if an empty name is passed from outside). if (m_encodingName.isEmpty() || !m_encoding.isValid()) { m_encodingName = "iso8859-1"; m_encoding = TextEncoding(Latin1Encoding); } m_decoder.set(StreamingTextDecoder::create(m_encoding)); The last line should go inside the braces too.
Attachments
Test case (17.86 KB, text/html)
2006-09-02 16:28 PDT, mitz
no flags
proposed fix (31.76 KB, patch)
2006-09-03 01:48 PDT, Alexey Proskuryakov
eric: review+
mitz
Comment 1 2006-09-02 16:28:06 PDT
Created attachment 10372 [details] Test case
Alexey Proskuryakov
Comment 2 2006-09-03 01:48:36 PDT
Created attachment 10376 [details] proposed fix
Eric Seidel (no email)
Comment 3 2006-09-03 09:43:03 PDT
Comment on attachment 10376 [details] proposed fix ap and I talked about this over IRC. This looks sane. I complained to ap about the duplicated logic for encoding fallback which seems to exist in both setEncodingName and the constructor here. It would be nice to get rid of that (fewer code repetitions leads to cleaner and less fragile code). ap noted that there would be an upcoming re-write of TextEncoding to get rid of custom-encoding to ID tables. All and all this looks sane. I just have to trust you that our test cases are adequate to cover this.
Alexey Proskuryakov
Comment 4 2006-09-03 09:53:00 PDT
Committed revision 16198.
Note You need to log in before you can comment on or make changes to this bug.