RESOLVED DUPLICATE of bug 233921 234030
TextCodecUTF8 can skip characters after an invalid sequence near EOF
https://bugs.webkit.org/show_bug.cgi?id=234030
Summary TextCodecUTF8 can skip characters after an invalid sequence near EOF
Andreu Botella
Reported 2021-12-08 12:55:22 PST
Created attachment 446414 [details] Sample to show that this bug affects page loading. WPT tests: https://wpt.fyi/results/encoding/textdecoder-eof.any.html?label=experimental&label=master&aligned (also tests for bug 233921). When the TextCodecUTF8 decoder finds a non-ASCII lead byte, it waits until enough bytes are consumed to make a valid sequence starting at that position, before starting to process the bytes. But if the stream is flushed before that, the decoder assumes that the remaining bytes are part of a truncated partial sequence, and so discards them while emitting a single replacement character. But this assumption doesn't necessarily hold, and it can result in non-replacement characters being skipped: // "�A" in Firefox and Chromium 98, and according to the spec. // "��A" in earlier versions of Chromium. // "�" in WebKit. new TextDecoder().decode(new Uint8Array([0xF0, 0x9F, 0x41])); This can also result in fewer replacement characters being emitted than should be the case: // "��A" in Firefox, Chrome, and according to the spec. // "�" in WebKit. new TextDecoder().decode(new Uint8Array([0xF0, 0x80, 0x41])); This bug also affects page loading, as with the attached sample.
Attachments
Sample to show that this bug affects page loading. (50 bytes, text/html)
2021-12-08 12:55 PST, Andreu Botella
no flags
Alex Christensen
Comment 1 2021-12-09 09:50:17 PST
*** This bug has been marked as a duplicate of bug 233921 ***
Alex Christensen
Comment 2 2021-12-09 09:50:33 PST
This will be fixed with the same fix as bug 233921
Note You need to log in before you can comment on or make changes to this bug.