Bug 234030 - TextCodecUTF8 can skip characters after an invalid sequence near EOF
Summary: TextCodecUTF8 can skip characters after an invalid sequence near EOF
Status: RESOLVED DUPLICATE of bug 233921
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-08 12:55 PST by Andreu Botella
Modified: 2021-12-09 09:50 PST (History)
4 users (show)

See Also:


Attachments
Sample to show that this bug affects page loading. (50 bytes, text/html)
2021-12-08 12:55 PST, Andreu Botella
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreu Botella 2021-12-08 12:55:22 PST
Created attachment 446414 [details]
Sample to show that this bug affects page loading.

WPT tests: https://wpt.fyi/results/encoding/textdecoder-eof.any.html?label=experimental&label=master&aligned (also tests for bug 233921).

When the TextCodecUTF8 decoder finds a non-ASCII lead byte, it waits until enough bytes are consumed to make a valid sequence starting at that position, before starting to process the bytes. But if the stream is flushed before that, the decoder assumes that the remaining bytes are part of a truncated partial sequence, and so discards them while emitting a single replacement character. But this assumption doesn't necessarily hold, and it can result in non-replacement characters being skipped:

// "�A" in Firefox and Chromium 98, and according to the spec.
// "��A" in earlier versions of Chromium.
// "�" in WebKit.
new TextDecoder().decode(new Uint8Array([0xF0, 0x9F, 0x41]));

This can also result in fewer replacement characters being emitted than should be the case:

// "��A" in Firefox, Chrome, and according to the spec.
// "�" in WebKit.
new TextDecoder().decode(new Uint8Array([0xF0, 0x80, 0x41]));

This bug also affects page loading, as with the attached sample.
Comment 1 Alex Christensen 2021-12-09 09:50:17 PST

*** This bug has been marked as a duplicate of bug 233921 ***
Comment 2 Alex Christensen 2021-12-09 09:50:33 PST
This will be fixed with the same fix as bug 233921