Bug 216202

Summary: TextDecoder should properly handle streams
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: New BugsAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, calvaris, cdumez, changseok, clopez, darin, eric.carlson, esprehn+autocc, ews-watchlist, glenn, gyuyoung.kim, japhet, jer.noble, kangil.han, philipj, sergio, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=233921
Attachments:
Description Flags
Patch
none
Patch darin: review+

Description Alex Christensen 2020-09-04 17:01:50 PDT
Allow TextCodec::decode to properly handle streams
Comment 1 Alex Christensen 2020-09-04 17:05:36 PDT
Created attachment 408046 [details]
Patch
Comment 2 Darin Adler 2020-09-04 17:17:35 PDT
Comment on attachment 408046 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408046&action=review

> Source/WebCore/platform/text/DecodeFailure.h:35
> +struct DecodeFailure {
> +    String stringBeforeError;
> +    size_t bytesConsumed { 0 };
> +};

Looking at the call sites, Expected isn’t really doing its job to make the code clean. Most of the call sites seem to want the string whether there was an error or not, but it seems they only need to know how many bytes were consumed if it was an error. Maybe the return value should just be a simple structure:

    struct DecodeResult {
        String string;
        Optional<size_t> bytesConsumedBeforeError;
    };

Or if you want to be even more straightforward:

    struct DecodeResult {
        String string;
        size_t bytesConsumed { 0 };
        bool success { false };
    };

I think those might be better than the Expected for how this is actually used.
Comment 3 Alexey Proskuryakov 2020-09-04 19:22:42 PDT
Comment on attachment 408046 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408046&action=review

> Source/WebCore/ChangeLog:8
> +        In order to properly handle cases like a stream breaking in the middle of a surrogate pair

Doesn't text decoding already properly handle such cases when decoding content as it comes from the network?
Comment 4 Alex Christensen 2020-09-04 19:55:16 PDT
Comment on attachment 408046 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408046&action=review

>> Source/WebCore/ChangeLog:8
>> +        In order to properly handle cases like a stream breaking in the middle of a surrogate pair
> 
> Doesn't text decoding already properly handle such cases when decoding content as it comes from the network?

That uses TextResourceDecoder, which stores a std::unique_ptr<TextCodec> which keeps state instead of keeping a buffer like TextDecoder currently does.  While this approach passes all existing web platform tests, it is incorrect.  I need to keep a std::unique_ptr<TextCodec> instead of a buffer, and I should probably add a test that fails with this implementation and passes with a correct implementation.  the only decoding failures in the existing tests are at the end of a stream block.
Comment 5 Alex Christensen 2020-09-05 00:23:11 PDT
Created attachment 408067 [details]
Patch
Comment 6 EWS Watchlist 2020-09-05 00:24:23 PDT
This patch modifies the imported WPT tests. Please ensure that any changes on the tests (not coming from a WPT import) are exported to WPT. Please see https://trac.webkit.org/wiki/WPTExportProcess
Comment 7 Darin Adler 2020-09-05 08:38:08 PDT
Comment on attachment 408067 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408067&action=review

Excellent. This is exactly how the TextCodec class was designed to be used.

> Source/WebCore/dom/TextDecoder.h:54
> +    ~TextDecoder();

We typically put this before other member functions. Arbitrary, but it’s atypical to put it at the end of the public section.

> Source/WebCore/dom/TextDecoder.h:64
> +    std::unique_ptr<TextCodec> m_textCodec;

I would have named this just m_codec.
Comment 8 Alex Christensen 2020-09-05 12:48:30 PDT
http://trac.webkit.org/r266668
Comment 9 Radar WebKit Bug Importer 2020-09-05 12:49:22 PDT
<rdar://problem/68402719>
Comment 10 Darin Adler 2020-09-05 12:52:23 PDT
Comment on attachment 408067 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=408067&action=review

> Source/WebCore/dom/TextDecoder.cpp:153
> +    auto oldBuffer = std::exchange(m_buffer, { });

Can we also come back here and delete this code?

> Source/WebCore/dom/TextDecoder.h:66
>      Vector<uint8_t> m_buffer;

And delete this?
Comment 11 Alex Christensen 2020-10-05 10:20:37 PDT
Those suggested improvements were done in https://trac.webkit.org/changeset/266681