Summary: | TextDecoder should properly handle streams | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Alex Christensen <achristensen> | ||||||
Component: | New Bugs | Assignee: | Alex Christensen <achristensen> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Normal | CC: | ap, calvaris, cdumez, changseok, clopez, darin, eric.carlson, esprehn+autocc, ews-watchlist, glenn, gyuyoung.kim, japhet, jer.noble, kangil.han, philipj, sergio, webkit-bug-importer, youennf | ||||||
Priority: | P2 | Keywords: | InRadar | ||||||
Version: | WebKit Nightly Build | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=233921 | ||||||||
Attachments: |
|
Description
Alex Christensen
2020-09-04 17:01:50 PDT
Created attachment 408046 [details]
Patch
Comment on attachment 408046 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408046&action=review > Source/WebCore/platform/text/DecodeFailure.h:35 > +struct DecodeFailure { > + String stringBeforeError; > + size_t bytesConsumed { 0 }; > +}; Looking at the call sites, Expected isn’t really doing its job to make the code clean. Most of the call sites seem to want the string whether there was an error or not, but it seems they only need to know how many bytes were consumed if it was an error. Maybe the return value should just be a simple structure: struct DecodeResult { String string; Optional<size_t> bytesConsumedBeforeError; }; Or if you want to be even more straightforward: struct DecodeResult { String string; size_t bytesConsumed { 0 }; bool success { false }; }; I think those might be better than the Expected for how this is actually used. Comment on attachment 408046 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408046&action=review > Source/WebCore/ChangeLog:8 > + In order to properly handle cases like a stream breaking in the middle of a surrogate pair Doesn't text decoding already properly handle such cases when decoding content as it comes from the network? Comment on attachment 408046 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408046&action=review >> Source/WebCore/ChangeLog:8 >> + In order to properly handle cases like a stream breaking in the middle of a surrogate pair > > Doesn't text decoding already properly handle such cases when decoding content as it comes from the network? That uses TextResourceDecoder, which stores a std::unique_ptr<TextCodec> which keeps state instead of keeping a buffer like TextDecoder currently does. While this approach passes all existing web platform tests, it is incorrect. I need to keep a std::unique_ptr<TextCodec> instead of a buffer, and I should probably add a test that fails with this implementation and passes with a correct implementation. the only decoding failures in the existing tests are at the end of a stream block. Created attachment 408067 [details]
Patch
This patch modifies the imported WPT tests. Please ensure that any changes on the tests (not coming from a WPT import) are exported to WPT. Please see https://trac.webkit.org/wiki/WPTExportProcess Comment on attachment 408067 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408067&action=review Excellent. This is exactly how the TextCodec class was designed to be used. > Source/WebCore/dom/TextDecoder.h:54 > + ~TextDecoder(); We typically put this before other member functions. Arbitrary, but it’s atypical to put it at the end of the public section. > Source/WebCore/dom/TextDecoder.h:64 > + std::unique_ptr<TextCodec> m_textCodec; I would have named this just m_codec. Comment on attachment 408067 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408067&action=review > Source/WebCore/dom/TextDecoder.cpp:153 > + auto oldBuffer = std::exchange(m_buffer, { }); Can we also come back here and delete this code? > Source/WebCore/dom/TextDecoder.h:66 > Vector<uint8_t> m_buffer; And delete this? Those suggested improvements were done in https://trac.webkit.org/changeset/266681 |