ASSIGNED Bug 54444
Correct handling of end of buffer partial sequence in UTF-8 and UTF-16 decoders when flushing with zero length
https://bugs.webkit.org/show_bug.cgi?id=54444
Summary Correct handling of end of buffer partial sequence in UTF-8 and UTF-16 decode...
Antti Koivisto
Reported 2011-02-15 03:08:48 PST
http://trac.webkit.org/changeset/78541 added a zero length check to fix crashing bug 54418. This is not strictly correct as far as I understand as it should also set the error flag and possibly generate replacement characters.
Attachments
Patch (34.43 KB, patch)
2012-06-11 19:23 PDT, Darin Adler
ap: review+
buildbot: commit-queue-
Archive of layout-test-results from ec2-cr-linux-01 (540.81 KB, application/zip)
2012-06-11 23:29 PDT, WebKit Review Bot
no flags
Darin Adler
Comment 1 2011-02-15 08:51:34 PST
Thanks for adding the check Antti. I’ll tackle this soon.
Alexey Proskuryakov
Comment 2 2011-02-15 08:52:21 PST
As mentioned in bug 54418, TextCodecUTF16 has the same problem.
Darin Adler
Comment 3 2011-02-15 08:56:29 PST
TextCodecUTF16 has a couple other mistakes as well. The main thing I need here is a good way to test codecs, maybe by using XMLHttpRequest or something. A way to feed in a specific number of bytes at a time. And also a way to test both the “stop on error” and “use replacement character” code paths. Then we can construct a great regression test and then fix these problems. I spotted another problem in the UTF8-codec that I would like to fix at the same time; if a partial sequence is invalid, then the remainder of that is left as a partial sequence. But the remainder could contain plain ASCII characters, and the code does not handle that path correctly.
Darin Adler
Comment 4 2011-02-22 17:26:06 PST
I have this fixed for the UTF-8 codec now, but my attempt to make a test case for this was unsuccessful.
Darin Adler
Comment 5 2012-06-11 19:23:16 PDT
Build Bot
Comment 6 2012-06-11 20:02:50 PDT
Alexey Proskuryakov
Comment 7 2012-06-11 22:49:08 PDT
Comment on attachment 146994 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=146994&action=review Please fix build issues found by EWS. > Source/WebCore/platform/text/TextCodecUTF16.cpp:77 > + if (m_haveBufferedByte && flush) { > + sawError = true; > + if (!stopOnError) > + return String(&replacementCharacter, 1); > + } > + return String(); I only had a very brief look at other decoders, but it seems like we may need to reset to clean state when flush == true. > Source/WebCore/platform/text/TextCodecUTF16.cpp:128 > + if (flush) { > + sawError = true; > + if (!stopOnError) { > + size_t numCharsInBuffer = q - buffer.characters(); > + if (numCharsInBuffer == buffer.length()) { > + buffer.resize(numCharsInBuffer + 1); > + q = buffer.characters() + numCharsInBuffer; > + } > + *q++ = replacementCharacter; > + } Ditto. > Source/WebCore/testing/Internals.h:51 > -class Internals : public RefCounted<Internals>, > - public FrameDestructionObserver { > +class Internals : public RefCounted<Internals>, private FrameDestructionObserver { Someone else is working on removing inheritance from FrameDestructionObserver now. > LayoutTests/fast/encoding/incremental-decoding.html:220 > + debug("Test requires WebKit internals object"); This is helpful for other browser vendors trying to open the test, but necessarily for people opening it in a WebKit based browser. Also, I'm surprised that the main branch is an else.
Alexey Proskuryakov
Comment 8 2012-06-11 22:49:40 PDT
*not* necessarily.
WebKit Review Bot
Comment 9 2012-06-11 23:29:50 PDT
Comment on attachment 146994 [details] Patch Attachment 146994 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/12937813 New failing tests: fast/parser/test-unicode-characters-in-attribute-name.html
WebKit Review Bot
Comment 10 2012-06-11 23:29:54 PDT
Created attachment 147017 [details] Archive of layout-test-results from ec2-cr-linux-01 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: ec2-cr-linux-01 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Darin Adler
Comment 11 2012-06-12 08:05:31 PDT
(In reply to comment #7) > fast/parser/test-unicode-characters-in-attribute-name.html I had overlooked this test failure, and it’s interesting. The test file has an extra byte at the end; it’s a newline character, but it’s a single byte. The old codec would ignore and drop that last byte. The new codec changes it into U+FFFD. I’m curious what other web browsers do with this stray byte. We should probably fix that test case and add coverage for this case to char-decoding.html by passing in a single byte.
Note You need to log in before you can comment on or make changes to this bug.