http://trac.webkit.org/changeset/78541 added a zero length check to fix crashing bug 54418. This is not strictly correct as far as I understand as it should also set the error flag and possibly generate replacement characters.
Thanks for adding the check Antti. I’ll tackle this soon.
As mentioned in bug 54418, TextCodecUTF16 has the same problem.
TextCodecUTF16 has a couple other mistakes as well. The main thing I need here is a good way to test codecs, maybe by using XMLHttpRequest or something. A way to feed in a specific number of bytes at a time. And also a way to test both the “stop on error” and “use replacement character” code paths. Then we can construct a great regression test and then fix these problems. I spotted another problem in the UTF8-codec that I would like to fix at the same time; if a partial sequence is invalid, then the remainder of that is left as a partial sequence. But the remainder could contain plain ASCII characters, and the code does not handle that path correctly.
I have this fixed for the UTF-8 codec now, but my attempt to make a test case for this was unsuccessful.
Created attachment 146994 [details] Patch
Comment on attachment 146994 [details] Patch Attachment 146994 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12949153
Comment on attachment 146994 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=146994&action=review Please fix build issues found by EWS. > Source/WebCore/platform/text/TextCodecUTF16.cpp:77 > + if (m_haveBufferedByte && flush) { > + sawError = true; > + if (!stopOnError) > + return String(&replacementCharacter, 1); > + } > + return String(); I only had a very brief look at other decoders, but it seems like we may need to reset to clean state when flush == true. > Source/WebCore/platform/text/TextCodecUTF16.cpp:128 > + if (flush) { > + sawError = true; > + if (!stopOnError) { > + size_t numCharsInBuffer = q - buffer.characters(); > + if (numCharsInBuffer == buffer.length()) { > + buffer.resize(numCharsInBuffer + 1); > + q = buffer.characters() + numCharsInBuffer; > + } > + *q++ = replacementCharacter; > + } Ditto. > Source/WebCore/testing/Internals.h:51 > -class Internals : public RefCounted<Internals>, > - public FrameDestructionObserver { > +class Internals : public RefCounted<Internals>, private FrameDestructionObserver { Someone else is working on removing inheritance from FrameDestructionObserver now. > LayoutTests/fast/encoding/incremental-decoding.html:220 > + debug("Test requires WebKit internals object"); This is helpful for other browser vendors trying to open the test, but necessarily for people opening it in a WebKit based browser. Also, I'm surprised that the main branch is an else.
*not* necessarily.
Comment on attachment 146994 [details] Patch Attachment 146994 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/12937813 New failing tests: fast/parser/test-unicode-characters-in-attribute-name.html
Created attachment 147017 [details] Archive of layout-test-results from ec2-cr-linux-01 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: ec2-cr-linux-01 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
(In reply to comment #7) > fast/parser/test-unicode-characters-in-attribute-name.html I had overlooked this test failure, and it’s interesting. The test file has an extra byte at the end; it’s a newline character, but it’s a single byte. The old codec would ignore and drop that last byte. The new codec changes it into U+FFFD. I’m curious what other web browsers do with this stray byte. We should probably fix that test case and add coverage for this case to char-decoding.html by passing in a single byte.