Bug 54444

Summary:

Correct handling of end of buffer partial sequence in UTF-8 and UTF-16 decoders when flushing with zero length

Product:

WebKit

Reporter:

Antti Koivisto <koivisto>

Component:

Text

Assignee:

Darin Adler <darin>

Status:

ASSIGNED

Severity:

Normal

CC:

annevk, ap, darin, dglazkov, gustavo, jnd, mario, philn, webkit.review.bot, xan.lopez

Priority:

Version:

528+ (Nightly build)

Hardware:

All

OS:

All

Bug Depends on:

Bug Blocks:

179303

Attachments:

Description	Flags
Patch	ap: review+, buildbot: commit-queue-
Archive of layout-test-results from ec2-cr-linux-01	none

Antti Koivisto

Reported 2011-02-15 03:08:48 PST

http://trac.webkit.org/changeset/78541 added a zero length check to fix crashing bug 54418. This is not strictly correct as far as I understand as it should also set the error flag and possibly generate replacement characters.

Attachments
Patch (34.43 KB, patch) 2012-06-11 19:23 PDT, Darin Adler	ap: review+ buildbot: commit-queue-	Details Formatted Diff Diff
Archive of layout-test-results from ec2-cr-linux-01 (540.81 KB, application/zip) 2012-06-11 23:29 PDT, WebKit Review Bot	no flags	Details
View All Add attachment proposed patch, testcase, etc.

Darin Adler

Comment 1 2011-02-15 08:51:34 PST

Thanks for adding the check Antti. I’ll tackle this soon.

Alexey Proskuryakov

Comment 2 2011-02-15 08:52:21 PST

As mentioned in bug 54418, TextCodecUTF16 has the same problem.

Darin Adler

Comment 3 2011-02-15 08:56:29 PST

TextCodecUTF16 has a couple other mistakes as well. The main thing I need here is a good way to test codecs, maybe by using XMLHttpRequest or something. A way to feed in a specific number of bytes at a time. And also a way to test both the “stop on error” and “use replacement character” code paths. Then we can construct a great regression test and then fix these problems. I spotted another problem in the UTF8-codec that I would like to fix at the same time; if a partial sequence is invalid, then the remainder of that is left as a partial sequence. But the remainder could contain plain ASCII characters, and the code does not handle that path correctly.

Darin Adler

Comment 4 2011-02-22 17:26:06 PST

I have this fixed for the UTF-8 codec now, but my attempt to make a test case for this was unsuccessful.

Darin Adler

Comment 5 2012-06-11 19:23:16 PDT

Created attachment 146994 [details] Patch

Build Bot

Comment 6 2012-06-11 20:02:50 PDT

Comment on attachment 146994 [details] Patch Attachment 146994 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12949153

Alexey Proskuryakov

Comment 7 2012-06-11 22:49:08 PDT

Comment on attachment 146994 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=146994&action=review Please fix build issues found by EWS. > Source/WebCore/platform/text/TextCodecUTF16.cpp:77 > + if (m_haveBufferedByte && flush) { > + sawError = true; > + if (!stopOnError) > + return String(&replacementCharacter, 1); > + } > + return String(); I only had a very brief look at other decoders, but it seems like we may need to reset to clean state when flush == true. > Source/WebCore/platform/text/TextCodecUTF16.cpp:128 > + if (flush) { > + sawError = true; > + if (!stopOnError) { > + size_t numCharsInBuffer = q - buffer.characters(); > + if (numCharsInBuffer == buffer.length()) { > + buffer.resize(numCharsInBuffer + 1); > + q = buffer.characters() + numCharsInBuffer; > + } > + *q++ = replacementCharacter; > + } Ditto. > Source/WebCore/testing/Internals.h:51 > -class Internals : public RefCounted<Internals>, > - public FrameDestructionObserver { > +class Internals : public RefCounted<Internals>, private FrameDestructionObserver { Someone else is working on removing inheritance from FrameDestructionObserver now. > LayoutTests/fast/encoding/incremental-decoding.html:220 > + debug("Test requires WebKit internals object"); This is helpful for other browser vendors trying to open the test, but necessarily for people opening it in a WebKit based browser. Also, I'm surprised that the main branch is an else.

Alexey Proskuryakov

Comment 8 2012-06-11 22:49:40 PDT

*not* necessarily.

WebKit Review Bot

Comment 9 2012-06-11 23:29:50 PDT

Comment on attachment 146994 [details] Patch Attachment 146994 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/12937813 New failing tests: fast/parser/test-unicode-characters-in-attribute-name.html

WebKit Review Bot

Comment 10 2012-06-11 23:29:54 PDT

Created attachment 147017 [details] Archive of layout-test-results from ec2-cr-linux-01 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: ec2-cr-linux-01 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick

Darin Adler

Comment 11 2012-06-12 08:05:31 PDT

(In reply to comment #7) > fast/parser/test-unicode-characters-in-attribute-name.html I had overlooked this test failure, and it’s interesting. The test file has an extra byte at the end; it’s a newline character, but it’s a single byte. The old codec would ignore and drop that last byte. The new codec changes it into U+FFFD. I’m curious what other web browsers do with this stray byte. We should probably fix that test case and add coverage for this case to char-decoding.html by passing in a single byte.

Note You need to log in before you can comment on or make changes to this bug.