WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
ASSIGNED
Bug 54444
Correct handling of end of buffer partial sequence in UTF-8 and UTF-16 decoders when flushing with zero length
https://bugs.webkit.org/show_bug.cgi?id=54444
Summary
Correct handling of end of buffer partial sequence in UTF-8 and UTF-16 decode...
Antti Koivisto
Reported
2011-02-15 03:08:48 PST
http://trac.webkit.org/changeset/78541
added a zero length check to fix crashing
bug 54418
. This is not strictly correct as far as I understand as it should also set the error flag and possibly generate replacement characters.
Attachments
Patch
(34.43 KB, patch)
2012-06-11 19:23 PDT
,
Darin Adler
ap
: review+
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from ec2-cr-linux-01
(540.81 KB, application/zip)
2012-06-11 23:29 PDT
,
WebKit Review Bot
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Darin Adler
Comment 1
2011-02-15 08:51:34 PST
Thanks for adding the check Antti. I’ll tackle this soon.
Alexey Proskuryakov
Comment 2
2011-02-15 08:52:21 PST
As mentioned in
bug 54418
, TextCodecUTF16 has the same problem.
Darin Adler
Comment 3
2011-02-15 08:56:29 PST
TextCodecUTF16 has a couple other mistakes as well. The main thing I need here is a good way to test codecs, maybe by using XMLHttpRequest or something. A way to feed in a specific number of bytes at a time. And also a way to test both the “stop on error” and “use replacement character” code paths. Then we can construct a great regression test and then fix these problems. I spotted another problem in the UTF8-codec that I would like to fix at the same time; if a partial sequence is invalid, then the remainder of that is left as a partial sequence. But the remainder could contain plain ASCII characters, and the code does not handle that path correctly.
Darin Adler
Comment 4
2011-02-22 17:26:06 PST
I have this fixed for the UTF-8 codec now, but my attempt to make a test case for this was unsuccessful.
Darin Adler
Comment 5
2012-06-11 19:23:16 PDT
Created
attachment 146994
[details]
Patch
Build Bot
Comment 6
2012-06-11 20:02:50 PDT
Comment on
attachment 146994
[details]
Patch
Attachment 146994
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/12949153
Alexey Proskuryakov
Comment 7
2012-06-11 22:49:08 PDT
Comment on
attachment 146994
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=146994&action=review
Please fix build issues found by EWS.
> Source/WebCore/platform/text/TextCodecUTF16.cpp:77 > + if (m_haveBufferedByte && flush) { > + sawError = true; > + if (!stopOnError) > + return String(&replacementCharacter, 1); > + } > + return String();
I only had a very brief look at other decoders, but it seems like we may need to reset to clean state when flush == true.
> Source/WebCore/platform/text/TextCodecUTF16.cpp:128 > + if (flush) { > + sawError = true; > + if (!stopOnError) { > + size_t numCharsInBuffer = q - buffer.characters(); > + if (numCharsInBuffer == buffer.length()) { > + buffer.resize(numCharsInBuffer + 1); > + q = buffer.characters() + numCharsInBuffer; > + } > + *q++ = replacementCharacter; > + }
Ditto.
> Source/WebCore/testing/Internals.h:51 > -class Internals : public RefCounted<Internals>, > - public FrameDestructionObserver { > +class Internals : public RefCounted<Internals>, private FrameDestructionObserver {
Someone else is working on removing inheritance from FrameDestructionObserver now.
> LayoutTests/fast/encoding/incremental-decoding.html:220 > + debug("Test requires WebKit internals object");
This is helpful for other browser vendors trying to open the test, but necessarily for people opening it in a WebKit based browser. Also, I'm surprised that the main branch is an else.
Alexey Proskuryakov
Comment 8
2012-06-11 22:49:40 PDT
*not* necessarily.
WebKit Review Bot
Comment 9
2012-06-11 23:29:50 PDT
Comment on
attachment 146994
[details]
Patch
Attachment 146994
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/12937813
New failing tests: fast/parser/test-unicode-characters-in-attribute-name.html
WebKit Review Bot
Comment 10
2012-06-11 23:29:54 PDT
Created
attachment 147017
[details]
Archive of layout-test-results from ec2-cr-linux-01 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: ec2-cr-linux-01 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Darin Adler
Comment 11
2012-06-12 08:05:31 PDT
(In reply to
comment #7
)
> fast/parser/test-unicode-characters-in-attribute-name.html
I had overlooked this test failure, and it’s interesting. The test file has an extra byte at the end; it’s a newline character, but it’s a single byte. The old codec would ignore and drop that last byte. The new codec changes it into U+FFFD. I’m curious what other web browsers do with this stray byte. We should probably fix that test case and add coverage for this case to char-decoding.html by passing in a single byte.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug