63231 – SharedBufferChunkReader does not deal properly with non printable characters

RESOLVED FIXED Bug 63231

SharedBufferChunkReader does not deal properly with non printable characters

https://bugs.webkit.org/show_bug.cgi?id=63231

Summary SharedBufferChunkReader does not deal properly with non printable characters

Jay Civelli

Reported 2011-06-22 23:59:46 PDT

The SharedBUfferChunkReader class currently returns a String when reading a chunk of data. This does not work well with binary data. To deal properly with binary data it should use Vector<char>.

Attachments
Patch (10.34 KB, patch) 2011-06-23 00:50 PDT, Jay Civelli	no flags	Details Formatted Diff Diff
Patch (10.50 KB, patch) 2011-06-23 10:54 PDT, Jay Civelli	no flags	Details Formatted Diff Diff
Unit-tests (11.43 KB, application/octet-stream) 2011-06-23 13:07 PDT, Jay Civelli	no flags	Details
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Jay Civelli

Comment 1 2011-06-23 00:50:56 PDT

Created attachment 98320 [details] Patch

Adam Barth

Comment 2 2011-06-23 09:52:46 PDT

Comment on attachment 98320 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review > Source/WebCore/ChangeLog:8 > + Fixing the SharedBufferChunkReader API so it works correctly with > + binary data (non printable characters). > + Also adding a method to peek at the data (this is needed for MHTML > + with binary parts). I'm slightly confused as to what this patch is doing. Is this changing any observable behavior? If so, we should add a test. This patch also appears to introduce some functions that aren't used. Is there a subsequent patch that will use these functions? > Source/WebCore/platform/SharedBufferChunkReader.cpp:84 > + chunk.append(m_separator.data(), m_separatorIndex); Can you add an assert that m_separatorIndex <= m_separator.size() ? > Source/WebCore/platform/SharedBufferChunkReader.cpp:110 > + return false; // Compiler is unhappy without this. Rather than have this comment, we usually ASSERT_NOT_REACHED() > Source/WebCore/platform/SharedBufferChunkReader.cpp:113 > +String SharedBufferChunkReader::nextChunkAsUTF8String(bool includeSeparator) Should we say nextChunkAsUTF8StringWithLatin1Fallback ? > Source/WebCore/platform/SharedBufferChunkReader.h:57 > + // Reads size bytes at the current location in the buffer, without changing the buffer position. > + // Returns the number of bytes read. That number might be less than the specified size if the end of the buffer was reached. > + size_t peek(Vector<char>&, size_t); This function doesn't appear to be used yet.

Jay Civelli

Comment 3 2011-06-23 10:35:02 PDT

Comment on attachment 98320 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review >> Source/WebCore/platform/SharedBufferChunkReader.cpp:84 >> + chunk.append(m_separator.data(), m_separatorIndex); > > Can you add an assert that m_separatorIndex <= m_separator.size() ? Done. >> Source/WebCore/platform/SharedBufferChunkReader.cpp:110 >> + return false; // Compiler is unhappy without this. > > Rather than have this comment, we usually ASSERT_NOT_REACHED() Done. >> Source/WebCore/platform/SharedBufferChunkReader.cpp:113 >> +String SharedBufferChunkReader::nextChunkAsUTF8String(bool includeSeparator) > > Should we say nextChunkAsUTF8StringWithLatin1Fallback ? Done. >> Source/WebCore/platform/SharedBufferChunkReader.h:57 >> + size_t peek(Vector<char>&, size_t); > > This function doesn't appear to be used yet. Yes, I need it for the MHTML binary part support. I have local unittest testing it. As soon as we have the unit-tests for WebCore working I'll add them. I'll upload them my unit-test to this bug so they are not lost.

Jay Civelli

Comment 4 2011-06-23 10:38:04 PDT

(In reply to comment #2) > (From update of attachment 98320 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review > > > Source/WebCore/ChangeLog:8 > > + Fixing the SharedBufferChunkReader API so it works correctly with > > + binary data (non printable characters). > > + Also adding a method to peek at the data (this is needed for MHTML > > + with binary parts). > > I'm slightly confused as to what this patch is doing. Is this changing any observable behavior? If so, we should add a test. This patch also appears to introduce some functions that aren't used. Is there a subsequent patch that will use these functions? Sorry for the confusion. My goal here is to change the API so it deals correctly with binary data. This is needed for an upcoming patch that allows binary parts in MHTML files. There are several issues with using String in this API when dealing with binary data: - each byte is transformed to a UTF16 char, using 2 bytes when it could use only one - when getting the data out of the string non UTF characters get messed up The peek method is also something I introduce as I'll need it for my upcoming patch.

Jay Civelli

Comment 5 2011-06-23 10:54:46 PDT

Created attachment 98370 [details] Patch

Jay Civelli

Comment 6 2011-06-23 10:58:25 PDT

(In reply to comment #4) > (In reply to comment #2) > > (From update of attachment 98320 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review > > > > > Source/WebCore/ChangeLog:8 > > > + Fixing the SharedBufferChunkReader API so it works correctly with > > > + binary data (non printable characters). > > > + Also adding a method to peek at the data (this is needed for MHTML > > > + with binary parts). > > > > I'm slightly confused as to what this patch is doing. Is this changing any observable behavior? If so, we should add a test. This patch also appears to introduce some functions that aren't used. Is there a subsequent patch that will use these functions? > > Sorry for the confusion. > My goal here is to change the API so it deals correctly with binary data. > This is needed for an upcoming patch that allows binary parts in MHTML files. > There are several issues with using String in this API when dealing with binary data: > - each byte is transformed to a UTF16 char, using 2 bytes when it could use only one > - when getting the data out of the string non UTF characters get messed up > > The peek method is also something I introduce as I'll need it for my upcoming patch. And also, this does not change any current behavior and the mhtml layout-tests ensure this did not break the MHTML support.

Adam Barth

Comment 7 2011-06-23 11:01:08 PDT

Comment on attachment 98370 [details] Patch Ok.

WebKit Review Bot

Comment 8 2011-06-23 12:29:49 PDT

Comment on attachment 98370 [details] Patch Clearing flags on attachment: 98370 Committed r89599: <http://trac.webkit.org/changeset/89599>

WebKit Review Bot

Comment 9 2011-06-23 12:29:53 PDT

All reviewed patches have been landed. Closing bug.

Jay Civelli

Comment 10 2011-06-23 13:07:26 PDT

Created attachment 98384 [details] Unit-tests Unit-tests that should be added to TestWebKitAPI once it has the WebCore dependency.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebCore Misc.

Assignee

Jay Civelli

Reported

2011-06-22 23:59 PDT

Modified

2011-06-23 13:07 PDT History

CC List

2 users Show

URL

Keywords

Depends on

Blocks