The SharedBUfferChunkReader class currently returns a String when reading a chunk of data. This does not work well with binary data. To deal properly with binary data it should use Vector<char>.
Created attachment 98320 [details] Patch
Comment on attachment 98320 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review > Source/WebCore/ChangeLog:8 > + Fixing the SharedBufferChunkReader API so it works correctly with > + binary data (non printable characters). > + Also adding a method to peek at the data (this is needed for MHTML > + with binary parts). I'm slightly confused as to what this patch is doing. Is this changing any observable behavior? If so, we should add a test. This patch also appears to introduce some functions that aren't used. Is there a subsequent patch that will use these functions? > Source/WebCore/platform/SharedBufferChunkReader.cpp:84 > + chunk.append(m_separator.data(), m_separatorIndex); Can you add an assert that m_separatorIndex <= m_separator.size() ? > Source/WebCore/platform/SharedBufferChunkReader.cpp:110 > + return false; // Compiler is unhappy without this. Rather than have this comment, we usually ASSERT_NOT_REACHED() > Source/WebCore/platform/SharedBufferChunkReader.cpp:113 > +String SharedBufferChunkReader::nextChunkAsUTF8String(bool includeSeparator) Should we say nextChunkAsUTF8StringWithLatin1Fallback ? > Source/WebCore/platform/SharedBufferChunkReader.h:57 > + // Reads size bytes at the current location in the buffer, without changing the buffer position. > + // Returns the number of bytes read. That number might be less than the specified size if the end of the buffer was reached. > + size_t peek(Vector<char>&, size_t); This function doesn't appear to be used yet.
Comment on attachment 98320 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review >> Source/WebCore/platform/SharedBufferChunkReader.cpp:84 >> + chunk.append(m_separator.data(), m_separatorIndex); > > Can you add an assert that m_separatorIndex <= m_separator.size() ? Done. >> Source/WebCore/platform/SharedBufferChunkReader.cpp:110 >> + return false; // Compiler is unhappy without this. > > Rather than have this comment, we usually ASSERT_NOT_REACHED() Done. >> Source/WebCore/platform/SharedBufferChunkReader.cpp:113 >> +String SharedBufferChunkReader::nextChunkAsUTF8String(bool includeSeparator) > > Should we say nextChunkAsUTF8StringWithLatin1Fallback ? Done. >> Source/WebCore/platform/SharedBufferChunkReader.h:57 >> + size_t peek(Vector<char>&, size_t); > > This function doesn't appear to be used yet. Yes, I need it for the MHTML binary part support. I have local unittest testing it. As soon as we have the unit-tests for WebCore working I'll add them. I'll upload them my unit-test to this bug so they are not lost.
(In reply to comment #2) > (From update of attachment 98320 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review > > > Source/WebCore/ChangeLog:8 > > + Fixing the SharedBufferChunkReader API so it works correctly with > > + binary data (non printable characters). > > + Also adding a method to peek at the data (this is needed for MHTML > > + with binary parts). > > I'm slightly confused as to what this patch is doing. Is this changing any observable behavior? If so, we should add a test. This patch also appears to introduce some functions that aren't used. Is there a subsequent patch that will use these functions? Sorry for the confusion. My goal here is to change the API so it deals correctly with binary data. This is needed for an upcoming patch that allows binary parts in MHTML files. There are several issues with using String in this API when dealing with binary data: - each byte is transformed to a UTF16 char, using 2 bytes when it could use only one - when getting the data out of the string non UTF characters get messed up The peek method is also something I introduce as I'll need it for my upcoming patch.
Created attachment 98370 [details] Patch
(In reply to comment #4) > (In reply to comment #2) > > (From update of attachment 98320 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review > > > > > Source/WebCore/ChangeLog:8 > > > + Fixing the SharedBufferChunkReader API so it works correctly with > > > + binary data (non printable characters). > > > + Also adding a method to peek at the data (this is needed for MHTML > > > + with binary parts). > > > > I'm slightly confused as to what this patch is doing. Is this changing any observable behavior? If so, we should add a test. This patch also appears to introduce some functions that aren't used. Is there a subsequent patch that will use these functions? > > Sorry for the confusion. > My goal here is to change the API so it deals correctly with binary data. > This is needed for an upcoming patch that allows binary parts in MHTML files. > There are several issues with using String in this API when dealing with binary data: > - each byte is transformed to a UTF16 char, using 2 bytes when it could use only one > - when getting the data out of the string non UTF characters get messed up > > The peek method is also something I introduce as I'll need it for my upcoming patch. And also, this does not change any current behavior and the mhtml layout-tests ensure this did not break the MHTML support.
Comment on attachment 98370 [details] Patch Ok.
Comment on attachment 98370 [details] Patch Clearing flags on attachment: 98370 Committed r89599: <http://trac.webkit.org/changeset/89599>
All reviewed patches have been landed. Closing bug.
Created attachment 98384 [details] Unit-tests Unit-tests that should be added to TestWebKitAPI once it has the WebCore dependency.