Bug 63231 - SharedBufferChunkReader does not deal properly with non printable characters
Summary: SharedBufferChunkReader does not deal properly with non printable characters
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jay Civelli
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-22 23:59 PDT by Jay Civelli
Modified: 2011-06-23 13:07 PDT (History)
2 users (show)

See Also:


Attachments
Patch (10.34 KB, patch)
2011-06-23 00:50 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (10.50 KB, patch)
2011-06-23 10:54 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Unit-tests (11.43 KB, application/octet-stream)
2011-06-23 13:07 PDT, Jay Civelli
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jay Civelli 2011-06-22 23:59:46 PDT
The SharedBUfferChunkReader class currently returns a String when reading a chunk of data.
This does not work well with binary data.
To deal properly with binary data it should use Vector<char>.
Comment 1 Jay Civelli 2011-06-23 00:50:56 PDT
Created attachment 98320 [details]
Patch
Comment 2 Adam Barth 2011-06-23 09:52:46 PDT
Comment on attachment 98320 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review

> Source/WebCore/ChangeLog:8
> +        Fixing the SharedBufferChunkReader API so it works correctly with
> +        binary data (non printable characters).
> +        Also adding a method to peek at the data (this is needed for MHTML
> +        with binary parts).

I'm slightly confused as to what this patch is doing.  Is this changing any observable behavior?  If so, we should add a test.  This patch also appears to introduce some functions that aren't used.  Is there a subsequent patch that will use these functions?

> Source/WebCore/platform/SharedBufferChunkReader.cpp:84
> +                    chunk.append(m_separator.data(), m_separatorIndex);

Can you add an assert that m_separatorIndex <= m_separator.size() ?

> Source/WebCore/platform/SharedBufferChunkReader.cpp:110
> +    return false; // Compiler is unhappy without this.

Rather than have this comment, we usually ASSERT_NOT_REACHED()

> Source/WebCore/platform/SharedBufferChunkReader.cpp:113
> +String SharedBufferChunkReader::nextChunkAsUTF8String(bool includeSeparator)

Should we say nextChunkAsUTF8StringWithLatin1Fallback ?

> Source/WebCore/platform/SharedBufferChunkReader.h:57
> +    // Reads size bytes at the current location in the buffer, without changing the buffer position.
> +    // Returns the number of bytes read. That number might be less than the specified size if the end of the buffer was reached.
> +    size_t peek(Vector<char>&, size_t);

This function doesn't appear to be used yet.
Comment 3 Jay Civelli 2011-06-23 10:35:02 PDT
Comment on attachment 98320 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review

>> Source/WebCore/platform/SharedBufferChunkReader.cpp:84
>> +                    chunk.append(m_separator.data(), m_separatorIndex);
> 
> Can you add an assert that m_separatorIndex <= m_separator.size() ?

Done.

>> Source/WebCore/platform/SharedBufferChunkReader.cpp:110
>> +    return false; // Compiler is unhappy without this.
> 
> Rather than have this comment, we usually ASSERT_NOT_REACHED()

Done.

>> Source/WebCore/platform/SharedBufferChunkReader.cpp:113
>> +String SharedBufferChunkReader::nextChunkAsUTF8String(bool includeSeparator)
> 
> Should we say nextChunkAsUTF8StringWithLatin1Fallback ?

Done.

>> Source/WebCore/platform/SharedBufferChunkReader.h:57
>> +    size_t peek(Vector<char>&, size_t);
> 
> This function doesn't appear to be used yet.

Yes, I need it for the MHTML binary part support.
I have local unittest testing it. As soon as we have the unit-tests for WebCore working I'll add them.
I'll upload them my unit-test to this bug so they are not lost.
Comment 4 Jay Civelli 2011-06-23 10:38:04 PDT
(In reply to comment #2)
> (From update of attachment 98320 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review
> 
> > Source/WebCore/ChangeLog:8
> > +        Fixing the SharedBufferChunkReader API so it works correctly with
> > +        binary data (non printable characters).
> > +        Also adding a method to peek at the data (this is needed for MHTML
> > +        with binary parts).
> 
> I'm slightly confused as to what this patch is doing.  Is this changing any observable behavior?  If so, we should add a test.  This patch also appears to introduce some functions that aren't used.  Is there a subsequent patch that will use these functions?

Sorry for the confusion.
My goal here is to change the API so it deals correctly with binary data.
This is needed for an upcoming patch that allows binary parts in MHTML files.
There are several issues with using String in this API when dealing with binary data:
- each byte is transformed to a UTF16 char, using 2 bytes when it could use only one
- when getting the data out of the string non UTF characters get messed up

The peek method is also something I introduce as I'll need it for my upcoming patch.
Comment 5 Jay Civelli 2011-06-23 10:54:46 PDT
Created attachment 98370 [details]
Patch
Comment 6 Jay Civelli 2011-06-23 10:58:25 PDT
(In reply to comment #4)
> (In reply to comment #2)
> > (From update of attachment 98320 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=98320&action=review
> > 
> > > Source/WebCore/ChangeLog:8
> > > +        Fixing the SharedBufferChunkReader API so it works correctly with
> > > +        binary data (non printable characters).
> > > +        Also adding a method to peek at the data (this is needed for MHTML
> > > +        with binary parts).
> > 
> > I'm slightly confused as to what this patch is doing.  Is this changing any observable behavior?  If so, we should add a test.  This patch also appears to introduce some functions that aren't used.  Is there a subsequent patch that will use these functions?
> 
> Sorry for the confusion.
> My goal here is to change the API so it deals correctly with binary data.
> This is needed for an upcoming patch that allows binary parts in MHTML files.
> There are several issues with using String in this API when dealing with binary data:
> - each byte is transformed to a UTF16 char, using 2 bytes when it could use only one
> - when getting the data out of the string non UTF characters get messed up
> 
> The peek method is also something I introduce as I'll need it for my upcoming patch.

And also, this does not change any current behavior and the mhtml layout-tests ensure this did not break the MHTML support.
Comment 7 Adam Barth 2011-06-23 11:01:08 PDT
Comment on attachment 98370 [details]
Patch

Ok.
Comment 8 WebKit Review Bot 2011-06-23 12:29:49 PDT
Comment on attachment 98370 [details]
Patch

Clearing flags on attachment: 98370

Committed r89599: <http://trac.webkit.org/changeset/89599>
Comment 9 WebKit Review Bot 2011-06-23 12:29:53 PDT
All reviewed patches have been landed.  Closing bug.
Comment 10 Jay Civelli 2011-06-23 13:07:26 PDT
Created attachment 98384 [details]
Unit-tests

Unit-tests that should be added to TestWebKitAPI once it has the WebCore dependency.