Bug 59946 - We need a utility class that read lines out of SharedBuffers
Summary: We need a utility class that read lines out of SharedBuffers
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Jay Civelli
URL:
Keywords:
Depends on:
Blocks: 7168
  Show dependency treegraph
 
Reported: 2011-05-02 10:21 PDT by Jay Civelli
Modified: 2011-05-05 10:42 PDT (History)
7 users (show)

See Also:


Attachments
Patch (16.40 KB, patch)
2011-05-02 10:34 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (14.86 KB, patch)
2011-05-03 14:04 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (14.86 KB, patch)
2011-05-03 14:13 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Code used for unit-testing that class during development (5.10 KB, application/octet-stream)
2011-05-05 09:57 PDT, Jay Civelli
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jay Civelli 2011-05-02 10:21:56 PDT
A utility class that provides a way to read a SharedBuffer line by line (CR-LF terminated) is needed for parsing of MIME headers, which is required for MHTML support.
Comment 1 Jay Civelli 2011-05-02 10:34:36 PDT
Created attachment 91938 [details]
Patch
Comment 2 Adam Barth 2011-05-02 12:16:08 PDT
Comment on attachment 91938 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91938&action=review

Below are a ton of nit-picks.  I'm not super excited about the recursive approach to this problem.  Seems like we could get in trouble with a long line broken into many segments.  It seems like what you really want here is a state machine that gets fed a character at a time and keeps track of the CR/LF state but is oblivious to the segmentation of the shared buffer.  Driving that, you want a loop that's scanning through a buffer and driving the state machine.  Driving that, you want a loop that's scanning through the buffer segments.  That approach removes the recursion and most of the special cases for CR/LFs splitting over segment boundaries.

We need to do something similar for the InputStreamPreprocessor in the HTML parser:

http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLTokenizer.h#L175

but I suspect it might be difficult to share code with the HTML parser because they're at different layers and the HTML parser as some specialized needs.

> Source/WebCore/WebCore.xcodeproj/project.pbxproj:-22764
> -			developmentRegion = English;

I think this means you should to upgrade your version of Xcode (although just within the 3.x.x series).

> Source/WebCore/platform/SharedBufferLineReader.cpp:41
> +    , m_bufferPos(0)

m_bufferPos => m_bufferPosition

> Source/WebCore/platform/SharedBufferLineReader.cpp:57
> +void SharedBufferLineReader::readNextLine(StringBuilder* line)

WebKit uses reference parameters as out-params, not pointers.

> Source/WebCore/platform/SharedBufferLineReader.cpp:78
> +    int eolIndex = getEOLIndex();
> +    if (eolIndex != -1) {

Generally WebKit founds upon in-band signaling.

> Source/WebCore/platform/SharedBufferLineReader.cpp:87
> +    unsigned lenToCopy = m_segmentLength - m_segmentIndex - (m_previousSegmentEndedWithCR ? 1 : 0);

lenToCopy => lengthToCopy.

> Source/WebCore/platform/SharedBufferLineReader.h:34
> +#include "PlatformString.h"

#include <wtf/text/WTFString.h>

> Source/WebCore/platform/SharedBufferLineReader.h:45
> +// Utility class that allows to read CR-LF terminated lines from a SharedBuffer.
> +class SharedBufferLineReader {

I'd remove the comment and put the CR-LF notion in the name of the class somehow.

> Source/WebCore/platform/SharedBufferLineReader.h:49
> +    bool isEOF() const { return m_eof; }

I'm not sure you need this function.  You can tell if you've reached EOF because you'll get a null string (as distinguished from an empty string).

> Source/WebCore/platform/SharedBufferLineReader.h:52
> +    // Returns the next line read from the buffer.
> +    // Returns an empty string once the EOF has been reached

I'd remove this comment.  Also, you should return a null string at EOF rather than an empty string.

> Source/WebCore/platform/SharedBufferLineReader.h:56
> +    void readNextLine(WTF::StringBuilder* line);

WTF:: isn't needed here.

> Source/WebCore/platform/SharedBufferLineReader.h:58
> +    // Reads a new segment. Returns false if the end of the buffer was reached.

Probably should remove this comment.

> Source/WebCore/platform/SharedBufferLineReader.h:62
> +    // Returns the index of the next CR-LF, or -1 if the end of the segment was reached without a CR-LF.
> +    int getEOLIndex();

Please don't start function names with "get".  As mentioned earlier, we prefer not to have in-band signaling, but we do use it for find-type operations.  Consider:

size_t findEndOfLine()

where you return notFound when there is not end of line.  Notice that we should use size_t rather than int when talking about offsets in memory.

> Source/WebCore/platform/SharedBufferLineReader.h:65
> +    unsigned m_bufferPos;

Probably size_t

> Source/WebCore/platform/SharedBufferLineReader.h:69
> +    unsigned m_segmentLength;
> +    unsigned m_segmentIndex;

Also probably size_t

> Source/WebCore/platform/SharedBufferLineReader.h:70
> +    bool m_eof;

m_reachedEndOfFile ?

> Source/WebCore/platform/SharedBufferLineReader.h:76
> +#endif // SharedBufferLineReader_h

No need for this comment.
Comment 3 Jay Civelli 2011-05-03 14:04:20 PDT
Created attachment 92126 [details]
Patch
Comment 4 Jay Civelli 2011-05-03 14:13:58 PDT
Created attachment 92128 [details]
Patch
Comment 5 Jay Civelli 2011-05-03 14:14:44 PDT
Comment on attachment 91938 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91938&action=review

>> Source/WebCore/WebCore.xcodeproj/project.pbxproj:-22764
>> -			developmentRegion = English;
> 
> I think this means you should to upgrade your version of Xcode (although just within the 3.x.x series).

Ah, done.

>> Source/WebCore/platform/SharedBufferLineReader.cpp:41
>> +    , m_bufferPos(0)
> 
> m_bufferPos => m_bufferPosition

Done.

>> Source/WebCore/platform/SharedBufferLineReader.cpp:57
>> +void SharedBufferLineReader::readNextLine(StringBuilder* line)
> 
> WebKit uses reference parameters as out-params, not pointers.

Method is now gone (but I'll try to remember it for next patch :-)

>> Source/WebCore/platform/SharedBufferLineReader.h:34
>> +#include "PlatformString.h"
> 
> #include <wtf/text/WTFString.h>

Fixed.

>> Source/WebCore/platform/SharedBufferLineReader.h:45
>> +class SharedBufferLineReader {
> 
> I'd remove the comment and put the CR-LF notion in the name of the class somehow.

OK, class renamed.

>> Source/WebCore/platform/SharedBufferLineReader.h:49
>> +    bool isEOF() const { return m_eof; }
> 
> I'm not sure you need this function.  You can tell if you've reached EOF because you'll get a null string (as distinguished from an empty string).

Good idea.

>> Source/WebCore/platform/SharedBufferLineReader.h:52
>> +    // Returns an empty string once the EOF has been reached
> 
> I'd remove this comment.  Also, you should return a null string at EOF rather than an empty string.

Done.

>> Source/WebCore/platform/SharedBufferLineReader.h:56
>> +    void readNextLine(WTF::StringBuilder* line);
> 
> WTF:: isn't needed here.

Method is gone.

>> Source/WebCore/platform/SharedBufferLineReader.h:58
>> +    // Reads a new segment. Returns false if the end of the buffer was reached.
> 
> Probably should remove this comment.

OK.

>> Source/WebCore/platform/SharedBufferLineReader.h:62
>> +    int getEOLIndex();
> 
> Please don't start function names with "get".  As mentioned earlier, we prefer not to have in-band signaling, but we do use it for find-type operations.  Consider:
> 
> size_t findEndOfLine()
> 
> where you return notFound when there is not end of line.  Notice that we should use size_t rather than int when talking about offsets in memory.

Method is gone.

>> Source/WebCore/platform/SharedBufferLineReader.h:65
>> +    unsigned m_bufferPos;
> 
> Probably size_t

Switched to size_t.
I was always unclear about that since in many places I found unsigned been used.

>> Source/WebCore/platform/SharedBufferLineReader.h:70
>> +    bool m_eof;
> 
> m_reachedEndOfFile ?

Renamed.

>> Source/WebCore/platform/SharedBufferLineReader.h:76
>> +#endif // SharedBufferLineReader_h
> 
> No need for this comment.

Removed.
Comment 6 Jay Civelli 2011-05-03 14:18:05 PDT
Adam,
I removed the recursion in the new patch as you suggested, it makes the patch simpler.
(I hacked some unit-tests locally to test this patch. I'll keep them around in case we get unit-tests in WebKit in the future).
Comment 7 Adam Barth 2011-05-04 16:45:14 PDT
Comment on attachment 92128 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92128&action=review

Looks great!  Thanks.

> Source/WebCore/platform/SharedBufferCRLFLineReader.cpp:67
> +                stringBuilder.append(currentCharacter);

I presume you looked also, but StringBuilder seems to handle character-at-a-time building fine.

> Source/WebCore/platform/SharedBufferCRLFLineReader.cpp:87
> +    return String();

This line of code is unreachable, right?  Can we remove it without the compiler getting sad?
Comment 8 WebKit Commit Bot 2011-05-04 23:41:53 PDT
Comment on attachment 92128 [details]
Patch

Clearing flags on attachment: 92128

Committed r85837: <http://trac.webkit.org/changeset/85837>
Comment 9 WebKit Commit Bot 2011-05-04 23:41:58 PDT
All reviewed patches have been landed.  Closing bug.
Comment 10 Jay Civelli 2011-05-05 09:57:35 PDT
Created attachment 92430 [details]
Code used for unit-testing that class during development

Attaching the code I used to unit-test the class while I was writing the code.
COuld be useful if/when WebKit gets unit-tests.
Comment 11 David Kilzer (:ddkilzer) 2011-05-05 10:30:23 PDT
(In reply to comment #10)
> Created an attachment (id=92430) [details]
> Code used for unit-testing that class during development
> 
> Attaching the code I used to unit-test the class while I was writing the code.
> COuld be useful if/when WebKit gets unit-tests.

TestWebKitAPI exists now.  I think there was a discussion about using gtest recently on webkit-dev.
Comment 12 Adam Barth 2011-05-05 10:38:57 PDT
> TestWebKitAPI exists now.  I think there was a discussion about using gtest recently on webkit-dev.

Can either of these test code in WebCore?
Comment 13 David Levin 2011-05-05 10:42:01 PDT
(In reply to comment #12)
> > TestWebKitAPI exists now.  I think there was a discussion about using gtest recently on webkit-dev.
> 
> Can either of these test code in WebCore?

dslomov is converting TestWebKitAPI to be gtest based, so I would imagine they will have similar restrictions. 

Jay, I would recommend that you trying using TestWebKitAPI right now and see if it will work for you (or make it work for you).