A utility class that provides a way to read a SharedBuffer line by line (CR-LF terminated) is needed for parsing of MIME headers, which is required for MHTML support.
Created attachment 91938 [details] Patch
Comment on attachment 91938 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=91938&action=review Below are a ton of nit-picks. I'm not super excited about the recursive approach to this problem. Seems like we could get in trouble with a long line broken into many segments. It seems like what you really want here is a state machine that gets fed a character at a time and keeps track of the CR/LF state but is oblivious to the segmentation of the shared buffer. Driving that, you want a loop that's scanning through a buffer and driving the state machine. Driving that, you want a loop that's scanning through the buffer segments. That approach removes the recursion and most of the special cases for CR/LFs splitting over segment boundaries. We need to do something similar for the InputStreamPreprocessor in the HTML parser: http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLTokenizer.h#L175 but I suspect it might be difficult to share code with the HTML parser because they're at different layers and the HTML parser as some specialized needs. > Source/WebCore/WebCore.xcodeproj/project.pbxproj:-22764 > - developmentRegion = English; I think this means you should to upgrade your version of Xcode (although just within the 3.x.x series). > Source/WebCore/platform/SharedBufferLineReader.cpp:41 > + , m_bufferPos(0) m_bufferPos => m_bufferPosition > Source/WebCore/platform/SharedBufferLineReader.cpp:57 > +void SharedBufferLineReader::readNextLine(StringBuilder* line) WebKit uses reference parameters as out-params, not pointers. > Source/WebCore/platform/SharedBufferLineReader.cpp:78 > + int eolIndex = getEOLIndex(); > + if (eolIndex != -1) { Generally WebKit founds upon in-band signaling. > Source/WebCore/platform/SharedBufferLineReader.cpp:87 > + unsigned lenToCopy = m_segmentLength - m_segmentIndex - (m_previousSegmentEndedWithCR ? 1 : 0); lenToCopy => lengthToCopy. > Source/WebCore/platform/SharedBufferLineReader.h:34 > +#include "PlatformString.h" #include <wtf/text/WTFString.h> > Source/WebCore/platform/SharedBufferLineReader.h:45 > +// Utility class that allows to read CR-LF terminated lines from a SharedBuffer. > +class SharedBufferLineReader { I'd remove the comment and put the CR-LF notion in the name of the class somehow. > Source/WebCore/platform/SharedBufferLineReader.h:49 > + bool isEOF() const { return m_eof; } I'm not sure you need this function. You can tell if you've reached EOF because you'll get a null string (as distinguished from an empty string). > Source/WebCore/platform/SharedBufferLineReader.h:52 > + // Returns the next line read from the buffer. > + // Returns an empty string once the EOF has been reached I'd remove this comment. Also, you should return a null string at EOF rather than an empty string. > Source/WebCore/platform/SharedBufferLineReader.h:56 > + void readNextLine(WTF::StringBuilder* line); WTF:: isn't needed here. > Source/WebCore/platform/SharedBufferLineReader.h:58 > + // Reads a new segment. Returns false if the end of the buffer was reached. Probably should remove this comment. > Source/WebCore/platform/SharedBufferLineReader.h:62 > + // Returns the index of the next CR-LF, or -1 if the end of the segment was reached without a CR-LF. > + int getEOLIndex(); Please don't start function names with "get". As mentioned earlier, we prefer not to have in-band signaling, but we do use it for find-type operations. Consider: size_t findEndOfLine() where you return notFound when there is not end of line. Notice that we should use size_t rather than int when talking about offsets in memory. > Source/WebCore/platform/SharedBufferLineReader.h:65 > + unsigned m_bufferPos; Probably size_t > Source/WebCore/platform/SharedBufferLineReader.h:69 > + unsigned m_segmentLength; > + unsigned m_segmentIndex; Also probably size_t > Source/WebCore/platform/SharedBufferLineReader.h:70 > + bool m_eof; m_reachedEndOfFile ? > Source/WebCore/platform/SharedBufferLineReader.h:76 > +#endif // SharedBufferLineReader_h No need for this comment.
Created attachment 92126 [details] Patch
Created attachment 92128 [details] Patch
Comment on attachment 91938 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=91938&action=review >> Source/WebCore/WebCore.xcodeproj/project.pbxproj:-22764 >> - developmentRegion = English; > > I think this means you should to upgrade your version of Xcode (although just within the 3.x.x series). Ah, done. >> Source/WebCore/platform/SharedBufferLineReader.cpp:41 >> + , m_bufferPos(0) > > m_bufferPos => m_bufferPosition Done. >> Source/WebCore/platform/SharedBufferLineReader.cpp:57 >> +void SharedBufferLineReader::readNextLine(StringBuilder* line) > > WebKit uses reference parameters as out-params, not pointers. Method is now gone (but I'll try to remember it for next patch :-) >> Source/WebCore/platform/SharedBufferLineReader.h:34 >> +#include "PlatformString.h" > > #include <wtf/text/WTFString.h> Fixed. >> Source/WebCore/platform/SharedBufferLineReader.h:45 >> +class SharedBufferLineReader { > > I'd remove the comment and put the CR-LF notion in the name of the class somehow. OK, class renamed. >> Source/WebCore/platform/SharedBufferLineReader.h:49 >> + bool isEOF() const { return m_eof; } > > I'm not sure you need this function. You can tell if you've reached EOF because you'll get a null string (as distinguished from an empty string). Good idea. >> Source/WebCore/platform/SharedBufferLineReader.h:52 >> + // Returns an empty string once the EOF has been reached > > I'd remove this comment. Also, you should return a null string at EOF rather than an empty string. Done. >> Source/WebCore/platform/SharedBufferLineReader.h:56 >> + void readNextLine(WTF::StringBuilder* line); > > WTF:: isn't needed here. Method is gone. >> Source/WebCore/platform/SharedBufferLineReader.h:58 >> + // Reads a new segment. Returns false if the end of the buffer was reached. > > Probably should remove this comment. OK. >> Source/WebCore/platform/SharedBufferLineReader.h:62 >> + int getEOLIndex(); > > Please don't start function names with "get". As mentioned earlier, we prefer not to have in-band signaling, but we do use it for find-type operations. Consider: > > size_t findEndOfLine() > > where you return notFound when there is not end of line. Notice that we should use size_t rather than int when talking about offsets in memory. Method is gone. >> Source/WebCore/platform/SharedBufferLineReader.h:65 >> + unsigned m_bufferPos; > > Probably size_t Switched to size_t. I was always unclear about that since in many places I found unsigned been used. >> Source/WebCore/platform/SharedBufferLineReader.h:70 >> + bool m_eof; > > m_reachedEndOfFile ? Renamed. >> Source/WebCore/platform/SharedBufferLineReader.h:76 >> +#endif // SharedBufferLineReader_h > > No need for this comment. Removed.
Adam, I removed the recursion in the new patch as you suggested, it makes the patch simpler. (I hacked some unit-tests locally to test this patch. I'll keep them around in case we get unit-tests in WebKit in the future).
Comment on attachment 92128 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=92128&action=review Looks great! Thanks. > Source/WebCore/platform/SharedBufferCRLFLineReader.cpp:67 > + stringBuilder.append(currentCharacter); I presume you looked also, but StringBuilder seems to handle character-at-a-time building fine. > Source/WebCore/platform/SharedBufferCRLFLineReader.cpp:87 > + return String(); This line of code is unreachable, right? Can we remove it without the compiler getting sad?
Comment on attachment 92128 [details] Patch Clearing flags on attachment: 92128 Committed r85837: <http://trac.webkit.org/changeset/85837>
All reviewed patches have been landed. Closing bug.
Created attachment 92430 [details] Code used for unit-testing that class during development Attaching the code I used to unit-test the class while I was writing the code. COuld be useful if/when WebKit gets unit-tests.
(In reply to comment #10) > Created an attachment (id=92430) [details] > Code used for unit-testing that class during development > > Attaching the code I used to unit-test the class while I was writing the code. > COuld be useful if/when WebKit gets unit-tests. TestWebKitAPI exists now. I think there was a discussion about using gtest recently on webkit-dev.
> TestWebKitAPI exists now. I think there was a discussion about using gtest recently on webkit-dev. Can either of these test code in WebCore?
(In reply to comment #12) > > TestWebKitAPI exists now. I think there was a discussion about using gtest recently on webkit-dev. > > Can either of these test code in WebCore? dslomov is converting TestWebKitAPI to be gtest based, so I would imagine they will have similar restrictions. Jay, I would recommend that you trying using TestWebKitAPI right now and see if it will work for you (or make it work for you).