WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
60637
We should have a Content-Type parser
https://bugs.webkit.org/show_bug.cgi?id=60637
Summary
We should have a Content-Type parser
Jay Civelli
Reported
2011-05-11 10:15:26 PDT
We should have a Content-Type parser that can parse the MIME type and all specified parameters. This is needed for MHTML reading support. The code in HTTPParsers only parse the MIME type and charset parameter.
Attachments
Patch
(12.06 KB, patch)
2011-05-11 10:33 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch
(12.46 KB, patch)
2011-05-11 11:30 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch
(11.86 KB, patch)
2011-05-11 14:10 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
ContentTypeParser unit-tests
(5.14 KB, application/octet-stream)
2011-05-11 17:25 PDT
,
Jay Civelli
no flags
Details
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Jay Civelli
Comment 1
2011-05-11 10:33:07 PDT
Created
attachment 93146
[details]
Patch
Darin Adler
Comment 2
2011-05-11 10:47:03 PDT
Comment on
attachment 93146
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=93146&action=review
Didn’t have chance for a whole review, but did have a couple comments.
> Source/JavaScriptCore/wtf/ASCIICType.h:70 > + inline bool isASCIIControl(char c) { return c <= ' '; }
All functions in this file should be overridden for all the same types, not just for char. This new function will return true for 0x80-0xFF on platforms where char is signed, so it’s incorrect and false for platforms where char is unsigned, so either way it’s incorrect. I’m not sure that “control character” is a clearly-enough-defined concept that this helper belongs in this file; I’m especially unsure that space should be considered a control character.
> Source/WebCore/platform/network/ContentTypeParser.cpp:41 > +static void skipWhiteSpaces(const String& input, size_t& startIndex)
This skips spaces only. The term “white spaces” is usually reserved for places where there are things other than a space.
Jay Civelli
Comment 3
2011-05-11 11:30:36 PDT
Created
attachment 93151
[details]
Patch
Jay Civelli
Comment 4
2011-05-11 11:31:27 PDT
Comment on
attachment 93146
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=93146&action=review
>> Source/JavaScriptCore/wtf/ASCIICType.h:70 >> + inline bool isASCIIControl(char c) { return c <= ' '; } > > All functions in this file should be overridden for all the same types, not just for char. > > This new function will return true for 0x80-0xFF on platforms where char is signed, so it’s incorrect and false for platforms where char is unsigned, so either way it’s incorrect. > > I’m not sure that “control character” is a clearly-enough-defined concept that this helper belongs in this file; I’m especially unsure that space should be considered a control character.
You are right, space is not a control character (but DEL is that I had forgotten). I fixed that and tried to fix the signed/unsigned problems. I thought ASCII control characters were common enough (see
http://en.wikipedia.org/wiki/Ascii#ASCII_control_characters
) that the function could be helpful to others. Let me know if you feel strongly about it, in which case I'll move this function it to ContentTypeParser.cpp.
>> Source/WebCore/platform/network/ContentTypeParser.cpp:41 >> +static void skipWhiteSpaces(const String& input, size_t& startIndex) > > This skips spaces only. The term “white spaces” is usually reserved for places where there are things other than a space.
Good point, changed.
Adam Barth
Comment 5
2011-05-11 11:50:15 PDT
Comment on
attachment 93151
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=93151&action=review
I would have written this slightly differently, but this looks fine. The approach you've used makes it slightly difficult to verify that you're always bounds checking correctly. It would also be nice to put the grammar from the RFC into a comment in the code for documentation.
> Source/WebCore/platform/network/ContentTypeParser.cpp:95 > + , m_inputIndex(0)
That's the point of this member?
> Source/WebCore/platform/network/ContentTypeParser.cpp:123 > + size_t index = strlen(contentTypeParameterName);
It feels like contentTypeParameterName should be a DECLARE_STATIC_LOCAL(String, contentTypeParameterName, ("Content-Type")) in this function. Then you could just use contentTypeParameterName.length().
> Source/WebCore/platform/network/ContentTypeParser.cpp:146 > + // Should we tolerate spaces here?
What does the RFC say?
> Source/WebCore/platform/network/ContentTypeParser.cpp:170 > + m_parameters.add(key, value);
What happens if there are more than one parameter with the same name?
Adam Barth
Comment 6
2011-05-11 11:52:17 PDT
If you're interested, here's a parser I wrote recently that uses a slightly different style:
http://trac.webkit.org/browser/trunk/Source/WebCore/page/ContentSecurityPolicy.cpp
That parser is somewhat different that this one because it uses a segment-and-parse strategy to get the particular error handling we're supposed to use.
Adam Barth
Comment 7
2011-05-11 11:54:25 PDT
Comment on
attachment 93151
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=93151&action=review
I'm going to mark this R+, but other folks should feel free to comment as well. Please address the (minor) feedback above before landing. I presume this is going to be tested in subsequent MHTML patchs.
> Source/JavaScriptCore/wtf/ASCIICType.h:76 > + inline bool isASCIIControl(char c) { return (c >= 0 && c < ' ') || c == 0x7F; } > + inline bool isASCIIControl(unsigned short c) { return c < ' ' || c == 0x7F; } > +#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED) > + inline bool isASCIIControl(wchar_t c) { return (c >= 0 && c < ' ') || c == 0x7F; } > +#endif > + inline bool isASCIIControl(int c) { return (c >= 0 && c < ' ') || c == 0x7F; } > + inline bool isASCIIControl(unsigned c) { return c < ' ' || c == 0x7F; }
This feels like a lot of copy/paste, but I guess that's how this file rolls.
Darin Adler
Comment 8
2011-05-11 12:03:48 PDT
Comment on
attachment 93151
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=93151&action=review
>> Source/JavaScriptCore/wtf/ASCIICType.h:76 >> + inline bool isASCIIControl(unsigned c) { return c < ' ' || c == 0x7F; } > > This feels like a lot of copy/paste, but I guess that's how this file rolls.
The one call site using this function isn’t getting a lot of benefit from it, because it has a lot of other special cases. I don’t think adding this is a good idea. Also, this implementation assumes that char is signed. On platforms where char is unsigned we might get a warning, and a warning that may be treated as an error, that >= 0 is always true.
> Source/WebCore/platform/network/ContentTypeParser.cpp:49 > + return !isASCIIControl(c) && c != ' ' && c != '"' && c != '(' && c != ')' && c != ',' && c != '/' && (c < ':' || c > '@') && (c < '[' || c > ']');
With so many specific rules here already, I think that the isASCIIControl logic should be just inlined here. What does this function do for 0x80-0xFF. I can’t tell! I suggest not abbreviated character as Char in this function.
> Source/WebCore/platform/network/ContentTypeParser.h:42 > +// FIXME: add support for comments. > +class ContentTypeParser { > +public:
Is this class used at all yet? Where are the test cases? I don’t understand why m_inputIndex is a data member. It seem to be used only during the parsing process.
Adam Barth
Comment 9
2011-05-11 13:20:30 PDT
> Is this class used at all yet? Where are the test cases?
I asked Jay to split this part of the patch out from
Bug 7168
. The test cases are in that bug (and I'm going to ask him to add more when that patch is ready for review).
Jay Civelli
Comment 10
2011-05-11 14:09:50 PDT
Comment on
attachment 93151
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=93151&action=review
>>> Source/JavaScriptCore/wtf/ASCIICType.h:76 >>> + inline bool isASCIIControl(unsigned c) { return c < ' ' || c == 0x7F; } >> >> This feels like a lot of copy/paste, but I guess that's how this file rolls. > > The one call site using this function isn’t getting a lot of benefit from it, because it has a lot of other special cases. I don’t think adding this is a good idea. > > Also, this implementation assumes that char is signed. On platforms where char is unsigned we might get a warning, and a warning that may be treated as an error, that >= 0 is always true.
OK, I removed it and added it to the caller.
>> Source/WebCore/platform/network/ContentTypeParser.cpp:49 >> + return !isASCIIControl(c) && c != ' ' && c != '"' && c != '(' && c != ')' && c != ',' && c != '/' && (c < ':' || c > '@') && (c < '[' || c > ']'); > > With so many specific rules here already, I think that the isASCIIControl logic should be just inlined here. > > What does this function do for 0x80-0xFF. I can’t tell! > > I suggest not abbreviated character as Char in this function.
Inlined isASCIIControl, now checking the token is ASCII, renamed without abbreviation.
>> Source/WebCore/platform/network/ContentTypeParser.cpp:95 >> + , m_inputIndex(0) > > That's the point of this member?
Ah! Unused, removed.
>> Source/WebCore/platform/network/ContentTypeParser.cpp:123 >> + size_t index = strlen(contentTypeParameterName); > > It feels like contentTypeParameterName should be a DECLARE_STATIC_LOCAL(String, contentTypeParameterName, ("Content-Type")) in this function. Then you could just use contentTypeParameterName.length().
Cool, I did not know about that macro. Now using it.
>> Source/WebCore/platform/network/ContentTypeParser.cpp:146 >> + // Should we tolerate spaces here? > > What does the RFC say?
It says there should be no space. I was wondering wether we should be lenient or not.
>> Source/WebCore/platform/network/ContentTypeParser.cpp:170 >> + m_parameters.add(key, value); > > What happens if there are more than one parameter with the same name?
Changed the code to keep the last one, and added a comment in the header to document that behavior.
>> Source/WebCore/platform/network/ContentTypeParser.h:42 >> +public: > > Is this class used at all yet? Where are the test cases? > > I don’t understand why m_inputIndex is a data member. It seem to be used only during the parsing process.
m_inputIndex was unused and has been removed. This is going to be used by
https://bugs.webkit.org/show_bug.cgi?id=7168
where we need to parse multipart Content-Type, such as: Content-Type: multipart/related; boundary="boundary-example"; type="text/html" Also I have a bunch of unit-tests for this code that will make it to TestWebKitAPI at some point.
Jay Civelli
Comment 11
2011-05-11 14:10:16 PDT
Created
attachment 93182
[details]
Patch
Adam Barth
Comment 12
2011-05-11 14:40:38 PDT
Comment on
attachment 93182
[details]
Patch ok
WebKit Commit Bot
Comment 13
2011-05-11 17:17:36 PDT
Comment on
attachment 93182
[details]
Patch Clearing flags on attachment: 93182 Committed
r86289
: <
http://trac.webkit.org/changeset/86289
>
WebKit Commit Bot
Comment 14
2011-05-11 17:17:43 PDT
All reviewed patches have been landed. Closing bug.
Jay Civelli
Comment 15
2011-05-11 17:25:28 PDT
Created
attachment 93218
[details]
ContentTypeParser unit-tests These are unit-tests used when writing ContentTypeParser. They'll be integrated to TestWebKitAPI once dslomov is done with transtionning it to GTest.
WebKit Review Bot
Comment 16
2011-05-11 19:41:44 PDT
http://trac.webkit.org/changeset/86289
might have broken GTK Linux 32-bit Release
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug