Bug 100927

Summary: Improve ContentTypeParser, so that it could be used to validate mime type according to RFC
Product: WebKit Reporter: Alexander Shalamov <alexander.shalamov>
Component: XMLAssignee: Alexander Shalamov <alexander.shalamov>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, dglazkov, gyuyoung.kim, rakuco, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 99983    
Attachments:
Description Flags
Patch 1
ap: review-, ap: commit-queue-
Patch 2
none
Patch 3
webkit.review.bot: commit-queue-
Patch 4
webkit.review.bot: commit-queue-
Patch 5
webkit.review.bot: commit-queue-
Patch 6
ap: review+, ap: commit-queue-
Patch 7
none
Patch 8 none

Description Alexander Shalamov 2012-11-01 01:32:06 PDT
At the moment ContentTypeParser contains validation and tokenization logic.
HTTPValidation::isValidHTTPHeader provides basic validation, but that is not enough to validate Content-Type value for XHR request.

It would be beneficial to make ContentTypeParser more flexible and add e.g. ContentTypeValidator that would only validate mime types.
Comment 1 Alexander Shalamov 2012-11-01 01:44:22 PDT
Created attachment 171796 [details]
Patch 1

- Added ContentTypeValidator. Content type string could be validated using ContentTypeValidator<>(contentTypeString).isValid()
- Removed StringBuilder usage that was inefficient.
- Removed "Content-Type" prefix that is not required to validate and parse mime type value.
Comment 2 Alexey Proskuryakov 2012-11-01 10:16:07 PDT
Comment on attachment 171796 [details]
Patch 1

View in context: https://bugs.webkit.org/attachment.cgi?id=171796&action=review

This interface seems overly complicated. The task is to tell if a string is valid, so the client should just call a function. Figuring out how to combine the templates into something usable is an unnecessary complication in this case.

If templates are desired for implementation, that could be an implementation detail, no need to expose that at call sites.

> Source/WebCore/platform/network/ContentTypeParser.cpp:186
> +    const unsigned contentTypeLength = contentType.length();

As a matter of coding style, we don't use const with local variables.

> Source/WebCore/platform/network/ContentTypeParser.h:82
> -    KeyValuePairs m_parameters;
> -    String m_mimeType;
> +    String m_contentType;
> +    mutable KeyValuePairs m_parameters;
> +    mutable String m_mimeType;

If "ContentTypeParser" encapsulates parsed results, then it's not a parser, but a "ParsedContentType".
Comment 3 Alexander Shalamov 2012-11-02 04:40:54 PDT
Created attachment 172034 [details]
Patch 2

- Renamed ContentTypeParser to ParsedContentType.
- Moved implementation details to cpp.
- const unsigned contentTypeLength ... => unsigned contentTypeLength ...
- Added isValidContentType function that could be used to validate content type strings.
Comment 4 Alexander Shalamov 2012-11-02 04:51:33 PDT
Created attachment 172037 [details]
Patch 3

- rebased to master
Comment 5 WebKit Review Bot 2012-11-02 11:27:14 PDT
Comment on attachment 172037 [details]
Patch 3

Attachment 172037 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/14687918

New failing tests:
mhtml/shared_buffer_bug.mht
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_binary.mht
mhtml/check_domain.mht
mhtml/page_with_image_unmht.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
Comment 6 Alexander Shalamov 2012-11-05 00:59:29 PST
Created attachment 172289 [details]
Patch 4

- Check for \r \n characters inside isValidContentType(const String&) function.
Comment 7 WebKit Review Bot 2012-11-05 01:52:56 PST
Comment on attachment 172289 [details]
Patch 4

Attachment 172289 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/14722695

New failing tests:
mhtml/shared_buffer_bug.mht
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_binary.mht
mhtml/check_domain.mht
mhtml/page_with_image_unmht.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
Comment 8 Alexander Shalamov 2012-11-05 05:24:34 PST
Created attachment 172318 [details]
Patch 5

- fixed typo lenght => length
- fixed token length calculation
Comment 9 WebKit Review Bot 2012-11-05 07:41:03 PST
Comment on attachment 172318 [details]
Patch 5

Attachment 172318 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/14748040

New failing tests:
mhtml/shared_buffer_bug.mht
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_binary.mht
mhtml/check_domain.mht
mhtml/page_with_image_unmht.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
Comment 10 Alexander Shalamov 2012-11-05 10:59:11 PST
Created attachment 172365 [details]
Patch 6

- exclude quotes from parsed value
Comment 11 Alexey Proskuryakov 2012-11-16 10:24:25 PST
Comment on attachment 172365 [details]
Patch 6

View in context: https://bugs.webkit.org/attachment.cgi?id=172365&action=review

Looks good to me. A few nits below.

> Source/WebCore/platform/network/ParsedContentType.cpp:2
> + * Copyright (C) 2011 Google Inc. All rights reserved.

Is it intentional that you did not add your copyright, or update the date if you work for Google?

> Source/WebCore/platform/network/ParsedContentType.cpp:39
> +class EmptyParsedContentType {

We normally use "dummy" for unused variables. Perhaps it would be more understandable here, too.

> Source/WebCore/platform/network/ParsedContentType.cpp:59
> +    const unsigned inputLength = input.length();
> +    const unsigned index = startIndex;

As a matter of coding style, we don't use const with local variables.

> Source/WebCore/platform/network/ParsedContentType.cpp:77
> +    const unsigned inputLength = input.length();
> +    const unsigned index = startIndex + 1;

Ditto.

"index" is not a good name for a variable other than a loop one.

> Source/WebCore/platform/network/ParsedContentType.cpp:153
> +bool parseContentType(const String& contentType, T& parserType)

parserTyoe does not seem to be accurate. I think that this should be "ReceiverType& receiver".

> Source/WebCore/platform/network/ParsedContentType.h:36
> +#include <wtf/text/WTFString.h>

This include should not be needed.
Comment 12 Alexander Shalamov 2012-11-19 02:25:44 PST
Created attachment 174924 [details]
Patch 7

- fixes according to review comments
- rebased to master
Comment 13 Alexander Shalamov 2012-11-19 04:27:23 PST
Created attachment 174950 [details]
Patch 8

- fixed typo
Comment 14 WebKit Review Bot 2012-11-19 10:58:18 PST
Comment on attachment 174950 [details]
Patch 8

Clearing flags on attachment: 174950

Committed r135176: <http://trac.webkit.org/changeset/135176>
Comment 15 WebKit Review Bot 2012-11-19 10:58:24 PST
All reviewed patches have been landed.  Closing bug.