Bug 193909

Summary: Implement serializing in MIME type parser
Product: WebKit Reporter: Rob Buis <rbuis>
Component: DOMAssignee: Rob Buis <rwlbuis>
Status: RESOLVED FIXED    
Severity: Normal CC: cdumez, commit-queue, darin, ews-watchlist, rniwa, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 180526    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
ews-watchlist: commit-queue-
Archive of layout-test-results from ews101 for mac-highsierra none

Description Rob Buis 2019-01-28 08:09:36 PST
Implement:
https://mimesniff.spec.whatwg.org/#serialize-a-mime-type
Comment 1 Rob Buis 2019-02-07 12:49:08 PST
Created attachment 361430 [details]
Patch
Comment 2 Darin Adler 2019-02-09 12:25:47 PST
Comment on attachment 361430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361430&action=review

Looks great. Could probably pile on a few more test cases.

> Source/WebCore/platform/network/ParsedContentType.cpp:332
> -        m_mimeType.convertToASCIILowercase();
> +        m_mimeType = m_mimeType.convertToASCIILowercase();

Wow, this seems like a major bug. The old code didn’t do anything at all. I guess your new regression tests cover this.

> Source/WebCore/platform/network/ParsedContentType.cpp:351
> -        keyName.convertToASCIILowercase();
> +        name = name.convertToASCIILowercase();

Again!

> Source/WebCore/platform/network/ParsedContentType.cpp:361
> +    for (auto paramName : m_parameterNames) {

Using auto here causes unnecessary reference count churn. You could use auto& instead and there would not be any.

Goes against WebKit coding style to call this "paramName". Writing out the whole word "parameter" doesn’t make it that much longer. But also in the context of this sort function, I think just "name" would be even better than "parameterName" (or "paramName").

> Source/WebCore/platform/network/ParsedContentType.cpp:366
> +        if (value.isEmpty() || value.find(isNonTokenCharacter) != notFound) {

Unfortunate that we don’t have a "contains" overload, because find(x) != notFound seems a little harder to read.

> Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:105
> +    return String();

Might be nice to return something like "ERROR" here so it’s easier to see the difference between a parse failure, empty string, and null string in test cases below. Might want to do similar for null vs. empty string?

> Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:110
> +    EXPECT_EQ(serializeIfValid("text/plain"), "text/plain");

When tests are about strings, I find it better to use EXPECT_STREQ so that the errors are easier to understand. That requires a strategy for extending the lifetime of a const char*; some of our tests use global buffers since there is no need for the tests to work multi-threaded, others call utf8().data(), others even use std::string and c_str(). You can see examples of multiple approaches in various test files such as TextCodec.cpp, HashMap.cpp and many others.

> Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:120
> +    EXPECT_EQ(serializeIfValid("/plain"), String());

I typically always include test cases like empty string and null string, even if they don't make any sense, just to make sure the function doesn’t cash in those cases. Similarly I’d suggest having test that is just ";".

> Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:123
> +    EXPECT_EQ(serializeIfValid("text/plain;test"), "text/plain");

I don’t see any test cases covering uppercase in parameter names. I’d like to see failure case tests.
Comment 3 Darin Adler 2019-02-09 12:26:17 PST
Comment on attachment 361430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361430&action=review

>> Source/WebCore/platform/network/ParsedContentType.cpp:361
>> +    for (auto paramName : m_parameterNames) {
> 
> Using auto here causes unnecessary reference count churn. You could use auto& instead and there would not be any.
> 
> Goes against WebKit coding style to call this "paramName". Writing out the whole word "parameter" doesn’t make it that much longer. But also in the context of this sort function, I think just "name" would be even better than "parameterName" (or "paramName").

short function
Comment 4 Darin Adler 2019-02-09 12:27:18 PST
Comment on attachment 361430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361430&action=review

>> Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:120
>> +    EXPECT_EQ(serializeIfValid("/plain"), String());
> 
> I typically always include test cases like empty string and null string, even if they don't make any sense, just to make sure the function doesn’t cash in those cases. Similarly I’d suggest having test that is just ";".

More worried about functions that "crash" than ones that "cash" ;)
Comment 5 Darin Adler 2019-02-09 12:28:05 PST
Comment on attachment 361430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361430&action=review

>> Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:123
>> +    EXPECT_EQ(serializeIfValid("text/plain;test"), "text/plain");
> 
> I don’t see any test cases covering uppercase in parameter names. I’d like to see failure case tests.

Other test cases to add: non-ASCII characters, null characters, other control characters, leading spaces, not just trailing spaces.
Comment 6 Darin Adler 2019-02-09 12:28:31 PST
Comment on attachment 361430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361430&action=review

>>> Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:123
>>> +    EXPECT_EQ(serializeIfValid("text/plain;test"), "text/plain");
>> 
>> I don’t see any test cases covering uppercase in parameter names. I’d like to see failure case tests.
> 
> Other test cases to add: non-ASCII characters, null characters, other control characters, leading spaces, not just trailing spaces.

Single quotes, unpaired surrogates.
Comment 7 Rob Buis 2019-02-10 05:23:11 PST
Created attachment 361627 [details]
Patch
Comment 8 Rob Buis 2019-02-10 12:50:35 PST
(In reply to Darin Adler from comment #2)
> Comment on attachment 361430 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=361430&action=review
> 
> Looks great. Could probably pile on a few more test cases.
> 
> > Source/WebCore/platform/network/ParsedContentType.cpp:332
> > -        m_mimeType.convertToASCIILowercase();
> > +        m_mimeType = m_mimeType.convertToASCIILowercase();
> 
> Wow, this seems like a major bug. The old code didn’t do anything at all. I
> guess your new regression tests cover this.

Yes, that was unfortunate, I simply did not look closely enough at the String API. No way to catch this when we just used ParsedContentType for validation, but when serializing it became quickly obvious.

> > Source/WebCore/platform/network/ParsedContentType.cpp:361
> > +    for (auto paramName : m_parameterNames) {
> 
> Using auto here causes unnecessary reference count churn. You could use
> auto& instead and there would not be any.

Done.

> Goes against WebKit coding style to call this "paramName". Writing out the
> whole word "parameter" doesn’t make it that much longer. But also in the
> context of this sort function, I think just "name" would be even better than
> "parameterName" (or "paramName").

I like name, also since I use value for the parameter value a bit below.

> > Source/WebCore/platform/network/ParsedContentType.cpp:366
> > +        if (value.isEmpty() || value.find(isNonTokenCharacter) != notFound) {
> 
> Unfortunate that we don’t have a "contains" overload, because find(x) !=
> notFound seems a little harder to read.

Agreed, not pretty. I don't have time for that right now but maybe later if somebody makes a bug for this.

> > Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:105
> > +    return String();
> 
> Might be nice to return something like "ERROR" here so it’s easier to see
> the difference between a parse failure, empty string, and null string in
> test cases below. Might want to do similar for null vs. empty string?

Good idea, I chose NOTVALID.

> > Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:110
> > +    EXPECT_EQ(serializeIfValid("text/plain"), "text/plain");
> 
> When tests are about strings, I find it better to use EXPECT_STREQ so that
> the errors are easier to understand. That requires a strategy for extending
> the lifetime of a const char*; some of our tests use global buffers since
> there is no need for the tests to work multi-threaded, others call
> utf8().data(), others even use std::string and c_str(). You can see examples
> of multiple approaches in various test files such as TextCodec.cpp,
> HashMap.cpp and many others.

I did not know EXPECT_STREQ, done.

> > Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:120
> > +    EXPECT_EQ(serializeIfValid("/plain"), String());
> 
> I typically always include test cases like empty string and null string,
> even if they don't make any sense, just to make sure the function doesn’t
> cash in those cases. Similarly I’d suggest having test that is just ";".

Done.

> > Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:123
> > +    EXPECT_EQ(serializeIfValid("text/plain;test"), "text/plain");
> 
> I don’t see any test cases covering uppercase in parameter names. I’d like
> to see failure case tests.

In general I now tried to add as many tests, taking your suggestions into account.
Comment 9 Darin Adler 2019-02-10 16:05:51 PST
Comment on attachment 361627 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361627&action=review

> Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:104
> +    if (Optional<ParsedContentType> parsedContentType = ParsedContentType::create(input))
> +        return parsedContentType->serialize().utf8().data();

This isn’t right. The return value will be deallocated and the test won‘t work if run under memory test tools like Address Sanitizer.

Need to put the return value somewhere that won’t be deallocated. See patterns in various other tests for the various approaches. I suggest the "have a global char array and copy the result into it inside this function" approach.
Comment 10 Rob Buis 2019-02-11 09:35:00 PST
Created attachment 361686 [details]
Patch
Comment 11 Rob Buis 2019-02-12 01:38:52 PST
Comment on attachment 361686 [details]
Patch

Fixed the deallocation problem by copying TextCoded.cpp approach.
Comment 12 WebKit Commit Bot 2019-02-12 02:04:44 PST
Comment on attachment 361686 [details]
Patch

Clearing flags on attachment: 361686

Committed r241291: <https://trac.webkit.org/changeset/241291>
Comment 13 WebKit Commit Bot 2019-02-12 02:04:45 PST
All reviewed patches have been landed.  Closing bug.
Comment 14 Rob Buis 2019-02-17 12:11:59 PST
Reopening to attach new patch.
Comment 15 Rob Buis 2019-02-17 12:12:02 PST
Created attachment 362241 [details]
Patch
Comment 16 Radar WebKit Bug Importer 2019-02-17 12:12:17 PST
<rdar://problem/48148464>
Comment 17 Radar WebKit Bug Importer 2019-02-17 12:12:18 PST
<rdar://problem/48148465>
Comment 18 EWS Watchlist 2019-02-17 13:13:17 PST
Comment on attachment 362241 [details]
Patch

Attachment 362241 [details] did not pass mac-ews (mac):
Output: https://webkit-queues.webkit.org/results/11183743

New failing tests:
imported/w3c/web-platform-tests/xhr/overridemimetype-blob.html
Comment 19 EWS Watchlist 2019-02-17 13:13:20 PST
Created attachment 362244 [details]
Archive of layout-test-results from ews101 for mac-highsierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-highsierra  Platform: Mac OS X 10.13.6
Comment 20 Rob Buis 2019-02-17 13:22:19 PST
Wrong bug...