Bug 213826 - MIMETypeRegistry::getExtensionsForMIMEType() needs to handle wildcard MIME types
Summary: MIMETypeRegistry::getExtensionsForMIMEType() needs to handle wildcard MIME types
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Images (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Said Abou-Hallawa
URL:
Keywords: InRadar
Depends on:
Blocks: 213347
  Show dependency treegraph
 
Reported: 2020-06-30 21:42 PDT by Said Abou-Hallawa
Modified: 2020-07-02 20:20 PDT (History)
4 users (show)

See Also:


Attachments
Patch (12.05 KB, patch)
2020-06-30 21:53 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (11.30 KB, patch)
2020-07-01 19:37 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (2.19 KB, patch)
2020-07-02 17:03 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (2.17 KB, patch)
2020-07-02 19:30 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Said Abou-Hallawa 2020-06-30 21:42:31 PDT
For example for MIMEType = "image/*", MIMETypeRegistry::getExtensionsForMIMEType() should return all the extensions of all image files.
Comment 1 Said Abou-Hallawa 2020-06-30 21:53:59 PDT
Created attachment 403263 [details]
Patch
Comment 2 Darin Adler 2020-07-01 10:38:12 PDT
Comment on attachment 403263 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=403263&action=review

Saying r=me, but really concerned that this is all untested.

> Source/WebCore/ChangeLog:3
> +        MIMETypeRegistry::getExtensionsForMIMEType() needs to handle wildcard MIME types

Where are the tests?

I think this requires a lot more tests. I am simply guessing that maybe this code works.

> Source/WebCore/platform/MIMETypeRegistry.h:60
> +    WEBCORE_EXPORT static Vector<String> getExtensionsForMIMEType(const String& type);

All the "get" in the function names in this class annoy me since that’s not WebKit coding style, but not new.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:39
> +        HashMap<String, HashSet<String>> extensionsForMIMETypeMap;

Seems like we could use a shorter name for this, in the context here.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:41
> +        auto addExtensionForMIMEType = [&](const String& type, const String& extension) {

Seems like we could use a shorter name for this, maybe just "add".

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:44
> +            auto addResult = extensionsForMIMETypeMap.ensure(type, []() {
> +                return HashSet<String>();
> +            });

Doesn’t seem like we need to use ensure here. I think this is the same as calling add({ }) and not necessarily more efficient. Can be written as this one-liner, I think.

    extensionsForMIMETypeMap.add(type, { }).iterator->value.add(extension);

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:48
> +        auto addExtensionsForMIMEType = [&](const String& type, const CFArrayRef& extensions) {

Why "const CArrayRef&"? Just CFArrayRef.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:49
> +            auto wildcardMIMEType = makeString(type.split('/').first(), "/*"_s);

String::split().first() is an especially inefficient implementation of "get the part before the first slash". We can be much more efficient by using StringView::split or StringView::find/StringView::left.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:51
> +            for (CFIndex i = 0, count = CFArrayGetCount(extensions); i < count; i++) {

Seems like an Objective-C for loop would be nicer and likely also more efficient. We keep having to cast back and forth between CF and NS if we use that, but we already have to cast to convert CFTypeRef to CFStringRef, so it’s not so bad.

And now I see, looking at the old code, that we used to do that. I think it would be better.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:53
> +                CFStringRef extension = reinterpret_cast<CFStringRef>(CFArrayGetValueAtIndex(extensions, i));

Why reinterpret_cast here and static_cast below?

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:59
> +                // Add extension to itsmimeType, for example add "png" to "image/png"

Typo: "itsmimeType".

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:64
> +        RetainPtr<CFArrayRef> allUTIs = adoptCF(_UTCopyDeclaredTypeIdentifiers());

auto

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:66
> +        for (CFIndex i = 0, count = CFArrayGetCount(allUTIs.get()); i < count; ++i) {

Same thought about Objective-C for loop.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:87
> +    auto iter = extensionsForMIMETypeMap().find(type);

Why "iter"? Lets use a word instead of an abbreviation.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:89
> +        extensions.appendRange(iter->value.begin(), iter->value.end());

Isn’t there a more idiomatic way to make a Vector from a HashSet?

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:101
> +    if (type.endsWith('*'))

This seems strangely broad. Why is it right to check only "ends with *"? Is it really valid to return this result for, say, "text/*"? Seems like a bad idea to me.
Comment 3 Said Abou-Hallawa 2020-07-01 19:33:25 PDT
Comment on attachment 403263 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=403263&action=review

>> Source/WebCore/ChangeLog:3
>> +        MIMETypeRegistry::getExtensionsForMIMEType() needs to handle wildcard MIME types
> 
> Where are the tests?
> 
> I think this requires a lot more tests. I am simply guessing that maybe this code works.

This work is towards webkit.org/b/213347. I will include tests there.

>> Source/WebCore/platform/MIMETypeRegistry.h:60
>> +    WEBCORE_EXPORT static Vector<String> getExtensionsForMIMEType(const String& type);
> 
> All the "get" in the function names in this class annoy me since that’s not WebKit coding style, but not new.

I will change all names in a separate patch.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:39
>> +        HashMap<String, HashSet<String>> extensionsForMIMETypeMap;
> 
> Seems like we could use a shorter name for this, in the context here.

I renamed it to 'map'

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:41
>> +        auto addExtensionForMIMEType = [&](const String& type, const String& extension) {
> 
> Seems like we could use a shorter name for this, maybe just "add".

I renamed it to 'addExtension'.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:44
>> +            });
> 
> Doesn’t seem like we need to use ensure here. I think this is the same as calling add({ }) and not necessarily more efficient. Can be written as this one-liner, I think.
> 
>     extensionsForMIMETypeMap.add(type, { }).iterator->value.add(extension);

I changed it as suggested. The only reason for using ensure was I wanted to save constructing the HashSet.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:48
>> +        auto addExtensionsForMIMEType = [&](const String& type, const CFArrayRef& extensions) {
> 
> Why "const CArrayRef&"? Just CFArrayRef.

This was changed to NSArray<NSString *> *.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:49
>> +            auto wildcardMIMEType = makeString(type.split('/').first(), "/*"_s);
> 
> String::split().first() is an especially inefficient implementation of "get the part before the first slash". We can be much more efficient by using StringView::split or StringView::find/StringView::left.

I changed this to be

    size_t pos = type.reverseFind('/');
    auto wildcardMIMEType = makeString(type.left(pos), "/*"_s);

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:51
>> +            for (CFIndex i = 0, count = CFArrayGetCount(extensions); i < count; i++) {
> 
> Seems like an Objective-C for loop would be nicer and likely also more efficient. We keep having to cast back and forth between CF and NS if we use that, but we already have to cast to convert CFTypeRef to CFStringRef, so it’s not so bad.
> 
> And now I see, looking at the old code, that we used to do that. I think it would be better.

Done.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:53
>> +                CFStringRef extension = reinterpret_cast<CFStringRef>(CFArrayGetValueAtIndex(extensions, i));
> 
> Why reinterpret_cast here and static_cast below?

Ater converting 'extensions' and 'allUTIs' to be of type NSArray<NSString *> *, no casting is needed.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:59
>> +                // Add extension to itsmimeType, for example add "png" to "image/png"
> 
> Typo: "itsmimeType".

Fixed.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:64
>> +        RetainPtr<CFArrayRef> allUTIs = adoptCF(_UTCopyDeclaredTypeIdentifiers());
> 
> auto

Fixed.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:66
>> +        for (CFIndex i = 0, count = CFArrayGetCount(allUTIs.get()); i < count; ++i) {
> 
> Same thought about Objective-C for loop.

Fixed.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:87
>> +    auto iter = extensionsForMIMETypeMap().find(type);
> 
> Why "iter"? Lets use a word instead of an abbreviation.

Fixed.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:89
>> +        extensions.appendRange(iter->value.begin(), iter->value.end());
> 
> Isn’t there a more idiomatic way to make a Vector from a HashSet?

I am not aware of one.

>> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:101
>> +    if (type.endsWith('*'))
> 
> This seems strangely broad. Why is it right to check only "ends with *"? Is it really valid to return this result for, say, "text/*"? Seems like a bad idea to me.

I do not see why it is bad. And why should we treat text/* different from image/*? According to https://tools.ietf.org/html/rfc7231#section-5.3.2: "type/*" indicating all subtypes of that type. This would translate "text/*" to { text/css, text/csv, text/html, text/php, text/plain, text/xml }. So "text/*" is just a short writing for this list.

But actually I wanted to delete this function and replace it with extensionsForWildcardMIMEType() since the later can handle all MIME types. But I was not sure whether NSURLFileTypeMappings.extensionsForMIMEType() would return the same list of extensions as extensionsForWildcardMIMEType() would return for a MIMEType.
Comment 4 Said Abou-Hallawa 2020-07-01 19:37:24 PDT
Created attachment 403338 [details]
Patch
Comment 5 EWS 2020-07-01 22:02:22 PDT
Committed r263832: <https://trac.webkit.org/changeset/263832>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 403338 [details].
Comment 6 Radar WebKit Bug Importer 2020-07-01 22:03:14 PDT
<rdar://problem/65018418>
Comment 7 Darin Adler 2020-07-02 09:12:54 PDT
Comment on attachment 403338 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=403338&action=review

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:49
> +            auto wildcardMIMEType = makeString(type.left(pos), "/*"_s);

This creates/destroys a String for the result of String::left. If we used StringView::left instead, then we would not do that additional unneeded memory allocation.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:53
> +                if (!extension)
> +                    continue;

An NSArray can’t contain a nil, so this is dead code and is not needed.

> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:62
> +        auto allUTIs = adoptNS((__bridge NSArray<NSString *> *)_UTCopyDeclaredTypeIdentifiers());

This will work for now but is poor coding style and won’t work correctly if we switch to ARC. It’s not correct to adopt a CF object using adoptNS. Instead we need to do adoptCF, and do the cast where allUTIs is used.
Comment 8 Said Abou-Hallawa 2020-07-02 17:03:17 PDT
Reopening to attach new patch.
Comment 9 Said Abou-Hallawa 2020-07-02 17:03:18 PDT
Created attachment 403418 [details]
Patch
Comment 10 Said Abou-Hallawa 2020-07-02 18:03:02 PDT
Comment on attachment 403263 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=403263&action=review

>>> Source/WebCore/platform/MIMETypeRegistry.h:60
>>> +    WEBCORE_EXPORT static Vector<String> getExtensionsForMIMEType(const String& type);
>> 
>> All the "get" in the function names in this class annoy me since that’s not WebKit coding style, but not new.
> 
> I will change all names in a separate patch.

I filed bug 213912 to address this issue.
Comment 11 EWS 2020-07-02 19:16:23 PDT
/Volumes/Data/worker/Commit-Queue/build/Source/WebCore/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive).
Comment 12 Said Abou-Hallawa 2020-07-02 19:30:31 PDT
Created attachment 403432 [details]
Patch
Comment 13 EWS 2020-07-02 20:19:59 PDT
Committed r263880: <https://trac.webkit.org/changeset/263880>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 403432 [details].