RESOLVED FIXED 213826
MIMETypeRegistry::getExtensionsForMIMEType() needs to handle wildcard MIME types
https://bugs.webkit.org/show_bug.cgi?id=213826
Summary MIMETypeRegistry::getExtensionsForMIMEType() needs to handle wildcard MIME types
Said Abou-Hallawa
Reported 2020-06-30 21:42:31 PDT
For example for MIMEType = "image/*", MIMETypeRegistry::getExtensionsForMIMEType() should return all the extensions of all image files.
Attachments
Patch (12.05 KB, patch)
2020-06-30 21:53 PDT, Said Abou-Hallawa
no flags
Patch (11.30 KB, patch)
2020-07-01 19:37 PDT, Said Abou-Hallawa
no flags
Patch (2.19 KB, patch)
2020-07-02 17:03 PDT, Said Abou-Hallawa
no flags
Patch (2.17 KB, patch)
2020-07-02 19:30 PDT, Said Abou-Hallawa
no flags
Said Abou-Hallawa
Comment 1 2020-06-30 21:53:59 PDT
Darin Adler
Comment 2 2020-07-01 10:38:12 PDT
Comment on attachment 403263 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=403263&action=review Saying r=me, but really concerned that this is all untested. > Source/WebCore/ChangeLog:3 > + MIMETypeRegistry::getExtensionsForMIMEType() needs to handle wildcard MIME types Where are the tests? I think this requires a lot more tests. I am simply guessing that maybe this code works. > Source/WebCore/platform/MIMETypeRegistry.h:60 > + WEBCORE_EXPORT static Vector<String> getExtensionsForMIMEType(const String& type); All the "get" in the function names in this class annoy me since that’s not WebKit coding style, but not new. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:39 > + HashMap<String, HashSet<String>> extensionsForMIMETypeMap; Seems like we could use a shorter name for this, in the context here. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:41 > + auto addExtensionForMIMEType = [&](const String& type, const String& extension) { Seems like we could use a shorter name for this, maybe just "add". > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:44 > + auto addResult = extensionsForMIMETypeMap.ensure(type, []() { > + return HashSet<String>(); > + }); Doesn’t seem like we need to use ensure here. I think this is the same as calling add({ }) and not necessarily more efficient. Can be written as this one-liner, I think. extensionsForMIMETypeMap.add(type, { }).iterator->value.add(extension); > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:48 > + auto addExtensionsForMIMEType = [&](const String& type, const CFArrayRef& extensions) { Why "const CArrayRef&"? Just CFArrayRef. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:49 > + auto wildcardMIMEType = makeString(type.split('/').first(), "/*"_s); String::split().first() is an especially inefficient implementation of "get the part before the first slash". We can be much more efficient by using StringView::split or StringView::find/StringView::left. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:51 > + for (CFIndex i = 0, count = CFArrayGetCount(extensions); i < count; i++) { Seems like an Objective-C for loop would be nicer and likely also more efficient. We keep having to cast back and forth between CF and NS if we use that, but we already have to cast to convert CFTypeRef to CFStringRef, so it’s not so bad. And now I see, looking at the old code, that we used to do that. I think it would be better. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:53 > + CFStringRef extension = reinterpret_cast<CFStringRef>(CFArrayGetValueAtIndex(extensions, i)); Why reinterpret_cast here and static_cast below? > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:59 > + // Add extension to itsmimeType, for example add "png" to "image/png" Typo: "itsmimeType". > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:64 > + RetainPtr<CFArrayRef> allUTIs = adoptCF(_UTCopyDeclaredTypeIdentifiers()); auto > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:66 > + for (CFIndex i = 0, count = CFArrayGetCount(allUTIs.get()); i < count; ++i) { Same thought about Objective-C for loop. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:87 > + auto iter = extensionsForMIMETypeMap().find(type); Why "iter"? Lets use a word instead of an abbreviation. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:89 > + extensions.appendRange(iter->value.begin(), iter->value.end()); Isn’t there a more idiomatic way to make a Vector from a HashSet? > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:101 > + if (type.endsWith('*')) This seems strangely broad. Why is it right to check only "ends with *"? Is it really valid to return this result for, say, "text/*"? Seems like a bad idea to me.
Said Abou-Hallawa
Comment 3 2020-07-01 19:33:25 PDT
Comment on attachment 403263 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=403263&action=review >> Source/WebCore/ChangeLog:3 >> + MIMETypeRegistry::getExtensionsForMIMEType() needs to handle wildcard MIME types > > Where are the tests? > > I think this requires a lot more tests. I am simply guessing that maybe this code works. This work is towards webkit.org/b/213347. I will include tests there. >> Source/WebCore/platform/MIMETypeRegistry.h:60 >> + WEBCORE_EXPORT static Vector<String> getExtensionsForMIMEType(const String& type); > > All the "get" in the function names in this class annoy me since that’s not WebKit coding style, but not new. I will change all names in a separate patch. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:39 >> + HashMap<String, HashSet<String>> extensionsForMIMETypeMap; > > Seems like we could use a shorter name for this, in the context here. I renamed it to 'map' >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:41 >> + auto addExtensionForMIMEType = [&](const String& type, const String& extension) { > > Seems like we could use a shorter name for this, maybe just "add". I renamed it to 'addExtension'. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:44 >> + }); > > Doesn’t seem like we need to use ensure here. I think this is the same as calling add({ }) and not necessarily more efficient. Can be written as this one-liner, I think. > > extensionsForMIMETypeMap.add(type, { }).iterator->value.add(extension); I changed it as suggested. The only reason for using ensure was I wanted to save constructing the HashSet. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:48 >> + auto addExtensionsForMIMEType = [&](const String& type, const CFArrayRef& extensions) { > > Why "const CArrayRef&"? Just CFArrayRef. This was changed to NSArray<NSString *> *. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:49 >> + auto wildcardMIMEType = makeString(type.split('/').first(), "/*"_s); > > String::split().first() is an especially inefficient implementation of "get the part before the first slash". We can be much more efficient by using StringView::split or StringView::find/StringView::left. I changed this to be size_t pos = type.reverseFind('/'); auto wildcardMIMEType = makeString(type.left(pos), "/*"_s); >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:51 >> + for (CFIndex i = 0, count = CFArrayGetCount(extensions); i < count; i++) { > > Seems like an Objective-C for loop would be nicer and likely also more efficient. We keep having to cast back and forth between CF and NS if we use that, but we already have to cast to convert CFTypeRef to CFStringRef, so it’s not so bad. > > And now I see, looking at the old code, that we used to do that. I think it would be better. Done. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:53 >> + CFStringRef extension = reinterpret_cast<CFStringRef>(CFArrayGetValueAtIndex(extensions, i)); > > Why reinterpret_cast here and static_cast below? Ater converting 'extensions' and 'allUTIs' to be of type NSArray<NSString *> *, no casting is needed. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:59 >> + // Add extension to itsmimeType, for example add "png" to "image/png" > > Typo: "itsmimeType". Fixed. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:64 >> + RetainPtr<CFArrayRef> allUTIs = adoptCF(_UTCopyDeclaredTypeIdentifiers()); > > auto Fixed. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:66 >> + for (CFIndex i = 0, count = CFArrayGetCount(allUTIs.get()); i < count; ++i) { > > Same thought about Objective-C for loop. Fixed. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:87 >> + auto iter = extensionsForMIMETypeMap().find(type); > > Why "iter"? Lets use a word instead of an abbreviation. Fixed. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:89 >> + extensions.appendRange(iter->value.begin(), iter->value.end()); > > Isn’t there a more idiomatic way to make a Vector from a HashSet? I am not aware of one. >> Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:101 >> + if (type.endsWith('*')) > > This seems strangely broad. Why is it right to check only "ends with *"? Is it really valid to return this result for, say, "text/*"? Seems like a bad idea to me. I do not see why it is bad. And why should we treat text/* different from image/*? According to https://tools.ietf.org/html/rfc7231#section-5.3.2: "type/*" indicating all subtypes of that type. This would translate "text/*" to { text/css, text/csv, text/html, text/php, text/plain, text/xml }. So "text/*" is just a short writing for this list. But actually I wanted to delete this function and replace it with extensionsForWildcardMIMEType() since the later can handle all MIME types. But I was not sure whether NSURLFileTypeMappings.extensionsForMIMEType() would return the same list of extensions as extensionsForWildcardMIMEType() would return for a MIMEType.
Said Abou-Hallawa
Comment 4 2020-07-01 19:37:24 PDT
EWS
Comment 5 2020-07-01 22:02:22 PDT
Committed r263832: <https://trac.webkit.org/changeset/263832> All reviewed patches have been landed. Closing bug and clearing flags on attachment 403338 [details].
Radar WebKit Bug Importer
Comment 6 2020-07-01 22:03:14 PDT
Darin Adler
Comment 7 2020-07-02 09:12:54 PDT
Comment on attachment 403338 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=403338&action=review > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:49 > + auto wildcardMIMEType = makeString(type.left(pos), "/*"_s); This creates/destroys a String for the result of String::left. If we used StringView::left instead, then we would not do that additional unneeded memory allocation. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:53 > + if (!extension) > + continue; An NSArray can’t contain a nil, so this is dead code and is not needed. > Source/WebCore/platform/cocoa/MIMETypeRegistryCocoa.mm:62 > + auto allUTIs = adoptNS((__bridge NSArray<NSString *> *)_UTCopyDeclaredTypeIdentifiers()); This will work for now but is poor coding style and won’t work correctly if we switch to ARC. It’s not correct to adopt a CF object using adoptNS. Instead we need to do adoptCF, and do the cast where allUTIs is used.
Said Abou-Hallawa
Comment 8 2020-07-02 17:03:17 PDT
Reopening to attach new patch.
Said Abou-Hallawa
Comment 9 2020-07-02 17:03:18 PDT
Said Abou-Hallawa
Comment 10 2020-07-02 18:03:02 PDT
Comment on attachment 403263 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=403263&action=review >>> Source/WebCore/platform/MIMETypeRegistry.h:60 >>> + WEBCORE_EXPORT static Vector<String> getExtensionsForMIMEType(const String& type); >> >> All the "get" in the function names in this class annoy me since that’s not WebKit coding style, but not new. > > I will change all names in a separate patch. I filed bug 213912 to address this issue.
EWS
Comment 11 2020-07-02 19:16:23 PDT
/Volumes/Data/worker/Commit-Queue/build/Source/WebCore/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive).
Said Abou-Hallawa
Comment 12 2020-07-02 19:30:31 PDT
EWS
Comment 13 2020-07-02 20:19:59 PDT
Committed r263880: <https://trac.webkit.org/changeset/263880> All reviewed patches have been landed. Closing bug and clearing flags on attachment 403432 [details].
Note You need to log in before you can comment on or make changes to this bug.