Bug 225251

Summary: Use SortedArrayMap in a few more places
Product: WebKit Reporter: Darin Adler <darin>
Component: Web Template FrameworkAssignee: Darin Adler <darin>
Status: RESOLVED FIXED    
Severity: Normal CC: alecflett, beidson, benjamin, calvaris, cdumez, changseok, cmarcelo, dino, drousso, eric.carlson, esprehn+autocc, ews-watchlist, fmalita, fred.wang, glenn, gyuyoung.kim, hta, jamesr, japhet, jer.noble, joepeck, jsbell, kangil.han, kondapallykalyan, luiz, macpherson, menard, mmaxfield, pdr, philipj, sabouhallawa, sam, schenney, sergio, tommyw, tonikitoo, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Attachments:
Description Flags
Patch
none
Patch
none
Patch sam: review+

Description Darin Adler 2021-04-30 14:08:51 PDT
Use SortedArrayMap in a few more places
Comment 1 Darin Adler 2021-04-30 15:49:44 PDT Comment hidden (obsolete)
Comment 2 Darin Adler 2021-04-30 16:00:53 PDT
Created attachment 427465 [details]
Patch
Comment 3 Eric Carlson 2021-04-30 16:40:41 PDT
Comment on attachment 427465 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427465&action=review

> Source/WebCore/platform/graphics/MIMETypeCache.cpp:2
> + * Copyright (C) 2021 Apple Inc. All rights reserved.

Did you mean to change this date or add a "-2021"?
Comment 4 Darin Adler 2021-04-30 16:46:38 PDT
Comment on attachment 427465 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427465&action=review

>> Source/WebCore/platform/graphics/MIMETypeCache.cpp:2
>> + * Copyright (C) 2021 Apple Inc. All rights reserved.
> 
> Did you mean to change this date or add a "-2021"?

Oops, will fix.
Comment 5 Darin Adler 2021-04-30 16:53:30 PDT
Created attachment 427469 [details]
Patch
Comment 6 Sam Weinig 2021-05-01 10:20:27 PDT
Comment on attachment 427469 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427469&action=review

> Source/WebCore/dom/ScriptElement.cpp:112
> +    static constexpr ComparableLettersLiteral languageArray[] = {

This is not new to this change, but I think it is a little harder to understand that this will be a case-folded comparison (at least, I think that is what ComparableLettersLiteral does) as opposed to HashSet<String, ASCIICaseInsensitiveHash> which really sold it.
Comment 7 Sam Weinig 2021-05-01 10:33:31 PDT
Comment on attachment 427469 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427469&action=review

> Source/WebCore/platform/gamepad/cocoa/GameControllerGamepadProvider.mm:64
> +    static constexpr SortedArraySet gameControllerCompatibleGamepadsSet { gameControllerCompatibleGamepadsArray };

I think maybe a switch statement would work just as well here.

uint32_t fullProductID = (((uint32_t)vendorID) << 16) | productID;
switch (fullProductID) {
    case StratusXL1:
    case StratusXL3:
    case Nimbus1:
    case XboxOne1:
    case XboxOne2:
    case XboxOne3:
    case Dualshock4_1:
    case Dualshock4_2:
    case GamesirM2:
    case HoripadUltimate:
    case StratusXL2:
    case StratusXL4:
    case Nimbus2:
        return true;

    default:
        return false;
}

And it would allow the compiler to be clever if it can find any math tricks.
Comment 8 Darin Adler 2021-05-01 12:09:59 PDT
Yes, I will go with the switch statement.

And I agree that ComparableLattersLiteral is not a clear enough name and would like to rename it. Sam, help me choose a new name and I’ll just do a global replace.
Comment 9 Darin Adler 2021-05-01 12:14:06 PDT
Comment on attachment 427469 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427469&action=review

>> Source/WebCore/dom/ScriptElement.cpp:112
>> +    static constexpr ComparableLettersLiteral languageArray[] = {
> 
> This is not new to this change, but I think it is a little harder to understand that this will be a case-folded comparison (at least, I think that is what ComparableLettersLiteral does) as opposed to HashSet<String, ASCIICaseInsensitiveHash> which really sold it.

Let me write this again, because I think this is important:

I suggest we solve this problem by renaming ComparableLettersLiteral. It’s only used in a few places, so it should be an easy quick rename once we choose a better name.

>> Source/WebCore/platform/gamepad/cocoa/GameControllerGamepadProvider.mm:64
>> +    static constexpr SortedArraySet gameControllerCompatibleGamepadsSet { gameControllerCompatibleGamepadsArray };
> 
> I think maybe a switch statement would work just as well here.
> 
> uint32_t fullProductID = (((uint32_t)vendorID) << 16) | productID;
> switch (fullProductID) {
>     case StratusXL1:
>     case StratusXL3:
>     case Nimbus1:
>     case XboxOne1:
>     case XboxOne2:
>     case XboxOne3:
>     case Dualshock4_1:
>     case Dualshock4_2:
>     case GamesirM2:
>     case HoripadUltimate:
>     case StratusXL2:
>     case StratusXL4:
>     case Nimbus2:
>         return true;
> 
>     default:
>         return false;
> }
> 
> And it would allow the compiler to be clever if it can find any math tricks.

I will do that here. Once nice thing about that is that I can also sort these alphabetically again instead of having to sort them numerically.

The irony is that the work to detect the "parse" member and work properly without it was driven by this use case. I wonder if I should rip that out or keep it.
Comment 10 Darin Adler 2021-05-01 20:09:43 PDT
Committed r276880 (237226@main): <https://commits.webkit.org/237226@main>
Comment 11 Radar WebKit Bug Importer 2021-05-01 20:10:16 PDT
<rdar://problem/77425601>
Comment 12 Sam Weinig 2021-05-02 09:23:20 PDT
(In reply to Darin Adler from comment #8)
> And I agree that ComparableLattersLiteral is not a clear enough name and
> would like to rename it. Sam, help me choose a new name and I’ll just do a
> global replace.

So the three types we have here are currently:

ComparableASCIILiteral
 - Input can be:
     Any ASCII character
 - Matches agains: 
     Any ASCII character. No case folding is performed.

ComparableCaseFoldingASCIILiteral
 - Input can be:
     Any ASCII character that is not an ASCII uppercase character
 - Matches agains: 
     Any ASCII character. Case folding is performed.

ComparableLettersLiteral
 - Input can be:
     Any ASCII character that is not an ASCII Uppercase or a control character with the 0x20 bit set.
 - Matches agains: 
     Any ASCII character. Case folding is performed.


(I think this is right so far).

The real trick here is what to call this input character set that only allow ASCII characters without the 0x20 bit set. In one place you call it NoUppercaseLettersOptimized. While that isn't perfectly descriptive, I think it might be the right direction to start looking.

So maybe, something like ComparableCaseFoldingASCIILiteralUsingOptimizedSubset? Gosh, that's not great is it :(.
Comment 13 Darin Adler 2021-05-02 11:05:37 PDT
Let me repeat your proposal back to you along with some tweaks and commentary.

The comparable strings themselves must be ASCII, but matching works against any character, so the "matches against" lines should not say "ASCII".

Another member of this family, currently in a .cpp file, but can be moved to the header, is PackedASCIILowerCodes. It’s the same as ComparableCaseFoldingASCIILiteral except that it packs the characters into an integer for comparison, and so has a limit based on the number of bytes in the integer. The cases it is currently used for could all be using the optimized subset, by the way.

To get rid of the "using", I rearranged word order. Not sure I got it best. Tried to think which made the best logical sense.

So here is a set of possible names:

    ComparableASCIILiteral
    ComparableCaseFoldingASCIILiteral
    ComparableOptimizedCaseFoldingASCIISubsetLiteral
    ComparablePackedCaseFoldingASCIILiteral<StorageType>

Corresponding equality comparison functions have these names:

    equalIgnoringASCIICase
    equalLettersIgnoringASCIICase

Not really sure that ComparableOptimizedCaseFoldingASCIISubsetLiteral and equalLettersIgnoringASCIICase are truly needed. Is this tiny bit of optimization (saving a branch per character, basically) worth possible confusion? Maybe instead of using a differently named identifier the right thing to do is to find a way to have the optimization be automatically done based on scanning the characters at compile time? Could be challenging to get that information "where it's needed" without any runtime cost.

Could also put the word "Packed" before the word "Comparable".