Bug 236532 - look up InputTypeFactoryMap with an ASCII lowercase string instead of using a ASCIICaseInsensitiveHash
Summary: look up InputTypeFactoryMap with an ASCII lowercase string instead of using a...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Cameron McCormack (:heycam)
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-02-11 15:47 PST by Cameron McCormack (:heycam)
Modified: 2022-02-13 10:29 PST (History)
9 users (show)

See Also:


Attachments
Patch (2.82 KB, patch)
2022-02-11 15:50 PST, Cameron McCormack (:heycam)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Cameron McCormack (:heycam) 2022-02-11 15:47:40 PST
InputType::create looks up the InputTypeFactoryMap based on the AtomString value of the <input type> attribute.  The HashMap uses an ASCIICaseInsensitiveHash, but the AtomStrings stored in the map are all ASCII lowercase to begin with.  This means that we spend time doing an ASCII case insensitive hash computation on the query string.  Most content already supplies an ASCII lowercase type value, so it's less work to ASCII lowercase the type value and then look up the HashMap using the regular hash for AtomStrings (i.e., pulling the hash out of AtomString).

Doing this is a 0.5% improvement on a couple of Speedometer 2 subtests, and a 0.1% improvement to the overall score.
Comment 1 Cameron McCormack (:heycam) 2022-02-11 15:50:51 PST
Created attachment 451754 [details]
Patch
Comment 2 EWS 2022-02-12 07:09:29 PST
Committed r289691 (247176@main): <https://commits.webkit.org/247176@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 451754 [details].
Comment 3 Radar WebKit Bug Importer 2022-02-12 07:10:18 PST
<rdar://problem/88855498>
Comment 4 Darin Adler 2022-02-12 14:10:16 PST
Comment on attachment 451754 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=451754&action=review

> Source/WebCore/ChangeLog:3
> +        Look up InputTypeFactoryMap with an ASCII lowercase string instead of using a ASCIICaseInsensitiveHash

This seems like a great speedup, but if the cost is the hash computation there’s another approach we could have taken: We can change ASCIICaseInsensitiveHash::hash(StringImpl&) to check if there are any ASCII uppercase characters. If there are none, then it can use string.hash(). If we do this, it would optimize *all* uses of ASCIICaseInsensitiveHash that are typically used on already-lowercase strings. However, if the optimization here benefits from other savings as well, then perhaps my suggestion would not work as well as this did.

What do you think, Cameron?
Comment 5 Cameron McCormack (:heycam) 2022-02-12 17:45:42 PST
(In reply to Darin Adler from comment #4)
> This seems like a great speedup, but if the cost is the hash computation
> there’s another approach we could have taken: We can change
> ASCIICaseInsensitiveHash::hash(StringImpl&) to check if there are any ASCII
> uppercase characters. If there are none, then it can use string.hash(). If
> we do this, it would optimize *all* uses of ASCIICaseInsensitiveHash that
> are typically used on already-lowercase strings. However, if the
> optimization here benefits from other savings as well, then perhaps my
> suggestion would not work as well as this did.
> 
> What do you think, Cameron?

Nice idea, and I think it is worth trying that out.  In this particular case, I think it's still advantageous to move away from ASCIICaseInsensitiveHash, as it allows us to skip hashing the key upon insertion too (in createInputTypeFactoryMap).

Is it possible to configure HashMap in a way that hashing of keys on insertion is done differently from hashing of keys passed when looking up the map?
Comment 6 Cameron McCormack (:heycam) 2022-02-12 17:48:08 PST
Maybe we would only want to make your suggested ASCIICaseInsensitiveHash change if we were sure that all performance critical uses of ASCIICaseInsensitiveHash and AtomStrings tend to get called with already ASCII lowercase strings.

Or we could have a different hash trait type which means "compute an ASCII case insensitive hash but the string probably is already ASCII case insensitive".
Comment 7 Darin Adler 2022-02-13 10:21:41 PST
On reflection, I think I have a better idea. We should use SortedArrayMap instead of HashMap for InputType::create. Then there will be no hashing at all. I suspect it can match or exceed the speed of the HashMap; I believe the list is short enough that it will be a linear search, but maybe big enough for binary. Because this is not a dynamic data structure, but rather something that can be initialized at compile time, SortedArrayMap should be good.

I've learned to never doubt the speed one can get from hashing, so there is a slim chance that won't work, though!

(In reply to Cameron McCormack (:heycam) from comment #5)
> Nice idea, and I think it is worth trying that out.  In this particular
> case, I think it's still advantageous to move away from
> ASCIICaseInsensitiveHash, as it allows us to skip hashing the key upon
> insertion too (in createInputTypeFactoryMap).

Sorry, I don’t understand this.

As far as I can tell, neither the solution you implemented nor the ASCIICaseInsensitiveHash::hash solution I suggested will involve any extra hashing on insertion. In both cases, we detect that the String is already all lowercase, and use the hash stored in the String.

In the solution you implemented, we are relying on knowing that there are no ASCII uppercase characters in the string and so it uses String::hash.

In the solution I suggested, the hash table add function will call the ASCIICaseInsensitiveHash::hash function. That function will detect that the String lacks uppercase ASCII characters and then it will call String::hash.

It’s true that my suggestion incurs an "are there any uppercase ASCII letters" check on each string during insertion that yours did not, but no additional hashing.

> Is it possible to configure HashMap in a way that hashing of keys on
> insertion is done differently from hashing of keys passed when looking up
> the map?

I don’t think that’s needed here, but if we wanted to use it, the answer is yes.

HashMap provides HashTranslator versions of the functions find, contains, get, inlineGet, and add, and in all of these we can use a different type that hashes differently by providing HashTranslator traits.

This could be used to make a version of add that already knows there are no uppercase ASCII letters, and thus optimize the hash table building process. But that’s not an important optimization for a function that runs once, I don’t think.

If the speed of building the table matters at all, then my SortedArrayMap idea becomes even better, because the map is built at compile/load time, not at runtime.
Comment 8 Darin Adler 2022-02-13 10:24:18 PST
(In reply to Cameron McCormack (:heycam) from comment #6)
> Maybe we would only want to make your suggested ASCIICaseInsensitiveHash
> change if we were sure that all performance critical uses of
> ASCIICaseInsensitiveHash and AtomStrings tend to get called with already
> ASCII lowercase strings.
> 
> Or we could have a different hash trait type which means "compute an ASCII
> case insensitive hash but the string probably is already ASCII case
> insensitive".

I see your point.

If there are places where we use ASCIICaseInsensitiveHash, but the strings often have some uppercase ASCII letters in them, then the extra checking would be wasteful.

But since this check is quite inexpensive compared to the hash computation, I am guessing we wouldn’t need two differently tuned versions.
Comment 9 Darin Adler 2022-02-13 10:27:56 PST
One other thought: If this is so hot that all that hash computation is significant, maybe we also want to make a histogram and find out what values are actually used in practice and add special cases for the hottest most common values:

    if (string == "text")
        <do the thing and don't even bother with any fancy data structure>

Adding a case like this could speed things up further! All depends on this really being super-hot.
Comment 10 Darin Adler 2022-02-13 10:29:10 PST
I think I like SortedArrayMap best of the above ideas.

But also, we should probably think about the ASCIICaseInsensitiveHash optimization for the benefit of the other uses of it. I bet it will be a net win, maybe there are even other places where we can see a measurable speedup on some benchmark.