Bug 224138 - Use Hasher more, remove IntegerHasher, fix hashing-related mistakes
Summary: Use Hasher more, remove IntegerHasher, fix hashing-related mistakes
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Darin Adler
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-04-02 15:50 PDT by Darin Adler
Modified: 2021-04-07 19:20 PDT (History)
26 users (show)

See Also:


Attachments
Patch (85.08 KB, patch)
2021-04-02 17:02 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (86.57 KB, patch)
2021-04-05 16:38 PDT, Darin Adler
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (91.99 KB, patch)
2021-04-05 17:08 PDT, Darin Adler
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (92.62 KB, patch)
2021-04-05 17:31 PDT, Darin Adler
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (93.37 KB, patch)
2021-04-05 18:30 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (93.34 KB, patch)
2021-04-06 14:43 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (94.81 KB, patch)
2021-04-07 13:18 PDT, Darin Adler
cdumez: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2021-04-02 15:50:17 PDT
Use Hasher more, remove IntegerHasher, fix hashing-related mistakes
Comment 1 Darin Adler 2021-04-02 17:02:54 PDT
Created attachment 425065 [details]
Patch
Comment 2 Darin Adler 2021-04-02 17:12:12 PDT
GTK build requires some changes to FontCacheFreeType.cpp
Comment 3 Darin Adler 2021-04-05 16:38:00 PDT
Created attachment 425221 [details]
Patch
Comment 4 Darin Adler 2021-04-05 17:08:14 PDT
Created attachment 425224 [details]
Patch
Comment 5 Darin Adler 2021-04-05 17:31:48 PDT
Created attachment 425227 [details]
Patch
Comment 6 Darin Adler 2021-04-05 18:30:04 PDT
Created attachment 425228 [details]
Patch
Comment 7 Darin Adler 2021-04-06 14:43:15 PDT
Created attachment 425328 [details]
Patch
Comment 8 Darin Adler 2021-04-07 13:18:56 PDT
Created attachment 425430 [details]
Patch
Comment 9 Chris Dumez 2021-04-07 14:30:21 PDT
Comment on attachment 425430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review

> Source/WebCore/ChangeLog:12
> +        hashes using exclusive or. The new one is almost certainly bettter.

typo: bettter

> Source/WTF/wtf/Hasher.h:100
> +    bool remainder = string.length() & 1;

Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash.
Comment 10 Chris Dumez 2021-04-07 14:42:40 PDT
Comment on attachment 425430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review

> Source/WebCore/platform/graphics/FontTaggedSettings.h:43
> +inline void add(Hasher& hasher, std::array<char, 4> array)

Could use FontTag?
Comment 11 Darin Adler 2021-04-07 14:56:01 PDT
Comment on attachment 425430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review

>> Source/WTF/wtf/Hasher.h:100
>> +    bool remainder = string.length() & 1;
> 
> Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash.

If we want to compute a hash by hashing a hash, then yes, that avoids scanning the characters again. But we’d have to always do it, not just in cases where the hash is already computed.

But generally common wisdom I have read in research about hashing is that it’s much better to hash characters than to hash an already-computed hash of the characters. I believe enough better that it’s worthwhile to re-scan the characters in the string.

Anyway, I’m open to either approach. What I like about Hasher is that we make the call here in one place.

>> Source/WebCore/platform/graphics/FontTaggedSettings.h:43
>> +inline void add(Hasher& hasher, std::array<char, 4> array)
> 
> Could use FontTag?

We could, but I intentionally did not because this function is not a hash specific to FontTag; it’s hash for any 4-element array of bytes. Could even make it a template so it works for all different 8-bit integer types. Maybe even move it to Hasher.h instead of leaving it here.
Comment 12 Chris Dumez 2021-04-07 15:00:37 PDT
(In reply to Darin Adler from comment #11)
> Comment on attachment 425430 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=425430&action=review
> 
> >> Source/WTF/wtf/Hasher.h:100
> >> +    bool remainder = string.length() & 1;
> > 
> > Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash.
> 
> If we want to compute a hash by hashing a hash, then yes, that avoids
> scanning the characters again. But we’d have to always do it, not just in
> cases where the hash is already computed.
> 
> But generally common wisdom I have read in research about hashing is that
> it’s much better to hash characters than to hash an already-computed hash of
> the characters. I believe enough better that it’s worthwhile to re-scan the
> characters in the string.
> 
> Anyway, I’m open to either approach. What I like about Hasher is that we
> make the call here in one place.

I see. Your call. I just thought it was sad to have hashing logic and caching of that hash in String, and yet not use it for hashing.

> 
> >> Source/WebCore/platform/graphics/FontTaggedSettings.h:43
> >> +inline void add(Hasher& hasher, std::array<char, 4> array)
> > 
> > Could use FontTag?
> 
> We could, but I intentionally did not because this function is not a hash
> specific to FontTag; it’s hash for any 4-element array of bytes. Could even
> make it a template so it works for all different 8-bit integer types. Maybe
> even move it to Hasher.h instead of leaving it here.

Ok.
Comment 13 Darin Adler 2021-04-07 17:00:28 PDT
Comment on attachment 425430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review

>>>> Source/WTF/wtf/Hasher.h:100
>>>> +    bool remainder = string.length() & 1;
>>> 
>>> Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash.
>> 
>> If we want to compute a hash by hashing a hash, then yes, that avoids scanning the characters again. But we’d have to always do it, not just in cases where the hash is already computed.
>> 
>> But generally common wisdom I have read in research about hashing is that it’s much better to hash characters than to hash an already-computed hash of the characters. I believe enough better that it’s worthwhile to re-scan the characters in the string.
>> 
>> Anyway, I’m open to either approach. What I like about Hasher is that we make the call here in one place.
> 
> I see. Your call. I just thought it was sad to have hashing logic and caching of that hash in String, and yet not use it for hashing.

I’ll land it like this, but it’s not my "final word" on the matter. It’s worth considering improvements and finding a way to measure.
Comment 14 Darin Adler 2021-04-07 19:19:28 PDT
Committed r275650 (236286@main): <https://commits.webkit.org/236286@main>
Comment 15 Radar WebKit Bug Importer 2021-04-07 19:20:17 PDT
<rdar://problem/76377558>