224138 – Use Hasher more, remove IntegerHasher, fix hashing-related mistakes

RESOLVED FIXED 224138

Use Hasher more, remove IntegerHasher, fix hashing-related mistakes

https://bugs.webkit.org/show_bug.cgi?id=224138

Summary Use Hasher more, remove IntegerHasher, fix hashing-related mistakes

Darin Adler

Reported 2021-04-02 15:50:17 PDT

Use Hasher more, remove IntegerHasher, fix hashing-related mistakes

Attachments
Patch (85.08 KB, patch) 2021-04-02 17:02 PDT, Darin Adler	no flags	Details Formatted Diff Diff
Patch (86.57 KB, patch) 2021-04-05 16:38 PDT, Darin Adler	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (91.99 KB, patch) 2021-04-05 17:08 PDT, Darin Adler	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (92.62 KB, patch) 2021-04-05 17:31 PDT, Darin Adler	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (93.37 KB, patch) 2021-04-05 18:30 PDT, Darin Adler	no flags	Details Formatted Diff Diff
Patch (93.34 KB, patch) 2021-04-06 14:43 PDT, Darin Adler	no flags	Details Formatted Diff Diff
Patch (94.81 KB, patch) 2021-04-07 13:18 PDT, Darin Adler	cdumez: review+	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Darin Adler

Comment 1 2021-04-02 17:02:54 PDT

Created attachment 425065 [details] Patch

Darin Adler

Comment 2 2021-04-02 17:12:12 PDT

GTK build requires some changes to FontCacheFreeType.cpp

Darin Adler

Comment 3 2021-04-05 16:38:00 PDT

Created attachment 425221 [details] Patch

Darin Adler

Comment 4 2021-04-05 17:08:14 PDT

Created attachment 425224 [details] Patch

Darin Adler

Comment 5 2021-04-05 17:31:48 PDT

Created attachment 425227 [details] Patch

Darin Adler

Comment 6 2021-04-05 18:30:04 PDT

Created attachment 425228 [details] Patch

Darin Adler

Comment 7 2021-04-06 14:43:15 PDT

Created attachment 425328 [details] Patch

Darin Adler

Comment 8 2021-04-07 13:18:56 PDT

Created attachment 425430 [details] Patch

Chris Dumez

Comment 9 2021-04-07 14:30:21 PDT

Comment on attachment 425430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review > Source/WebCore/ChangeLog:12 > + hashes using exclusive or. The new one is almost certainly bettter. typo: bettter > Source/WTF/wtf/Hasher.h:100 > + bool remainder = string.length() & 1; Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash.

Chris Dumez

Comment 10 2021-04-07 14:42:40 PDT

Comment on attachment 425430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review > Source/WebCore/platform/graphics/FontTaggedSettings.h:43 > +inline void add(Hasher& hasher, std::array<char, 4> array) Could use FontTag?

Darin Adler

Comment 11 2021-04-07 14:56:01 PDT

Comment on attachment 425430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review >> Source/WTF/wtf/Hasher.h:100 >> + bool remainder = string.length() & 1; > > Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash. If we want to compute a hash by hashing a hash, then yes, that avoids scanning the characters again. But we’d have to always do it, not just in cases where the hash is already computed. But generally common wisdom I have read in research about hashing is that it’s much better to hash characters than to hash an already-computed hash of the characters. I believe enough better that it’s worthwhile to re-scan the characters in the string. Anyway, I’m open to either approach. What I like about Hasher is that we make the call here in one place. >> Source/WebCore/platform/graphics/FontTaggedSettings.h:43 >> +inline void add(Hasher& hasher, std::array<char, 4> array) > > Could use FontTag? We could, but I intentionally did not because this function is not a hash specific to FontTag; it’s hash for any 4-element array of bytes. Could even make it a template so it works for all different 8-bit integer types. Maybe even move it to Hasher.h instead of leaving it here.

Chris Dumez

Comment 12 2021-04-07 15:00:37 PDT

(In reply to Darin Adler from comment #11) > Comment on attachment 425430 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=425430&action=review > > >> Source/WTF/wtf/Hasher.h:100 > >> + bool remainder = string.length() & 1; > > > > Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash. > > If we want to compute a hash by hashing a hash, then yes, that avoids > scanning the characters again. But we’d have to always do it, not just in > cases where the hash is already computed. > > But generally common wisdom I have read in research about hashing is that > it’s much better to hash characters than to hash an already-computed hash of > the characters. I believe enough better that it’s worthwhile to re-scan the > characters in the string. > > Anyway, I’m open to either approach. What I like about Hasher is that we > make the call here in one place. I see. Your call. I just thought it was sad to have hashing logic and caching of that hash in String, and yet not use it for hashing. > > >> Source/WebCore/platform/graphics/FontTaggedSettings.h:43 > >> +inline void add(Hasher& hasher, std::array<char, 4> array) > > > > Could use FontTag? > > We could, but I intentionally did not because this function is not a hash > specific to FontTag; it’s hash for any 4-element array of bytes. Could even > make it a template so it works for all different 8-bit integer types. Maybe > even move it to Hasher.h instead of leaving it here. Ok.

Darin Adler

Comment 13 2021-04-07 17:00:28 PDT

Comment on attachment 425430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review >>>> Source/WTF/wtf/Hasher.h:100 >>>> + bool remainder = string.length() & 1; >>> >>> Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash. >> >> If we want to compute a hash by hashing a hash, then yes, that avoids scanning the characters again. But we’d have to always do it, not just in cases where the hash is already computed. >> >> But generally common wisdom I have read in research about hashing is that it’s much better to hash characters than to hash an already-computed hash of the characters. I believe enough better that it’s worthwhile to re-scan the characters in the string. >> >> Anyway, I’m open to either approach. What I like about Hasher is that we make the call here in one place. > > I see. Your call. I just thought it was sad to have hashing logic and caching of that hash in String, and yet not use it for hashing. I’ll land it like this, but it’s not my "final word" on the matter. It’s worth considering improvements and finding a way to measure.

Darin Adler

Comment 14 2021-04-07 19:19:28 PDT

Committed r275650 (236286@main): <https://commits.webkit.org/236286@main>

Radar WebKit Bug Importer

Comment 15 2021-04-07 19:20:17 PDT

<rdar://problem/76377558>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Web Template Framework

Assignee

Darin Adler

Reported

2021-04-02 15:50 PDT

Modified

2021-04-07 19:20 PDT History

CC List

26 users Show

URL

Keywords InRadar

Depends on

Blocks

benjamin
cdumez
cgarcia
changseok
clord
cmarcelo
eric.carlson
esprehn+autocc
ews-watchlist
glenn
gyuyoung.kim
jer.noble
keith_miller
kondapallykalyan
macpherson
mark.lam
menard
mifenton
mmaxfield
msaboff
pdr
philipj
saam
sergio
tzagallo
webkit-bug-importer