Bug 224138

Summary: Use Hasher more, remove IntegerHasher, fix hashing-related mistakes
Product: WebKit Reporter: Darin Adler <darin>
Component: Web Template FrameworkAssignee: Darin Adler <darin>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cgarcia, changseok, clord, cmarcelo, eric.carlson, esprehn+autocc, ews-watchlist, glenn, gyuyoung.kim, jer.noble, keith_miller, kondapallykalyan, macpherson, mark.lam, menard, mifenton, mmaxfield, msaboff, pdr, philipj, saam, sergio, tzagallo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch cdumez: review+

Darin Adler
Reported 2021-04-02 15:50:17 PDT
Use Hasher more, remove IntegerHasher, fix hashing-related mistakes
Attachments
Patch (85.08 KB, patch)
2021-04-02 17:02 PDT, Darin Adler
no flags
Patch (86.57 KB, patch)
2021-04-05 16:38 PDT, Darin Adler
ews-feeder: commit-queue-
Patch (91.99 KB, patch)
2021-04-05 17:08 PDT, Darin Adler
ews-feeder: commit-queue-
Patch (92.62 KB, patch)
2021-04-05 17:31 PDT, Darin Adler
ews-feeder: commit-queue-
Patch (93.37 KB, patch)
2021-04-05 18:30 PDT, Darin Adler
no flags
Patch (93.34 KB, patch)
2021-04-06 14:43 PDT, Darin Adler
no flags
Patch (94.81 KB, patch)
2021-04-07 13:18 PDT, Darin Adler
cdumez: review+
Darin Adler
Comment 1 2021-04-02 17:02:54 PDT
Darin Adler
Comment 2 2021-04-02 17:12:12 PDT
GTK build requires some changes to FontCacheFreeType.cpp
Darin Adler
Comment 3 2021-04-05 16:38:00 PDT
Darin Adler
Comment 4 2021-04-05 17:08:14 PDT
Darin Adler
Comment 5 2021-04-05 17:31:48 PDT
Darin Adler
Comment 6 2021-04-05 18:30:04 PDT
Darin Adler
Comment 7 2021-04-06 14:43:15 PDT
Darin Adler
Comment 8 2021-04-07 13:18:56 PDT
Chris Dumez
Comment 9 2021-04-07 14:30:21 PDT
Comment on attachment 425430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review > Source/WebCore/ChangeLog:12 > + hashes using exclusive or. The new one is almost certainly bettter. typo: bettter > Source/WTF/wtf/Hasher.h:100 > + bool remainder = string.length() & 1; Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash.
Chris Dumez
Comment 10 2021-04-07 14:42:40 PDT
Comment on attachment 425430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review > Source/WebCore/platform/graphics/FontTaggedSettings.h:43 > +inline void add(Hasher& hasher, std::array<char, 4> array) Could use FontTag?
Darin Adler
Comment 11 2021-04-07 14:56:01 PDT
Comment on attachment 425430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review >> Source/WTF/wtf/Hasher.h:100 >> + bool remainder = string.length() & 1; > > Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash. If we want to compute a hash by hashing a hash, then yes, that avoids scanning the characters again. But we’d have to always do it, not just in cases where the hash is already computed. But generally common wisdom I have read in research about hashing is that it’s much better to hash characters than to hash an already-computed hash of the characters. I believe enough better that it’s worthwhile to re-scan the characters in the string. Anyway, I’m open to either approach. What I like about Hasher is that we make the call here in one place. >> Source/WebCore/platform/graphics/FontTaggedSettings.h:43 >> +inline void add(Hasher& hasher, std::array<char, 4> array) > > Could use FontTag? We could, but I intentionally did not because this function is not a hash specific to FontTag; it’s hash for any 4-element array of bytes. Could even make it a template so it works for all different 8-bit integer types. Maybe even move it to Hasher.h instead of leaving it here.
Chris Dumez
Comment 12 2021-04-07 15:00:37 PDT
(In reply to Darin Adler from comment #11) > Comment on attachment 425430 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=425430&action=review > > >> Source/WTF/wtf/Hasher.h:100 > >> + bool remainder = string.length() & 1; > > > > Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash. > > If we want to compute a hash by hashing a hash, then yes, that avoids > scanning the characters again. But we’d have to always do it, not just in > cases where the hash is already computed. > > But generally common wisdom I have read in research about hashing is that > it’s much better to hash characters than to hash an already-computed hash of > the characters. I believe enough better that it’s worthwhile to re-scan the > characters in the string. > > Anyway, I’m open to either approach. What I like about Hasher is that we > make the call here in one place. I see. Your call. I just thought it was sad to have hashing logic and caching of that hash in String, and yet not use it for hashing. > > >> Source/WebCore/platform/graphics/FontTaggedSettings.h:43 > >> +inline void add(Hasher& hasher, std::array<char, 4> array) > > > > Could use FontTag? > > We could, but I intentionally did not because this function is not a hash > specific to FontTag; it’s hash for any 4-element array of bytes. Could even > make it a template so it works for all different 8-bit integer types. Maybe > even move it to Hasher.h instead of leaving it here. Ok.
Darin Adler
Comment 13 2021-04-07 17:00:28 PDT
Comment on attachment 425430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=425430&action=review >>>> Source/WTF/wtf/Hasher.h:100 >>>> + bool remainder = string.length() & 1; >>> >>> Wouldn't it be more efficient to call String::hash()? In cases where the String hash has already been computed, String::hash() just returns the cached hash. >> >> If we want to compute a hash by hashing a hash, then yes, that avoids scanning the characters again. But we’d have to always do it, not just in cases where the hash is already computed. >> >> But generally common wisdom I have read in research about hashing is that it’s much better to hash characters than to hash an already-computed hash of the characters. I believe enough better that it’s worthwhile to re-scan the characters in the string. >> >> Anyway, I’m open to either approach. What I like about Hasher is that we make the call here in one place. > > I see. Your call. I just thought it was sad to have hashing logic and caching of that hash in String, and yet not use it for hashing. I’ll land it like this, but it’s not my "final word" on the matter. It’s worth considering improvements and finding a way to measure.
Darin Adler
Comment 14 2021-04-07 19:19:28 PDT
Radar WebKit Bug Importer
Comment 15 2021-04-07 19:20:17 PDT
Note You need to log in before you can comment on or make changes to this bug.