WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
237049
Drop StringHasher::hashMemory() and use the modern Hasher instead
https://bugs.webkit.org/show_bug.cgi?id=237049
Summary
Drop StringHasher::hashMemory() and use the modern Hasher instead
Chris Dumez
Reported
2022-02-22 11:46:51 PST
Drop StringHasher::hashMemory() and use the modern Hasher instead.
Attachments
Patch
(27.55 KB, patch)
2022-02-22 11:48 PST
,
Chris Dumez
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(27.55 KB, patch)
2022-02-22 12:56 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(30.01 KB, patch)
2022-02-22 18:42 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(30.01 KB, patch)
2022-02-22 18:47 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(30.06 KB, patch)
2022-02-22 19:26 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Chris Dumez
Comment 1
2022-02-22 11:48:29 PST
Created
attachment 452888
[details]
Patch
Chris Dumez
Comment 2
2022-02-22 12:56:55 PST
Created
attachment 452895
[details]
Patch
Sam Weinig
Comment 3
2022-02-22 17:37:07 PST
Comment on
attachment 452895
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=452895&action=review
> Source/WTF/wtf/SchedulePair.h:77 > + return computeHash(reinterpret_cast<uintptr_t>(pair->runLoop()), pair->mode() ? CFHash(pair->mode()) : 0);
Not new, but we should probably not hash a CFHash() here and should just hash the CFStringRef (not needed for this change, just pointing it out).
> Source/WTF/wtf/UUID.h:101 > + add(hasher, high, low);
Probably worth just adding an overload of add(Hasher, ..) for UInt128. But I think you can also use the UInt128Low64/UInt128High64 helpers instead of the explicit bit operations.
> Source/WebCore/Modules/indexeddb/shared/IDBResourceIdentifier.h:93 > - static unsigned hash(const IDBResourceIdentifier& a) { return a.hash(); } > + static unsigned hash(const IDBResourceIdentifier& a) { return computeHash(a); }
Would be nice if we could avoid having this and instead have HashTable (or whatever is using these Hash structs) try to call add(Hasher,...) if it exists.
> Source/WebCore/page/SecurityOriginHash.h:41 > + return computeHash(origin->protocol(), origin->host(), origin->port().value_or(0));
This will compute a different hash value than the old one, and will be more expensive due to no using cached hash values in the strings. This is likely a better hash overall since you get more entropy per-byte, but it might be a slow down due to rehashing each string.
> Source/WebKitLegacy/mac/History/BinaryPropertyList.cpp:81 > + auto* integers = array.integers(); > + for (size_t i = 0; i < array.size(); ++i) > + add(hasher, integers[i]);
I would be nice if this could be done instead: add(hasher, Span(array.integers(), array.size()));
Darin Adler
Comment 4
2022-02-22 17:44:32 PST
Comment on
attachment 452895
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=452895&action=review
Do we need to measure the performance effect of all these hashing changes?
> Source/WTF/wtf/SchedulePair.h:77 > + return computeHash(reinterpret_cast<uintptr_t>(pair->runLoop()), pair->mode() ? CFHash(pair->mode()) : 0);
Seems like it would be slightly more elegant to have add(hasher&, const SchedulePair&) and call "return computeHash(*pair)" here. This code is still doing a hash of hashes, by calling CFHash on CFStringRef. Is there an alternative? Should we teach Hasher how to hash a CFStringRef? Put that in a header somewhere so we can reuse it? Do we really need to convert the run loop pointer to an integer to hash it? Can we hash pointers without reinterpret_cast, or is it ambiguous whether that means to hash the address or the data structure at that address? Is converting to an integer the way to say we want to hash the address? Mainly, I think uintptr_t is a leftover from wanting the same type for two integers that we wanted to pass to hashMemory.
> Source/WTF/wtf/UUID.h:97 > +inline void add(Hasher& hasher, const UUID& uuid)
Probably should take UUID instead of const UUID&.
> Source/WTF/wtf/UUID.h:101 > + auto high = static_cast<uint64_t>(uuid.m_data >> 64); > + auto low = static_cast<uint64_t>(uuid.m_data & 0xffffffffffffffff); > + add(hasher, high, low);
Seems like we should have: void add(Hasher&, UInt128) And have this call that. Can still have it live in this header for now.
> Source/WebCore/dom/QualifiedName.h:123 > +inline void add(Hasher& hasher, const QualifiedName& name) > +{ > + auto impl = name.impl(); > + add(hasher, impl->m_prefix, impl->m_localName, impl->m_namespace); > +}
I often like to just overload and also have a second function: void add(Hasher&, const QualifiedNameImpl&) And then have this one be: add(hasher, *name.impl()); Not *sure* that is better, but it seems appealing to me.
> Source/WebCore/page/SecurityOriginHash.h:41 > + return computeHash(origin->protocol(), origin->host(), origin->port().value_or(0));
I don’t think we need this value_or(0) here. Also seems like we’d like to have: add(Hasher&, const SecurityOrigin&); Then here just call computeHash(*origin);
> Source/WebCore/platform/ScriptExecutionContextIdentifier.h:79 > +inline void add(Hasher& hasher, const ProcessQualified<UUID>& uuid) > +{ > + add(hasher, uuid.object()); > +}
Should this instead be a function template that works on all ProcessQualified<>?
> Source/WebCore/platform/graphics/cocoa/FontPlatformDataCocoa.mm:50 > + return computeHash(fontHash, flags);
The use of uintptr_t in this function doesn’t really make sense any more. We can hash any integer types that are appropriate, and seems unlikely we need a pointer-sized int for the flags, nor do we need to cast CFHashCode to uintptr_t. Also a little strange to put these bits all together in a flags integer when the hasher can just be passed multiple things; this is probably a bit more efficient, but why not just hash all the fields in the most straightforward way. Another example of hashing a CFStringRef by hashing a hash, and it’s not great.
> Source/WebCore/platform/graphics/win/FontPlatformDataCGWin.cpp:137 > unsigned fontHash = m_font ? m_font->hash() : 0;
Might need a hasher that knows how to dereference a pointer.
> Source/WebCore/platform/graphics/win/FontPlatformDataCGWin.cpp:140 > + return computeHash(fontHash, cgFontHash, ctFontHash, m_useGDI);
The "hashing a hash" pattern again, on both CFStringRef but also on m_font. But how much do we care about code in CGWin? Probably close to retirement at this point.
> Source/WebKitLegacy/mac/History/BinaryPropertyList.cpp:82 > +inline void add(Hasher& hasher, const IntegerArray& array) > +{ > + auto* integers = array.integers(); > + for (size_t i = 0; i < array.size(); ++i) > + add(hasher, integers[i]); > +}
A little strange that if there’s nothing in the array the hash doesn’t get changed at all. I am confused about what the best practice should be for adding empty collections to a hash.
Darin Adler
Comment 5
2022-02-22 17:44:53 PST
Nice that Sam and I made many of the same comments.
Chris Dumez
Comment 6
2022-02-22 17:58:31 PST
(In reply to Sam Weinig from
comment #3
)
> Comment on
attachment 452895
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=452895&action=review
> > > Source/WTF/wtf/SchedulePair.h:77 > > + return computeHash(reinterpret_cast<uintptr_t>(pair->runLoop()), pair->mode() ? CFHash(pair->mode()) : 0); > > Not new, but we should probably not hash a CFHash() here and should just > hash the CFStringRef (not needed for this change, just pointing it out).
Yes, I wondered about that but tried to be conservative since this is a pre-existing issue. Hashing the CFStringRef would produce a better hash but would be more expensive.
> > Source/WTF/wtf/UUID.h:101 > > + add(hasher, high, low); > > Probably worth just adding an overload of add(Hasher, ..) for UInt128. But I > think you can also use the UInt128Low64/UInt128High64 helpers instead of the > explicit bit operations.
The reason UUID uses the bit operations and not UInt128Low64()/UInt128High64() is that those functions work on a UInt128Impl, which is an implementation detail. On some platforms, UInt128 is a __uint128_t, not a UInt128Impl.
> > > Source/WebCore/Modules/indexeddb/shared/IDBResourceIdentifier.h:93 > > - static unsigned hash(const IDBResourceIdentifier& a) { return a.hash(); } > > + static unsigned hash(const IDBResourceIdentifier& a) { return computeHash(a); } > > Would be nice if we could avoid having this and instead have HashTable (or > whatever is using these Hash structs) try to call add(Hasher,...) if it > exists. > > > Source/WebCore/page/SecurityOriginHash.h:41 > > + return computeHash(origin->protocol(), origin->host(), origin->port().value_or(0)); > > This will compute a different hash value than the old one, and will be more > expensive due to no using cached hash values in the strings. This is likely > a better hash overall since you get more entropy per-byte, but it might be a > slow down due to rehashing each string.
Well, technically, those are AtomStrings so we are not actually going to re-hash strings per se. We'll just hash pointers. Therefore, I think it isn't as expensive as it looks. You are right though that it is more expensive than using the cached hash. The alternative though would be to use the cached hash and give that to Hasher. Sadly, this would hash a hash, which is not optimal. Would we prefer that?
> > > Source/WebKitLegacy/mac/History/BinaryPropertyList.cpp:81 > > + auto* integers = array.integers(); > > + for (size_t i = 0; i < array.size(); ++i) > > + add(hasher, integers[i]); > > I would be nice if this could be done instead: > > add(hasher, Span(array.integers(), array.size()));
This may work already, I'll give it a try.
Chris Dumez
Comment 7
2022-02-22 18:26:16 PST
(In reply to Darin Adler from
comment #4
)
> Comment on
attachment 452895
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=452895&action=review
> > Do we need to measure the performance effect of all these hashing changes? > > > Source/WTF/wtf/SchedulePair.h:77 > > + return computeHash(reinterpret_cast<uintptr_t>(pair->runLoop()), pair->mode() ? CFHash(pair->mode()) : 0); > > Seems like it would be slightly more elegant to have add(hasher&, const > SchedulePair&) and call "return computeHash(*pair)" here. > > This code is still doing a hash of hashes, by calling CFHash on CFStringRef. > Is there an alternative? Should we teach Hasher how to hash a CFStringRef? > Put that in a header somewhere so we can reuse it? > > Do we really need to convert the run loop pointer to an integer to hash it? > Can we hash pointers without reinterpret_cast, or is it ambiguous whether > that means to hash the address or the data structure at that address? Is > converting to an integer the way to say we want to hash the address? Mainly, > I think uintptr_t is a leftover from wanting the same type for two integers > that we wanted to pass to hashMemory. > > > Source/WTF/wtf/UUID.h:97 > > +inline void add(Hasher& hasher, const UUID& uuid) > > Probably should take UUID instead of const UUID&. > > > Source/WTF/wtf/UUID.h:101 > > + auto high = static_cast<uint64_t>(uuid.m_data >> 64); > > + auto low = static_cast<uint64_t>(uuid.m_data & 0xffffffffffffffff); > > + add(hasher, high, low); > > Seems like we should have: > > void add(Hasher&, UInt128) > > And have this call that. Can still have it live in this header for now. > > > Source/WebCore/dom/QualifiedName.h:123 > > +inline void add(Hasher& hasher, const QualifiedName& name) > > +{ > > + auto impl = name.impl(); > > + add(hasher, impl->m_prefix, impl->m_localName, impl->m_namespace); > > +} > > I often like to just overload and also have a second function: > > void add(Hasher&, const QualifiedNameImpl&) > > And then have this one be: > > add(hasher, *name.impl()); > > Not *sure* that is better, but it seems appealing to me. > > > Source/WebCore/page/SecurityOriginHash.h:41 > > + return computeHash(origin->protocol(), origin->host(), origin->port().value_or(0)); > > I don’t think we need this value_or(0) here. > > Also seems like we’d like to have: > > add(Hasher&, const SecurityOrigin&); > > Then here just call computeHash(*origin); > > > Source/WebCore/platform/ScriptExecutionContextIdentifier.h:79 > > +inline void add(Hasher& hasher, const ProcessQualified<UUID>& uuid) > > +{ > > + add(hasher, uuid.object()); > > +} > > Should this instead be a function template that works on all > ProcessQualified<>?
There is already one generic one for ProcessQualified<> and hashes both the object and the processIdentifier. However, the ProcessQualified<UUID> specialization really only needs to hash the object which it is a UUID. It maintains pre-exiting behavior and performance.
Darin Adler
Comment 8
2022-02-22 18:28:47 PST
(In reply to Chris Dumez from
comment #7
)
> However, the ProcessQualified<UUID> > specialization really only needs to hash the object which it is a UUID. It > maintains pre-exiting behavior and performance.
Definitely needs a comment, something like this: // Since UUIDs are unique on their own, optimize by not hashing the process identifier.
Chris Dumez
Comment 9
2022-02-22 18:41:55 PST
(In reply to Darin Adler from
comment #4
)
> Comment on
attachment 452895
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=452895&action=review
> > Do we need to measure the performance effect of all these hashing changes?
You are right that there is a slight risk. I'll monitor the perf bots after I land to make sure there is no obvious regression.
Chris Dumez
Comment 10
2022-02-22 18:42:45 PST
Created
attachment 452926
[details]
Patch
Chris Dumez
Comment 11
2022-02-22 18:47:51 PST
Created
attachment 452927
[details]
Patch
Darin Adler
Comment 12
2022-02-22 19:09:49 PST
Comment on
attachment 452927
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=452927&action=review
Looks great
> Source/WTF/wtf/Hasher.h:79 > +inline void add(Hasher& hasher, const UInt128& value)
I suggest UInt128 instead of const UInt128&.
> Source/WebCore/Modules/indexeddb/shared/IDBResourceIdentifier.h:89 > + add(hasher, identifier.connectionIdentifier(), identifier.m_resourceNumber);
Just realized this is a friend, so maybe we should use both data members instead of using a getter for one and the data member for the other.
Chris Dumez
Comment 13
2022-02-22 19:26:10 PST
Created
attachment 452929
[details]
Patch
EWS
Comment 14
2022-02-22 22:13:41 PST
Committed
r290349
(
247667@main
): <
https://commits.webkit.org/247667@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 452929
[details]
.
Radar WebKit Bug Importer
Comment 15
2022-02-22 22:14:22 PST
<
rdar://problem/89337284
>
Sam Weinig
Comment 16
2022-02-23 10:27:56 PST
> > > Source/WebCore/page/SecurityOriginHash.h:41 > > > + return computeHash(origin->protocol(), origin->host(), origin->port().value_or(0)); > > > > This will compute a different hash value than the old one, and will be more > > expensive due to no using cached hash values in the strings. This is likely > > a better hash overall since you get more entropy per-byte, but it might be a > > slow down due to rehashing each string. > > Well, technically, those are AtomStrings so we are not actually going to > re-hash strings per se. We'll just hash pointers. > Therefore, I think it isn't as expensive as it looks. You are right though > that it is more expensive than using the cached hash. > The alternative though would be to use the cached hash and give that to > Hasher. Sadly, this would hash a hash, which is not optimal. > > Would we prefer that?
I don't think those are AtomStrings, though perhaps I am missing something. SecurityOrigin::protocol() returns a const String&. I actually think this is likely better better, but that you should just be on the lookout for a possible perf regression.
Chris Dumez
Comment 17
2022-02-23 10:30:01 PST
(In reply to Sam Weinig from
comment #16
)
> > > > Source/WebCore/page/SecurityOriginHash.h:41 > > > > + return computeHash(origin->protocol(), origin->host(), origin->port().value_or(0)); > > > > > > This will compute a different hash value than the old one, and will be more > > > expensive due to no using cached hash values in the strings. This is likely > > > a better hash overall since you get more entropy per-byte, but it might be a > > > slow down due to rehashing each string. > > > > Well, technically, those are AtomStrings so we are not actually going to > > re-hash strings per se. We'll just hash pointers. > > Therefore, I think it isn't as expensive as it looks. You are right though > > that it is more expensive than using the cached hash. > > The alternative though would be to use the cached hash and give that to > > Hasher. Sadly, this would hash a hash, which is not optimal. > > > > Would we prefer that? > > I don't think those are AtomStrings, though perhaps I am missing something. > SecurityOrigin::protocol() returns a const String&.
Right, those are not. My comment was meant for QualifiedNameComponents / QualifiedName.
> I actually think this is likely better better, but that you should just be > on the lookout for a possible perf regression.
Agreed. Will check the bots today.
Chris Dumez
Comment 18
2022-02-23 12:00:40 PST
(In reply to Chris Dumez from
comment #17
)
> (In reply to Sam Weinig from
comment #16
) > > > > > Source/WebCore/page/SecurityOriginHash.h:41 > > > > > + return computeHash(origin->protocol(), origin->host(), origin->port().value_or(0)); > > > > > > > > This will compute a different hash value than the old one, and will be more > > > > expensive due to no using cached hash values in the strings. This is likely > > > > a better hash overall since you get more entropy per-byte, but it might be a > > > > slow down due to rehashing each string. > > > > > > Well, technically, those are AtomStrings so we are not actually going to > > > re-hash strings per se. We'll just hash pointers. > > > Therefore, I think it isn't as expensive as it looks. You are right though > > > that it is more expensive than using the cached hash. > > > The alternative though would be to use the cached hash and give that to > > > Hasher. Sadly, this would hash a hash, which is not optimal. > > > > > > Would we prefer that? > > > > I don't think those are AtomStrings, though perhaps I am missing something. > > SecurityOrigin::protocol() returns a const String&. > > Right, those are not. My comment was meant for QualifiedNameComponents / > QualifiedName. > > > I actually think this is likely better better, but that you should just be > > on the lookout for a possible perf regression. > > Agreed. Will check the bots today.
I've just checked the perf bots and it looks to be perf-neutral.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug