Bug 141356

Summary: Use longer hashes for cache keys
Product: WebKit Reporter: Antti Koivisto <koivisto>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: cdumez, kling
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
patch
none
patch2 darin: review+

Description Antti Koivisto 2015-02-07 04:00:58 PST
Current key hashes are 32bit. We should use longer hashes to reduce collisions.
Comment 1 Antti Koivisto 2015-02-07 04:55:59 PST
Created attachment 246208 [details]
patch
Comment 2 Antti Koivisto 2015-02-07 06:16:26 PST
Created attachment 246209 [details]
patch2
Comment 3 Darin Adler 2015-02-07 08:33:05 PST
Comment on attachment 246209 [details]
patch2

View in context: https://bugs.webkit.org/attachment.cgi?id=246209&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCacheCoders.cpp:179
> +    auto data64 = reinterpret_cast<const uint64_t*>(digest.data());

Do we really need to do this? It makes things endian-dependent for no good reason. Can’t we write these out as a stream of bytes instead of two 64-bit integers?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheKey.cpp:77
> +    hashString(md5, m_method);
> +    hashString(md5, m_partition);
> +    hashString(md5, m_identifier);

It’s a little strange to hash the characters in these strings, but no string boundaries or lengths. I suppose it’s OK, though.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheKey.cpp:86
> +    auto* data64 = reinterpret_cast<const uint64_t*>(m_hash.data());
> +    return String::format("%016llx%016llx", data64[0], data64[1]);

I suggest treating this as a sequence of 16 bytes, not do this endian-specific thing:

    return String::format("%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x", data[0], ...)

Could use StringBuilder instead of String::format to avoid having to write it all out like this. It would be really easy to write a tiny function that adds a byte to a StringBuilder as hex.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheKey.cpp:93
> +    auto* data64 = reinterpret_cast<uint64_t*>(hash.data());

Again, I don’t like this endian-dependent strategy here any more than elsewhere. We should do it a byte at a time.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorageCocoa.mm:45
> +static const char* versionDirectoryPrefix = "Version ";

I think it’s slightly more efficient to use const char[] for these things instead of const char*. Also, if not switching to const char[] it should be const char* const since we want the pointers to be constants, not variables.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorageCocoa.mm:122
> +    String versionSubdirectory = String(versionDirectoryPrefix) + String::number(NetworkCacheStorage::version);

No need to cast versionDirectoryPrefix to a String explicitly. Should just work without that.

I sure wish makeString or some other function could do this without allocating an intermediate string with String::number.
Comment 4 Antti Koivisto 2015-02-07 11:22:29 PST
https://trac.webkit.org/r179779
Comment 5 Antti Koivisto 2015-02-07 11:45:59 PST
and a followup https://trac.webkit.org/r179781