Bug 141356 - Use longer hashes for cache keys
Summary: Use longer hashes for cache keys
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-07 04:00 PST by Antti Koivisto
Modified: 2015-02-07 11:45 PST (History)
2 users (show)

See Also:


Attachments
patch (15.28 KB, patch)
2015-02-07 04:55 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff
patch2 (16.95 KB, patch)
2015-02-07 06:16 PST, Antti Koivisto
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Antti Koivisto 2015-02-07 04:00:58 PST
Current key hashes are 32bit. We should use longer hashes to reduce collisions.
Comment 1 Antti Koivisto 2015-02-07 04:55:59 PST
Created attachment 246208 [details]
patch
Comment 2 Antti Koivisto 2015-02-07 06:16:26 PST
Created attachment 246209 [details]
patch2
Comment 3 Darin Adler 2015-02-07 08:33:05 PST
Comment on attachment 246209 [details]
patch2

View in context: https://bugs.webkit.org/attachment.cgi?id=246209&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCacheCoders.cpp:179
> +    auto data64 = reinterpret_cast<const uint64_t*>(digest.data());

Do we really need to do this? It makes things endian-dependent for no good reason. Can’t we write these out as a stream of bytes instead of two 64-bit integers?

> Source/WebKit2/NetworkProcess/cache/NetworkCacheKey.cpp:77
> +    hashString(md5, m_method);
> +    hashString(md5, m_partition);
> +    hashString(md5, m_identifier);

It’s a little strange to hash the characters in these strings, but no string boundaries or lengths. I suppose it’s OK, though.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheKey.cpp:86
> +    auto* data64 = reinterpret_cast<const uint64_t*>(m_hash.data());
> +    return String::format("%016llx%016llx", data64[0], data64[1]);

I suggest treating this as a sequence of 16 bytes, not do this endian-specific thing:

    return String::format("%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x", data[0], ...)

Could use StringBuilder instead of String::format to avoid having to write it all out like this. It would be really easy to write a tiny function that adds a byte to a StringBuilder as hex.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheKey.cpp:93
> +    auto* data64 = reinterpret_cast<uint64_t*>(hash.data());

Again, I don’t like this endian-dependent strategy here any more than elsewhere. We should do it a byte at a time.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorageCocoa.mm:45
> +static const char* versionDirectoryPrefix = "Version ";

I think it’s slightly more efficient to use const char[] for these things instead of const char*. Also, if not switching to const char[] it should be const char* const since we want the pointers to be constants, not variables.

> Source/WebKit2/NetworkProcess/cache/NetworkCacheStorageCocoa.mm:122
> +    String versionSubdirectory = String(versionDirectoryPrefix) + String::number(NetworkCacheStorage::version);

No need to cast versionDirectoryPrefix to a String explicitly. Should just work without that.

I sure wish makeString or some other function could do this without allocating an intermediate string with String::number.
Comment 4 Antti Koivisto 2015-02-07 11:22:29 PST
https://trac.webkit.org/r179779
Comment 5 Antti Koivisto 2015-02-07 11:45:59 PST
and a followup https://trac.webkit.org/r179781