Bug 186747
| Summary: | Reduce HashTable capacity wastage | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Simon Fraser (smfr) <simon.fraser> |
| Component: | WebKit Misc. | Assignee: | Nobody <webkit-unassigned> |
| Status: | NEW | ||
| Severity: | Normal | CC: | darin, sam, simon.fraser, webkit-bug-importer |
| Priority: | P2 | Keywords: | InRadar |
| Version: | WebKit Nightly Build | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
Simon Fraser (smfr)
Using the patch in bug 186698 we know that there is significant wastage in HashTable capacity. Wastage on some pages:
nytimes.com: 7MB
cnn.com: 10MB
theverge.com: 4MB
HashMap<> doesn't have shrink() like Vector does. It looks like we could expose a rehash() which will halve the table size, but only if less than 50% of table capacity is used (?).
Maybe the policy should be that we don't store large objects in HashMaps, and store pointers instead?
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Darin Adler
I’m not 100% sure exactly what part of the memory we consider the "wasted" part, but our policy for hash tables is based on these constants:
static const unsigned m_maxLoad = 2;
static const unsigned m_minLoad = 6;
The units here are fraction of table full. So we expand a hash table as soon as 50% (1/2) of the buckets are full and we shrink a hash table if only 17% (1/6) of the buckets are full.
When we shrink, we always shrink to half the current size and when we expand, we typically expand to double the current size.
I agree that we can make improvements by changing these policies.
Darin Adler
This policy means that growing hash tables vary between 25% and 50% full, getting expanded each time they get up to 50%. I can imagine an explicit shrink-to-fit operation that would take us to 50% full once we knew we were doing growing and I don’t see much downside in implementing it.
Darin Adler
(In reply to Simon Fraser (smfr) from comment #0)
> Maybe the policy should be that we don't store large objects in HashMaps,
> and store pointers instead?
The tradeoff is likely the extra overhead for the separate memory blocks for each object vs. the extra overhead for unused hash table buckets. It would be good to have a principled way to do the math on that to figure out which costs more.
Radar WebKit Bug Importer
<rdar://problem/41217978>