Using the patch in bug 186698 we know that there is significant wastage in HashTable capacity. Wastage on some pages: nytimes.com: 7MB cnn.com: 10MB theverge.com: 4MB HashMap<> doesn't have shrink() like Vector does. It looks like we could expose a rehash() which will halve the table size, but only if less than 50% of table capacity is used (?). Maybe the policy should be that we don't store large objects in HashMaps, and store pointers instead?
I’m not 100% sure exactly what part of the memory we consider the "wasted" part, but our policy for hash tables is based on these constants: static const unsigned m_maxLoad = 2; static const unsigned m_minLoad = 6; The units here are fraction of table full. So we expand a hash table as soon as 50% (1/2) of the buckets are full and we shrink a hash table if only 17% (1/6) of the buckets are full. When we shrink, we always shrink to half the current size and when we expand, we typically expand to double the current size. I agree that we can make improvements by changing these policies.
This policy means that growing hash tables vary between 25% and 50% full, getting expanded each time they get up to 50%. I can imagine an explicit shrink-to-fit operation that would take us to 50% full once we knew we were doing growing and I don’t see much downside in implementing it.
(In reply to Simon Fraser (smfr) from comment #0) > Maybe the policy should be that we don't store large objects in HashMaps, > and store pointers instead? The tradeoff is likely the extra overhead for the separate memory blocks for each object vs. the extra overhead for unused hash table buckets. It would be good to have a principled way to do the math on that to figure out which costs more.
<rdar://problem/41217978>