Bug 186747 - Reduce HashTable capacity wastage
Summary: Reduce HashTable capacity wastage
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-06-16 17:33 PDT by Simon Fraser (smfr)
Modified: 2018-06-18 11:31 PDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Fraser (smfr) 2018-06-16 17:33:22 PDT
Using the patch in bug 186698 we know that there is significant wastage in HashTable capacity. Wastage on some pages:

nytimes.com:     7MB
cnn.com:        10MB
theverge.com:    4MB

HashMap<> doesn't have shrink() like Vector does. It looks like we could expose a rehash() which will halve the table size, but only if less than 50% of table capacity is used (?).

Maybe the policy should be that we don't store large objects in HashMaps, and store pointers instead?
Comment 1 Darin Adler 2018-06-16 23:30:36 PDT
I’m not 100% sure exactly what part of the memory we consider the "wasted" part, but our policy for hash tables is based on these constants:

    static const unsigned m_maxLoad = 2;
    static const unsigned m_minLoad = 6;

The units here are fraction of table full. So we expand a hash table as soon as 50% (1/2) of the buckets are full and we shrink a hash table if only 17% (1/6) of the buckets are full.

When we shrink, we always shrink to half the current size and when we expand, we typically expand to double the current size.

I agree that we can make improvements by changing these policies.
Comment 2 Darin Adler 2018-06-17 00:04:30 PDT
This policy means that growing hash tables vary between 25% and 50% full, getting expanded each time they get up to 50%. I can imagine an explicit shrink-to-fit operation that would take us to 50% full once we knew we were doing growing and I don’t see much downside in implementing it.
Comment 3 Darin Adler 2018-06-17 10:57:03 PDT
(In reply to Simon Fraser (smfr) from comment #0)
> Maybe the policy should be that we don't store large objects in HashMaps,
> and store pointers instead?

The tradeoff is likely the extra overhead for the separate memory blocks for each object vs. the extra overhead for unused hash table buckets. It would be good to have a principled way to do the math on that to figure out which costs more.
Comment 4 Radar WebKit Bug Importer 2018-06-18 11:31:18 PDT
<rdar://problem/41217978>