186747 – Reduce HashTable capacity wastage

NEW 186747

Reduce HashTable capacity wastage

https://bugs.webkit.org/show_bug.cgi?id=186747

Summary Reduce HashTable capacity wastage

Simon Fraser (smfr)

Reported 2018-06-16 17:33:22 PDT

Using the patch in bug 186698 we know that there is significant wastage in HashTable capacity. Wastage on some pages: nytimes.com: 7MB cnn.com: 10MB theverge.com: 4MB HashMap<> doesn't have shrink() like Vector does. It looks like we could expose a rehash() which will halve the table size, but only if less than 50% of table capacity is used (?). Maybe the policy should be that we don't store large objects in HashMaps, and store pointers instead?

Attachments
Add attachment proposed patch, testcase, etc.

Darin Adler

Comment 1 2018-06-16 23:30:36 PDT

I’m not 100% sure exactly what part of the memory we consider the "wasted" part, but our policy for hash tables is based on these constants: static const unsigned m_maxLoad = 2; static const unsigned m_minLoad = 6; The units here are fraction of table full. So we expand a hash table as soon as 50% (1/2) of the buckets are full and we shrink a hash table if only 17% (1/6) of the buckets are full. When we shrink, we always shrink to half the current size and when we expand, we typically expand to double the current size. I agree that we can make improvements by changing these policies.

Darin Adler

Comment 2 2018-06-17 00:04:30 PDT

This policy means that growing hash tables vary between 25% and 50% full, getting expanded each time they get up to 50%. I can imagine an explicit shrink-to-fit operation that would take us to 50% full once we knew we were doing growing and I don’t see much downside in implementing it.

Darin Adler

Comment 3 2018-06-17 10:57:03 PDT

(In reply to Simon Fraser (smfr) from comment #0) > Maybe the policy should be that we don't store large objects in HashMaps, > and store pointers instead? The tradeoff is likely the extra overhead for the separate memory blocks for each object vs. the extra overhead for unused hash table buckets. It would be good to have a principled way to do the math on that to figure out which costs more.

Radar WebKit Bug Importer

Comment 4 2018-06-18 11:31:18 PDT

<rdar://problem/41217978>

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebKit Misc.

Assignee

Nobody

Reported

2018-06-16 17:33 PDT

Modified

2018-06-18 11:31 PDT History

CC List

4 users Show

URL

Keywords InRadar

Depends on

Blocks