Bug 143158

Summary: HashTable should have inline capacity
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: Web Template FrameworkAssignee: Alex Christensen <achristensen>
Status: NEW ---    
Severity: Normal CC: barraclough, benjamin, buildbot, cmarcelo, commit-queue, ggaren, kling, rniwa
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Archive of layout-test-results from ews105 for mac-mavericks-wk2
none
Patch
none
Patch achristensen: review-

Description Alex Christensen 2015-03-27 15:36:07 PDT
Small HashSets have terrible performance compared to an array.  Why not use an array with small sets?  This is a step towards getting that working.
Comment 1 Alex Christensen 2015-03-27 15:40:48 PDT
Created attachment 249614 [details]
Patch
Comment 2 Alex Christensen 2015-03-27 16:36:06 PDT
Created attachment 249622 [details]
Patch
Comment 3 Alex Christensen 2015-03-27 17:43:00 PDT
Created attachment 249626 [details]
Patch
Comment 4 Build Bot 2015-03-27 18:32:59 PDT
Comment on attachment 249626 [details]
Patch

Attachment 249626 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/5911759877570560

Number of test failures exceeded the failure limit.
Comment 5 Build Bot 2015-03-27 18:33:02 PDT
Created attachment 249629 [details]
Archive of layout-test-results from ews105 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 6 Alex Christensen 2015-03-27 18:37:47 PDT
Created attachment 249630 [details]
Patch
Comment 7 Alex Christensen 2015-03-27 19:17:33 PDT
Created attachment 249635 [details]
Patch
Comment 8 Alex Christensen 2015-03-27 20:19:29 PDT
Feel free to review, but I'm going to gather some data about HashTable sizes before landing this to have some speculative evidence that this is a good direction to go.
Comment 9 Alex Christensen 2015-03-30 15:47:39 PDT
In the destructor of HashTable I printed out the largest m_keyCount a HashTable had seen, and I found this histogram data from opening Gmail and Facebook in MiniBrowser:

Max Size    Hash Table Count
       0             1187217
       1              113932
       2               64251
       3               36962
       4               23451
       5               14731
       6               12945
       7                8240
       8                6282
      >8               39911

Most HashTables never get an entry.  We already optimized for this case.
>30% of HashTables that get entries only get 1 entry before being destroyed.
>80% of HashTables that get entries never grow beyond a size of 8.

Based on this data, I think it is worth looking into optimizing small HashTables and this is a step in the right direction.
Comment 10 Geoffrey Garen 2015-03-30 15:57:59 PDT
(In reply to comment #9)
It seems like it would be especially valuable to optimize for the zero case. I don't think we quite do that right now, since an empty has table has a pretty large constant overhead. 

Perhaps a good first step is to make a table just a pointer, and use the null pointer to indicate an empty table. That would make the empty table 3X smaller.

Side note: if we want to specialize for, say, the <= 3 case, the best specialization is probably to use a vector rather than a hash table.
Comment 11 Alex Christensen 2015-03-30 16:46:41 PDT
Comment on attachment 249635 [details]
Patch

I'm going to look into making a HashTable just a pointer.  Inline capacity is too specific to my one use case, and it probably shouldn't go in WTF.