Bug 143158 - HashTable should have inline capacity
Summary: HashTable should have inline capacity
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-27 15:36 PDT by Alex Christensen
Modified: 2015-04-06 14:28 PDT (History)
8 users (show)

See Also:


Attachments
Patch (98.27 KB, patch)
2015-03-27 15:40 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (98.49 KB, patch)
2015-03-27 16:36 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (98.44 KB, patch)
2015-03-27 17:43 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews105 for mac-mavericks-wk2 (246.00 KB, application/zip)
2015-03-27 18:33 PDT, Build Bot
no flags Details
Patch (98.46 KB, patch)
2015-03-27 18:37 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (98.48 KB, patch)
2015-03-27 19:17 PDT, Alex Christensen
achristensen: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2015-03-27 15:36:07 PDT
Small HashSets have terrible performance compared to an array.  Why not use an array with small sets?  This is a step towards getting that working.
Comment 1 Alex Christensen 2015-03-27 15:40:48 PDT
Created attachment 249614 [details]
Patch
Comment 2 Alex Christensen 2015-03-27 16:36:06 PDT
Created attachment 249622 [details]
Patch
Comment 3 Alex Christensen 2015-03-27 17:43:00 PDT
Created attachment 249626 [details]
Patch
Comment 4 Build Bot 2015-03-27 18:32:59 PDT
Comment on attachment 249626 [details]
Patch

Attachment 249626 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/5911759877570560

Number of test failures exceeded the failure limit.
Comment 5 Build Bot 2015-03-27 18:33:02 PDT
Created attachment 249629 [details]
Archive of layout-test-results from ews105 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 6 Alex Christensen 2015-03-27 18:37:47 PDT
Created attachment 249630 [details]
Patch
Comment 7 Alex Christensen 2015-03-27 19:17:33 PDT
Created attachment 249635 [details]
Patch
Comment 8 Alex Christensen 2015-03-27 20:19:29 PDT
Feel free to review, but I'm going to gather some data about HashTable sizes before landing this to have some speculative evidence that this is a good direction to go.
Comment 9 Alex Christensen 2015-03-30 15:47:39 PDT
In the destructor of HashTable I printed out the largest m_keyCount a HashTable had seen, and I found this histogram data from opening Gmail and Facebook in MiniBrowser:

Max Size    Hash Table Count
       0             1187217
       1              113932
       2               64251
       3               36962
       4               23451
       5               14731
       6               12945
       7                8240
       8                6282
      >8               39911

Most HashTables never get an entry.  We already optimized for this case.
>30% of HashTables that get entries only get 1 entry before being destroyed.
>80% of HashTables that get entries never grow beyond a size of 8.

Based on this data, I think it is worth looking into optimizing small HashTables and this is a step in the right direction.
Comment 10 Geoffrey Garen 2015-03-30 15:57:59 PDT
(In reply to comment #9)
It seems like it would be especially valuable to optimize for the zero case. I don't think we quite do that right now, since an empty has table has a pretty large constant overhead. 

Perhaps a good first step is to make a table just a pointer, and use the null pointer to indicate an empty table. That would make the empty table 3X smaller.

Side note: if we want to specialize for, say, the <= 3 case, the best specialization is probably to use a vector rather than a hash table.
Comment 11 Alex Christensen 2015-03-30 16:46:41 PDT
Comment on attachment 249635 [details]
Patch

I'm going to look into making a HashTable just a pointer.  Inline capacity is too specific to my one use case, and it probably shouldn't go in WTF.