Summary: | IdentifierRepHash has a mismatch between hash() and equal() | ||
---|---|---|---|
Product: | WebKit | Reporter: | Filip Pizlo <fpizlo> |
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW --- | ||
Severity: | Normal | CC: | darin, msaboff, ysuzuki |
Priority: | P2 | ||
Version: | WebKit Nightly Build | ||
Hardware: | All | ||
OS: | All |
Description
Filip Pizlo
2015-10-14 13:16:16 PDT
I don't remember the exact bug number (maybe, in the context of Symbol and IdentifierRepHash), IIRC, when I attempt to change this to PtrHash, I heard the following reason why we did not use PtrHash; We intentionally use StringHasher instead of PtrHash because it gives nicer distribution for Strings and reduces collisions in the hash table. So when changing this, I think measuring collision and performance is nice. (In reply to comment #1) > I don't remember the exact bug number (maybe, in the context of Symbol and > IdentifierRepHash), IIRC, when I attempt to change this to PtrHash, I heard > the following reason why we did not use PtrHash; We intentionally use > StringHasher instead of PtrHash because it gives nicer distribution for > Strings and reduces collisions in the hash table. > So when changing this, I think measuring collision and performance is nice. Personally I guess PtrHash gives nice distribution. PtrHash maps intptr_t => unsigned. StringHasher maps UChar[arbitrary length] => unsigned. If PtrHash works well, it could provide nice distribution in this case I think. (In reply to comment #2) > (In reply to comment #1) > > I don't remember the exact bug number (maybe, in the context of Symbol and > > IdentifierRepHash), IIRC, when I attempt to change this to PtrHash, I heard > > the following reason why we did not use PtrHash; We intentionally use > > StringHasher instead of PtrHash because it gives nicer distribution for > > Strings and reduces collisions in the hash table. > > So when changing this, I think measuring collision and performance is nice. > > Personally I guess PtrHash gives nice distribution. PtrHash maps intptr_t => > unsigned. StringHasher maps UChar[arbitrary length] => unsigned. If PtrHash > works well, it could provide nice distribution in this case I think. Do you recommend measuring distribution directly, or do you think it's OK to just see how performance is affected overall? We may encounter a trade-off between a fast to compute hash function that gives worse distribution and a slow to compute hash function that gives better distribution. I don't have a philosophical preference between the two, but I just suspect that unless the distribution is really bad, the faster-to-compute hash function will give better overall performance. Maybe the best thing to do is just to look at end-to-end performance. Probably, it doesn't matter at all. In that case, PtrHash is nicer because it's the less surprising choice. As I recall, we were originally using PtrHash for this purpose (or a similar one in WebCore?) and Gavin changed to use the string hash instead to get a measurable performance boost. But I don’t think the change was to recompute a string hash each time; our strings memoize the result of the string hash function. (In reply to comment #3) > Do you recommend measuring distribution directly, or do you think it's OK to > just see how performance is affected overall? I was thinking about HashTable::dumpStats. DUMP_HASHTABLE_STATS can dump the statistics including collisions. > We may encounter a trade-off between a fast to compute hash function that > gives worse distribution and a slow to compute hash function that gives > better distribution. I don't have a philosophical preference between the > two, but I just suspect that unless the distribution is really bad, the > faster-to-compute hash function will give better overall performance. Maybe > the best thing to do is just to look at end-to-end performance. Probably, > it doesn't matter at all. In that case, PtrHash is nicer because it's the > less surprising choice. Agreed. If the change does not pose any performance regression & there are enough tests, I think it indirectly indicates the distribution of PtrHash is not so bad. |