91891 – QualifiedName's HashSet should be big enough to hold at least all the static names

RESOLVED FIXED 91891

QualifiedName's HashSet should be big enough to hold at least all the static names

https://bugs.webkit.org/show_bug.cgi?id=91891

Summary QualifiedName's HashSet should be big enough to hold at least all the static ...

Benjamin Poulain

Reported 2012-07-20 12:28:45 PDT

QualifiedName's table has a standard size of 64 buckets. When initializing WebKit, we create 850 static QualifiedName for the standard names (HTMLNames, SVGNames etc). The small base size forces us to grow and rehash the table several time on startup.

Attachments
Patch (1.89 KB, patch) 2012-07-20 12:34 PDT, Benjamin Poulain	no flags	Details Formatted Diff Diff
Patch (6.06 KB, patch) 2012-07-20 17:07 PDT, Benjamin Poulain	no flags	Details Formatted Diff Diff
Patch (6.07 KB, patch) 2012-07-20 17:17 PDT, Benjamin Poulain	no flags	Details Formatted Diff Diff
[PROGRAM] Simple cpp program comparing the two approaches (3.38 KB, application/octet-stream) 2012-07-20 23:15 PDT, Joseph Pecoraro	no flags	Details
Patch (5.96 KB, patch) 2012-07-23 12:28 PDT, Benjamin Poulain	no flags	Details Formatted Diff Diff
Patch (13.98 KB, patch) 2012-07-23 14:58 PDT, Benjamin Poulain	no flags	Details Formatted Diff Diff
Patch (11.66 KB, patch) 2012-07-23 16:44 PDT, Benjamin Poulain	no flags	Details Formatted Diff Diff
Patch (14.69 KB, patch) 2012-07-23 17:55 PDT, Benjamin Poulain	darin: review+	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Benjamin Poulain

Comment 1 2012-07-20 12:34:41 PDT

Created attachment 153566 [details] Patch

Simon Fraser (smfr)

Comment 2 2012-07-20 12:37:48 PDT

Comment on attachment 153566 [details] Patch What is SVG is compiled out?

Benjamin Poulain

Comment 3 2012-07-20 17:07:08 PDT

Created attachment 153627 [details] Patch

Simon Fraser (smfr)

Comment 4 2012-07-20 17:09:41 PDT

Comment on attachment 153627 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=153627&action=review > Source/WTF/ChangeLog:9 > + This allow us to create HashTraints giving the minimumSize without hardcoding the values. HashTraints

Benjamin Poulain

Comment 5 2012-07-20 17:17:31 PDT

Created attachment 153628 [details] Patch

Joseph Pecoraro

Comment 6 2012-07-20 23:11:21 PDT

Comment on attachment 153628 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=153628&action=review > Source/WTF/ChangeLog:9 > + This allow us to create HashTraints giving the minimumSize without hardcoding the values. Nit (typo): "HashTraints" => "HashTraits" > Source/WTF/wtf/HashTable.h:458 > + // The following compute the upper power of two capacity to hold the size parameter. Nit (typo): "compute" => "computes" > Source/WTF/wtf/HashTable.h:468 > + template<unsigned size, unsigned shift> struct hashTableCapacityShifts; > + template<unsigned size> > + struct hashTableCapacityShifts<size, 1> { > + static const unsigned value = (size - 1) | (size - 1) >> 1; > + }; > + template<unsigned size, unsigned shift> > + struct hashTableCapacityShifts { > + static const unsigned value = hashTableCapacityShifts<size, (shift / 2)>::value | hashTableCapacityShifts<size, (shift / 2)>::value >> shift; > + }; So, I think I finally understand this. You're doing the round up to power of 2 algorithm on wikipedia, but without assignment you need to do some tricks? <http://en.wikipedia.org/wiki/Power_of_two#Algorithm_to_round_up_to_power_of_two> > n = n - 1; > n = n | (n >> 1); > n = n | (n >> 2); > n = n | (n >> 4); > n = n | (n >> 8); > n = n | (n >> 16); > ... > n = n | (n >> (bitspace / 2)); > n = n + 1; It looks like because you can't re-assign a value of n in each step that you're actually computing what that assigned value of n would be at each step. Causing double recursion, which the original algorithm does not have. The right side of your recursion is the same as the wikipedia algorithm. The left side of the recursion is this added complexity. This seems unnecessarily complex. You can get away with something much simpler by walk the high bit down to the end and OR it all up. With assignment you can calculate the result in log2(bitspace) assignments and ORs, but with the simple approach without assignment you can compute it in max(bitspace) ORs, and I think your approach is 2^log2(bitspace) ORs. I think this approach is significantly easier to follow this. It special cases an input size of 0, but with your compiler assert that value should never be input. So that is just for completeness: template<unsigned size> struct simpleHashTableCapacityForSizeUpperPowerOfTwoMinusOne; template<> struct simpleHashTableCapacityForSizeUpperPowerOfTwoMinusOne<0> { static const unsigned value = 0; }; template<unsigned size> struct simpleHashTableCapacityForSizeUpperPowerOfTwoMinusOne { static const unsigned value = size | simpleHashTableCapacityForSizeUpperPowerOfTwoMinusOne<(size >> 1)>::value; }; template<unsigned size> struct simpleHashTableCapacityForSize; template<> struct simpleHashTableCapacityForSize<0> { static const unsigned value = 0; }; template<unsigned size> struct simpleHashTableCapacityForSize { static const unsigned value = (simpleHashTableCapacityForSizeUpperPowerOfTwoMinusOne<size - 1>::value + 1) << 1; }; I'll attach a sample program you can test with. > Source/WTF/wtf/HashTable.h:472 > + static const unsigned value = hashTableCapacityShifts<size, 16>::value; If you do decide to go this route, instead of the alternative, this 16 should not be hardcoded. It would be "sizeof(unsigned) * 8 / 2". > Source/WebCore/dom/make_names.pl:603 > + print F "const size_t $parameters{namespace}TagsCount = ", scalar(keys %allTags), ";\n"; How about unsigned instead of size_t? Then you might not need to cast when you're adding up these values. It would also be ridiculous for this to be too big =) > Source/WebCore/dom/make_names.pl:608 > + print F "const size_t $parameters{namespace}AttrsCount = ", scalar(keys %allAttrs), ";\n"; Ditto.

Joseph Pecoraro

Comment 7 2012-07-20 23:15:24 PDT

Created attachment 153649 [details] [PROGRAM] Simple cpp program comparing the two approaches Here is a small C++ program that compares the two approaches. I prefer the simple approach. What do you think?

Benjamin Poulain

Comment 8 2012-07-23 12:28:19 PDT

Created attachment 153839 [details] Patch

Benjamin Poulain

Comment 9 2012-07-23 14:58:54 PDT

Created attachment 153870 [details] Patch

Darin Adler

Comment 10 2012-07-23 15:30:13 PDT

Comment on attachment 153870 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=153870&action=review r=me, but I think there is some room for improvement > Source/WTF/wtf/HashTable.h:478 > + // The following computes the upper power of two capacity to hold the size parameter. > + // This is done at compile time to initialize the HashTraits. > + template<unsigned size> struct hashTableCapacityForSizeUpperPowerOfTwoMinusOne; > + template<> > + struct hashTableCapacityForSizeUpperPowerOfTwoMinusOne<0> { > + static const unsigned value = 0; > + }; > + template<unsigned size> > + struct hashTableCapacityForSizeUpperPowerOfTwoMinusOne { > + static const unsigned value = size | hashTableCapacityForSizeUpperPowerOfTwoMinusOne<(size >> 1)>::value; > + }; > + > + template<unsigned size, bool isPowerOfTwo> struct hashTableCapacityForSizeDecider; > + template<unsigned size> > + struct hashTableCapacityForSizeDecider<size, true> { > + static const unsigned value = size << 2; > + }; > + template<unsigned size> > + struct hashTableCapacityForSizeDecider<size, false> { > + static const unsigned value = (hashTableCapacityForSizeUpperPowerOfTwoMinusOne<size - 1>::value + 1) << 1; > + }; There must be a less confusing way to code this. I have no idea what “upper power of two minus one” means, nor do I why powers of two need a special case. I think the code is doing some kind of smearing technique to construct a binary integer of all ones, but this is so non-obvious that it requires a comment. I suggest not repeating “ForSize” in all these struct names, and there seems no reason for hashTableCapacityForSizeUpperPowerOfTwoMinusOne to have hashTableCapacity in its name. It’s just an arithmetic function. > Source/WTF/wtf/HashTable.h:481 > + struct hashTableCapacityForSize { Name should be capitalized. All the other traits structs and classes are. Even though they are “function-like”. > Source/WTF/wtf/HashTable.h:485 > + COMPILE_ASSERT(value > (2 * size), HashTableCapacityHoldsContentSize); Shouldn’t this be >=? > Source/WebCore/dom/QualifiedName.cpp:35 > +#if ENABLE(MATHML) > +#include "MathMLNames.h" > +#endif > +#if ENABLE(SVG) > +#include "SVGNames.h" > +#endif Normally we put conditional includes in their own separate paragraphs. > Source/WebCore/dom/QualifiedName.cpp:47 > + static_cast<const unsigned>(HTMLNames::HTMLTagsCount + HTMLNames::HTMLAttrsCount I don’t think the const here is needed. > Tools/TestWebKitAPI/Tests/WTF/HashSet.cpp:54 > + // Adding one more item may change the capacity. > + testSet.add(initialCapacity); > + EXPECT_GE(static_cast<unsigned>(testSet.capacity()), initialCapacity); I think this would be a better test if it added more than just one more item. In fact, might be good to keep adding until the capacity goes up and make some assertion about when that happens.

Darin Adler

Comment 11 2012-07-23 15:44:02 PDT

This approach, setting minimum table size when creating the hash table, is going to cut down on code sharing and increase code bloat for these templates, because the entire template will be separate. I don’t think our tools are smart enough to coalesce the many, many identical functions in otherwise-identical HashSet and HashTable templates. An alternate approach might be to come up with some sort of “reserve initial capacity” function that we could call at runtime.

Benjamin Poulain

Comment 12 2012-07-23 15:53:51 PDT

(In reply to comment #11) > This approach, setting minimum table size when creating the hash table, is going to cut down on code sharing and increase code bloat for these templates, because the entire template will be separate. I don’t think our tools are smart enough to coalesce the many, many identical functions in otherwise-identical HashSet and HashTable templates. > > An alternate approach might be to come up with some sort of “reserve initial capacity” function that we could call at runtime. That is a good point. A “reserve initial capacity” would increase the size of every HashTable by 32bits, I am not sure that is a better trade off. I think sparingly using the KeyTraits should be ok. In this case, the usage is unique, there is no other HashSet instantiated for the same types.

Darin Adler

Comment 13 2012-07-23 16:41:42 PDT

(In reply to comment #12) > A “reserve initial capacity” would increase the size of every HashTable by 32bits Are you sure? I think there’s a way to do it without changing the size of HashTable. > In this case, the usage is unique, there is no other HashSet instantiated for the same types. Good point. No worries then.

Benjamin Poulain

Comment 14 2012-07-23 16:44:44 PDT

Created attachment 153897 [details] Patch

Benjamin Poulain

Comment 15 2012-07-23 16:48:53 PDT

This alternative greatly simplify the problem by changing HashTable::shouldExpand(). I also tried to make the template clearer. The OneifyLowBits is a bit poor. What is does is set all the bits to one after the most significant bit: 00110101010 -> 00111111111 I welcome any better name for this.

Darin Adler

Comment 16 2012-07-23 16:57:09 PDT

Comment on attachment 153897 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=153897&action=review Somehow this version of the patch is missing the HashSet.cpp unit test source file. > Source/WTF/ChangeLog:20 > + (WTF): You should remove junk lines like this from the ChangeLog. > Source/WebCore/ChangeLog:18 > + (WebCore): Another pure-junk line. > Source/WTF/wtf/HashTable.h:408 > - bool shouldExpand() const { return (m_keyCount + m_deletedCount) * m_maxLoad >= m_tableSize; } > + bool shouldExpand() const { return (m_keyCount + m_deletedCount) * m_maxLoad > m_tableSize; } Despite my suggestion below that you should change the COMPILE_ASSERT, I was not proposing we change the HashTable class’s own rule! I suggest backing this out to make the change lower risk. Unless there is some significant benefit of making this change that I didn’t notice. > Source/WTF/wtf/HashTable.h:472 > + template<unsigned number> > + struct UpperPowerOfTwoBound { > + static const unsigned value = (OneifyLowBits<number - 1>::value + 1) << 1; > + }; It’s nice to have this function name for documentation. But note that we don’t really need this template. We could just have this same code below in HashTableCapacityForSize. I think it’s clearer like this, not sure. > Source/WebCore/dom/QualifiedName.cpp:41 > #include "QualifiedName.h" > +#include "HTMLNames.h" > +#if ENABLE(MATHML) > +#include "MathMLNames.h" > +#endif > +#if ENABLE(SVG) > +#include "SVGNames.h" > +#endif > +#include "XLinkNames.h" > +#include "XMLNSNames.h" > +#include "XMLNames.h" > #include <wtf/Assertions.h> > #include <wtf/HashSet.h> > #include <wtf/StaticConstructors.h> My comment about paragraphs means that I’d put them like this: ... #include <wtf/StaticConstructors.h> #if ENABLE(MATHML) #include "MathMLNames.h" #endif #if ENABLE(SVG) #include "SVGNames.h" #endif > Source/WebCore/dom/QualifiedName.cpp:56 > + static_cast<const unsigned>(HTMLNames::HTMLTagsCount + HTMLNames::HTMLAttrsCount > +#if ENABLE(MATHML) > + + MathMLNames::MathMLTagsCount + MathMLNames::MathMLAttrsCount > +#endif > +#if ENABLE(SVG) > + + SVGNames::SVGTagsCount + SVGNames::SVGAttrsCount > +#endif > + + XLinkNames::XLinkAttrsCount > + + XMLNSNames::XMLNSAttrsCount > + + XMLNames::XMLAttrsCount)>::value; Might be cleaner to have the total number of qualified names be a separate named constant outside the struct. That would eliminate the need for static_cast and make this easier to read.

Benjamin Poulain

Comment 17 2012-07-23 17:11:19 PDT

Comment on attachment 153897 [details] Patch Oh, I am sorry. I just wanted your opinion on basing the change on shouldExpand(). I'll make a real patch addressing your other comments. > It’s nice to have this function name for documentation. But note that we don’t really need this template. We could just have this same code below in HashTableCapacityForSize. I think it’s clearer like this, not sure. It was purely for documentation.

Benjamin Poulain

Comment 18 2012-07-23 17:55:15 PDT

Created attachment 153922 [details] Patch

Darin Adler

Comment 19 2012-07-23 18:06:44 PDT

(In reply to comment #17) > Oh, I am sorry. I just wanted your opinion on basing the change on shouldExpand(). I think we should do/consider that longer term. Since it’s a higher risk change to do that, I like having this lower risk first change in the bag.

Benjamin Poulain

Comment 20 2012-07-25 00:24:03 PDT

Committed r123582: <http://trac.webkit.org/changeset/123582>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebCore Misc.

Assignee

Benjamin Poulain

Reported

2012-07-20 12:28 PDT

Modified

2012-07-25 00:24 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks