StringHasher has methods to calculate the hash of a null terminated character array which internally calculate the length by searching for the null value at the end. These methods are only used by CStringTranslator in AtomicString to calculate the hash of a new AtomicString made by a string literal. These methods are both unnecessary and a bit costly because CStringTranslator, which is used by Hash* classes in WTF, searches for the end of the character array more than once and then once again when creating the StringImpl that backs the AtomicString.
Created attachment 233248 [details] Patch
Blink review: <https://codereview.chromium.org/328453003> Blink commit: <https://src.chromium.org/viewvc/blink?revision=175826&view=revision>
(In reply to comment #2) > Blink review: <https://codereview.chromium.org/328453003> > Blink commit: <https://src.chromium.org/viewvc/blink?revision=175826&view=revision> Can you have a look please or suggest someone else to review?
Comment on attachment 233248 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=233248&action=review > Source/WTF/ChangeLog:6 > + Refactor to remove unnecessary code from the string hashing functions > + https://bugs.webkit.org/show_bug.cgi?id=133990 > + > + Previously, StringHasher contained methods for calculating the hash of a pure C We have an convention of referring to the original Blink commit when merging a Blink patch into WebKit by adding a remark to the ChangeLog entry/commit message of the form: Merged from Blink: https://src.chromium.org/viewvc/blink?revision=175826&view=revision You can see an example of ChangeLog entry/commit messages that reference this at <http://trac.webkit.org/changeset/166668>. > Source/WTF/ChangeLog:17 > + Reviewed by NOBODY (OOPS!). Please move the Reviewed by line to be on its own line under the WebKit bug URL. > Source/WTF/wtf/text/AtomicString.cpp:94 > - return addToStringTable<const LChar*, CStringTranslator>(c); > + return add(c, strlen(reinterpret_cast<const char*>(c))); This seems like a weird and indirect approach because AtomicString::add(const LChar*, unsigned) duplicates the precondition checks done in this function and otherwise calls through to AtomicString::addToStringTable<LCharBuffer, LCharBufferTranslator>(). We should take a similar approach as AtomicString::add(const UChar*) and stack allocate an LCharBuffer with the specified string and computed length and directly call AtomicString::addToStringTable<LCharBuffer, LCharBufferTranslator>(...), which may be inlined. > Tools/ChangeLog:6 > + Refactor to remove unnecessary code from the string hashing functions > + https://bugs.webkit.org/show_bug.cgi?id=133990 > + > + Previously, StringHasher contained methods for calculating the hash of a pure C Please also update this ChangeLog entry.
Comment on attachment 233248 [details] Patch As far as I can tell, this patch removes a “don’t iterate the string twice” optimization.
(In reply to comment #5) > (From update of attachment 233248 [details]) > As far as I can tell, this patch removes a “don’t iterate the string twice” optimization. I was thinking the same :) Any performance numbers Adam? How did you measure the improvement?
(In reply to comment #5) > (From update of attachment 233248 [details]) > As far as I can tell, this patch removes a “don’t iterate the string twice” optimization. Just the opposite unless I'm very mistaken. This patch removes additional iterations of the string in favor of one iteration up front.
(In reply to comment #6) > (In reply to comment #5) > > (From update of attachment 233248 [details] [details]) > > As far as I can tell, this patch removes a “don’t iterate the string twice” optimization. > > I was thinking the same :) > > Any performance numbers Adam? How did you measure the improvement? I did not measure any improvement. The reason for the patch is mostly a refactor to remove unnecessary code. I do believe it removes additional iterations of the string, but I haven't tried to measure it.
Comment on attachment 233248 [details] Patch Looking over the patch again, I agree with Darin Adler's and Benjamin Poulain's remarks in comment #5 and comment #6, respectively, that this patch removes an optimization to avoid an iteration over the specified LChar string to be added to the table. Specifically this patch increases the algorithmic complexity of add() by a factor of N (the length of the LChar string to add to the table) when the string already exists in the table. Without the proposed patch the cost to add a string S with length N that already exists in the table is O(2N) = "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)). With the proposed patch the cost becomes O(3N) = "cost to compute strlen(S)" (O(N)) + "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)).
(In reply to comment #9) > (From update of attachment 233248 [details]) > Looking over the patch again, I agree with Darin Adler's and Benjamin Poulain's remarks in comment #5 and comment #6, respectively, that this patch removes an optimization to avoid an iteration over the specified LChar string to be added to the table. Specifically this patch increases the algorithmic complexity of add() by a factor of N (the length of the LChar string to add to the table) when the string already exists in the table. > > Without the proposed patch the cost to add a string S with length N that already exists in the table is O(2N) = "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)). With the proposed patch the cost becomes O(3N) = "cost to compute strlen(S)" (O(N)) + "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)). It also reduces by a factor of N when the string is not in the table, but hash clashes with another string of different length, right?
(In reply to comment #10) > (In reply to comment #9) > > (From update of attachment 233248 [details] [details]) > > Looking over the patch again, I agree with Darin Adler's and Benjamin Poulain's remarks in comment #5 and comment #6, respectively, that this patch removes an optimization to avoid an iteration over the specified LChar string to be added to the table. Specifically this patch increases the algorithmic complexity of add() by a factor of N (the length of the LChar string to add to the table) when the string already exists in the table. > > > > Without the proposed patch the cost to add a string S with length N that already exists in the table is O(2N) = "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)). With the proposed patch the cost becomes O(3N) = "cost to compute strlen(S)" (O(N)) + "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)). > > It also reduces by a factor of N when the string is not in the table, but hash clashes with another string of different length, right? Can you elaborate? As far as I can tell, the proposed patch doesn't reduce the cost by a factor of N for such a situation.
(In reply to comment #11) > (In reply to comment #10) > > (In reply to comment #9) > > > (From update of attachment 233248 [details] [details] [details]) > > > Looking over the patch again, I agree with Darin Adler's and Benjamin Poulain's remarks in comment #5 and comment #6, respectively, that this patch removes an optimization to avoid an iteration over the specified LChar string to be added to the table. Specifically this patch increases the algorithmic complexity of add() by a factor of N (the length of the LChar string to add to the table) when the string already exists in the table. > > > > > > Without the proposed patch the cost to add a string S with length N that already exists in the table is O(2N) = "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)). With the proposed patch the cost becomes O(3N) = "cost to compute strlen(S)" (O(N)) + "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)). > > > > It also reduces by a factor of N when the string is not in the table, but hash clashes with another string of different length, right? > > Can you elaborate? > > As far as I can tell, the proposed patch doesn't reduce the cost by a factor of N for such a situation. In such a situation, the proposed patch will consist of: "cost to compute strlen(S)" (O(N) + "cost to hash S" (O(N)) + "cost to determine the two strings don't match which is early return 'S.length != S1.length" whereas without the patch it would be: "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)) + cost to create new StringImpl by calling strlen at the end correct?
(In reply to comment #12) > > > [...] > > > It also reduces by a factor of N when the string is not in the table, but hash clashes with another string of different length, right? > > > > Can you elaborate? > > > > As far as I can tell, the proposed patch doesn't reduce the cost by a factor of N for such a situation. > > In such a situation, the proposed patch will consist of: > > "cost to compute strlen(S)" (O(N) + "cost to hash S" (O(N)) + "cost to determine the two strings don't match which is early return 'S.length != S1.length" > > whereas without the patch it would be: > > "cost to hash S" (O(N)) + "cost to perform an equality operation to ensure that a match with the same hash is identical to S" (O(N)) + cost to create new StringImpl by calling strlen at the end > > correct? You're right for this case! As aforementioned, the proposed patch increases the algorithmic complexity of add() by a factor of N when the string already exists in the table. I'm unclear which case is more likely to happen in practice. Are you able to measure it?
Let S and S' be strings. We stay S == S' if the contents of the strings are identical. The following are different cases and their worst-case algorithmic complexities with and without the proposed patch: Case 1: Suppose S != S', strlen(S) == strlen(S'), hash(S) == hash(S') and S' is in the atomic string table. AtomicString::add(S): Without patch: O(4N) = "cost of hash(S)" (O(N)) + "cost to compare S and S'" (O(N)) + "cost to compute strlen(S)" (O(N)) + "cost to copy the buffer in StringImpl::createInternal()" (O(N)) With patch: O(4N) = "cost to compute strlen(S)" (O(N)) + "cost of hash(S)" (O(N)) + "cost to compare S and S'" (O(N)) + "cost to copy the buffer in StringImpl::createInternal()" (O(N)) Case 2: Suppose S != S', strlen(S) != strlen(S'), hash(S) == hash(S') and S' is in the atomic string table. AtomicString::add(S): Without patch: O(4N) = "cost of hash(S)" (O(N)) + "cost to compare S and S'" (O(N)) + "cost to compute strlen(S)" (O(N)) + "cost to copy the buffer in StringImpl::createInternal()" (O(N)) With patch: O(3N) = "cost to compute strlen(S)" (O(N)) + "cost of hash(S)" (O(N)) + "cost to compare S and S'" (O(1)) + "cost to copy the buffer in StringImpl::createInternal()" (O(N)) Case 3: Suppose S == S', strlen(S) == strlen(S'), hash(S) == hash(S') and S' is in the atomic string table. AtomicString::add(S): Without patch: O(2N) = "cost of hash(S)" (O(N)) + "cost to compare S and S'" (O(N)) With patch: O(3N) = "cost to compute strlen(S)" (O(N)) + "cost of hash(S)" (O(N)) + "cost to compare S and S'" (O(N)) Case 4: Suppose hash(S) != hash(S_i) for every string S_i in the atomic string table. AtomicString::add(S): Without patch: O(3N) = "cost of hash(S)" (O(N)) + "cost to compute strlen(S)" (O(N)) + "cost to copy the buffer in StringImpl::createInternal()" (O(N)) With patch: O(3N) = "cost to compute strlen(S)" (O(N)) + "cost of hash(S)" (O(N)) + "cost to copy the buffer in StringImpl::createInternal()" (O(N))