Bug 52934 - Use WTF::StringHasher in WebCore
Summary: Use WTF::StringHasher in WebCore
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Patrick R. Gansterer
URL:
Keywords:
Depends on:
Blocks: 49894
  Show dependency treegraph
 
Reported: 2011-01-21 15:34 PST by Patrick R. Gansterer
Modified: 2011-03-13 12:10 PDT (History)
3 users (show)

See Also:


Attachments
Patch (5.61 KB, patch)
2011-01-21 15:51 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (6.40 KB, patch)
2011-01-21 21:33 PST, Patrick R. Gansterer
darin: review-
darin: commit-queue-
Details | Formatted Diff | Diff
Patch (6.40 KB, patch)
2011-01-23 12:46 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
IntegerArray fix (1.17 KB, patch)
2011-03-13 03:52 PDT, Patrick R. Gansterer
darin: review-
darin: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick R. Gansterer 2011-01-21 15:34:00 PST
Use WTF::StringHasher in WebCore
Comment 1 Patrick R. Gansterer 2011-01-21 15:51:07 PST
Created attachment 79806 [details]
Patch
Comment 2 Patrick R. Gansterer 2011-01-21 21:33:04 PST
Created attachment 79823 [details]
Patch
Comment 3 Darin Adler 2011-01-23 10:26:19 PST
Comment on attachment 79823 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review

In WebKit we normally use the word “create” to refer to allocating a new object. Thus, I don’t think that create is a good verb to use in hash functions. I prefer compute.

I don’t think that “blob” is a good name for a range of memory. I know we are using it to mean something like that in the web API.

As far as I am concerned these functions should all be named either computeHash or just hash, and should not have other words in them.

I think it is not good to have an object called StringHasher and then use it to hash things that are not strings.

The mechanics of this patch and the others are fine, but the aesthetics, naming, and semantics all seems subtly wrong.

> Source/JavaScriptCore/wtf/StringHasher.h:144
> +    static inline unsigned createBlobHash(const void* data, unsigned size)

What does the word blob mean in this function name?
Comment 4 Patrick R. Gansterer 2011-01-23 10:55:31 PST
(In reply to comment #3)
> (From update of attachment 79823 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review
> 
> In WebKit we normally use the word “create” to refer to allocating a new object. Thus, I don’t think that create is a good verb to use in hash functions. I prefer compute.
You're right! All the other methods in the file have the same "problem" and there is already a createBlobHash with different parameters: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/wtf/StringHasher.h?rev=74855#L138.

Can we land this patch so all calls to StringHasher are done vie create*Hash?
If ok I'll post a follow up patch with a rename of the file (to e.g. Hasher.h, HashComputer.h or HashCalculator.h???) and all methods to computeHash when this is landed.
Comment 5 Darin Adler 2011-01-23 11:17:56 PST
Comment on attachment 79823 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review

review- because of the WTF prefix issue. Other than this and the function naming that we decided to handle outside the scope of this patch, this seems fine.

> Source/JavaScriptCore/wtf/StringHasher.h:146
> +        ASSERT(!(size % 4));

Where does the number 4 come from here? The hasher can handle any object with a size that’s multiple of 2. It is true that 4 is the size of int and the size of pointers on 32-bit systems, but it seems a bit arbitrary to specifically support only ranges of memory that are multiples of 4.

> Source/WebCore/loader/appcache/ApplicationCacheStorage.cpp:92
> -    return AlreadyHashed::avoidDeletedValue(StringImpl::computeHash(url.string().characters() + hostStart, hostEnd - hostStart));
> +    return AlreadyHashed::avoidDeletedValue(WTF::StringHasher::createHash(url.string().characters() + hostStart, hostEnd - hostStart));

There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix.
Comment 6 Patrick R. Gansterer 2011-01-23 12:46:16 PST
Created attachment 79873 [details]
Patch

(In reply to comment #5)
> (From update of attachment 79823 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review
> 
> review- because of the WTF prefix issue. Other than this and the function naming that we decided to handle outside the scope of this patch, this seems fine.
> 
> > Source/JavaScriptCore/wtf/StringHasher.h:146
> > +        ASSERT(!(size % 4));
> 
> Where does the number 4 come from here? The hasher can handle any object with a size that’s multiple of 2. It is true that 4 is the size of int and the size of pointers on 32-bit systems, but it seems a bit arbitrary to specifically support only ranges of memory that are multiples of 4.
Fixed.

> > Source/WebCore/loader/appcache/ApplicationCacheStorage.cpp:92
> > -    return AlreadyHashed::avoidDeletedValue(StringImpl::computeHash(url.string().characters() + hostStart, hostEnd - hostStart));
> > +    return AlreadyHashed::avoidDeletedValue(WTF::StringHasher::createHash(url.string().characters() + hostStart, hostEnd - hostStart));
> 
> There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix.
Is ok to adress this too in the follow up patch? StringHaser.h has no using WTF:StringHasher, and we use it already with the WTF prefix everywhere. see http://trac.webkit.org/changeset/70705
Comment 7 Darin Adler 2011-01-23 16:22:37 PST
(In reply to comment #6)
> > There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix.
> Is ok to adress this too in the follow up patch?

Yes.
Comment 8 Darin Adler 2011-01-23 16:23:15 PST
Comment on attachment 79873 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=79873&action=review

> Source/JavaScriptCore/wtf/StringHasher.h:146
> +        ASSERT(!(size % 2));

This is OK, but we could also use sizeof(UChar) here instead of the constant 2.
Comment 9 WebKit Commit Bot 2011-01-23 17:00:22 PST
The commit-queue encountered the following flaky tests while processing attachment 79873 [details]:

http/tests/xmlhttprequest/failed-auth.html bug 51835 (author: ap@webkit.org)
The commit-queue is continuing to process your patch.
Comment 10 WebKit Commit Bot 2011-01-23 17:01:05 PST
Comment on attachment 79873 [details]
Patch

Clearing flags on attachment: 79873

Committed r76474: <http://trac.webkit.org/changeset/76474>
Comment 11 WebKit Commit Bot 2011-01-23 17:01:10 PST
All reviewed patches have been landed.  Closing bug.
Comment 12 Patrick R. Gansterer 2011-01-23 23:50:05 PST
(In reply to comment #8)
> (From update of attachment 79873 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=79873&action=review
> 
> > Source/JavaScriptCore/wtf/StringHasher.h:146
> > +        ASSERT(!(size % 2));
> 
> This is OK, but we could also use sizeof(UChar) here instead of the constant 2.

I'll address this too in the follow up patch.

@darin:
Can you please give me your opinion for the names (so I don't need to change it more than once ;-):
What is the preferred name for the file+classname? I would suggest Hasher, HashCalculator and my favorite HashComputer. But the resulting HashComputer::computeHash() would look kind of strange IMHO.
createHash should be renamed to comuteHash (that's easy :-)
If we rename createBlobHash to computeHash too, this will result in ugly overloads. I prefer computeBinaryDataHash, since you don't like "blob".
Should I move the static computeHash() directly into WTF namespace and add using WTF::computeHash; ? This would solve the HashComputer::computeHash "problem".
Comment 13 Darin Adler 2011-01-24 14:25:00 PST
(In reply to comment #12)
> Can you please give me your opinion for the names (so I don't need to change it more than once ;-):
> What is the preferred name for the file+classname? I would suggest Hasher, HashCalculator and my favorite HashComputer. But the resulting HashComputer::computeHash() would look kind of strange IMHO.

There are two ways to think of this:

    1) General purpose hasher for arbitrary data that we also use for strings.

    2) String hasher that we repurpose for arbitrary data.

If we think it’s (2), then I think StringHasher is a fine name. If we think it’s (1) then Hasher is a better name.

> createHash should be renamed to computeHash (that's easy :-)

Yes. The other candidate would just be hash, but as long as we have a function named hash that gives the current hash of an instance, it’s bad style to use the same name using the verb form of “hash”.

> If we rename createBlobHash to computeHash too, this will result in ugly overloads.

The strange overloading here is the one where we hash single-byte characters and expand them to UChars treating them as Latin-1. This is a useful operation, but I don’t think it should be hidden in an overload. This is the function of the “defaultCoverter“ (sic) for char, and it’s weak that this unusual behavior occurs automatically. It would do the wrong thing if the string had UTF-8 sequences in it, computing an incorrect hash.

I also think that overloading is a bit too subtle for the "null-terminated string" vs. "arbitrary array of character" versions. That seems the kind of thing I’d want to express in function names.

I think that overloading based on the type of data we want to hash is not ugly at all. It’s what overloading is best at!

But you are probably thinking of is the ambiguity between a size that is a number of bytes and a size that is a number of UChars. That is indeed a problem. We can’t use overloading if that will be the difference.

> I prefer computeBinaryDataHash, since you don't like "blob".

We definitely shouldn’t include the word “binary” since all our data is binary.

The standard library calls this kind of thing “memory”. As in “memcpy” vs. “strcpy” and the C++ header <memory>. So maybe we can use that term. Or “raw data” perhaps.

I think one good thing here is to acknowledge that we are making a hash by treating the data as characters. A name that acknowledged that would also make clear why the length needs to be a multiple of sizeof(UChar).

> Should I move the static computeHash() directly into WTF namespace and add using WTF::computeHash; ? This would solve the HashComputer::computeHash "problem".

I think it’s a close call. Normally I like free functions much better than static members. But in the case of StringHasher, I think that:

    StringHasher::hashMemory

makes it clear that we are reusing the string hash to hash memory, which is something I suggested we do above.

Sorry I couldn’t give you straighter answers. Those are my thoughts.
Comment 14 Darin Adler 2011-03-12 19:27:43 PST
The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing!
Comment 15 Patrick R. Gansterer 2011-03-13 03:52:16 PDT
Created attachment 85616 [details]
IntegerArray fix

(In reply to comment #14)
> The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing!
The IntegerArray::size returns the count of integers and not the memory size in bytes, so the hashing was wrong all the time.

BTW: Can you review the follow up patch at bug 53532?
Comment 16 Darin Adler 2011-03-13 12:10:31 PDT
(In reply to comment #15)
> > The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing!
> The IntegerArray::size returns the count of integers and not the memory size in bytes, so the hashing was wrong all the time.

No, that’s incorrect. The old code was right. This was broken during the conversion process by removing the old code without understanding it.

The old code did math involving both sizeof(int) and sizeof(UChar) to convert from count of integers to count of UChar.
Comment 17 Darin Adler 2011-03-13 12:10:57 PDT
Comment on attachment 85616 [details]
IntegerArray fix

Patch is fine, but I already dealt with this yesterday.