Use WTF::StringHasher in WebCore
Created attachment 79806 [details] Patch
Created attachment 79823 [details] Patch
Comment on attachment 79823 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review In WebKit we normally use the word “create” to refer to allocating a new object. Thus, I don’t think that create is a good verb to use in hash functions. I prefer compute. I don’t think that “blob” is a good name for a range of memory. I know we are using it to mean something like that in the web API. As far as I am concerned these functions should all be named either computeHash or just hash, and should not have other words in them. I think it is not good to have an object called StringHasher and then use it to hash things that are not strings. The mechanics of this patch and the others are fine, but the aesthetics, naming, and semantics all seems subtly wrong. > Source/JavaScriptCore/wtf/StringHasher.h:144 > + static inline unsigned createBlobHash(const void* data, unsigned size) What does the word blob mean in this function name?
(In reply to comment #3) > (From update of attachment 79823 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review > > In WebKit we normally use the word “create” to refer to allocating a new object. Thus, I don’t think that create is a good verb to use in hash functions. I prefer compute. You're right! All the other methods in the file have the same "problem" and there is already a createBlobHash with different parameters: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/wtf/StringHasher.h?rev=74855#L138. Can we land this patch so all calls to StringHasher are done vie create*Hash? If ok I'll post a follow up patch with a rename of the file (to e.g. Hasher.h, HashComputer.h or HashCalculator.h???) and all methods to computeHash when this is landed.
Comment on attachment 79823 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review review- because of the WTF prefix issue. Other than this and the function naming that we decided to handle outside the scope of this patch, this seems fine. > Source/JavaScriptCore/wtf/StringHasher.h:146 > + ASSERT(!(size % 4)); Where does the number 4 come from here? The hasher can handle any object with a size that’s multiple of 2. It is true that 4 is the size of int and the size of pointers on 32-bit systems, but it seems a bit arbitrary to specifically support only ranges of memory that are multiples of 4. > Source/WebCore/loader/appcache/ApplicationCacheStorage.cpp:92 > - return AlreadyHashed::avoidDeletedValue(StringImpl::computeHash(url.string().characters() + hostStart, hostEnd - hostStart)); > + return AlreadyHashed::avoidDeletedValue(WTF::StringHasher::createHash(url.string().characters() + hostStart, hostEnd - hostStart)); There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix.
Created attachment 79873 [details] Patch (In reply to comment #5) > (From update of attachment 79823 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review > > review- because of the WTF prefix issue. Other than this and the function naming that we decided to handle outside the scope of this patch, this seems fine. > > > Source/JavaScriptCore/wtf/StringHasher.h:146 > > + ASSERT(!(size % 4)); > > Where does the number 4 come from here? The hasher can handle any object with a size that’s multiple of 2. It is true that 4 is the size of int and the size of pointers on 32-bit systems, but it seems a bit arbitrary to specifically support only ranges of memory that are multiples of 4. Fixed. > > Source/WebCore/loader/appcache/ApplicationCacheStorage.cpp:92 > > - return AlreadyHashed::avoidDeletedValue(StringImpl::computeHash(url.string().characters() + hostStart, hostEnd - hostStart)); > > + return AlreadyHashed::avoidDeletedValue(WTF::StringHasher::createHash(url.string().characters() + hostStart, hostEnd - hostStart)); > > There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix. Is ok to adress this too in the follow up patch? StringHaser.h has no using WTF:StringHasher, and we use it already with the WTF prefix everywhere. see http://trac.webkit.org/changeset/70705
(In reply to comment #6) > > There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix. > Is ok to adress this too in the follow up patch? Yes.
Comment on attachment 79873 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79873&action=review > Source/JavaScriptCore/wtf/StringHasher.h:146 > + ASSERT(!(size % 2)); This is OK, but we could also use sizeof(UChar) here instead of the constant 2.
The commit-queue encountered the following flaky tests while processing attachment 79873 [details]: http/tests/xmlhttprequest/failed-auth.html bug 51835 (author: ap@webkit.org) The commit-queue is continuing to process your patch.
Comment on attachment 79873 [details] Patch Clearing flags on attachment: 79873 Committed r76474: <http://trac.webkit.org/changeset/76474>
All reviewed patches have been landed. Closing bug.
(In reply to comment #8) > (From update of attachment 79873 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=79873&action=review > > > Source/JavaScriptCore/wtf/StringHasher.h:146 > > + ASSERT(!(size % 2)); > > This is OK, but we could also use sizeof(UChar) here instead of the constant 2. I'll address this too in the follow up patch. @darin: Can you please give me your opinion for the names (so I don't need to change it more than once ;-): What is the preferred name for the file+classname? I would suggest Hasher, HashCalculator and my favorite HashComputer. But the resulting HashComputer::computeHash() would look kind of strange IMHO. createHash should be renamed to comuteHash (that's easy :-) If we rename createBlobHash to computeHash too, this will result in ugly overloads. I prefer computeBinaryDataHash, since you don't like "blob". Should I move the static computeHash() directly into WTF namespace and add using WTF::computeHash; ? This would solve the HashComputer::computeHash "problem".
(In reply to comment #12) > Can you please give me your opinion for the names (so I don't need to change it more than once ;-): > What is the preferred name for the file+classname? I would suggest Hasher, HashCalculator and my favorite HashComputer. But the resulting HashComputer::computeHash() would look kind of strange IMHO. There are two ways to think of this: 1) General purpose hasher for arbitrary data that we also use for strings. 2) String hasher that we repurpose for arbitrary data. If we think it’s (2), then I think StringHasher is a fine name. If we think it’s (1) then Hasher is a better name. > createHash should be renamed to computeHash (that's easy :-) Yes. The other candidate would just be hash, but as long as we have a function named hash that gives the current hash of an instance, it’s bad style to use the same name using the verb form of “hash”. > If we rename createBlobHash to computeHash too, this will result in ugly overloads. The strange overloading here is the one where we hash single-byte characters and expand them to UChars treating them as Latin-1. This is a useful operation, but I don’t think it should be hidden in an overload. This is the function of the “defaultCoverter“ (sic) for char, and it’s weak that this unusual behavior occurs automatically. It would do the wrong thing if the string had UTF-8 sequences in it, computing an incorrect hash. I also think that overloading is a bit too subtle for the "null-terminated string" vs. "arbitrary array of character" versions. That seems the kind of thing I’d want to express in function names. I think that overloading based on the type of data we want to hash is not ugly at all. It’s what overloading is best at! But you are probably thinking of is the ambiguity between a size that is a number of bytes and a size that is a number of UChars. That is indeed a problem. We can’t use overloading if that will be the difference. > I prefer computeBinaryDataHash, since you don't like "blob". We definitely shouldn’t include the word “binary” since all our data is binary. The standard library calls this kind of thing “memory”. As in “memcpy” vs. “strcpy” and the C++ header <memory>. So maybe we can use that term. Or “raw data” perhaps. I think one good thing here is to acknowledge that we are making a hash by treating the data as characters. A name that acknowledged that would also make clear why the length needs to be a multiple of sizeof(UChar). > Should I move the static computeHash() directly into WTF namespace and add using WTF::computeHash; ? This would solve the HashComputer::computeHash "problem". I think it’s a close call. Normally I like free functions much better than static members. But in the case of StringHasher, I think that: StringHasher::hashMemory makes it clear that we are reusing the string hash to hash memory, which is something I suggested we do above. Sorry I couldn’t give you straighter answers. Those are my thoughts.
The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing!
Created attachment 85616 [details] IntegerArray fix (In reply to comment #14) > The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing! The IntegerArray::size returns the count of integers and not the memory size in bytes, so the hashing was wrong all the time. BTW: Can you review the follow up patch at bug 53532?
(In reply to comment #15) > > The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing! > The IntegerArray::size returns the count of integers and not the memory size in bytes, so the hashing was wrong all the time. No, that’s incorrect. The old code was right. This was broken during the conversion process by removing the old code without understanding it. The old code did math involving both sizeof(int) and sizeof(UChar) to convert from count of integers to count of UChar.
Comment on attachment 85616 [details] IntegerArray fix Patch is fine, but I already dealt with this yesterday.