Bug 52934

Summary: Use WTF::StringHasher in WebCore
Product: WebKit Reporter: Patrick R. Gansterer <paroga>
Component: New BugsAssignee: Patrick R. Gansterer <paroga>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, darin, eric
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 49894    
Attachments:
Description Flags
Patch
none
Patch
darin: review-, darin: commit-queue-
Patch
none
IntegerArray fix darin: review-, darin: commit-queue-

Patrick R. Gansterer
Reported 2011-01-21 15:34:00 PST
Use WTF::StringHasher in WebCore
Attachments
Patch (5.61 KB, patch)
2011-01-21 15:51 PST, Patrick R. Gansterer
no flags
Patch (6.40 KB, patch)
2011-01-21 21:33 PST, Patrick R. Gansterer
darin: review-
darin: commit-queue-
Patch (6.40 KB, patch)
2011-01-23 12:46 PST, Patrick R. Gansterer
no flags
IntegerArray fix (1.17 KB, patch)
2011-03-13 03:52 PDT, Patrick R. Gansterer
darin: review-
darin: commit-queue-
Patrick R. Gansterer
Comment 1 2011-01-21 15:51:07 PST
Patrick R. Gansterer
Comment 2 2011-01-21 21:33:04 PST
Darin Adler
Comment 3 2011-01-23 10:26:19 PST
Comment on attachment 79823 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review In WebKit we normally use the word “create” to refer to allocating a new object. Thus, I don’t think that create is a good verb to use in hash functions. I prefer compute. I don’t think that “blob” is a good name for a range of memory. I know we are using it to mean something like that in the web API. As far as I am concerned these functions should all be named either computeHash or just hash, and should not have other words in them. I think it is not good to have an object called StringHasher and then use it to hash things that are not strings. The mechanics of this patch and the others are fine, but the aesthetics, naming, and semantics all seems subtly wrong. > Source/JavaScriptCore/wtf/StringHasher.h:144 > + static inline unsigned createBlobHash(const void* data, unsigned size) What does the word blob mean in this function name?
Patrick R. Gansterer
Comment 4 2011-01-23 10:55:31 PST
(In reply to comment #3) > (From update of attachment 79823 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review > > In WebKit we normally use the word “create” to refer to allocating a new object. Thus, I don’t think that create is a good verb to use in hash functions. I prefer compute. You're right! All the other methods in the file have the same "problem" and there is already a createBlobHash with different parameters: http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/wtf/StringHasher.h?rev=74855#L138. Can we land this patch so all calls to StringHasher are done vie create*Hash? If ok I'll post a follow up patch with a rename of the file (to e.g. Hasher.h, HashComputer.h or HashCalculator.h???) and all methods to computeHash when this is landed.
Darin Adler
Comment 5 2011-01-23 11:17:56 PST
Comment on attachment 79823 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review review- because of the WTF prefix issue. Other than this and the function naming that we decided to handle outside the scope of this patch, this seems fine. > Source/JavaScriptCore/wtf/StringHasher.h:146 > + ASSERT(!(size % 4)); Where does the number 4 come from here? The hasher can handle any object with a size that’s multiple of 2. It is true that 4 is the size of int and the size of pointers on 32-bit systems, but it seems a bit arbitrary to specifically support only ranges of memory that are multiples of 4. > Source/WebCore/loader/appcache/ApplicationCacheStorage.cpp:92 > - return AlreadyHashed::avoidDeletedValue(StringImpl::computeHash(url.string().characters() + hostStart, hostEnd - hostStart)); > + return AlreadyHashed::avoidDeletedValue(WTF::StringHasher::createHash(url.string().characters() + hostStart, hostEnd - hostStart)); There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix.
Patrick R. Gansterer
Comment 6 2011-01-23 12:46:16 PST
Created attachment 79873 [details] Patch (In reply to comment #5) > (From update of attachment 79823 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=79823&action=review > > review- because of the WTF prefix issue. Other than this and the function naming that we decided to handle outside the scope of this patch, this seems fine. > > > Source/JavaScriptCore/wtf/StringHasher.h:146 > > + ASSERT(!(size % 4)); > > Where does the number 4 come from here? The hasher can handle any object with a size that’s multiple of 2. It is true that 4 is the size of int and the size of pointers on 32-bit systems, but it seems a bit arbitrary to specifically support only ranges of memory that are multiples of 4. Fixed. > > Source/WebCore/loader/appcache/ApplicationCacheStorage.cpp:92 > > - return AlreadyHashed::avoidDeletedValue(StringImpl::computeHash(url.string().characters() + hostStart, hostEnd - hostStart)); > > + return AlreadyHashed::avoidDeletedValue(WTF::StringHasher::createHash(url.string().characters() + hostStart, hostEnd - hostStart)); > > There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix. Is ok to adress this too in the follow up patch? StringHaser.h has no using WTF:StringHasher, and we use it already with the WTF prefix everywhere. see http://trac.webkit.org/changeset/70705
Darin Adler
Comment 7 2011-01-23 16:22:37 PST
(In reply to comment #6) > > There should be no need to use the namespace prefix WTF. The WTF header files make use of using declarations, so files that use things defined in the WTF namespace should almost never need to explicitly include the WTF:: prefix. > Is ok to adress this too in the follow up patch? Yes.
Darin Adler
Comment 8 2011-01-23 16:23:15 PST
Comment on attachment 79873 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79873&action=review > Source/JavaScriptCore/wtf/StringHasher.h:146 > + ASSERT(!(size % 2)); This is OK, but we could also use sizeof(UChar) here instead of the constant 2.
WebKit Commit Bot
Comment 9 2011-01-23 17:00:22 PST
The commit-queue encountered the following flaky tests while processing attachment 79873 [details]: http/tests/xmlhttprequest/failed-auth.html bug 51835 (author: ap@webkit.org) The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 10 2011-01-23 17:01:05 PST
Comment on attachment 79873 [details] Patch Clearing flags on attachment: 79873 Committed r76474: <http://trac.webkit.org/changeset/76474>
WebKit Commit Bot
Comment 11 2011-01-23 17:01:10 PST
All reviewed patches have been landed. Closing bug.
Patrick R. Gansterer
Comment 12 2011-01-23 23:50:05 PST
(In reply to comment #8) > (From update of attachment 79873 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=79873&action=review > > > Source/JavaScriptCore/wtf/StringHasher.h:146 > > + ASSERT(!(size % 2)); > > This is OK, but we could also use sizeof(UChar) here instead of the constant 2. I'll address this too in the follow up patch. @darin: Can you please give me your opinion for the names (so I don't need to change it more than once ;-): What is the preferred name for the file+classname? I would suggest Hasher, HashCalculator and my favorite HashComputer. But the resulting HashComputer::computeHash() would look kind of strange IMHO. createHash should be renamed to comuteHash (that's easy :-) If we rename createBlobHash to computeHash too, this will result in ugly overloads. I prefer computeBinaryDataHash, since you don't like "blob". Should I move the static computeHash() directly into WTF namespace and add using WTF::computeHash; ? This would solve the HashComputer::computeHash "problem".
Darin Adler
Comment 13 2011-01-24 14:25:00 PST
(In reply to comment #12) > Can you please give me your opinion for the names (so I don't need to change it more than once ;-): > What is the preferred name for the file+classname? I would suggest Hasher, HashCalculator and my favorite HashComputer. But the resulting HashComputer::computeHash() would look kind of strange IMHO. There are two ways to think of this: 1) General purpose hasher for arbitrary data that we also use for strings. 2) String hasher that we repurpose for arbitrary data. If we think it’s (2), then I think StringHasher is a fine name. If we think it’s (1) then Hasher is a better name. > createHash should be renamed to computeHash (that's easy :-) Yes. The other candidate would just be hash, but as long as we have a function named hash that gives the current hash of an instance, it’s bad style to use the same name using the verb form of “hash”. > If we rename createBlobHash to computeHash too, this will result in ugly overloads. The strange overloading here is the one where we hash single-byte characters and expand them to UChars treating them as Latin-1. This is a useful operation, but I don’t think it should be hidden in an overload. This is the function of the “defaultCoverter“ (sic) for char, and it’s weak that this unusual behavior occurs automatically. It would do the wrong thing if the string had UTF-8 sequences in it, computing an incorrect hash. I also think that overloading is a bit too subtle for the "null-terminated string" vs. "arbitrary array of character" versions. That seems the kind of thing I’d want to express in function names. I think that overloading based on the type of data we want to hash is not ugly at all. It’s what overloading is best at! But you are probably thinking of is the ambiguity between a size that is a number of bytes and a size that is a number of UChars. That is indeed a problem. We can’t use overloading if that will be the difference. > I prefer computeBinaryDataHash, since you don't like "blob". We definitely shouldn’t include the word “binary” since all our data is binary. The standard library calls this kind of thing “memory”. As in “memcpy” vs. “strcpy” and the C++ header <memory>. So maybe we can use that term. Or “raw data” perhaps. I think one good thing here is to acknowledge that we are making a hash by treating the data as characters. A name that acknowledged that would also make clear why the length needs to be a multiple of sizeof(UChar). > Should I move the static computeHash() directly into WTF namespace and add using WTF::computeHash; ? This would solve the HashComputer::computeHash "problem". I think it’s a close call. Normally I like free functions much better than static members. But in the case of StringHasher, I think that: StringHasher::hashMemory makes it clear that we are reusing the string hash to hash memory, which is something I suggested we do above. Sorry I couldn’t give you straighter answers. Those are my thoughts.
Darin Adler
Comment 14 2011-03-12 19:27:43 PST
The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing!
Patrick R. Gansterer
Comment 15 2011-03-13 03:52:16 PDT
Created attachment 85616 [details] IntegerArray fix (In reply to comment #14) > The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing! The IntegerArray::size returns the count of integers and not the memory size in bytes, so the hashing was wrong all the time. BTW: Can you review the follow up patch at bug 53532?
Darin Adler
Comment 16 2011-03-13 12:10:31 PDT
(In reply to comment #15) > > The IntegerArrayHash::hash change here was wrong; passing the wrong length. Not sure how I missed it when reviewing! > The IntegerArray::size returns the count of integers and not the memory size in bytes, so the hashing was wrong all the time. No, that’s incorrect. The old code was right. This was broken during the conversion process by removing the old code without understanding it. The old code did math involving both sizeof(int) and sizeof(UChar) to convert from count of integers to count of UChar.
Darin Adler
Comment 17 2011-03-13 12:10:57 PDT
Comment on attachment 85616 [details] IntegerArray fix Patch is fine, but I already dealt with this yesterday.
Note You need to log in before you can comment on or make changes to this bug.