Bug 45970 - Add StringHasher class
Summary: Add StringHasher class
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Patrick R. Gansterer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-17 10:06 PDT by Patrick R. Gansterer
Modified: 2010-09-26 11:58 PDT (History)
8 users (show)

See Also:


Attachments
Patch (8.30 KB, patch)
2010-09-17 10:23 PDT, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (use one 31 bit) (8.39 KB, patch)
2010-09-17 10:55 PDT, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (use one 31 bit) (8.39 KB, patch)
2010-09-18 15:37 PDT, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Fix typo (2.01 KB, patch)
2010-09-25 01:31 PDT, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick R. Gansterer 2010-09-17 10:06:47 PDT
This is the the first patch required for adding AtomicString::fromUTF8.
(see https://bugs.webkit.org/show_bug.cgi?id=45594#c10)

required work after this patch (1-3 is cleanup):
1) Use StringHasher::createHash directly insted of WTF::stringHash.
2) Use StringHasher::createHash in StringHash.h and differenct WebCore files (e.g. QualifiedName).
3) Rename StringHashFunctions.h into StringHasher.h
4) Create a function for calculating the hash directly out of null terminated UTF-8 data
5) Use this UTF8->hash function in AtomicString::fromUTF8.
6) Use AtomicString::fromUTF8 in XMLParser.
Comment 1 Patrick R. Gansterer 2010-09-17 10:23:57 PDT
Created attachment 67916 [details]
Patch

> +        if (m_cachedCharacter != invalidCharacterValue) {
> +            addCharactersToHash(m_cachedCharacter, ch);
> +            m_cachedCharacter = invalidCharacterValue;
> +            return;
> +        }
Will be used later in the UTF8 to hash function to add each single character (avoids caching there).


> -    hash &= 0x7fffffff;
> -
> -    // this avoids ever returning a hash code of 0, since that is used to
> -    // signal "hash not computed yet", using a value that is likely to be
> -    // effectively the same as 0 when the low bits are masked
> -    if (hash == 0)
> -        hash = 0x40000000;
> +        if (!result)
> +            return 0x80000000;
This is the same algorithm as in StringHash.h.


> +    template<typename T, UChar Coverter(T)> static inline unsigned createHash(const T* data, unsigned length)
"UChar Coverter(T)" will be used later in CaseFoldingHash::hash to do a createHash<UChar, WTF::Unicode::foldCase>().
Comment 2 Darin Adler 2010-09-17 10:35:38 PDT
(In reply to comment #1)
> > -    hash &= 0x7fffffff;
> > -
> > -    // this avoids ever returning a hash code of 0, since that is used to
> > -    // signal "hash not computed yet", using a value that is likely to be
> > -    // effectively the same as 0 when the low bits are masked
> > -    if (hash == 0)
> > -        hash = 0x40000000;
> > +        if (!result)
> > +            return 0x80000000;
> This is the same algorithm as in StringHash.h.

Changing this will almost certainly break JavaScriptCore, which uses the high bit for something else. Please talk to Gavin Barraclough about this.
Comment 3 Patrick R. Gansterer 2010-09-17 10:55:49 PDT
Created attachment 67920 [details]
Patch (use one 31 bit)

(In reply to comment #2)
> Changing this will almost certainly break JavaScriptCore, which uses the high bit for something else. Please talk to Gavin Barraclough about this.
http://trac.webkit.org/changeset/53456: I don't know what "it causes much sadness in this world" means exactly ;-)
http://trac.webkit.org/changeset/52856: "change string hash result from 32-bit to 31-bit, to free a bit in UStringImpl for m_isIdentifier"

I changed it to use only 31 bits and added a comment.
Comment 4 Early Warning System Bot 2010-09-18 15:31:09 PDT
Attachment 67920 [details] did not build on qt:
Build output: http://queues.webkit.org/results/4054067
Comment 5 Patrick R. Gansterer 2010-09-18 15:37:03 PDT
Created attachment 68017 [details]
Patch (use one 31 bit)

Make Qt build happy.
Comment 6 Patrick R. Gansterer 2010-09-24 10:32:52 PDT
ping?
Comment 7 Gavin Barraclough 2010-09-24 12:05:03 PDT
Comment on attachment 68017 [details]
Patch (use one 31 bit)

Sorry for the delay - looks like a nice improvement!
We could probably switch back to a 32-bit hash now, if we wanted - I'd have to check, but I don't think we're stealing a bit for anything at the minute.  However the hash algorithm is used in a compile stage, adding constant hashes to autogenerated tables, so we'd have to change that too. r+.

G.
Comment 8 WebKit Commit Bot 2010-09-24 14:06:41 PDT
Comment on attachment 68017 [details]
Patch (use one 31 bit)

Clearing flags on attachment: 68017

Committed r68289: <http://trac.webkit.org/changeset/68289>
Comment 9 WebKit Commit Bot 2010-09-24 14:06:46 PDT
All reviewed patches have been landed.  Closing bug.
Comment 10 WebKit Review Bot 2010-09-24 15:37:13 PDT
http://trac.webkit.org/changeset/68289 might have broken GTK Linux 64-bit Debug
Comment 11 Darin Adler 2010-09-24 18:35:44 PDT
There’s a typo: "coverter" in this patch.
Comment 12 Patrick R. Gansterer 2010-09-25 01:31:04 PDT
Created attachment 68816 [details]
Fix typo
Comment 13 Patrick R. Gansterer 2010-09-25 01:34:38 PDT
Reopen for commit queue.
Comment 14 WebKit Commit Bot 2010-09-25 02:07:15 PDT
Comment on attachment 68816 [details]
Fix typo

Clearing flags on attachment: 68816

Committed r68330: <http://trac.webkit.org/changeset/68330>
Comment 15 WebKit Commit Bot 2010-09-25 02:07:21 PDT
All reviewed patches have been landed.  Closing bug.
Comment 16 WebKit Review Bot 2010-09-25 09:10:30 PDT
http://trac.webkit.org/changeset/68330 might have broken Qt Linux Release
The following changes are on the blame list:
http://trac.webkit.org/changeset/68329
http://trac.webkit.org/changeset/68330
http://trac.webkit.org/changeset/68331
Comment 17 Mihai Parparita 2010-09-25 13:51:02 PDT
This change causes the DRT to crash on fast/js/encode-URI-test.html (I can reproduce on Snow Leopard locally, but you can also see it on the Leopard bot (not sure why the SL one isn't crashing): http://build.webkit.org/results/Leopard%20Intel%20Debug%20(Tests)/r68336%20(20697)/results.html).

The culprit appears to be the "\uFFFE" string literal in the test, just entering that string into the JS console in a build with this change is enough to trigger the assert on line 50 of StringHashFunctions.h.
Comment 18 Patrick R. Gansterer 2010-09-25 19:02:44 PDT
(In reply to comment #17)
> This change causes the DRT to crash on fast/js/encode-URI-test.html (I can reproduce on Snow Leopard locally, but you can also see it on the Leopard bot (not sure why the SL one isn't crashing): http://build.webkit.org/results/Leopard%20Intel%20Debug%20(Tests)/r68336%20(20697)/results.html).
Sorry for that! The patch at bug 46553 (needs r+) should fix it.
Comment 19 Adam Barth 2010-09-26 11:54:12 PDT
Reopen for commit-queue to see the patch.
Comment 20 WebKit Commit Bot 2010-09-26 11:55:17 PDT
Comment on attachment 68816 [details]
Fix typo

Rejecting patch 68816 from commit-queue.

Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'apply-attachment', '--force-clean', '--non-interactive', '--quiet', 68816]" exit_code: 2
Cleaning working directory
Updating working directory
Logging in as commit-queue@webkit.org...
Fetching: https://bugs.webkit.org/attachment.cgi?id=68816&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=45970&ctype=xml
Processing 1 patch from 1 bug.
Processing patch 68816 from bug 45970.
Failed to run "[u'/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1

Full output: http://queues.webkit.org/results/4049167
Comment 21 Patrick R. Gansterer 2010-09-26 11:58:17 PDT
Comment on attachment 68816 [details]
Fix typo

Clearing flags on attachment: 68816

Already committed r68330: <http://trac.webkit.org/changeset/68330>