Bug 21818 - Should be able to use AtomicString as the key for a HashMap and HashSet
Summary: Should be able to use AtomicString as the key for a HashMap and HashSet
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Simon Fraser (smfr)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-22 17:20 PDT by Simon Fraser (smfr)
Modified: 2008-10-31 08:24 PDT (History)
2 users (show)

See Also:


Attachments
WIP patch (2.84 KB, patch)
2008-10-22 18:15 PDT, Simon Fraser (smfr)
no flags Details | Formatted Diff | Diff
Patch, changelog (4.21 KB, patch)
2008-10-23 11:20 PDT, Simon Fraser (smfr)
no flags Details | Formatted Diff | Diff
Patch, changelog (5.64 KB, patch)
2008-10-23 15:28 PDT, Simon Fraser (smfr)
darin: review+
Details | Formatted Diff | Diff
Patch adding assert back (9.94 KB, patch)
2008-10-23 21:40 PDT, Simon Fraser (smfr)
no flags Details | Formatted Diff | Diff
Final patch (10.65 KB, patch)
2008-10-24 09:58 PDT, Simon Fraser (smfr)
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Fraser (smfr) 2008-10-22 17:20:38 PDT
It's currently not possible to use an AtomicString as the key for a HashMap or HashSet; AtomicStringImpl is used instead (which is just a StringImpl).

This has the unfortunate side-effect that, even while an item in the hash is live, all AtomicStrings referencing the key can go out of scope, leaving the key dangling. The next matching AtomicString will pick up a new AtomicStringImpl, but this will no longer match the hash entry.

The solution is to allow AtomicString as the hash key. This would require implementing hash functions for AtomicString, which should be easy (they can just call through to the impl).
Comment 1 Simon Fraser (smfr) 2008-10-22 18:15:18 PDT
Created attachment 24584 [details]
WIP patch

Something along these lines?

constructDeletedValue() is wrong; AtomicString actually dereferences the impl, so we can't use HashTableDeletedValue as the impl. address.
Comment 2 Alexey Proskuryakov 2008-10-23 06:20:35 PDT
FWIW, we use RefPtr<AtomicString> when there is no external guarantee that the string will persist. I do not know if there is any reason for such a mismatch with String, which can be used directly.
Comment 3 Simon Fraser (smfr) 2008-10-23 11:20:01 PDT
Created attachment 24604 [details]
Patch, changelog

Final patch. This uses an AtomicString with a NULL impl as the constructDeletedValue, which seems to work OK.

I tested it with this code:

diff --git a/WebCore/dom/Document.cpp b/WebCore/dom/Document.cpp
index f43eda1..c1984e6 100644
--- a/WebCore/dom/Document.cpp
+++ b/WebCore/dom/Document.cpp
@@ -279,6 +279,53 @@ private:
     RefPtr<Document> m_document;
 };
 
+
+#include <wtf/HashMap.h>
+#include "AtomicString.h"
+#include "AtomicStringHash.h"
+
+void testAtomicStringHashMap()
+{
+    HashMap<AtomicString, bool> testMap;
+    
+    testMap.add("test1", true);
+    testMap.add("test2", false);
+    
+    assert(testMap.contains("test1") && testMap.get("test1"));
+    assert(testMap.contains("test2") && !testMap.get("test2"));
+    assert(!testMap.contains("test3"));
+
+    testMap.remove("test2");
+    assert(!testMap.contains("test2"));
+    
+    AtomicString test1("test1");
+    assert(testMap.contains(test1) && testMap.get("test1"));
+
+    testMap.remove("test1");
+    assert(!testMap.contains(test1));
+
+    testMap.clear();
+    
+    {
+        AtomicString one("foobar_one");
+        testMap.add(one, true);
+
+        AtomicString two("foobar_two");
+        testMap.add(two, true);
+    }
+    
+    {
+        AtomicString one("foobar_one");
+        AtomicString two("foobar_two");
+        AtomicString three("foobar_three");
+        assert(testMap.contains(one));
+        assert(testMap.contains(two));
+        assert(!testMap.contains(three));
+    }
+}
+
+
+
 Document::Document(Frame* frame, bool isXHTML)
     : ContainerNode(0)
     , m_domtree_version(0)
@@ -320,6 +367,9 @@ Document::Document(Frame* frame, bool isXHTML)
     , m_inLowBandwidthDisplay(false)
 #endif
 {
+
+testAtomicStringHashMap();
+
     m_document.resetSkippingRef(this);
 
     m_printing = false;
Comment 4 Simon Fraser (smfr) 2008-10-23 11:26:46 PDT
Comment on attachment 24604 [details]
Patch, changelog

Patch is bad; isDeletedValue() and nullAtom compare to be the same thing.
Comment 5 Simon Fraser (smfr) 2008-10-23 15:28:25 PDT
Created attachment 24622 [details]
Patch, changelog
Comment 6 Darin Adler 2008-10-23 16:04:04 PDT
Comment on attachment 24622 [details]
Patch, changelog

> +    unsigned existingHash() const { return m_hash; }

You need an ASSERT(m_hash) in here.

How did you test?

r=me
Comment 7 Simon Fraser (smfr) 2008-10-23 16:30:38 PDT
(In reply to comment #6)
> (From update of attachment 24622 [details] [edit])
> > +    unsigned existingHash() const { return m_hash; }
> 
> You need an ASSERT(m_hash) in here.

I had, but that fires if someone uses "" as a key.

> How did you test?

Using the code pasted above.
Comment 8 Darin Adler 2008-10-23 16:41:59 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > You need an ASSERT(m_hash) in here.
> 
> I had, but that fires if someone uses "" as a key.

If so, that indicates a bug. The hash value of an empty string should be 0x80000000 due to this code in StringImpl::computeHash:

    // This avoids ever returning a hash code of 0, since that is used to
    // signal "hash not computed yet", using a value that is likely to be
    // effectively the same as 0 when the low bits are masked.
    hash |= !hash << 31;

We need to investigate why this happens and fix it.
Comment 9 Simon Fraser (smfr) 2008-10-23 16:45:21 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > You need an ASSERT(m_hash) in here.
> > 
> > I had, but that fires if someone uses "" as a key.
> 
> If so, that indicates a bug. The hash value of an empty string should be
> 0x80000000 due to this code in StringImpl::computeHash:
> 
>     // This avoids ever returning a hash code of 0, since that is used to
>     // signal "hash not computed yet", using a value that is likely to be
>     // effectively the same as 0 when the low bits are masked.
>     hash |= !hash << 31;
> 
> We need to investigate why this happens and fix it.

I'm sorry, I misspoke; I saw that code too. It does NOT fire for empty sting, but it does for nullAtom, but we agreed that nullAtom was not a valid key.

I'll add the assert back and re-test.
Comment 10 Simon Fraser (smfr) 2008-10-23 21:40:53 PDT
Created attachment 24633 [details]
Patch adding assert back

So the assert in existingHash() does fire in one particular case: when the empty string is used as a hash key. The reason is that AtomicString::add(const char* c) doesn't actually add the StringImpl::empty() to the HashSet: StringImpl::empty() returns a singleton, so it's guaranteed to self-compare. Because of this, hash() may never be called on this instance.

So in order to avoid the assertion in existingHash(), I force the hash to be computed on the StringImpl returned by StringImpl::empty() in AtomicString::add(). Is this the best place to do this, or should I do it in the StringImpl() ctor, or in the emptyString() method?
Comment 11 Darin Adler 2008-10-24 09:36:55 PDT
(In reply to comment #10)
> So in order to avoid the assertion in existingHash(), I force the hash to be
> computed on the StringImpl returned by StringImpl::empty() in
> AtomicString::add(). Is this the best place to do this, or should I do it in
> the StringImpl() ctor, or in the emptyString() method?

In the StringImpl() constructor. I would probably just initialize the hash to 0x80000000 rather than actually calling the hash function, and maybe call the hash function in an assertion. But calling the hash function would also be fine.
Comment 12 Simon Fraser (smfr) 2008-10-24 09:58:07 PDT
Created attachment 24641 [details]
Final patch

Call hash() in the StringImpl() ctor for empty strings (the hash is actually 0x4ec889e for an empty string).

Add AtomicStringHash.h to other build systems.
Comment 13 Darin Adler 2008-10-24 12:03:30 PDT
Comment on attachment 24641 [details]
Final patch

+        * platform/text/AtomicString.h:
+        (WebCore::AtomicString::AtomicString):
+        (WebCore::AtomicString::isHashTableDeletedValue):
+        (WTF::): specialize DefaultHash for AtomicString to use AtomicStringHash
+        * platform/text/AtomicStringHash.h: Added.
+        (WebCore::AtomicStringHash::hash):
+        (WebCore::AtomicStringHash::equal):
+        (WTF::):

For future reference, there's no need to leave bogus things like "(WTF::)" in the ChangeLog just because a buggy prepare-ChangeLog script put them in there!

@@ -44,6 +44,7 @@
 		0A4844990CA44CB200B7BD48 /* SoftLinking.h in Headers */ = {isa = PBXBuildFile; fileRef = 0A4844980CA44CB200B7BD48 /* SoftLinking.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		0F56028F0E4B76580065B038 /* RenderMarquee.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F56028D0E4B76580065B038 /* RenderMarquee.h */; };
 		0F5602900E4B76580065B038 /* RenderMarquee.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F56028E0E4B76580065B038 /* RenderMarquee.cpp */; };
+		0FC705210EB1815600B90AD8 /* AtomicStringHash.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FC705200EB1815600B90AD8 /* AtomicStringHash.h */; };
 		1402645E0AFDC19B005919E2 /* LoggingMac.mm in Sources */ = {isa = PBXBuildFile; fileRef = 1402645D0AFDC19B005919E2 /* LoggingMac.mm */; };
 		1403B99709EB13AF00797C7F /* DOMWindow.h in Headers */ = {isa = PBXBuildFile; fileRef = 1403B99509EB13AF00797C7F /* DOMWindow.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		1403B99809EB13AF00797C7F /* DOMWindow.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 1403B99609EB13AF00797C7F /* DOMWindow.cpp */; };
@@ -4716,6 +4717,7 @@
 		0A4844980CA44CB200B7BD48 /* SoftLinking.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = SoftLinking.h; sourceTree = "<group>"; };
 		0F56028D0E4B76580065B038 /* RenderMarquee.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RenderMarquee.h; sourceTree = "<group>"; };
 		0F56028E0E4B76580065B038 /* RenderMarquee.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = RenderMarquee.cpp; sourceTree = "<group>"; };
+		0FC705200EB1815600B90AD8 /* AtomicStringHash.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AtomicStringHash.h; sourceTree = "<group>"; };
 		1402645D0AFDC19B005919E2 /* LoggingMac.mm */ = {isa = PBXFileReference; fileEncoding = 30; lastKnownFileType = sourcecode.cpp.objcpp; path = LoggingMac.mm; sourceTree = "<group>"; };
 		1403B90C09EB124500797C7F /* DOMWindow.idl */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text; path = DOMWindow.idl; sourceTree = "<group>"; };
 		1403B99509EB13AF00797C7F /* DOMWindow.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DOMWindow.h; sourceTree = "<group>"; };

What is this change? It's not mentioned in the ChangeLog.

 {
+    hash();
 }

I think you should add a comment. Calling a function that's normally used for its result, but just for its side effect, is unusual enough that I think you should call it out.

r=me
Comment 14 Simon Fraser (smfr) 2008-10-24 12:33:50 PDT
Committed, with changes to address the review comments.

	M	WebCore/GNUmakefile.am
	A	WebCore/platform/text/AtomicStringHash.h
	M	WebCore/platform/text/StringImpl.h
	M	WebCore/platform/text/AtomicString.h
	M	WebCore/platform/text/StringImpl.cpp
	M	WebCore/WebCore.xcodeproj/project.pbxproj
	M	WebCore/ChangeLog
	M	WebCore/WebCore.vcproj/WebCore.vcproj
r37852 = 912ac415fe8d6672047de379d807eefebee98158 (trunk)
Comment 15 Alexey Proskuryakov 2008-10-31 08:24:45 PDT
I just found a case where HashMap key was actually destroyed: bug 22001.