It's currently not possible to use an AtomicString as the key for a HashMap or HashSet; AtomicStringImpl is used instead (which is just a StringImpl). This has the unfortunate side-effect that, even while an item in the hash is live, all AtomicStrings referencing the key can go out of scope, leaving the key dangling. The next matching AtomicString will pick up a new AtomicStringImpl, but this will no longer match the hash entry. The solution is to allow AtomicString as the hash key. This would require implementing hash functions for AtomicString, which should be easy (they can just call through to the impl).
Created attachment 24584 [details] WIP patch Something along these lines? constructDeletedValue() is wrong; AtomicString actually dereferences the impl, so we can't use HashTableDeletedValue as the impl. address.
FWIW, we use RefPtr<AtomicString> when there is no external guarantee that the string will persist. I do not know if there is any reason for such a mismatch with String, which can be used directly.
Created attachment 24604 [details] Patch, changelog Final patch. This uses an AtomicString with a NULL impl as the constructDeletedValue, which seems to work OK. I tested it with this code: diff --git a/WebCore/dom/Document.cpp b/WebCore/dom/Document.cpp index f43eda1..c1984e6 100644 --- a/WebCore/dom/Document.cpp +++ b/WebCore/dom/Document.cpp @@ -279,6 +279,53 @@ private: RefPtr<Document> m_document; }; + +#include <wtf/HashMap.h> +#include "AtomicString.h" +#include "AtomicStringHash.h" + +void testAtomicStringHashMap() +{ + HashMap<AtomicString, bool> testMap; + + testMap.add("test1", true); + testMap.add("test2", false); + + assert(testMap.contains("test1") && testMap.get("test1")); + assert(testMap.contains("test2") && !testMap.get("test2")); + assert(!testMap.contains("test3")); + + testMap.remove("test2"); + assert(!testMap.contains("test2")); + + AtomicString test1("test1"); + assert(testMap.contains(test1) && testMap.get("test1")); + + testMap.remove("test1"); + assert(!testMap.contains(test1)); + + testMap.clear(); + + { + AtomicString one("foobar_one"); + testMap.add(one, true); + + AtomicString two("foobar_two"); + testMap.add(two, true); + } + + { + AtomicString one("foobar_one"); + AtomicString two("foobar_two"); + AtomicString three("foobar_three"); + assert(testMap.contains(one)); + assert(testMap.contains(two)); + assert(!testMap.contains(three)); + } +} + + + Document::Document(Frame* frame, bool isXHTML) : ContainerNode(0) , m_domtree_version(0) @@ -320,6 +367,9 @@ Document::Document(Frame* frame, bool isXHTML) , m_inLowBandwidthDisplay(false) #endif { + +testAtomicStringHashMap(); + m_document.resetSkippingRef(this); m_printing = false;
Comment on attachment 24604 [details] Patch, changelog Patch is bad; isDeletedValue() and nullAtom compare to be the same thing.
Created attachment 24622 [details] Patch, changelog
Comment on attachment 24622 [details] Patch, changelog > + unsigned existingHash() const { return m_hash; } You need an ASSERT(m_hash) in here. How did you test? r=me
(In reply to comment #6) > (From update of attachment 24622 [details] [edit]) > > + unsigned existingHash() const { return m_hash; } > > You need an ASSERT(m_hash) in here. I had, but that fires if someone uses "" as a key. > How did you test? Using the code pasted above.
(In reply to comment #7) > (In reply to comment #6) > > You need an ASSERT(m_hash) in here. > > I had, but that fires if someone uses "" as a key. If so, that indicates a bug. The hash value of an empty string should be 0x80000000 due to this code in StringImpl::computeHash: // This avoids ever returning a hash code of 0, since that is used to // signal "hash not computed yet", using a value that is likely to be // effectively the same as 0 when the low bits are masked. hash |= !hash << 31; We need to investigate why this happens and fix it.
(In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > > > You need an ASSERT(m_hash) in here. > > > > I had, but that fires if someone uses "" as a key. > > If so, that indicates a bug. The hash value of an empty string should be > 0x80000000 due to this code in StringImpl::computeHash: > > // This avoids ever returning a hash code of 0, since that is used to > // signal "hash not computed yet", using a value that is likely to be > // effectively the same as 0 when the low bits are masked. > hash |= !hash << 31; > > We need to investigate why this happens and fix it. I'm sorry, I misspoke; I saw that code too. It does NOT fire for empty sting, but it does for nullAtom, but we agreed that nullAtom was not a valid key. I'll add the assert back and re-test.
Created attachment 24633 [details] Patch adding assert back So the assert in existingHash() does fire in one particular case: when the empty string is used as a hash key. The reason is that AtomicString::add(const char* c) doesn't actually add the StringImpl::empty() to the HashSet: StringImpl::empty() returns a singleton, so it's guaranteed to self-compare. Because of this, hash() may never be called on this instance. So in order to avoid the assertion in existingHash(), I force the hash to be computed on the StringImpl returned by StringImpl::empty() in AtomicString::add(). Is this the best place to do this, or should I do it in the StringImpl() ctor, or in the emptyString() method?
(In reply to comment #10) > So in order to avoid the assertion in existingHash(), I force the hash to be > computed on the StringImpl returned by StringImpl::empty() in > AtomicString::add(). Is this the best place to do this, or should I do it in > the StringImpl() ctor, or in the emptyString() method? In the StringImpl() constructor. I would probably just initialize the hash to 0x80000000 rather than actually calling the hash function, and maybe call the hash function in an assertion. But calling the hash function would also be fine.
Created attachment 24641 [details] Final patch Call hash() in the StringImpl() ctor for empty strings (the hash is actually 0x4ec889e for an empty string). Add AtomicStringHash.h to other build systems.
Comment on attachment 24641 [details] Final patch + * platform/text/AtomicString.h: + (WebCore::AtomicString::AtomicString): + (WebCore::AtomicString::isHashTableDeletedValue): + (WTF::): specialize DefaultHash for AtomicString to use AtomicStringHash + * platform/text/AtomicStringHash.h: Added. + (WebCore::AtomicStringHash::hash): + (WebCore::AtomicStringHash::equal): + (WTF::): For future reference, there's no need to leave bogus things like "(WTF::)" in the ChangeLog just because a buggy prepare-ChangeLog script put them in there! @@ -44,6 +44,7 @@ 0A4844990CA44CB200B7BD48 /* SoftLinking.h in Headers */ = {isa = PBXBuildFile; fileRef = 0A4844980CA44CB200B7BD48 /* SoftLinking.h */; settings = {ATTRIBUTES = (Private, ); }; }; 0F56028F0E4B76580065B038 /* RenderMarquee.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F56028D0E4B76580065B038 /* RenderMarquee.h */; }; 0F5602900E4B76580065B038 /* RenderMarquee.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F56028E0E4B76580065B038 /* RenderMarquee.cpp */; }; + 0FC705210EB1815600B90AD8 /* AtomicStringHash.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FC705200EB1815600B90AD8 /* AtomicStringHash.h */; }; 1402645E0AFDC19B005919E2 /* LoggingMac.mm in Sources */ = {isa = PBXBuildFile; fileRef = 1402645D0AFDC19B005919E2 /* LoggingMac.mm */; }; 1403B99709EB13AF00797C7F /* DOMWindow.h in Headers */ = {isa = PBXBuildFile; fileRef = 1403B99509EB13AF00797C7F /* DOMWindow.h */; settings = {ATTRIBUTES = (Private, ); }; }; 1403B99809EB13AF00797C7F /* DOMWindow.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 1403B99609EB13AF00797C7F /* DOMWindow.cpp */; }; @@ -4716,6 +4717,7 @@ 0A4844980CA44CB200B7BD48 /* SoftLinking.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = SoftLinking.h; sourceTree = "<group>"; }; 0F56028D0E4B76580065B038 /* RenderMarquee.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RenderMarquee.h; sourceTree = "<group>"; }; 0F56028E0E4B76580065B038 /* RenderMarquee.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = RenderMarquee.cpp; sourceTree = "<group>"; }; + 0FC705200EB1815600B90AD8 /* AtomicStringHash.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AtomicStringHash.h; sourceTree = "<group>"; }; 1402645D0AFDC19B005919E2 /* LoggingMac.mm */ = {isa = PBXFileReference; fileEncoding = 30; lastKnownFileType = sourcecode.cpp.objcpp; path = LoggingMac.mm; sourceTree = "<group>"; }; 1403B90C09EB124500797C7F /* DOMWindow.idl */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text; path = DOMWindow.idl; sourceTree = "<group>"; }; 1403B99509EB13AF00797C7F /* DOMWindow.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DOMWindow.h; sourceTree = "<group>"; }; What is this change? It's not mentioned in the ChangeLog. { + hash(); } I think you should add a comment. Calling a function that's normally used for its result, but just for its side effect, is unusual enough that I think you should call it out. r=me
Committed, with changes to address the review comments. M WebCore/GNUmakefile.am A WebCore/platform/text/AtomicStringHash.h M WebCore/platform/text/StringImpl.h M WebCore/platform/text/AtomicString.h M WebCore/platform/text/StringImpl.cpp M WebCore/WebCore.xcodeproj/project.pbxproj M WebCore/ChangeLog M WebCore/WebCore.vcproj/WebCore.vcproj r37852 = 912ac415fe8d6672047de379d807eefebee98158 (trunk)
I just found a case where HashMap key was actually destroyed: bug 22001.