Bug 112336

Summary: HTMLIdentifier uses 8-12 bytes when it only needs 4-8
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: New BugsAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED CONFIGURATION CHANGED    
Severity: Normal CC: abarth, annevk, benjamin, tonyg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 111645    
Attachments:
Description Flags
About what I had in mind none

Eric Seidel (no email)
Reported 2013-03-14 02:45:59 PDT
HTMLIdentifier uses 8-12 bytes when it only needs 4-8 http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLIdentifier.h#L80 // FIXME: This could be a union. unsigned m_index; String m_string; (I no longer believe that it needs to be a Union, but rather just a StringImpl* with one bit stolen -- perhaps even there is a PointerWithMask template class to help us with this?) The one masked bit would tell you which mode it was in. Either we could treat the "known identifier" mode as a StringImpl* to a const (non-reffed) AtomicStringImpl*, or we could use an index (like the current implementation does). Using a StringImpl* for both the owned and non-owned case is nice because then the nameForIndex function completely disappears! Basically we'd just be implementing a static version of the AtomicString hash table, which held a known set of AtomicStrings and only exposed StringImpl* access to them. (And magically acted like a String in the case where the character buffer in question did not match one of our known strings.) We've sorta implemented a "ThreadPortableAtomicString" of sorts. :) Making this smaller will also make the maximum memory consumed during speculation smaller, which may affect bug 112335.
Attachments
About what I had in mind (20.34 KB, patch)
2013-03-14 16:34 PDT, Eric Seidel (no email)
no flags
Eric Seidel (no email)
Comment 1 2013-03-14 02:51:08 PDT
This sentence was confusing: > Either we could treat the "known identifier" mode as a StringImpl* to a const (non-reffed) > AtomicStringImpl*, or we could use an index (like the current implementation does). I've fully given up on the "index" based approach. It shoudl always be a pointer to a StringImpl*, teh question is just if we own (ref) that StringImpl or not. If it's a known identifier (AtomicStringImpl*) then we don't own it, and depend on something else to keep it alive (and in the AtomicStringTable on the main thread) forever. If we own it, then it's just a plain old StringImpl which we need to manually ref/deref since we can't steal a bit and still use RefPtr or String. :) This class would only ever provide String access to its storage from the main thread, but could provide const StringImpl* access on any thread.
Benjamin Poulain
Comment 2 2013-03-14 10:33:04 PDT
> I've fully given up on the "index" based approach. It shoudl always be a pointer to a StringImpl*, teh question is just if we own (ref) that StringImpl or not. If it's a known identifier (AtomicStringImpl*) then we don't own it, and depend on something else to keep it alive (and in the AtomicStringTable on the main thread) forever. If we own it, then it's just a plain old StringImpl which we need to manually ref/deref since we can't steal a bit and still use RefPtr or String. :) I am unfamiliar with the ownership here. Is it that if it is a AtomicStringImpl*, it must be a HTMLName, if not, it is a custom string?
Eric Seidel (no email)
Comment 3 2013-03-14 11:48:22 PDT
I wasn't very clear, sorry. I'll just code up an implementation and post it. :)
Eric Seidel (no email)
Comment 4 2013-03-14 16:34:52 PDT
Created attachment 193200 [details] About what I had in mind
Adam Barth
Comment 5 2013-03-15 12:18:06 PDT
Comment on attachment 193200 [details] About what I had in mind View in context: https://bugs.webkit.org/attachment.cgi?id=193200&action=review > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:64 > -static void checkThatTokensAreSafeToSendToAnotherThread(const CompactHTMLTokenStream* tokens) > +static void checkThatTokensAreSafeToSendToAnotherThread(CompactHTMLTokenStream* tokens) Should we rename this function to make it clear that it has side-effects? I presume those side effects are only relevant in debug. > Source/WebCore/html/parser/CompactHTMLToken.h:61 > + void markForTransferToMainThread(); Can you make this a DEBUG only function? > Source/WebCore/html/parser/HTMLIdentifier.cpp:71 > + const_cast<StringImpl*>(m_pointerWithFlags.ptr())->deref(); Woah. > Source/WebCore/html/parser/HTMLIdentifier.h:45 > +// This is basically just a bitfield for a pointer. > +template <typename T> > +class PointerWithFlags { This should probably be in WTF. > Source/WebCore/html/parser/HTMLIdentifier.h:56 > + bool flag1() const { return maskedValue(kFlag1Mask); } > + void setFlag1(bool flag) { setMaskedValue(flag, kFlag1Mask); } > + > + bool flag2() const { return maskedValue(kFlag2Mask); } > + void setFlag2(bool flag) { setMaskedValue(flag, kFlag2Mask); } Should these methods be protected? Presumably the way to use this class is to subclass it and give these flags reasonable names. > Source/WebCore/html/parser/HTMLIdentifier.h:116 > +#ifndef NDEBUG > + bool isMarkedForTransferToMainThread() { return m_pointerWithFlags.flag2(); } > +#endif If this is debug-only, why not introduce debug-only storage for this value?
Anne van Kesteren
Comment 6 2023-04-01 00:53:15 PDT
This no longer exists.
Note You need to log in before you can comment on or make changes to this bug.