Bug 112336 - HTMLIdentifier uses 8-12 bytes when it only needs 4-8
Summary: HTMLIdentifier uses 8-12 bytes when it only needs 4-8
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks: 111645
  Show dependency treegraph
 
Reported: 2013-03-14 02:45 PDT by Eric Seidel (no email)
Modified: 2013-03-15 12:18 PDT (History)
3 users (show)

See Also:


Attachments
About what I had in mind (20.34 KB, patch)
2013-03-14 16:34 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2013-03-14 02:45:59 PDT
HTMLIdentifier uses 8-12 bytes when it only needs 4-8

http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLIdentifier.h#L80

    // FIXME: This could be a union.
    unsigned m_index;
    String m_string;

(I no longer believe that it needs to be a Union, but rather just a StringImpl* with one bit stolen -- perhaps even there is a PointerWithMask template class to help us with this?)

The one masked bit would tell you which mode it was in.  Either we could treat the "known identifier" mode as a StringImpl* to a const (non-reffed) AtomicStringImpl*, or we could use an index (like the current implementation does).

Using a StringImpl* for both the owned and non-owned case is nice because then the nameForIndex function completely disappears!

Basically we'd just be implementing a static version of the AtomicString hash table, which held a known set of AtomicStrings and only exposed StringImpl* access to them.  (And magically acted like a String in the case where the character buffer in question did not match one of our known strings.)

We've sorta implemented a "ThreadPortableAtomicString" of sorts. :)

Making this smaller will also make the maximum memory consumed during speculation smaller, which may affect bug 112335.
Comment 1 Eric Seidel (no email) 2013-03-14 02:51:08 PDT
This sentence was confusing:
> Either we could treat the "known identifier" mode as a StringImpl* to a const (non-reffed)
> AtomicStringImpl*, or we could use an index (like the current implementation does).

I've fully given up on the "index" based approach.  It shoudl always be a pointer to a StringImpl*, teh question is just if we own (ref) that StringImpl or not.  If it's a known identifier (AtomicStringImpl*) then we don't own it, and depend on something else to keep it alive (and in the AtomicStringTable on the main thread) forever.  If we own it, then it's just a plain old StringImpl which we need to manually ref/deref since we can't steal a bit and still use RefPtr or String. :)

This class would only ever provide String access to its storage from the main thread, but could provide const StringImpl* access on any thread.
Comment 2 Benjamin Poulain 2013-03-14 10:33:04 PDT
> I've fully given up on the "index" based approach.  It shoudl always be a pointer to a StringImpl*, teh question is just if we own (ref) that StringImpl or not.  If it's a known identifier (AtomicStringImpl*) then we don't own it, and depend on something else to keep it alive (and in the AtomicStringTable on the main thread) forever.  If we own it, then it's just a plain old StringImpl which we need to manually ref/deref since we can't steal a bit and still use RefPtr or String. :)

I am unfamiliar with the ownership here.

Is it that if it is a AtomicStringImpl*, it must be a HTMLName, if not, it is a custom string?
Comment 3 Eric Seidel (no email) 2013-03-14 11:48:22 PDT
I wasn't very clear, sorry.  I'll just code up an implementation and post it. :)
Comment 4 Eric Seidel (no email) 2013-03-14 16:34:52 PDT
Created attachment 193200 [details]
About what I had in mind
Comment 5 Adam Barth 2013-03-15 12:18:06 PDT
Comment on attachment 193200 [details]
About what I had in mind

View in context: https://bugs.webkit.org/attachment.cgi?id=193200&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:64
> -static void checkThatTokensAreSafeToSendToAnotherThread(const CompactHTMLTokenStream* tokens)
> +static void checkThatTokensAreSafeToSendToAnotherThread(CompactHTMLTokenStream* tokens)

Should we rename this function to make it clear that it has side-effects?  I presume those side effects are only relevant in debug.

> Source/WebCore/html/parser/CompactHTMLToken.h:61
> +    void markForTransferToMainThread();

Can you make this a DEBUG only function?

> Source/WebCore/html/parser/HTMLIdentifier.cpp:71
> +    const_cast<StringImpl*>(m_pointerWithFlags.ptr())->deref();

Woah.

> Source/WebCore/html/parser/HTMLIdentifier.h:45
> +// This is basically just a bitfield for a pointer.
> +template <typename T>
> +class PointerWithFlags {

This should probably be in WTF.

> Source/WebCore/html/parser/HTMLIdentifier.h:56
> +    bool flag1() const { return maskedValue(kFlag1Mask); }
> +    void setFlag1(bool flag) { setMaskedValue(flag, kFlag1Mask); }
> +
> +    bool flag2() const { return maskedValue(kFlag2Mask); }
> +    void setFlag2(bool flag) { setMaskedValue(flag, kFlag2Mask); }

Should these methods be protected?  Presumably the way to use this class is to subclass it and give these flags reasonable names.

> Source/WebCore/html/parser/HTMLIdentifier.h:116
> +#ifndef NDEBUG
> +    bool isMarkedForTransferToMainThread() { return m_pointerWithFlags.flag2(); }
> +#endif

If this is debug-only, why not introduce debug-only storage for this value?