Bug 111250 - AtomicHTMLToken should not be heap allocated or RefCounted
Summary: AtomicHTMLToken should not be heap allocated or RefCounted
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks: 111645
  Show dependency treegraph
 
Reported: 2013-03-02 03:09 PST by Eric Seidel (no email)
Modified: 2013-03-08 11:17 PST (History)
6 users (show)

See Also:


Attachments
Remove infinity mallocs from parsing (24.27 KB, patch)
2013-03-08 00:58 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2013-03-02 03:09:48 PST
AtomicHTMLToken should not be heap allocated or RefCounted

This is done so that it can be saved on the HTMLElementStack, but it appears we only use this in 3 places:

/Users/eseidel/Projects/WebKit/Source/WebCore/html/parser/HTMLElementStack.cpp:
  291          return false;
  292      if (item->hasTagName(MathMLNames::annotation_xmlTag)) {
  293:         Attribute* encodingAttr = item->token()->getAttributeItem(MathMLNames::encodingAttr);
  294          if (encodingAttr) {
  295              const String& encoding = encodingAttr->value();

/Users/eseidel/Projects/WebKit/Source/WebCore/html/parser/HTMLFormattingElementList.cpp:
  179      remainingCandidates.reserveInitialCapacity(candidates.size());
  180  
  181:     const Vector<Attribute>& attributes = newItem->token()->attributes();
  182      for (size_t i = 0; i < attributes.size(); ++i) {
  183          const Attribute& attribute = attributes[i];
  ...
  190              ASSERT(newItem->localName() == candidate->localName() && newItem->namespaceURI() == candidate->namespaceURI());
  191  
  192:             Attribute* candidateAttribute = candidate->token()->getAttributeItem(attribute.name());
  193              if (candidateAttribute && candidateAttribute->value() == attribute.value())
  194                  remainingCandidates.append(candidate);

This just adds an extra malloc, and some complexity to AtomicHTMLToken we don't need, including the ability to clear it's externalCharacterBuffer pointers so we don't grab at memory after its freed.  See bug 111248.
Comment 1 Eric Seidel (no email) 2013-03-07 16:11:30 PST
I believe this is the fastMalloc showing up as .6% of time profiling Parser/html-parser-threaded.html with bug 107236 applied:

Running Time	Self		Symbol Name
108.5ms    6.2%	108.5	 	WTF::fastMalloc(unsigned long)
12.1ms    0.6%	0.0	 	 WebCore::HTMLDocumentParser::processParsedChunkFromBackgroundParser(WTF::PassOwnPtr<WebCore::HTMLDocumentParser::ParsedChunk>)
Comment 2 Eric Seidel (no email) 2013-03-08 00:54:09 PST
Before:

RESULT Parser: html-parser-threaded: Time= 450.164849998 ms
median= 450.482999993 ms, stdev= 7.2381436538 ms, min= 436.084999994 ms, max= 464.968999964 ms
RESULT Parser: html-parser-threaded: JSHeap= 3522054.8 bytes
median= 3603242.0 bytes, stdev= 623535.479063 bytes, min= 2623588.0 bytes, max= 4378336.0 bytes
RESULT Parser: html-parser-threaded: Malloc= 53670691.6 bytes
median= 56351944.0 bytes, stdev= 11597207.0499 bytes, min= 36349272.0 bytes, max= 69981056.0 bytes

After:

RESULT Parser: html-parser-threaded: Time= 436.115300003 ms
median= 436.41600004 ms, stdev= 5.71435647554 ms, min= 427.160999971 ms, max= 446.753000026 ms
RESULT Parser: html-parser-threaded: JSHeap= 3548257.8 bytes
median= 3603324.0 bytes, stdev= 635649.475906 bytes, min= 2623588.0 bytes, max= 4509212.0 bytes
RESULT Parser: html-parser-threaded: Malloc= 54642561.6 bytes
median= 56337700.0 bytes, stdev= 11804768.9883 bytes, min= 36077416.0 bytes, max= 69851960.0 bytes

I makes it faster.
Comment 3 Eric Seidel (no email) 2013-03-08 00:58:48 PST
Created attachment 192173 [details]
Remove infinity mallocs from parsing
Comment 4 Eric Seidel (no email) 2013-03-08 01:00:50 PST
Comment on attachment 192173 [details]
Remove infinity mallocs from parsing

View in context: https://bugs.webkit.org/attachment.cgi?id=192173&action=review

> Source/WebCore/html/parser/HTMLConstructionSite.cpp:542
> +    // NOTE: Moving from item -> token -> item copies the Attribute vector twice!

These mallocs do show up on the profile.  But there are *way* fewer of them than all the AtomicHTMLTokens we were mallocing before. :)

> Source/WebCore/html/parser/HTMLStackItem.h:-211
> -            // Create a fake token for a document fragment node. This looks ugly but required for performance
> -            // because we want to use m_token->name() in localName(), hasLocalName() and hasTagName() without
> -            // checking m_isDocumentFragmentNode flag.
> -            m_token = AtomicHTMLToken::create(HTMLToken::StartTag, nullAtom);

It's possible I'm subtly changing behavior here... or that this code wasn't actually doing what it thought it was doing.  The new code should still return nullAtom from localName() like the old code did. :)
Comment 5 Adam Barth 2013-03-08 10:51:49 PST
Comment on attachment 192173 [details]
Remove infinity mallocs from parsing

View in context: https://bugs.webkit.org/attachment.cgi?id=192173&action=review

> Source/WebCore/html/parser/HTMLStackItem.h:227
> +        , m_tokenAttributes(token->attributes())

If we were clever, we could probably swap the attributes in and out of this Vector.

> Source/WebCore/html/parser/HTMLStackItem.h:237
> +    AtomicString m_tokenLocalName;
> +    Vector<Attribute> m_tokenAttributes;
>      AtomicString m_namespaceURI;

The names m_tokenLocalName and m_namespaceURI aren't quite symmetric.  Maybe they're different subtly?
Comment 6 Eric Seidel (no email) 2013-03-08 11:07:37 PST
Comment on attachment 192173 [details]
Remove infinity mallocs from parsing

View in context: https://bugs.webkit.org/attachment.cgi?id=192173&action=review

>> Source/WebCore/html/parser/HTMLStackItem.h:227
>> +        , m_tokenAttributes(token->attributes())
> 
> If we were clever, we could probably swap the attributes in and out of this Vector.

Definitely something to explore. Will require changing the api contract with the token.

>> Source/WebCore/html/parser/HTMLStackItem.h:237
>>      AtomicString m_namespaceURI;
> 
> The names m_tokenLocalName and m_namespaceURI aren't quite symmetric.  Maybe they're different subtly?

Namespace comes from the context.  Token is only set for formatting elements iirc.
Comment 7 Eric Seidel (no email) 2013-03-08 11:09:05 PST
Comment on attachment 192173 [details]
Remove infinity mallocs from parsing

Thanks!
Comment 8 WebKit Review Bot 2013-03-08 11:17:49 PST
Comment on attachment 192173 [details]
Remove infinity mallocs from parsing

Clearing flags on attachment: 192173

Committed r145248: <http://trac.webkit.org/changeset/145248>
Comment 9 WebKit Review Bot 2013-03-08 11:17:52 PST
All reviewed patches have been landed.  Closing bug.