<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>111708</bug_id>
          
          <creation_ts>2013-03-07 03:53:14 -0800</creation_ts>
          <short_desc>SegmentedString copy constructor is 1% of total time for background html parser?</short_desc>
          <delta_ts>2023-12-25 10:04:59 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>New Bugs</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>111645</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Eric Seidel (no email)">eric</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>abarth</cc>
    
    <cc>annevk</cc>
    
    <cc>tonyg</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>849943</commentid>
    <comment_count>0</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-07 03:53:14 -0800</bug_when>
    <thetext>SegmentedString copy constructor is 1% of total time for background html parser?

Something is wrong here.

Running Time	Self		Symbol Name
18.6ms    0.9%	0.0	 	 WTF::Deque&lt;WebCore::SegmentedSubstring, 0ul&gt;::Deque(WTF::Deque&lt;WebCore::SegmentedSubstring, 0ul&gt; const&amp;)
18.3ms    0.8%	0.0	 	  WebCore::SegmentedString::operator=(WebCore::SegmentedString const&amp;)
18.3ms    0.8%	0.0	 	   WebCore::HTMLSourceTracker::start(WebCore::SegmentedString&amp;, WebCore::HTMLTokenizer*, WebCore::HTMLToken&amp;)
18.3ms    0.8%	0.0	 	    WebCore::BackgroundHTMLParser::pumpTokenizer()
18.3ms    0.8%	0.0	 	     WTF::BoundFunctionImpl&lt;WTF::FunctionWrapper&lt;void (WebCore::BackgroundHTMLParser::*)(WTF::String const&amp;)&gt;, void (WTF::WeakPtr&lt;WebCore::BackgroundHTMLParser&gt;, WTF::String)&gt;::operator()()
18.3ms    0.8%	0.0	 	      WebCore::HTMLParserThread::runLoop()

That sample was taken with bug 107236 applied (so that I could actually see the parser amongst the rendering noise).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>849965</commentid>
    <comment_count>1</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-07 04:41:12 -0800</bug_when>
    <thetext>I feel like I fixed this identical bug back when I was doing all the malloc(0) removal in WebKit.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>849966</commentid>
    <comment_count>2</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-07 04:44:09 -0800</bug_when>
    <thetext>The bug I&apos;m thinking of is bug 55005.  I suspect this is just a new variant of that. :(</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>849970</commentid>
    <comment_count>3</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-07 04:45:29 -0800</bug_when>
    <thetext>I see.  That was never actually needed because bug 55091 solved things.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>851765</commentid>
    <comment_count>4</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-09 02:06:41 -0800</bug_when>
    <thetext>I added some logging, it looks like in parsing the whole of teh HTML5 spec, we only copy Deque from this callsite 10 times or so?  That makes me very surprised to see it be 2% of total time, as I did in my sample just now (other parts of gotten faster since I filed this bug).

Not sure what&apos;s up.  It&apos;s possible my methods are flawed.

I&apos;ll try changing Deque&lt;SegmentedSubstring&gt; m_substrings to use an inline capacity of 2 tomorrow and see if that makes things faster.  I worry since SegmentedSubstring can&apos;t be copied with memcpy, that will just shift the slowness elsewhere.

It&apos;s not immediately clear to me why HTMLSourceTracker needs to do this copying in the first place. :(</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>851801</commentid>
    <comment_count>5</comment_count>
    <who name="Adam Barth">abarth</who>
    <bug_when>2013-03-09 11:23:24 -0800</bug_when>
    <thetext>&gt; It&apos;s not immediately clear to me why HTMLSourceTracker needs to do this copying in the first place. :(

It needs to remember where in the input stream the token started so that it can later provide the source string that generated the token.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>851818</commentid>
    <comment_count>6</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2013-03-09 13:14:39 -0800</bug_when>
    <thetext>I tried changing to use a Deque with inline capacity 2, but that just causes VectorBuffer::swap to be hot.  Presumably from:

    template&lt;typename T, size_t inlineCapacity&gt;
    inline Deque&lt;T, inlineCapacity&gt;&amp; Deque&lt;T, inlineCapacity&gt;::operator=(const Deque&lt;T, inlineCapacity&gt;&amp; other)
    {
        // FIXME: This is inefficient if we&apos;re using an inline buffer and T is
        // expensive to copy since it will copy the buffer twice instead of once.
        Deque&lt;T, inlineCapacity&gt; copy(other);
        swap(copy);
        return *this;
    }

I suspect there is a way to get what HTMLSourceTracker wants w/o the malloc, I just have to study what it&apos;s trying to do more.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2001928</commentid>
    <comment_count>7</comment_count>
    <who name="Anne van Kesteren">annevk</who>
    <bug_when>2023-12-25 10:04:59 -0800</bug_when>
    <thetext>Threaded HTML parser was removed.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>