<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>58123</bug_id>
          
          <creation_ts>2011-04-08 01:36:21 -0700</creation_ts>
          <short_desc>Document source preload scanned repeatedly</short_desc>
          <delta_ts>2011-04-11 12:59:34 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Page Loading</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>PC</rep_platform>
          <op_sys>OS X 10.5</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Antti Koivisto">koivisto</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>abarth</cc>
    
    <cc>tonyg</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>382077</commentid>
    <comment_count>0</comment_count>
    <who name="Antti Koivisto">koivisto</who>
    <bug_when>2011-04-08 01:36:21 -0700</bug_when>
    <thetext>Clearing the preload scanner after parsing resumes (from http://trac.webkit.org/changeset/63154)
 
void HTMLDocumentParser::resumeParsingAfterScriptExecution()
{
    ASSERT(!inScriptExecution());
    ASSERT(!m_treeBuilder-&gt;isPaused());

    m_preloadScanner.clear()
...

makes us rescan the document text multiple times if the parser gets blocked repeatedly during loading. For example when reloading nytimes.com, we scan almost the entire 120kb main document 5 times, causing us to spent way more time in the preload scanner than the main tokenizer.

This happens because when the scanner is cleared we lose information about how far we have already scanned. When the parser gets blocked again, the preload scanner is re-constructed and the scanning starts from the current position of the main parser:

void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode)
{
...
    if (isWaitingForScripts()) {
        ASSERT(m_tokenizer-&gt;state() == HTMLTokenizer::DataState);
        if (!m_preloadScanner) {
            m_preloadScanner.set(new HTMLPreloadScanner(document()));
            m_preloadScanner-&gt;appendToEnd(m_input.current());
        }
        m_preloadScanner-&gt;scan();
...

If the blocking happens early in the document (as it often does) most of the document text gets re-scanned.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>383148</commentid>
    <comment_count>1</comment_count>
      <attachid>89001</attachid>
    <who name="Antti Koivisto">koivisto</who>
    <bug_when>2011-04-11 06:47:48 -0700</bug_when>
    <thetext>Created attachment 89001
patch</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>383214</commentid>
    <comment_count>2</comment_count>
    <who name="Tony Gentilcore">tonyg</who>
    <bug_when>2011-04-11 09:30:37 -0700</bug_when>
    <thetext>The fix looks good, but I&apos;m slightly sad that we can&apos;t add a test to make sure this doesn&apos;t regress. I can&apos;t think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>383257</commentid>
    <comment_count>3</comment_count>
    <who name="Antti Koivisto">koivisto</who>
    <bug_when>2011-04-11 10:39:06 -0700</bug_when>
    <thetext>(In reply to comment #2)
&gt; The fix looks good, but I&apos;m slightly sad that we can&apos;t add a test to make sure this doesn&apos;t regress. I can&apos;t think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614

It is kinda hard since you need to run against a (throttled) web server to trigger the preload scanner, need a huge test content to make tokenization time observable and need to construct measurement so that the load throttling itself won&apos;t dominate. I don&apos;t think it would be worth the effort.

On the other hand the issue is easily observable with a sampling profiler on real world web sites. We would probably notice if it regressed.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>383299</commentid>
    <comment_count>4</comment_count>
    <who name="Antti Koivisto">koivisto</who>
    <bug_when>2011-04-11 11:23:01 -0700</bug_when>
    <thetext>http://trac.webkit.org/changeset/83456</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>383346</commentid>
    <comment_count>5</comment_count>
    <who name="Antti Koivisto">koivisto</who>
    <bug_when>2011-04-11 12:08:37 -0700</bug_when>
    <thetext>Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>383399</commentid>
    <comment_count>6</comment_count>
    <who name="Tony Gentilcore">tonyg</who>
    <bug_when>2011-04-11 12:59:34 -0700</bug_when>
    <thetext>(In reply to comment #5)
&gt; Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished.

Oh, good point. Are you putting together a follow-up?</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>89001</attachid>
            <date>2011-04-11 06:47:48 -0700</date>
            <delta_ts>2011-04-11 10:48:52 -0700</delta_ts>
            <desc>patch</desc>
            <filename>preload-clear.patch</filename>
            <type>text/plain</type>
            <size>3412</size>
            <attacher name="Antti Koivisto">koivisto</attacher>
            
              <data encoding="base64">SW5kZXg6IFNvdXJjZS9XZWJDb3JlL0NoYW5nZUxvZwo9PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBTb3VyY2UvV2Vi
Q29yZS9DaGFuZ2VMb2cJKHJldmlzaW9uIDgzNDM2KQorKysgU291cmNlL1dlYkNvcmUvQ2hhbmdl
TG9nCSh3b3JraW5nIGNvcHkpCkBAIC0xLDMgKzEsMjcgQEAKKzIwMTEtMDQtMTEgIEFudHRpIEtv
aXZpc3RvICA8YW50dGlAYXBwbGUuY29tPgorCisgICAgICAgIFJldmlld2VkIGJ5IE5PQk9EWSAo
T09QUyEpLgorCisgICAgICAgIERvY3VtZW50IHNvdXJjZSBwcmVsb2FkIHNjYW5uZWQgcmVwZWF0
ZWRseQorICAgICAgICBodHRwczovL2J1Z3Mud2Via2l0Lm9yZy9zaG93X2J1Zy5jZ2k/aWQ9NTgx
MjMKKyAgICAgICAgCisgICAgICAgIERvbid0IGNsZWFyIHRoZSBwcmVsb2FkIHNjYW5uZXIgYWZ0
ZXIgZXhlY3V0aW9uIHJlc3VtZXMuIFRoaXMgd291bGQgbG9zZSB0aGUgY3VycmVudAorICAgICAg
ICBzY2FubmluZyBwb2ludCBhbmQgbGVhZCB0byByZXNjYW5uaW5nIHdoZW4gdGhlIHByZWxvYWQg
c2Nhbm5lciB3b3VsZCBnZXQgcmVjb25zdHJ1Y3RlZAorICAgICAgICBkdWUgdG8gbWFpbiBwYXJz
ZXIgYmxvY2tpbmcgYWdhaW4uCisKKyAgICAgICAgSW5zdGVhZCBjbGVhciB0aGUgc2Nhbm5lciBv
bmx5IGluIHRoZSBzcGVjaWZpYyBjYXNlIG9mIHJlY2VpdmluZyBuZXcgZGF0YSB3aGlsZSB0aGUg
bWFpbgorICAgICAgICBwYXJzZXIgaGFzIGFscmVhZHkgcmVhY2hlZCB0aGUgZW5kIG9mIHRoZSBj
dXJyZW50IGlucHV0LgorCisgICAgICAgIEFsc28gc3dpdGNoZWQgdG8gdXNpbmcgaXNXYWl0aW5n
Rm9yU2NyaXB0cygpIGluc3RlYWQgb2YgbV90cmVlQnVpbGRlci0+aXNQYXVzZWQoKSBmb3IgY29u
c2lzdGVuY3kuCisKKyAgICAgICAgVGhlIGNhc2UgdGhlIGNsZWFyaW5nIGluIHJlc3VtZVBhcnNp
bmdBZnRlclNjcmlwdEV4ZWN1dGlvbigpIHdhcyBhZGRlZCBmb3IgaXMgY292ZXJlZCBieSAKKyAg
ICAgICAgaHR0cC90ZXN0cy9sb2FkaW5nL3ByZWxvYWQtc2xvdy1sb2FkaW5nLnBocC4KKworICAg
ICAgICAqIGh0bWwvcGFyc2VyL0hUTUxEb2N1bWVudFBhcnNlci5jcHA6CisgICAgICAgIChXZWJD
b3JlOjpIVE1MRG9jdW1lbnRQYXJzZXI6Omluc2VydCk6CisgICAgICAgIChXZWJDb3JlOjpIVE1M
RG9jdW1lbnRQYXJzZXI6OmFwcGVuZCk6CisgICAgICAgIChXZWJDb3JlOjpIVE1MRG9jdW1lbnRQ
YXJzZXI6OnJlc3VtZVBhcnNpbmdBZnRlclNjcmlwdEV4ZWN1dGlvbik6CisKIDIwMTEtMDQtMTEg
IFBhdmVsIEZlbGRtYW4gIDxwZmVsZG1hbkBjaHJvbWl1bS5vcmc+CiAKICAgICAgICAgUmV2aWV3
ZWQgYnkgWXVyeSBTZW1pa2hhdHNreS4KSW5kZXg6IFNvdXJjZS9XZWJDb3JlL2h0bWwvcGFyc2Vy
L0hUTUxEb2N1bWVudFBhcnNlci5jcHAKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gU291cmNlL1dlYkNvcmUvaHRt
bC9wYXJzZXIvSFRNTERvY3VtZW50UGFyc2VyLmNwcAkocmV2aXNpb24gODMzMjEpCisrKyBTb3Vy
Y2UvV2ViQ29yZS9odG1sL3BhcnNlci9IVE1MRG9jdW1lbnRQYXJzZXIuY3BwCSh3b3JraW5nIGNv
cHkpCkBAIC0zMjQsNyArMzI0LDcgQEAgdm9pZCBIVE1MRG9jdW1lbnRQYXJzZXI6Omluc2VydChj
b25zdCBTZQogICAgIG1faW5wdXQuaW5zZXJ0QXRDdXJyZW50SW5zZXJ0aW9uUG9pbnQoZXhjbHVk
ZWRMaW5lTnVtYmVyU291cmNlKTsKICAgICBwdW1wVG9rZW5pemVySWZQb3NzaWJsZShGb3JjZVN5
bmNocm9ub3VzKTsKICAgICAKLSAgICBpZiAobV90cmVlQnVpbGRlci0+aXNQYXVzZWQoKSkgewor
ICAgIGlmIChpc1dhaXRpbmdGb3JTY3JpcHRzKCkpIHsKICAgICAgICAgLy8gQ2hlY2sgdGhlIGRv
Y3VtZW50LndyaXRlKCkgb3V0cHV0IHdpdGggYSBzZXBhcmF0ZSBwcmVsb2FkIHNjYW5uZXIgYXMK
ICAgICAgICAgLy8gdGhlIG1haW4gc2Nhbm5lciBjYW4ndCBkZWFsIHdpdGggaW5zZXJ0aW9ucy4K
ICAgICAgICAgSFRNTFByZWxvYWRTY2FubmVyIHByZWxvYWRTY2FubmVyKGRvY3VtZW50KCkpOwpA
QCAtMzQ0LDEzICszNDQsMjAgQEAgdm9pZCBIVE1MRG9jdW1lbnRQYXJzZXI6OmFwcGVuZChjb25z
dCBTZQogICAgIC8vIGJ1dCB3ZSBuZWVkIHRvIGVuc3VyZSBpdCBpc24ndCBkZWxldGVkIHlldC4K
ICAgICBSZWZQdHI8SFRNTERvY3VtZW50UGFyc2VyPiBwcm90ZWN0KHRoaXMpOwogCi0gICAgbV9p
bnB1dC5hcHBlbmRUb0VuZChzb3VyY2UpOwogICAgIGlmIChtX3ByZWxvYWRTY2FubmVyKSB7Ci0g
ICAgICAgIG1fcHJlbG9hZFNjYW5uZXItPmFwcGVuZFRvRW5kKHNvdXJjZSk7Ci0gICAgICAgIGlm
IChtX3RyZWVCdWlsZGVyLT5pc1BhdXNlZCgpKQotICAgICAgICAgICAgbV9wcmVsb2FkU2Nhbm5l
ci0+c2NhbigpOworICAgICAgICBpZiAobV9pbnB1dC5jdXJyZW50KCkuaXNFbXB0eSgpICYmICFp
c1dhaXRpbmdGb3JTY3JpcHRzKCkpIHsKKyAgICAgICAgICAgIC8vIFdlIGhhdmUgcGFyc2VkIHVu
dGlsIHRoZSBlbmQgb2YgdGhlIGN1cnJlbnQgaW5wdXQgYW5kIHNvIGFyZSBub3cgbW92aW5nIGFo
ZWFkIG9mIHRoZSBwcmVsb2FkIHNjYW5uZXIuCisgICAgICAgICAgICAvLyBDbGVhciB0aGUgc2Nh
bm5lciBzbyB3ZSBrbm93IHRvIHNjYW4gc3RhcnRpbmcgZnJvbSB0aGUgY3VycmVudCBpbnB1dCBw
b2ludCBpZiB3ZSBibG9jayBhZ2Fpbi4KKyAgICAgICAgICAgIG1fcHJlbG9hZFNjYW5uZXIuY2xl
YXIoKTsKKyAgICAgICAgfSBlbHNlIHsKKyAgICAgICAgICAgIG1fcHJlbG9hZFNjYW5uZXItPmFw
cGVuZFRvRW5kKHNvdXJjZSk7CisgICAgICAgICAgICBpZiAoaXNXYWl0aW5nRm9yU2NyaXB0cygp
KQorICAgICAgICAgICAgICAgIG1fcHJlbG9hZFNjYW5uZXItPnNjYW4oKTsKKyAgICAgICAgfQog
ICAgIH0KIAorICAgIG1faW5wdXQuYXBwZW5kVG9FbmQoc291cmNlKTsKKwogICAgIGlmIChpblB1
bXBTZXNzaW9uKCkpIHsKICAgICAgICAgLy8gV2UndmUgZ290dGVuIGRhdGEgb2ZmIHRoZSBuZXR3
b3JrIGluIGEgbmVzdGVkIHdyaXRlLgogICAgICAgICAvLyBXZSBkb24ndCB3YW50IHRvIGNvbnN1
bWUgYW55IG1vcmUgb2YgdGhlIGlucHV0IHN0cmVhbSBub3cuICBEbwpAQCAtNDcwLDcgKzQ3Nyw2
IEBAIHZvaWQgSFRNTERvY3VtZW50UGFyc2VyOjpyZXN1bWVQYXJzaW5nQWYKICAgICBBU1NFUlQo
IWluU2NyaXB0RXhlY3V0aW9uKCkpOwogICAgIEFTU0VSVCghbV90cmVlQnVpbGRlci0+aXNQYXVz
ZWQoKSk7CiAKLSAgICBtX3ByZWxvYWRTY2FubmVyLmNsZWFyKCk7CiAgICAgcHVtcFRva2VuaXpl
cklmUG9zc2libGUoQWxsb3dZaWVsZCk7CiAgICAgZW5kSWZEZWxheWVkKCk7CiB9Cg==
</data>
<flag name="review"
          id="81624"
          type_id="1"
          status="+"
          setter="tonyg"
    />
          </attachment>
      

    </bug>

</bugzilla>