Bug 40417 - HTML5Tokenizer needs to tell the InspectorTimelineAgent before and after it writes
Summary: HTML5Tokenizer needs to tell the InspectorTimelineAgent before and after it w...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks: 39259
  Show dependency treegraph
 
Reported: 2010-06-10 03:57 PDT by Eric Seidel (no email)
Modified: 2010-06-12 00:23 PDT (History)
6 users (show)

See Also:


Attachments
Patch (2.92 KB, patch)
2010-06-10 04:01 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Another wip patch (3.03 KB, patch)
2010-06-10 18:25 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Updated after conversation with James, ready for review (5.18 KB, patch)
2010-06-11 15:16 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2010-06-10 03:57:05 PDT
HTML5Tokenizer needs to tell the InspectorTimelineAgent before and after it writes
Comment 1 Eric Seidel (no email) 2010-06-10 04:00:54 PDT
I'm not 100% certain if we're supposed to call the willWrite/didWrite methods before every time we pump the lexer, or just during the write() call.  a write() call might not actually finish the block in question.  If the timeline agent is trying to know every time we're pumping the lexer it will want slightly different calls.

Then again, I dont' think the old tokenizer "correctly" recorded resumes etc. either, so I think this code may just need some further tweaking from the inspector timeline folks.  It would be easy to move these calls to inside pumpLexer() if that's what's desired.
Comment 2 Eric Seidel (no email) 2010-06-10 04:01:03 PDT
Created attachment 58359 [details]
Patch
Comment 3 Eric Seidel (no email) 2010-06-10 04:03:55 PDT
Comment on attachment 58359 [details]
Patch

Thinking about this more, I think we do want to move this to inside pumpLexer.  We'd need a more complicated test which caused the parser to pause due to script loads to see any difference.

I think the old code reported the wrong source since when the parser resumed it would report an empty SegmentedString to the inspector.
Comment 4 Adam Barth 2010-06-10 10:16:37 PDT
Comment on attachment 58359 [details]
Patch

http://wkrietveld.appspot.com/40417/diff/1/3
File WebCore/html/HTML5Tokenizer.cpp (right):

http://wkrietveld.appspot.com/40417/diff/1/3#newcode160
WebCore/html/HTML5Tokenizer.cpp:160: willWriteHTML(source);
Are we supposed to call this when appendData is true?  If so, why don't we call when write() is nested and appendData is true?
Comment 5 Eric Seidel (no email) 2010-06-10 11:16:43 PDT
I think we're supposed to call this in pumpLexer.  The question is what SegementedString we're supposed to pass it then.

The old tokenizer used write() as the pump lexer, basically.  I also think that even in the old system this code commonly got strange results for the SegementedString.
Comment 6 Eric Seidel (no email) 2010-06-10 18:25:02 PDT
Created attachment 58431 [details]
Another wip patch
Comment 7 Eric Seidel (no email) 2010-06-10 18:28:37 PDT
This current WIP causes this diff:

--- /tmp/layout-test-results/inspector/timeline-script-tag-1-expected.txt	2010-06-10 18:18:48.000000000 -0700
+++ /tmp/layout-test-results/inspector/timeline-script-tag-1-actual.txt	2010-06-10 18:18:48.000000000 -0700
@@ -1,6 +1,7 @@
 Tests the Timeline API instrumentation of an HTML script tag.
 
 ParseHTML
+ParseHTML
 ----> EvaluateScript
 --------> MarkTimeline : SCRIPT TAG
 EvaluateScript Properties:
@@ -16,4 +17,5 @@
     usedHeapSize : <number>
     totalHeapSize : <number>
 }
+ParseHTML
 

I think we could avoid that by changing the pump loop into a do {} while, and having it check to see if we have any tokens to process before we inform the timeline agent.

Then again, I'm still not really sure what the InspectorTimelineAgent is looking for here.  CCing some inspector folks.
Comment 8 James Robinson 2010-06-11 13:16:58 PDT
The intent is to bracket HTML processing (initiated from the network or document.write() or .innerHTML) with willwrite/didwrite calls so that something reasonable can be placed on the timeline.  The string itself is just used for char count, which is supposed to be the number of characters processed but I'm pretty sure it's buggy in some cases due to pause/resumes.  The char count is of marginal utility anyway - the important bit is having some way for the developer looking at the timeline to know what piece of HTML was being processed if it ended up taking a long time.
Comment 9 Eric Seidel (no email) 2010-06-11 14:05:27 PDT
I talked with James on #webkit.  He agrees we should give the inspector information about every pump and it should ignore any 0-length pumps, etc.  So the current patch is correct.  I'll upload a copy with more comments/explanation.

James and I also discussed that eventually the inspector should move the charCount from willWrite to didWrite at which time the HTML5Lexer could provide exact numbers of how many chars it processed in that pump, instead of the current number which is always the count of how many chars it has left in its buffer.  The old Tokenizer always provided the number of chars passed to write() which was 0 in the case of parser resumes.  So both are wrong in different ways with the current behavior of passing the charCount to willWrite() instead of didWrite().  But that's a separate issue. :)
Comment 10 Eric Seidel (no email) 2010-06-11 15:16:30 PDT
Created attachment 58512 [details]
Updated after conversation with James, ready for review
Comment 11 Adam Barth 2010-06-11 15:39:20 PDT
Comment on attachment 58512 [details]
Updated after conversation with James, ready for review

okiedokes
Comment 12 WebKit Commit Bot 2010-06-12 00:23:29 PDT
Comment on attachment 58512 [details]
Updated after conversation with James, ready for review

Clearing flags on attachment: 58512

Committed r61055: <http://trac.webkit.org/changeset/61055>
Comment 13 WebKit Commit Bot 2010-06-12 00:23:37 PDT
All reviewed patches have been landed.  Closing bug.