Bug 40645 - Fragment parsing needs to go through the HTML5 Parser code path
Summary: Fragment parsing needs to go through the HTML5 Parser code path
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on: 40657
Blocks: 46692 39259
  Show dependency treegraph
 
Reported: 2010-06-15 16:49 PDT by Eric Seidel (no email)
Modified: 2011-01-19 19:07 PST (History)
9 users (show)

See Also:


Attachments
work in progress (18.24 KB, patch)
2010-06-15 16:52 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (37.34 KB, patch)
2010-06-21 17:00 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (38.13 KB, patch)
2010-06-21 17:20 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Updated per Adam's review comments (46.01 KB, patch)
2010-06-21 21:30 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Hopefully fix the EWS bots (47.41 KB, patch)
2010-06-21 21:51 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
helps if you git add the new file (49.43 KB, patch)
2010-06-21 22:12 PDT, Eric Seidel (no email)
abarth: review+
abarth: commit-queue+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2010-06-15 16:49:05 PDT
Fragment parsing needs to go through the HTML5 Parser code path
Comment 1 Eric Seidel (no email) 2010-06-15 16:52:55 PDT
Created attachment 58837 [details]
work in progress
Comment 2 Eric Seidel (no email) 2010-06-15 16:58:10 PDT
The current patch crashes in a few tests at:
bool peek(SegmentedString& source, int& lineNumber)
{
    m_nextInputCharacter = *source;

under the fragment parsing path.  I do not yet know why.

This change could probably be simplified some.  The fragment parsing path is a mess anyway.  Not really sure how to clean it up.  Part of me feels like these parse*DocumentFragment methods belong in their own header or on DocumentFragment iself, not sure.  For the HTML5 codepath I put the call on HTML5DocumentParser so I could make the DocumentFragment* version of the constructor private.
Comment 3 Eric Seidel (no email) 2010-06-15 16:59:08 PDT
editing/pasteboard/copy-crash.html	stderr
fast/forms/textarea-textlength.html	stderr
platform/mac/fast/text/ligature-subdivision.html	expected	actual	diff	pretty diff	stderr

Are the tests which crash.

I suspect we might end up calling end() on the DocumentParser, and thus trying to delete a stack pointer.  Although I would expect the delete to fail, not random later crashes.
Comment 4 Eric Seidel (no email) 2010-06-15 17:06:46 PDT
My end() theory is bogus.  The Document doesn't own this DocumentParser, so it never deletes it. :)
Comment 5 Eric Seidel (no email) 2010-06-16 00:33:38 PDT
The crash is caused by bug 40657.
Comment 6 Eric Seidel (no email) 2010-06-16 16:07:06 PDT
I've documented the failures here:
https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en#gid=6

I may find a way to land this code "off" so that we can attack those failures in a distributed fashion like we did for the main parser conversion.
Comment 7 Eric Seidel (no email) 2010-06-21 17:00:57 PDT
Created attachment 59314 [details]
Patch
Comment 8 WebKit Review Bot 2010-06-21 17:02:53 PDT
Attachment 59314 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/dom/Element.cpp:45:  Alphabetical sorting problem.  [build/include_order] [4]
WebCore/html/HTML5TreeBuilder.cpp:34:  Alphabetical sorting problem.  [build/include_order] [4]
WebCore/xml/XSLTProcessor.cpp:36:  Alphabetical sorting problem.  [build/include_order] [4]
WebCore/html/HTMLElement.cpp:42:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 4 in 23 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Eric Seidel (no email) 2010-06-21 17:20:30 PDT
Created attachment 59317 [details]
Patch
Comment 10 Adam Barth 2010-06-21 17:25:28 PDT
Comment on attachment 59314 [details]
Patch

WebCore/html/HTML5DocumentParser.cpp:372
 +  // This call is virtual.
call -> function

WebCore/html/HTML5DocumentParser.cpp:378
 +  // This call is used throughout the implementation.
call -> function

WebCore/html/HTML5DocumentParser.cpp:494
 +          && document->page()->settings()->html5ParserEnabled();
We should reverse the default when we can't find the Settings object at some point.  This occurs during boot.

WebCore/html/HTML5DocumentParser.cpp:503
 +      parser.write(source, false);
false here because we're not from the network.  I guess that interacts properly with the preload scanner.

WebCore/html/HTML5DocumentParser.h:56
 +      static void parseHTMLDocumentFragment(const String&, DocumentFragment*, FragmentScriptingPermission = FragmentScriptingAllowed);
I feel like this should be a method of the DocumentFragment object.  (Followup patch?)

WebCore/html/HTML5TreeBuilder.cpp:207
 +                      m_lastScriptElement->removeChildren();
Crazy

WebCore/html/HTML5TreeBuilder.h:30
 +  #include "MappedAttributeEntry.h" // For FragmentScriptingPermission!?
We should move FragmentScriptingPermission
Comment 11 Eric Seidel (no email) 2010-06-21 17:26:44 PDT
Btw, the horrible FragmentScriptingPermission hack exists just for:
http://trac.webkit.org/browser/trunk/LayoutTests/editing/pasteboard/paste-visible-script.html
Which came from:
http://trac.webkit.org/changeset/53659
https://bugs.webkit.org/show_bug.cgi?id=33970
Comment 12 Adam Barth 2010-06-21 17:31:21 PDT
That patch makes zero sense.  It doesn't seem to block <img src="about:blank" onerror="alert(1)">.
Comment 13 Adam Barth 2010-06-21 17:33:14 PDT
It also randomly nukes <dog happyhref="javascript:potato"></dog>
Comment 14 Adam Barth 2010-06-21 17:35:28 PDT
In any case, that doesn't block us here, but we should either make that check bulletproof or get rid of it entirely.  A half-working hack helps no one.
Comment 15 Eric Seidel (no email) 2010-06-21 21:30:07 PDT
Created attachment 59333 [details]
Updated per Adam's review comments
Comment 16 Adam Barth 2010-06-21 21:32:28 PDT
(In reply to comment #12)
> That patch makes zero sense.  It doesn't seem to block <img src="about:blank" onerror="alert(1)">.

I could be wrong about this one.  :)
Comment 17 Adam Barth 2010-06-21 21:38:50 PDT
Comment on attachment 59333 [details]
Updated per Adam's review comments

Politics aside, this is a good patch.  Thanks.  :)

LayoutTests/fast/tokenizer/lessthan-terminates-tags-and-attrs-expected.txt:5
 +  FAIL: <foo<bar> parsed as <foo<bar></foo<bar> instead of <foo><bar></bar></foo>
Can you double-check this in minefield?

WebCore/html/HTML5TreeBuilder.h:113
 +      // FIXME: FragmentScriptingPermission is a HACK for platform/Pasteboard.
This seems a bit over the top, but ok.
Comment 18 Eric Seidel (no email) 2010-06-21 21:39:37 PDT
(In reply to comment #17)
> LayoutTests/fast/tokenizer/lessthan-terminates-tags-and-attrs-expected.txt:5
>  +  FAIL: <foo<bar> parsed as <foo<bar></foo<bar> instead of <foo><bar></bar></foo>
> Can you double-check this in minefield?

Yup.  Minefield agrees.
Comment 19 Early Warning System Bot 2010-06-21 21:44:11 PDT
Attachment 59333 [details] did not build on qt:
Build output: http://webkit-commit-queue.appspot.com/results/3306541
Comment 20 Eric Seidel (no email) 2010-06-21 21:51:13 PDT
Created attachment 59334 [details]
Hopefully fix the EWS bots
Comment 21 Early Warning System Bot 2010-06-21 22:04:20 PDT
Attachment 59334 [details] did not build on qt:
Build output: http://webkit-commit-queue.appspot.com/results/3286565
Comment 22 Eric Seidel (no email) 2010-06-21 22:08:03 PDT
OK, so what am I missing?  I edited the .pro file to include the header...
Comment 23 WebKit Review Bot 2010-06-21 22:08:55 PDT
Attachment 59333 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3330559
Comment 24 Eric Seidel (no email) 2010-06-21 22:12:32 PDT
Created attachment 59337 [details]
helps if you git add the new file
Comment 25 WebKit Review Bot 2010-06-21 22:25:23 PDT
Attachment 59333 [details] did not build on gtk:
Build output: http://webkit-commit-queue.appspot.com/results/3287557
Comment 26 WebKit Review Bot 2010-06-21 22:35:11 PDT
Attachment 59334 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3338594
Comment 27 Adam Barth 2010-06-21 22:53:38 PDT
Comment on attachment 59337 [details]
helps if you git add the new file

I trust you got all the compile demons out of the patch.
Comment 28 Adam Barth 2010-06-22 16:44:17 PDT
Committed r61637: <http://trac.webkit.org/changeset/61637>
Comment 29 WebKit Review Bot 2010-06-22 17:11:14 PDT
http://trac.webkit.org/changeset/61637 might have broken Qt Linux Release
The following changes are on the blame list:
http://trac.webkit.org/changeset/61636
http://trac.webkit.org/changeset/61637
Comment 30 Eric Seidel (no email) 2010-06-22 17:20:09 PDT
This checkin exposed a bug in the Qt clipboard code:
https://bugs.webkit.org/show_bug.cgi?id=41025