Bug 40645

Summary: Fragment parsing needs to go through the HTML5 Parser code path
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: New BugsAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dglazkov, eric, gustavo, ossy, tonyg, webkit-ews, webkit.review.bot, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Bug Depends on: 40657    
Bug Blocks: 46692, 39259    
Attachments:
Description Flags
work in progress
none
Patch
none
Patch
none
Updated per Adam's review comments
none
Hopefully fix the EWS bots
none
helps if you git add the new file abarth: review+, abarth: commit-queue+

Description Eric Seidel (no email) 2010-06-15 16:49:05 PDT
Fragment parsing needs to go through the HTML5 Parser code path
Comment 1 Eric Seidel (no email) 2010-06-15 16:52:55 PDT
Created attachment 58837 [details]
work in progress
Comment 2 Eric Seidel (no email) 2010-06-15 16:58:10 PDT
The current patch crashes in a few tests at:
bool peek(SegmentedString& source, int& lineNumber)
{
    m_nextInputCharacter = *source;

under the fragment parsing path.  I do not yet know why.

This change could probably be simplified some.  The fragment parsing path is a mess anyway.  Not really sure how to clean it up.  Part of me feels like these parse*DocumentFragment methods belong in their own header or on DocumentFragment iself, not sure.  For the HTML5 codepath I put the call on HTML5DocumentParser so I could make the DocumentFragment* version of the constructor private.
Comment 3 Eric Seidel (no email) 2010-06-15 16:59:08 PDT
editing/pasteboard/copy-crash.html	stderr
fast/forms/textarea-textlength.html	stderr
platform/mac/fast/text/ligature-subdivision.html	expected	actual	diff	pretty diff	stderr

Are the tests which crash.

I suspect we might end up calling end() on the DocumentParser, and thus trying to delete a stack pointer.  Although I would expect the delete to fail, not random later crashes.
Comment 4 Eric Seidel (no email) 2010-06-15 17:06:46 PDT
My end() theory is bogus.  The Document doesn't own this DocumentParser, so it never deletes it. :)
Comment 5 Eric Seidel (no email) 2010-06-16 00:33:38 PDT
The crash is caused by bug 40657.
Comment 6 Eric Seidel (no email) 2010-06-16 16:07:06 PDT
I've documented the failures here:
https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en#gid=6

I may find a way to land this code "off" so that we can attack those failures in a distributed fashion like we did for the main parser conversion.
Comment 7 Eric Seidel (no email) 2010-06-21 17:00:57 PDT
Created attachment 59314 [details]
Patch
Comment 8 WebKit Review Bot 2010-06-21 17:02:53 PDT
Attachment 59314 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/dom/Element.cpp:45:  Alphabetical sorting problem.  [build/include_order] [4]
WebCore/html/HTML5TreeBuilder.cpp:34:  Alphabetical sorting problem.  [build/include_order] [4]
WebCore/xml/XSLTProcessor.cpp:36:  Alphabetical sorting problem.  [build/include_order] [4]
WebCore/html/HTMLElement.cpp:42:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 4 in 23 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Eric Seidel (no email) 2010-06-21 17:20:30 PDT
Created attachment 59317 [details]
Patch
Comment 10 Adam Barth 2010-06-21 17:25:28 PDT
Comment on attachment 59314 [details]
Patch

WebCore/html/HTML5DocumentParser.cpp:372
 +  // This call is virtual.
call -> function

WebCore/html/HTML5DocumentParser.cpp:378
 +  // This call is used throughout the implementation.
call -> function

WebCore/html/HTML5DocumentParser.cpp:494
 +          && document->page()->settings()->html5ParserEnabled();
We should reverse the default when we can't find the Settings object at some point.  This occurs during boot.

WebCore/html/HTML5DocumentParser.cpp:503
 +      parser.write(source, false);
false here because we're not from the network.  I guess that interacts properly with the preload scanner.

WebCore/html/HTML5DocumentParser.h:56
 +      static void parseHTMLDocumentFragment(const String&, DocumentFragment*, FragmentScriptingPermission = FragmentScriptingAllowed);
I feel like this should be a method of the DocumentFragment object.  (Followup patch?)

WebCore/html/HTML5TreeBuilder.cpp:207
 +                      m_lastScriptElement->removeChildren();
Crazy

WebCore/html/HTML5TreeBuilder.h:30
 +  #include "MappedAttributeEntry.h" // For FragmentScriptingPermission!?
We should move FragmentScriptingPermission
Comment 11 Eric Seidel (no email) 2010-06-21 17:26:44 PDT
Btw, the horrible FragmentScriptingPermission hack exists just for:
http://trac.webkit.org/browser/trunk/LayoutTests/editing/pasteboard/paste-visible-script.html
Which came from:
http://trac.webkit.org/changeset/53659
https://bugs.webkit.org/show_bug.cgi?id=33970
Comment 12 Adam Barth 2010-06-21 17:31:21 PDT
That patch makes zero sense.  It doesn't seem to block <img src="about:blank" onerror="alert(1)">.
Comment 13 Adam Barth 2010-06-21 17:33:14 PDT
It also randomly nukes <dog happyhref="javascript:potato"></dog>
Comment 14 Adam Barth 2010-06-21 17:35:28 PDT
In any case, that doesn't block us here, but we should either make that check bulletproof or get rid of it entirely.  A half-working hack helps no one.
Comment 15 Eric Seidel (no email) 2010-06-21 21:30:07 PDT
Created attachment 59333 [details]
Updated per Adam's review comments
Comment 16 Adam Barth 2010-06-21 21:32:28 PDT
(In reply to comment #12)
> That patch makes zero sense.  It doesn't seem to block <img src="about:blank" onerror="alert(1)">.

I could be wrong about this one.  :)
Comment 17 Adam Barth 2010-06-21 21:38:50 PDT
Comment on attachment 59333 [details]
Updated per Adam's review comments

Politics aside, this is a good patch.  Thanks.  :)

LayoutTests/fast/tokenizer/lessthan-terminates-tags-and-attrs-expected.txt:5
 +  FAIL: <foo<bar> parsed as <foo<bar></foo<bar> instead of <foo><bar></bar></foo>
Can you double-check this in minefield?

WebCore/html/HTML5TreeBuilder.h:113
 +      // FIXME: FragmentScriptingPermission is a HACK for platform/Pasteboard.
This seems a bit over the top, but ok.
Comment 18 Eric Seidel (no email) 2010-06-21 21:39:37 PDT
(In reply to comment #17)
> LayoutTests/fast/tokenizer/lessthan-terminates-tags-and-attrs-expected.txt:5
>  +  FAIL: <foo<bar> parsed as <foo<bar></foo<bar> instead of <foo><bar></bar></foo>
> Can you double-check this in minefield?

Yup.  Minefield agrees.
Comment 19 Early Warning System Bot 2010-06-21 21:44:11 PDT
Attachment 59333 [details] did not build on qt:
Build output: http://webkit-commit-queue.appspot.com/results/3306541
Comment 20 Eric Seidel (no email) 2010-06-21 21:51:13 PDT
Created attachment 59334 [details]
Hopefully fix the EWS bots
Comment 21 Early Warning System Bot 2010-06-21 22:04:20 PDT
Attachment 59334 [details] did not build on qt:
Build output: http://webkit-commit-queue.appspot.com/results/3286565
Comment 22 Eric Seidel (no email) 2010-06-21 22:08:03 PDT
OK, so what am I missing?  I edited the .pro file to include the header...
Comment 23 WebKit Review Bot 2010-06-21 22:08:55 PDT
Attachment 59333 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3330559
Comment 24 Eric Seidel (no email) 2010-06-21 22:12:32 PDT
Created attachment 59337 [details]
helps if you git add the new file
Comment 25 WebKit Review Bot 2010-06-21 22:25:23 PDT
Attachment 59333 [details] did not build on gtk:
Build output: http://webkit-commit-queue.appspot.com/results/3287557
Comment 26 WebKit Review Bot 2010-06-21 22:35:11 PDT
Attachment 59334 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3338594
Comment 27 Adam Barth 2010-06-21 22:53:38 PDT
Comment on attachment 59337 [details]
helps if you git add the new file

I trust you got all the compile demons out of the patch.
Comment 28 Adam Barth 2010-06-22 16:44:17 PDT
Committed r61637: <http://trac.webkit.org/changeset/61637>
Comment 29 WebKit Review Bot 2010-06-22 17:11:14 PDT
http://trac.webkit.org/changeset/61637 might have broken Qt Linux Release
The following changes are on the blame list:
http://trac.webkit.org/changeset/61636
http://trac.webkit.org/changeset/61637
Comment 30 Eric Seidel (no email) 2010-06-22 17:20:09 PDT
This checkin exposed a bug in the Qt clipboard code:
https://bugs.webkit.org/show_bug.cgi?id=41025