Bug 40645

Summary: Fragment parsing needs to go through the HTML5 Parser code path
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: New BugsAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dglazkov, eric, gustavo, ossy, tonyg, webkit-ews, webkit.review.bot, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Bug Depends on: 40657    
Bug Blocks: 46692, 39259    
Attachments:
Description Flags
work in progress
none
Patch
none
Patch
none
Updated per Adam's review comments
none
Hopefully fix the EWS bots
none
helps if you git add the new file abarth: review+, abarth: commit-queue+

Eric Seidel (no email)
Reported 2010-06-15 16:49:05 PDT
Fragment parsing needs to go through the HTML5 Parser code path
Attachments
work in progress (18.24 KB, patch)
2010-06-15 16:52 PDT, Eric Seidel (no email)
no flags
Patch (37.34 KB, patch)
2010-06-21 17:00 PDT, Eric Seidel (no email)
no flags
Patch (38.13 KB, patch)
2010-06-21 17:20 PDT, Eric Seidel (no email)
no flags
Updated per Adam's review comments (46.01 KB, patch)
2010-06-21 21:30 PDT, Eric Seidel (no email)
no flags
Hopefully fix the EWS bots (47.41 KB, patch)
2010-06-21 21:51 PDT, Eric Seidel (no email)
no flags
helps if you git add the new file (49.43 KB, patch)
2010-06-21 22:12 PDT, Eric Seidel (no email)
abarth: review+
abarth: commit-queue+
Eric Seidel (no email)
Comment 1 2010-06-15 16:52:55 PDT
Created attachment 58837 [details] work in progress
Eric Seidel (no email)
Comment 2 2010-06-15 16:58:10 PDT
The current patch crashes in a few tests at: bool peek(SegmentedString& source, int& lineNumber) { m_nextInputCharacter = *source; under the fragment parsing path. I do not yet know why. This change could probably be simplified some. The fragment parsing path is a mess anyway. Not really sure how to clean it up. Part of me feels like these parse*DocumentFragment methods belong in their own header or on DocumentFragment iself, not sure. For the HTML5 codepath I put the call on HTML5DocumentParser so I could make the DocumentFragment* version of the constructor private.
Eric Seidel (no email)
Comment 3 2010-06-15 16:59:08 PDT
editing/pasteboard/copy-crash.html stderr fast/forms/textarea-textlength.html stderr platform/mac/fast/text/ligature-subdivision.html expected actual diff pretty diff stderr Are the tests which crash. I suspect we might end up calling end() on the DocumentParser, and thus trying to delete a stack pointer. Although I would expect the delete to fail, not random later crashes.
Eric Seidel (no email)
Comment 4 2010-06-15 17:06:46 PDT
My end() theory is bogus. The Document doesn't own this DocumentParser, so it never deletes it. :)
Eric Seidel (no email)
Comment 5 2010-06-16 00:33:38 PDT
The crash is caused by bug 40657.
Eric Seidel (no email)
Comment 6 2010-06-16 16:07:06 PDT
I've documented the failures here: https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en#gid=6 I may find a way to land this code "off" so that we can attack those failures in a distributed fashion like we did for the main parser conversion.
Eric Seidel (no email)
Comment 7 2010-06-21 17:00:57 PDT
WebKit Review Bot
Comment 8 2010-06-21 17:02:53 PDT
Attachment 59314 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 WebCore/dom/Element.cpp:45: Alphabetical sorting problem. [build/include_order] [4] WebCore/html/HTML5TreeBuilder.cpp:34: Alphabetical sorting problem. [build/include_order] [4] WebCore/xml/XSLTProcessor.cpp:36: Alphabetical sorting problem. [build/include_order] [4] WebCore/html/HTMLElement.cpp:42: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 4 in 23 files If any of these errors are false positives, please file a bug against check-webkit-style.
Eric Seidel (no email)
Comment 9 2010-06-21 17:20:30 PDT
Adam Barth
Comment 10 2010-06-21 17:25:28 PDT
Comment on attachment 59314 [details] Patch WebCore/html/HTML5DocumentParser.cpp:372 + // This call is virtual. call -> function WebCore/html/HTML5DocumentParser.cpp:378 + // This call is used throughout the implementation. call -> function WebCore/html/HTML5DocumentParser.cpp:494 + && document->page()->settings()->html5ParserEnabled(); We should reverse the default when we can't find the Settings object at some point. This occurs during boot. WebCore/html/HTML5DocumentParser.cpp:503 + parser.write(source, false); false here because we're not from the network. I guess that interacts properly with the preload scanner. WebCore/html/HTML5DocumentParser.h:56 + static void parseHTMLDocumentFragment(const String&, DocumentFragment*, FragmentScriptingPermission = FragmentScriptingAllowed); I feel like this should be a method of the DocumentFragment object. (Followup patch?) WebCore/html/HTML5TreeBuilder.cpp:207 + m_lastScriptElement->removeChildren(); Crazy WebCore/html/HTML5TreeBuilder.h:30 + #include "MappedAttributeEntry.h" // For FragmentScriptingPermission!? We should move FragmentScriptingPermission
Eric Seidel (no email)
Comment 11 2010-06-21 17:26:44 PDT
Adam Barth
Comment 12 2010-06-21 17:31:21 PDT
That patch makes zero sense. It doesn't seem to block <img src="about:blank" onerror="alert(1)">.
Adam Barth
Comment 13 2010-06-21 17:33:14 PDT
It also randomly nukes <dog happyhref="javascript:potato"></dog>
Adam Barth
Comment 14 2010-06-21 17:35:28 PDT
In any case, that doesn't block us here, but we should either make that check bulletproof or get rid of it entirely. A half-working hack helps no one.
Eric Seidel (no email)
Comment 15 2010-06-21 21:30:07 PDT
Created attachment 59333 [details] Updated per Adam's review comments
Adam Barth
Comment 16 2010-06-21 21:32:28 PDT
(In reply to comment #12) > That patch makes zero sense. It doesn't seem to block <img src="about:blank" onerror="alert(1)">. I could be wrong about this one. :)
Adam Barth
Comment 17 2010-06-21 21:38:50 PDT
Comment on attachment 59333 [details] Updated per Adam's review comments Politics aside, this is a good patch. Thanks. :) LayoutTests/fast/tokenizer/lessthan-terminates-tags-and-attrs-expected.txt:5 + FAIL: <foo<bar> parsed as <foo<bar></foo<bar> instead of <foo><bar></bar></foo> Can you double-check this in minefield? WebCore/html/HTML5TreeBuilder.h:113 + // FIXME: FragmentScriptingPermission is a HACK for platform/Pasteboard. This seems a bit over the top, but ok.
Eric Seidel (no email)
Comment 18 2010-06-21 21:39:37 PDT
(In reply to comment #17) > LayoutTests/fast/tokenizer/lessthan-terminates-tags-and-attrs-expected.txt:5 > + FAIL: <foo<bar> parsed as <foo<bar></foo<bar> instead of <foo><bar></bar></foo> > Can you double-check this in minefield? Yup. Minefield agrees.
Early Warning System Bot
Comment 19 2010-06-21 21:44:11 PDT
Eric Seidel (no email)
Comment 20 2010-06-21 21:51:13 PDT
Created attachment 59334 [details] Hopefully fix the EWS bots
Early Warning System Bot
Comment 21 2010-06-21 22:04:20 PDT
Eric Seidel (no email)
Comment 22 2010-06-21 22:08:03 PDT
OK, so what am I missing? I edited the .pro file to include the header...
WebKit Review Bot
Comment 23 2010-06-21 22:08:55 PDT
Eric Seidel (no email)
Comment 24 2010-06-21 22:12:32 PDT
Created attachment 59337 [details] helps if you git add the new file
WebKit Review Bot
Comment 25 2010-06-21 22:25:23 PDT
WebKit Review Bot
Comment 26 2010-06-21 22:35:11 PDT
Adam Barth
Comment 27 2010-06-21 22:53:38 PDT
Comment on attachment 59337 [details] helps if you git add the new file I trust you got all the compile demons out of the patch.
Adam Barth
Comment 28 2010-06-22 16:44:17 PDT
WebKit Review Bot
Comment 29 2010-06-22 17:11:14 PDT
http://trac.webkit.org/changeset/61637 might have broken Qt Linux Release The following changes are on the blame list: http://trac.webkit.org/changeset/61636 http://trac.webkit.org/changeset/61637
Eric Seidel (no email)
Comment 30 2010-06-22 17:20:09 PDT
This checkin exposed a bug in the Qt clipboard code: https://bugs.webkit.org/show_bug.cgi?id=41025
Note You need to log in before you can comment on or make changes to this bug.