Bug 41453

Summary: HTMLTreeBuilder needs an adoption agency
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: New BugsAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dglazkov, eric, ian, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 41123, 41581    
Attachments:
Description Flags
work in progress
none
compiles, but crashes. nearly there
none
runs, changes behavior, not sure if its right yet
none
another step forward
none
ready for review, intrested in your comments
none
fix cr-linux
none
Addressed Adam's feedback abarth: review+

Eric Seidel (no email)
Reported 2010-07-01 03:44:54 PDT
HTMLTreeBuilder needs an adoption agency
Attachments
work in progress (16.66 KB, patch)
2010-07-01 03:46 PDT, Eric Seidel (no email)
no flags
compiles, but crashes. nearly there (31.48 KB, patch)
2010-07-02 03:36 PDT, Eric Seidel (no email)
no flags
runs, changes behavior, not sure if its right yet (34.31 KB, patch)
2010-07-02 03:45 PDT, Eric Seidel (no email)
no flags
another step forward (34.00 KB, patch)
2010-07-02 20:39 PDT, Eric Seidel (no email)
no flags
ready for review, intrested in your comments (41.91 KB, patch)
2010-07-03 08:05 PDT, Eric Seidel (no email)
no flags
fix cr-linux (42.55 KB, patch)
2010-07-03 08:56 PDT, Eric Seidel (no email)
no flags
Addressed Adam's feedback (42.26 KB, patch)
2010-07-03 23:30 PDT, Eric Seidel (no email)
abarth: review+
Eric Seidel (no email)
Comment 1 2010-07-01 03:46:04 PDT
Created attachment 60219 [details] work in progress
Eric Seidel (no email)
Comment 2 2010-07-02 03:36:06 PDT
Created attachment 60354 [details] compiles, but crashes. nearly there
Eric Seidel (no email)
Comment 3 2010-07-02 03:45:37 PDT
Created attachment 60355 [details] runs, changes behavior, not sure if its right yet
Eric Seidel (no email)
Comment 4 2010-07-02 20:39:44 PDT
Created attachment 60434 [details] another step forward
Eric Seidel (no email)
Comment 5 2010-07-03 08:05:07 PDT
Created attachment 60450 [details] ready for review, intrested in your comments
Eric Seidel (no email)
Comment 6 2010-07-03 08:08:06 PDT
The sad part is, I suspect this change is one or two typos away from fixing a zillion sub-tests. However this change is already too large, and keeping it locally longer doesn't help us move forward. The adoption01.dat tests I wrote show that at least the foster parenting algorithm has a bug. Getting the "bookmarking" functionality added to the list of active formatting elements will probably require moving to our own list instead of using a vector (so that we can insert elements w/o destroying all pointers.)
WebKit Review Bot
Comment 7 2010-07-03 08:20:46 PDT
Eric Seidel (no email)
Comment 8 2010-07-03 08:56:30 PDT
Created attachment 60451 [details] fix cr-linux
Adam Barth
Comment 9 2010-07-03 16:46:47 PDT
Comment on attachment 60451 [details] fix cr-linux WebCore/html/HTMLElementStack.cpp:167 + recordAbove->next()->element()->beginParsingChildren(); // FIXME: Is this right? I think so. WebCore/html/HTMLElementStack.cpp:245 + #if ENABLE(SVG_FOREIGN_OBJECT) We need to remove this ifdef eventually. WebCore/html/HTMLElementStack.h:55 + ElementRecord* next() const { return m_next.get(); } Blank line below. WebCore/html/HTMLElementStack.cpp:229 + return find(element); !!find ? WebCore/html/HTMLFormattingElementList.cpp:106 + // This is somewhat of a hack, and is why this method can't be const. Yeah, I'd just use a loop here. It's not like Vector::find is any smarter, and then we wouldn't need operator==, etc. WebCore/html/HTMLTreeBuilder.cpp:810 + // !phrasing && !formatting == special || scoping Nit: The comment is in a different order than the code. WebCore/html/HTMLTreeBuilder.cpp:831 + fosterParentElement = lastTableElementRecord->next()->element(); This code seems strange. Do we really prefer the lastTableElement's actual parent in the DOM? Keep in mind that DOM nodes can be moved around during parsing. Usually we refer only to our own stack of open elements. WebCore/html/HTMLTreeBuilder.cpp:846 + newParent->appendChild(child, ec); No, this is no good. You need to collect these up into a vector and hold references to them while you twiddle the DOM. This fires mutation events, which can do all manner of bad things to you here.
Adam Barth
Comment 10 2010-07-03 16:47:13 PDT
I still need to read "call the adoption agency", but I need to run now.
Adam Barth
Comment 11 2010-07-03 20:44:25 PDT
Comment on attachment 60451 [details] fix cr-linux Sorry I don't see your bug. Here are some comments though. WebCore/html/HTMLTreeBuilder.cpp:860 + notImplemented(); // Check the stack of open elements for a more specific parse error. "if that node is also in the stack of open elements but the element is not in scope" You seem to be missing this text. WebCore/html/HTMLTreeBuilder.cpp:805 + HTMLElementStack::ElementRecord* record = m_openElements.topRecord(); "Let the furthest block be the topmost node in the stack of open elements that is lower in the stack than the formatting element, and is not an element in the phrasing or formatting categories. There might not be one." Wow, that's super confusing. Let me re-phrase. We want the "oldest" (nee topmost) one that's "younger" (nee lower) that the formatting element. You're walking the stack from the youngest (our top) and bailing when you hit the formatting element. Ok. That seems right. WebCore/html/HTMLTreeBuilder.cpp:875 + m_openElements.pop(); You can just call popUntil(formattingElement) WebCore/html/HTMLTreeBuilder.cpp:881 + HTMLElementStack::ElementRecord* formattingElementRecord = m_openElements.find(formattingElement); Lame that we need to search m_openElements again. We just did that on line 863. WebCore/html/HTMLTreeBuilder.cpp:897 + m_openElements.remove(node->element()); Here we're depending on some very specific semantics of remove. Also, this is quite inefficient since we're going to traverse the stack again. WebCore/html/HTMLTreeBuilder.cpp:909 + AtomicHTMLToken fakeToken(HTMLToken::StartTag, node->element()->localName()); We're missing the attributes here. WebCore/html/HTMLTreeBuilder.cpp:913 + HTMLFormattingElementList::Entry* nodeEntry = m_activeFormattingElements.find(node->element()); Another search. Once we get this working, we need to get rid of all these extra searches. WebCore/html/HTMLTreeBuilder.cpp:919 + node->element()->appendChild(lastNode->element(), ec); Wow, that's scary. I think it's ok because we're holding onto our datastructures here (and not the DOM's datastructures), but we to think really carefully about what can happen here. I wish we could twiddle the DOM without firing off mutation events here. WebCore/html/HTMLTreeBuilder.cpp:925 + const AtomicString& commonAncestorTag = commonAncestor->localName(); commonAncestorName ? WebCore/html/HTMLTreeBuilder.cpp:939 + AtomicHTMLToken fakeToken(HTMLToken::StartTag, formattingElement->localName()); Again, we're missing the attributes.
Adam Barth
Comment 12 2010-07-03 20:52:23 PDT
Comment on attachment 60451 [details] fix cr-linux WebCore/html/HTMLTreeBuilder.cpp:832 + } else // fragment case Maybe ASSERT that we're parsing a fragment? WebCore/html/HTMLTreeBuilder.cpp:821 + HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName()); "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element." Why do you think |last| means topmost? It seems ambiguous to me. WebCore/html/HTMLTreeBuilder.cpp:831 + fosterParentElement = lastTableElementRecord->next()->element(); I see. You've translated "before" to "our-below", which is intuitive in this case since we're looking for the "parent". That means "last" would be "most our-above," which means closet to our-top, which means topmost. Maybe ask Hixie to clarify? At least his-above and his-below is clear (if backwards).
Adam Barth
Comment 13 2010-07-03 22:13:24 PDT
Rather than saving off the tokens, you might be able to use cloneNode(false). :)
Eric Seidel (no email)
Comment 14 2010-07-03 22:44:44 PDT
(In reply to comment #9) > (From update of attachment 60451 [details]) > WebCore/html/HTMLElementStack.cpp:167 > + recordAbove->next()->element()->beginParsingChildren(); // FIXME: Is this right? > I think so. OK, removed. > WebCore/html/HTMLElementStack.cpp:245 > + #if ENABLE(SVG_FOREIGN_OBJECT) > We need to remove this ifdef eventually. Yup. > WebCore/html/HTMLElementStack.h:55 > + ElementRecord* next() const { return m_next.get(); } > Blank line below. Added. > WebCore/html/HTMLElementStack.cpp:229 > + return find(element); > !!find ? Changed. > WebCore/html/HTMLFormattingElementList.cpp:106 > + // This is somewhat of a hack, and is why this method can't be const. > Yeah, I'd just use a loop here. It's not like Vector::find is any smarter, and then we wouldn't need operator==, etc. It's not the loop. Its the &m_entries[x]. m_entries[x] returns a const reference because "this" is const. > WebCore/html/HTMLTreeBuilder.cpp:810 > + // !phrasing && !formatting == special || scoping > Nit: The comment is in a different order than the code. Fixed. > WebCore/html/HTMLTreeBuilder.cpp:831 > + fosterParentElement = lastTableElementRecord->next()->element(); > This code seems strange. Do we really prefer the lastTableElement's actual parent in the DOM? Keep in mind that DOM nodes can be moved around during parsing. Usually we refer only to our own stack of open elements. I think so. "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element" I filed http://www.w3.org/Bugs/Public/show_bug.cgi?id=10063 about the illegibility of that paragraph. > WebCore/html/HTMLTreeBuilder.cpp:846 > + newParent->appendChild(child, ec); > No, this is no good. You need to collect these up into a vector and hold references to them while you twiddle the DOM. This fires mutation events, which can do all manner of bad things to you here. Yeah, it sucks. We need a parserAddChild which is reparenting aware. I could write a wrapper around parserAddChild instead.
Eric Seidel (no email)
Comment 15 2010-07-03 23:10:39 PDT
(In reply to comment #11) > (From update of attachment 60451 [details]) > Sorry I don't see your bug. Here are some comments though. It's OK. I'll see it eventually. :) > WebCore/html/HTMLTreeBuilder.cpp:860 > + notImplemented(); // Check the stack of open elements for a more specific parse error. > "if that node is also in the stack of open elements but the element is not in scope" > > You seem to be missing this text. Ahha! You might have found my bug. ;) > WebCore/html/HTMLTreeBuilder.cpp:805 > + HTMLElementStack::ElementRecord* record = m_openElements.topRecord(); > "Let the furthest block be the topmost node in the stack of open elements that is lower in the stack than the formatting element, and is not an element in the phrasing or formatting categories. There might not be one." > > Wow, that's super confusing. Let me re-phrase. We want the "oldest" (nee topmost) one that's "younger" (nee lower) that the formatting element. You're walking the stack from the youngest (our top) and bailing when you hit the formatting element. Ok. That seems right. Yup. I wrote it wrong the first time. :) I shouted a few profanities at Hixie's broken stack and then wrote it two more times. I think it's right now. :) > WebCore/html/HTMLTreeBuilder.cpp:875 > + m_openElements.pop(); > You can just call popUntil(formattingElement) Fixed. > WebCore/html/HTMLTreeBuilder.cpp:881 > + HTMLElementStack::ElementRecord* formattingElementRecord = m_openElements.find(formattingElement); > Lame that we need to search m_openElements again. We just did that on line 863. Fixed. (We have lots of over-searching. I was just trying to get the damn thing *right* first.... still not quite there.) > WebCore/html/HTMLTreeBuilder.cpp:897 > + m_openElements.remove(node->element()); > Here we're depending on some very specific semantics of remove. Also, this is quite inefficient since we're going to traverse the stack again. What? That it doesn't invalidate our saved node->next() pointer? I think it's OK to depend on that. It's impossible to avoid the second traversal unless we either make our list items hold bi-directional pointers, or we save "node->previous()" around so we can fix up the pointers. > WebCore/html/HTMLTreeBuilder.cpp:909 > + AtomicHTMLToken fakeToken(HTMLToken::StartTag, node->element()->localName()); > We're missing the attributes here. Yes, that's what the FIXME is about. > WebCore/html/HTMLTreeBuilder.cpp:913 > + HTMLFormattingElementList::Entry* nodeEntry = m_activeFormattingElements.find(node->element()); > Another search. Once we get this working, we need to get rid of all these extra searches. This is a different data structure we're searching in here. :) > WebCore/html/HTMLTreeBuilder.cpp:919 > + node->element()->appendChild(lastNode->element(), ec); > Wow, that's scary. I think it's ok because we're holding onto our datastructures here (and not the DOM's datastructures), but we to think really carefully about what can happen here. I wish we could twiddle the DOM without firing off mutation events here. We don't want to use appendChild. We want a parserAddChild which can handle reparenting. > WebCore/html/HTMLTreeBuilder.cpp:925 > + const AtomicString& commonAncestorTag = commonAncestor->localName(); > commonAncestorName ? I'd rather use tag to be consistent with other code. Also, theoretically we could/should use qualifiedNames here which would be called "tags". Actually, this check should just be a helper function. I just didn't have a good name for it. > WebCore/html/HTMLTreeBuilder.cpp:939 > + AtomicHTMLToken fakeToken(HTMLToken::StartTag, formattingElement->localName()); > Again, we're missing the attributes. Yup, same FIXME.
Eric Seidel (no email)
Comment 16 2010-07-03 23:14:37 PDT
(In reply to comment #12) > (From update of attachment 60451 [details]) > WebCore/html/HTMLTreeBuilder.cpp:832 > + } else // fragment case > Maybe ASSERT that we're parsing a fragment? Fixed. > WebCore/html/HTMLTreeBuilder.cpp:821 > + HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName()); > "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element." > > Why do you think |last| means topmost? It seems ambiguous to me. I think so. But I could be reading wrong. Pile your lovin on http://www.w3.org/Bugs/Public/show_bug.cgi?id=10063. Also CCing Ian so he can witness our stack-induced tears. :) > WebCore/html/HTMLTreeBuilder.cpp:831 > + fosterParentElement = lastTableElementRecord->next()->element(); > I see. You've translated "before" to "our-below", which is intuitive in this case since we're looking for the "parent". That means "last" would be "most our-above," which means closet to our-top, which means topmost. Maybe ask Hixie to clarify? At least his-above and his-below is clear (if backwards). CC'd Hixie. I'm clearly getting some foster parenting cases wrong, so something is wrong with my current implementation.
Eric Seidel (no email)
Comment 17 2010-07-03 23:20:52 PDT
(In reply to comment #15) > > WebCore/html/HTMLTreeBuilder.cpp:860 > > + notImplemented(); // Check the stack of open elements for a more specific parse error. > > "if that node is also in the stack of open elements but the element is not in scope" > > > > You seem to be missing this text. > > Ahha! You might have found my bug. ;) That fix changed behavior... positively I think.
Eric Seidel (no email)
Comment 18 2010-07-03 23:30:12 PDT
Created attachment 60468 [details] Addressed Adam's feedback
Eric Seidel (no email)
Comment 19 2010-07-03 23:32:34 PDT
(In reply to comment #12) > WebCore/html/HTMLTreeBuilder.cpp:821 > + HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName()); > "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element." > > Why do you think |last| means topmost? It seems ambiguous to me. I read "last" as "last added", which would be our "topmost" or hixie's "bottommost"
Eric Seidel (no email)
Comment 20 2010-07-03 23:33:18 PDT
(In reply to comment #13) > Rather than saving off the tokens, you might be able to use cloneNode(false). :) Yeah, but that would get us in trouble if a <script> got to run in between (which it could do, I think...).
Eric Seidel (no email)
Comment 21 2010-07-04 11:54:56 PDT
I found out that at least one of my new adoption tests is failing due to lack of InTable mode. :) I believe the rest may be failing due to the lack of bookmark support in my adoption agency algorithm. I can't (easily) add bookmark support until we change the list of active formatting elements to actually be a list instead of a vector. I think I'm done with this patch and going to start work on the other pieces (like InTable mode) before trying to find more bugs in this new code.
Adam Barth
Comment 22 2010-07-04 13:29:55 PDT
Comment on attachment 60468 [details] Addressed Adam's feedback Ok. There's still work to do here around mutation events, etc, but we can do that in a subsequent patch. Feel free to cq+ if you removed that notImplemented() mentioned below on purpose. WebCore/html/HTMLTreeBuilder.cpp:  + notImplemented(); I thought there was a reason I had this notImplemented here. WebCore/html/HTMLTreeBuilder.cpp:843 + void HTMLTreeBuilder::reparentChildren(Element* oldParent, Element* newParent) I'm sad that this is still subject to mutation event tricks.
Eric Seidel (no email)
Comment 23 2010-07-04 14:10:31 PDT
Yes, we'll go back and fix all the mutation event stuff in a separate patch. :)
Eric Seidel (no email)
Comment 24 2010-07-04 14:11:49 PDT
(In reply to comment #22) > (From update of attachment 60468 [details]) > Ok. There's still work to do here around mutation events, etc, but we can do that in a subsequent patch. Feel free to cq+ if you removed that notImplemented() mentioned below on purpose. > > WebCore/html/HTMLTreeBuilder.cpp:  > + notImplemented(); > I thought there was a reason I had this notImplemented here. Oops. I assume you mean line 424. That was unintentional. Will add it back. :)
Eric Seidel (no email)
Comment 25 2010-07-04 14:16:36 PDT
WebKit Review Bot
Comment 26 2010-07-04 14:22:32 PDT
http://trac.webkit.org/changeset/62468 might have broken Qt Linux Release minimal
Eric Seidel (no email)
Comment 27 2010-07-04 15:10:01 PDT
WebKit Review Bot
Comment 28 2010-07-04 16:38:45 PDT
http://trac.webkit.org/changeset/62470 might have broken Leopard Intel Release (Tests)
Note You need to log in before you can comment on or make changes to this bug.