HTMLTreeBuilder needs an adoption agency
Created attachment 60219 [details] work in progress
Created attachment 60354 [details] compiles, but crashes. nearly there
Created attachment 60355 [details] runs, changes behavior, not sure if its right yet
Created attachment 60434 [details] another step forward
Created attachment 60450 [details] ready for review, intrested in your comments
The sad part is, I suspect this change is one or two typos away from fixing a zillion sub-tests. However this change is already too large, and keeping it locally longer doesn't help us move forward. The adoption01.dat tests I wrote show that at least the foster parenting algorithm has a bug. Getting the "bookmarking" functionality added to the list of active formatting elements will probably require moving to our own list instead of using a vector (so that we can insert elements w/o destroying all pointers.)
Attachment 60450 [details] did not build on chromium: Build output: http://webkit-commit-queue.appspot.com/results/3391144
Created attachment 60451 [details] fix cr-linux
Comment on attachment 60451 [details] fix cr-linux WebCore/html/HTMLElementStack.cpp:167 + recordAbove->next()->element()->beginParsingChildren(); // FIXME: Is this right? I think so. WebCore/html/HTMLElementStack.cpp:245 + #if ENABLE(SVG_FOREIGN_OBJECT) We need to remove this ifdef eventually. WebCore/html/HTMLElementStack.h:55 + ElementRecord* next() const { return m_next.get(); } Blank line below. WebCore/html/HTMLElementStack.cpp:229 + return find(element); !!find ? WebCore/html/HTMLFormattingElementList.cpp:106 + // This is somewhat of a hack, and is why this method can't be const. Yeah, I'd just use a loop here. It's not like Vector::find is any smarter, and then we wouldn't need operator==, etc. WebCore/html/HTMLTreeBuilder.cpp:810 + // !phrasing && !formatting == special || scoping Nit: The comment is in a different order than the code. WebCore/html/HTMLTreeBuilder.cpp:831 + fosterParentElement = lastTableElementRecord->next()->element(); This code seems strange. Do we really prefer the lastTableElement's actual parent in the DOM? Keep in mind that DOM nodes can be moved around during parsing. Usually we refer only to our own stack of open elements. WebCore/html/HTMLTreeBuilder.cpp:846 + newParent->appendChild(child, ec); No, this is no good. You need to collect these up into a vector and hold references to them while you twiddle the DOM. This fires mutation events, which can do all manner of bad things to you here.
I still need to read "call the adoption agency", but I need to run now.
Comment on attachment 60451 [details] fix cr-linux Sorry I don't see your bug. Here are some comments though. WebCore/html/HTMLTreeBuilder.cpp:860 + notImplemented(); // Check the stack of open elements for a more specific parse error. "if that node is also in the stack of open elements but the element is not in scope" You seem to be missing this text. WebCore/html/HTMLTreeBuilder.cpp:805 + HTMLElementStack::ElementRecord* record = m_openElements.topRecord(); "Let the furthest block be the topmost node in the stack of open elements that is lower in the stack than the formatting element, and is not an element in the phrasing or formatting categories. There might not be one." Wow, that's super confusing. Let me re-phrase. We want the "oldest" (nee topmost) one that's "younger" (nee lower) that the formatting element. You're walking the stack from the youngest (our top) and bailing when you hit the formatting element. Ok. That seems right. WebCore/html/HTMLTreeBuilder.cpp:875 + m_openElements.pop(); You can just call popUntil(formattingElement) WebCore/html/HTMLTreeBuilder.cpp:881 + HTMLElementStack::ElementRecord* formattingElementRecord = m_openElements.find(formattingElement); Lame that we need to search m_openElements again. We just did that on line 863. WebCore/html/HTMLTreeBuilder.cpp:897 + m_openElements.remove(node->element()); Here we're depending on some very specific semantics of remove. Also, this is quite inefficient since we're going to traverse the stack again. WebCore/html/HTMLTreeBuilder.cpp:909 + AtomicHTMLToken fakeToken(HTMLToken::StartTag, node->element()->localName()); We're missing the attributes here. WebCore/html/HTMLTreeBuilder.cpp:913 + HTMLFormattingElementList::Entry* nodeEntry = m_activeFormattingElements.find(node->element()); Another search. Once we get this working, we need to get rid of all these extra searches. WebCore/html/HTMLTreeBuilder.cpp:919 + node->element()->appendChild(lastNode->element(), ec); Wow, that's scary. I think it's ok because we're holding onto our datastructures here (and not the DOM's datastructures), but we to think really carefully about what can happen here. I wish we could twiddle the DOM without firing off mutation events here. WebCore/html/HTMLTreeBuilder.cpp:925 + const AtomicString& commonAncestorTag = commonAncestor->localName(); commonAncestorName ? WebCore/html/HTMLTreeBuilder.cpp:939 + AtomicHTMLToken fakeToken(HTMLToken::StartTag, formattingElement->localName()); Again, we're missing the attributes.
Comment on attachment 60451 [details] fix cr-linux WebCore/html/HTMLTreeBuilder.cpp:832 + } else // fragment case Maybe ASSERT that we're parsing a fragment? WebCore/html/HTMLTreeBuilder.cpp:821 + HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName()); "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element." Why do you think |last| means topmost? It seems ambiguous to me. WebCore/html/HTMLTreeBuilder.cpp:831 + fosterParentElement = lastTableElementRecord->next()->element(); I see. You've translated "before" to "our-below", which is intuitive in this case since we're looking for the "parent". That means "last" would be "most our-above," which means closet to our-top, which means topmost. Maybe ask Hixie to clarify? At least his-above and his-below is clear (if backwards).
Rather than saving off the tokens, you might be able to use cloneNode(false). :)
(In reply to comment #9) > (From update of attachment 60451 [details]) > WebCore/html/HTMLElementStack.cpp:167 > + recordAbove->next()->element()->beginParsingChildren(); // FIXME: Is this right? > I think so. OK, removed. > WebCore/html/HTMLElementStack.cpp:245 > + #if ENABLE(SVG_FOREIGN_OBJECT) > We need to remove this ifdef eventually. Yup. > WebCore/html/HTMLElementStack.h:55 > + ElementRecord* next() const { return m_next.get(); } > Blank line below. Added. > WebCore/html/HTMLElementStack.cpp:229 > + return find(element); > !!find ? Changed. > WebCore/html/HTMLFormattingElementList.cpp:106 > + // This is somewhat of a hack, and is why this method can't be const. > Yeah, I'd just use a loop here. It's not like Vector::find is any smarter, and then we wouldn't need operator==, etc. It's not the loop. Its the &m_entries[x]. m_entries[x] returns a const reference because "this" is const. > WebCore/html/HTMLTreeBuilder.cpp:810 > + // !phrasing && !formatting == special || scoping > Nit: The comment is in a different order than the code. Fixed. > WebCore/html/HTMLTreeBuilder.cpp:831 > + fosterParentElement = lastTableElementRecord->next()->element(); > This code seems strange. Do we really prefer the lastTableElement's actual parent in the DOM? Keep in mind that DOM nodes can be moved around during parsing. Usually we refer only to our own stack of open elements. I think so. "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element" I filed http://www.w3.org/Bugs/Public/show_bug.cgi?id=10063 about the illegibility of that paragraph. > WebCore/html/HTMLTreeBuilder.cpp:846 > + newParent->appendChild(child, ec); > No, this is no good. You need to collect these up into a vector and hold references to them while you twiddle the DOM. This fires mutation events, which can do all manner of bad things to you here. Yeah, it sucks. We need a parserAddChild which is reparenting aware. I could write a wrapper around parserAddChild instead.
(In reply to comment #11) > (From update of attachment 60451 [details]) > Sorry I don't see your bug. Here are some comments though. It's OK. I'll see it eventually. :) > WebCore/html/HTMLTreeBuilder.cpp:860 > + notImplemented(); // Check the stack of open elements for a more specific parse error. > "if that node is also in the stack of open elements but the element is not in scope" > > You seem to be missing this text. Ahha! You might have found my bug. ;) > WebCore/html/HTMLTreeBuilder.cpp:805 > + HTMLElementStack::ElementRecord* record = m_openElements.topRecord(); > "Let the furthest block be the topmost node in the stack of open elements that is lower in the stack than the formatting element, and is not an element in the phrasing or formatting categories. There might not be one." > > Wow, that's super confusing. Let me re-phrase. We want the "oldest" (nee topmost) one that's "younger" (nee lower) that the formatting element. You're walking the stack from the youngest (our top) and bailing when you hit the formatting element. Ok. That seems right. Yup. I wrote it wrong the first time. :) I shouted a few profanities at Hixie's broken stack and then wrote it two more times. I think it's right now. :) > WebCore/html/HTMLTreeBuilder.cpp:875 > + m_openElements.pop(); > You can just call popUntil(formattingElement) Fixed. > WebCore/html/HTMLTreeBuilder.cpp:881 > + HTMLElementStack::ElementRecord* formattingElementRecord = m_openElements.find(formattingElement); > Lame that we need to search m_openElements again. We just did that on line 863. Fixed. (We have lots of over-searching. I was just trying to get the damn thing *right* first.... still not quite there.) > WebCore/html/HTMLTreeBuilder.cpp:897 > + m_openElements.remove(node->element()); > Here we're depending on some very specific semantics of remove. Also, this is quite inefficient since we're going to traverse the stack again. What? That it doesn't invalidate our saved node->next() pointer? I think it's OK to depend on that. It's impossible to avoid the second traversal unless we either make our list items hold bi-directional pointers, or we save "node->previous()" around so we can fix up the pointers. > WebCore/html/HTMLTreeBuilder.cpp:909 > + AtomicHTMLToken fakeToken(HTMLToken::StartTag, node->element()->localName()); > We're missing the attributes here. Yes, that's what the FIXME is about. > WebCore/html/HTMLTreeBuilder.cpp:913 > + HTMLFormattingElementList::Entry* nodeEntry = m_activeFormattingElements.find(node->element()); > Another search. Once we get this working, we need to get rid of all these extra searches. This is a different data structure we're searching in here. :) > WebCore/html/HTMLTreeBuilder.cpp:919 > + node->element()->appendChild(lastNode->element(), ec); > Wow, that's scary. I think it's ok because we're holding onto our datastructures here (and not the DOM's datastructures), but we to think really carefully about what can happen here. I wish we could twiddle the DOM without firing off mutation events here. We don't want to use appendChild. We want a parserAddChild which can handle reparenting. > WebCore/html/HTMLTreeBuilder.cpp:925 > + const AtomicString& commonAncestorTag = commonAncestor->localName(); > commonAncestorName ? I'd rather use tag to be consistent with other code. Also, theoretically we could/should use qualifiedNames here which would be called "tags". Actually, this check should just be a helper function. I just didn't have a good name for it. > WebCore/html/HTMLTreeBuilder.cpp:939 > + AtomicHTMLToken fakeToken(HTMLToken::StartTag, formattingElement->localName()); > Again, we're missing the attributes. Yup, same FIXME.
(In reply to comment #12) > (From update of attachment 60451 [details]) > WebCore/html/HTMLTreeBuilder.cpp:832 > + } else // fragment case > Maybe ASSERT that we're parsing a fragment? Fixed. > WebCore/html/HTMLTreeBuilder.cpp:821 > + HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName()); > "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element." > > Why do you think |last| means topmost? It seems ambiguous to me. I think so. But I could be reading wrong. Pile your lovin on http://www.w3.org/Bugs/Public/show_bug.cgi?id=10063. Also CCing Ian so he can witness our stack-induced tears. :) > WebCore/html/HTMLTreeBuilder.cpp:831 > + fosterParentElement = lastTableElementRecord->next()->element(); > I see. You've translated "before" to "our-below", which is intuitive in this case since we're looking for the "parent". That means "last" would be "most our-above," which means closet to our-top, which means topmost. Maybe ask Hixie to clarify? At least his-above and his-below is clear (if backwards). CC'd Hixie. I'm clearly getting some foster parenting cases wrong, so something is wrong with my current implementation.
(In reply to comment #15) > > WebCore/html/HTMLTreeBuilder.cpp:860 > > + notImplemented(); // Check the stack of open elements for a more specific parse error. > > "if that node is also in the stack of open elements but the element is not in scope" > > > > You seem to be missing this text. > > Ahha! You might have found my bug. ;) That fix changed behavior... positively I think.
Created attachment 60468 [details] Addressed Adam's feedback
(In reply to comment #12) > WebCore/html/HTMLTreeBuilder.cpp:821 > + HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName()); > "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element." > > Why do you think |last| means topmost? It seems ambiguous to me. I read "last" as "last added", which would be our "topmost" or hixie's "bottommost"
(In reply to comment #13) > Rather than saving off the tokens, you might be able to use cloneNode(false). :) Yeah, but that would get us in trouble if a <script> got to run in between (which it could do, I think...).
I found out that at least one of my new adoption tests is failing due to lack of InTable mode. :) I believe the rest may be failing due to the lack of bookmark support in my adoption agency algorithm. I can't (easily) add bookmark support until we change the list of active formatting elements to actually be a list instead of a vector. I think I'm done with this patch and going to start work on the other pieces (like InTable mode) before trying to find more bugs in this new code.
Comment on attachment 60468 [details] Addressed Adam's feedback Ok. There's still work to do here around mutation events, etc, but we can do that in a subsequent patch. Feel free to cq+ if you removed that notImplemented() mentioned below on purpose. WebCore/html/HTMLTreeBuilder.cpp: + notImplemented(); I thought there was a reason I had this notImplemented here. WebCore/html/HTMLTreeBuilder.cpp:843 + void HTMLTreeBuilder::reparentChildren(Element* oldParent, Element* newParent) I'm sad that this is still subject to mutation event tricks.
Yes, we'll go back and fix all the mutation event stuff in a separate patch. :)
(In reply to comment #22) > (From update of attachment 60468 [details]) > Ok. There's still work to do here around mutation events, etc, but we can do that in a subsequent patch. Feel free to cq+ if you removed that notImplemented() mentioned below on purpose. > > WebCore/html/HTMLTreeBuilder.cpp: > + notImplemented(); > I thought there was a reason I had this notImplemented here. Oops. I assume you mean line 424. That was unintentional. Will add it back. :)
Committed r62468: <http://trac.webkit.org/changeset/62468>
http://trac.webkit.org/changeset/62468 might have broken Qt Linux Release minimal
Committed r62470: <http://trac.webkit.org/changeset/62470>
http://trac.webkit.org/changeset/62470 might have broken Leopard Intel Release (Tests)