Bug 41453 - HTMLTreeBuilder needs an adoption agency
Summary: HTMLTreeBuilder needs an adoption agency
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks: 41123 41581
  Show dependency treegraph
 
Reported: 2010-07-01 03:44 PDT by Eric Seidel (no email)
Modified: 2011-01-22 19:51 PST (History)
5 users (show)

See Also:


Attachments
work in progress (16.66 KB, patch)
2010-07-01 03:46 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
compiles, but crashes. nearly there (31.48 KB, patch)
2010-07-02 03:36 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
runs, changes behavior, not sure if its right yet (34.31 KB, patch)
2010-07-02 03:45 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
another step forward (34.00 KB, patch)
2010-07-02 20:39 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
ready for review, intrested in your comments (41.91 KB, patch)
2010-07-03 08:05 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
fix cr-linux (42.55 KB, patch)
2010-07-03 08:56 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Addressed Adam's feedback (42.26 KB, patch)
2010-07-03 23:30 PDT, Eric Seidel (no email)
abarth: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2010-07-01 03:44:54 PDT
HTMLTreeBuilder needs an adoption agency
Comment 1 Eric Seidel (no email) 2010-07-01 03:46:04 PDT
Created attachment 60219 [details]
work in progress
Comment 2 Eric Seidel (no email) 2010-07-02 03:36:06 PDT
Created attachment 60354 [details]
compiles, but crashes.  nearly there
Comment 3 Eric Seidel (no email) 2010-07-02 03:45:37 PDT
Created attachment 60355 [details]
runs, changes behavior, not sure if its right yet
Comment 4 Eric Seidel (no email) 2010-07-02 20:39:44 PDT
Created attachment 60434 [details]
another step forward
Comment 5 Eric Seidel (no email) 2010-07-03 08:05:07 PDT
Created attachment 60450 [details]
ready for review, intrested in your comments
Comment 6 Eric Seidel (no email) 2010-07-03 08:08:06 PDT
The sad part is, I suspect this change is one or two typos away from fixing a zillion sub-tests.  However this change is already too large, and keeping it locally longer doesn't help us move forward.

The adoption01.dat tests I wrote show that at least the foster parenting algorithm has a bug.

Getting the "bookmarking" functionality added to the list of active formatting elements will probably require moving to our own list instead of using a vector (so that we can insert elements w/o destroying all pointers.)
Comment 7 WebKit Review Bot 2010-07-03 08:20:46 PDT
Attachment 60450 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3391144
Comment 8 Eric Seidel (no email) 2010-07-03 08:56:30 PDT
Created attachment 60451 [details]
fix cr-linux
Comment 9 Adam Barth 2010-07-03 16:46:47 PDT
Comment on attachment 60451 [details]
fix cr-linux

WebCore/html/HTMLElementStack.cpp:167
 +          recordAbove->next()->element()->beginParsingChildren(); // FIXME: Is this right?
I think so.

WebCore/html/HTMLElementStack.cpp:245
 +  #if ENABLE(SVG_FOREIGN_OBJECT)
We need to remove this ifdef eventually.

WebCore/html/HTMLElementStack.h:55
 +          ElementRecord* next() const { return m_next.get(); }
Blank line below.

WebCore/html/HTMLElementStack.cpp:229
 +      return find(element);
!!find ?

WebCore/html/HTMLFormattingElementList.cpp:106
 +          // This is somewhat of a hack, and is why this method can't be const.
Yeah, I'd just use a loop here.  It's not like Vector::find is any smarter, and then we wouldn't need operator==, etc.

WebCore/html/HTMLTreeBuilder.cpp:810
 +          // !phrasing && !formatting == special || scoping
Nit: The comment is in a different order than the code.

WebCore/html/HTMLTreeBuilder.cpp:831
 +          fosterParentElement = lastTableElementRecord->next()->element();
This code seems strange.  Do we really prefer the lastTableElement's actual parent in the DOM?  Keep in mind that DOM nodes can be moved around during parsing.  Usually we refer only to our own stack of open elements.

WebCore/html/HTMLTreeBuilder.cpp:846
 +          newParent->appendChild(child, ec);
No, this is no good.  You need to collect these up into a vector and hold references to them while you twiddle the DOM.  This fires mutation events, which can do all manner of bad things to you here.
Comment 10 Adam Barth 2010-07-03 16:47:13 PDT
I still need to read "call the adoption agency", but I need to run now.
Comment 11 Adam Barth 2010-07-03 20:44:25 PDT
Comment on attachment 60451 [details]
fix cr-linux

Sorry I don't see your bug.  Here are some comments though.

WebCore/html/HTMLTreeBuilder.cpp:860
 +              notImplemented(); // Check the stack of open elements for a more specific parse error.
"if that node is also in the stack of open elements but the element is not in scope"

You seem to be missing this text.

WebCore/html/HTMLTreeBuilder.cpp:805
 +      HTMLElementStack::ElementRecord* record = m_openElements.topRecord();
"Let the furthest block be the topmost node in the stack of open elements that is lower in the stack than the formatting element, and is not an element in the phrasing or formatting categories. There might not be one."

Wow, that's super confusing.  Let me re-phrase.  We want the "oldest" (nee topmost) one that's "younger" (nee lower) that the formatting element.  You're walking the stack from the youngest (our top) and bailing when you hit the formatting element.  Ok.  That seems right.

WebCore/html/HTMLTreeBuilder.cpp:875
 +                  m_openElements.pop();
You can just call popUntil(formattingElement)

WebCore/html/HTMLTreeBuilder.cpp:881
 +          HTMLElementStack::ElementRecord* formattingElementRecord = m_openElements.find(formattingElement);
Lame that we need to search m_openElements again.  We just did that on line 863.

WebCore/html/HTMLTreeBuilder.cpp:897
 +                  m_openElements.remove(node->element());
Here we're depending on some very specific semantics of remove.  Also, this is quite inefficient since we're going to traverse the stack again.

WebCore/html/HTMLTreeBuilder.cpp:909
 +              AtomicHTMLToken fakeToken(HTMLToken::StartTag, node->element()->localName());
We're missing the attributes here.

WebCore/html/HTMLTreeBuilder.cpp:913
 +              HTMLFormattingElementList::Entry* nodeEntry = m_activeFormattingElements.find(node->element());
Another search.  Once we get this working, we need to get rid of all these extra searches.

WebCore/html/HTMLTreeBuilder.cpp:919
 +              node->element()->appendChild(lastNode->element(), ec);
Wow, that's scary.  I think it's ok because we're holding onto our datastructures here (and not the DOM's datastructures), but we to think really carefully about what can happen here.  I wish we could twiddle the DOM without firing off mutation events here.

WebCore/html/HTMLTreeBuilder.cpp:925
 +          const AtomicString& commonAncestorTag = commonAncestor->localName();
commonAncestorName ?

WebCore/html/HTMLTreeBuilder.cpp:939
 +          AtomicHTMLToken fakeToken(HTMLToken::StartTag, formattingElement->localName());
Again, we're missing the attributes.
Comment 12 Adam Barth 2010-07-03 20:52:23 PDT
Comment on attachment 60451 [details]
fix cr-linux

WebCore/html/HTMLTreeBuilder.cpp:832
 +      } else // fragment case
Maybe ASSERT that we're parsing a fragment?

WebCore/html/HTMLTreeBuilder.cpp:821
 +      HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName());
"The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element."

Why do you think |last| means topmost?  It seems ambiguous to me.

WebCore/html/HTMLTreeBuilder.cpp:831
 +          fosterParentElement = lastTableElementRecord->next()->element();
I see.  You've translated "before" to "our-below", which is intuitive in this case since we're looking for the "parent".  That means "last" would be "most our-above," which means closet to our-top, which means topmost.  Maybe ask Hixie to clarify?  At least his-above and his-below is clear (if backwards).
Comment 13 Adam Barth 2010-07-03 22:13:24 PDT
Rather than saving off the tokens, you might be able to use cloneNode(false).  :)
Comment 14 Eric Seidel (no email) 2010-07-03 22:44:44 PDT
(In reply to comment #9)
> (From update of attachment 60451 [details])
> WebCore/html/HTMLElementStack.cpp:167
>  +          recordAbove->next()->element()->beginParsingChildren(); // FIXME: Is this right?
> I think so.

OK, removed.

> WebCore/html/HTMLElementStack.cpp:245
>  +  #if ENABLE(SVG_FOREIGN_OBJECT)
> We need to remove this ifdef eventually.

Yup.

> WebCore/html/HTMLElementStack.h:55
>  +          ElementRecord* next() const { return m_next.get(); }
> Blank line below.

Added.

> WebCore/html/HTMLElementStack.cpp:229
>  +      return find(element);
> !!find ?

Changed.

> WebCore/html/HTMLFormattingElementList.cpp:106
>  +          // This is somewhat of a hack, and is why this method can't be const.
> Yeah, I'd just use a loop here.  It's not like Vector::find is any smarter, and then we wouldn't need operator==, etc.

It's not the loop.  Its the &m_entries[x].  m_entries[x] returns a const reference because "this" is const.

> WebCore/html/HTMLTreeBuilder.cpp:810
>  +          // !phrasing && !formatting == special || scoping
> Nit: The comment is in a different order than the code.

Fixed.

> WebCore/html/HTMLTreeBuilder.cpp:831
>  +          fosterParentElement = lastTableElementRecord->next()->element();
> This code seems strange.  Do we really prefer the lastTableElement's actual parent in the DOM?  Keep in mind that DOM nodes can be moved around during parsing.  Usually we refer only to our own stack of open elements.

I think so.  "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element"

I filed http://www.w3.org/Bugs/Public/show_bug.cgi?id=10063 about the illegibility of that paragraph.

> WebCore/html/HTMLTreeBuilder.cpp:846
>  +          newParent->appendChild(child, ec);
> No, this is no good.  You need to collect these up into a vector and hold references to them while you twiddle the DOM.  This fires mutation events, which can do all manner of bad things to you here.

Yeah, it sucks.  We need a parserAddChild which is reparenting aware.  I could write a wrapper around parserAddChild instead.
Comment 15 Eric Seidel (no email) 2010-07-03 23:10:39 PDT
(In reply to comment #11)
> (From update of attachment 60451 [details])
> Sorry I don't see your bug.  Here are some comments though.

It's OK.  I'll see it eventually. :)

> WebCore/html/HTMLTreeBuilder.cpp:860
>  +              notImplemented(); // Check the stack of open elements for a more specific parse error.
> "if that node is also in the stack of open elements but the element is not in scope"
> 
> You seem to be missing this text.

Ahha!  You might have found my bug. ;)

> WebCore/html/HTMLTreeBuilder.cpp:805
>  +      HTMLElementStack::ElementRecord* record = m_openElements.topRecord();
> "Let the furthest block be the topmost node in the stack of open elements that is lower in the stack than the formatting element, and is not an element in the phrasing or formatting categories. There might not be one."
> 
> Wow, that's super confusing.  Let me re-phrase.  We want the "oldest" (nee topmost) one that's "younger" (nee lower) that the formatting element.  You're walking the stack from the youngest (our top) and bailing when you hit the formatting element.  Ok.  That seems right.

Yup.  I wrote it wrong the first time.  :)  I shouted a few profanities at Hixie's broken stack and then wrote it two more times.  I think it's right now. :)

> WebCore/html/HTMLTreeBuilder.cpp:875
>  +                  m_openElements.pop();
> You can just call popUntil(formattingElement)

Fixed.

> WebCore/html/HTMLTreeBuilder.cpp:881
>  +          HTMLElementStack::ElementRecord* formattingElementRecord = m_openElements.find(formattingElement);
> Lame that we need to search m_openElements again.  We just did that on line 863.

Fixed.  (We have lots of over-searching.  I was just trying to get the damn thing *right* first.... still not quite there.)

> WebCore/html/HTMLTreeBuilder.cpp:897
>  +                  m_openElements.remove(node->element());
> Here we're depending on some very specific semantics of remove.  Also, this is quite inefficient since we're going to traverse the stack again.

What? That it doesn't invalidate our saved node->next() pointer?   I think it's OK to depend on that.
It's impossible to avoid the second traversal unless we either make our list items hold bi-directional pointers, or we save "node->previous()" around so we can fix up the pointers.

> WebCore/html/HTMLTreeBuilder.cpp:909
>  +              AtomicHTMLToken fakeToken(HTMLToken::StartTag, node->element()->localName());
> We're missing the attributes here.

Yes, that's what the FIXME is about.

> WebCore/html/HTMLTreeBuilder.cpp:913
>  +              HTMLFormattingElementList::Entry* nodeEntry = m_activeFormattingElements.find(node->element());
> Another search.  Once we get this working, we need to get rid of all these extra searches.

This is a different data structure we're searching in here. :)

> WebCore/html/HTMLTreeBuilder.cpp:919
>  +              node->element()->appendChild(lastNode->element(), ec);
> Wow, that's scary.  I think it's ok because we're holding onto our datastructures here (and not the DOM's datastructures), but we to think really carefully about what can happen here.  I wish we could twiddle the DOM without firing off mutation events here.

We don't want to use appendChild.  We want a parserAddChild which can handle reparenting.

> WebCore/html/HTMLTreeBuilder.cpp:925
>  +          const AtomicString& commonAncestorTag = commonAncestor->localName();
> commonAncestorName ?

I'd rather use tag to be consistent with other code.  Also, theoretically we could/should use qualifiedNames here which would be called "tags".

Actually, this check should just be a helper function.  I just didn't have a good name for it.

> WebCore/html/HTMLTreeBuilder.cpp:939
>  +          AtomicHTMLToken fakeToken(HTMLToken::StartTag, formattingElement->localName());
> Again, we're missing the attributes.

Yup, same FIXME.
Comment 16 Eric Seidel (no email) 2010-07-03 23:14:37 PDT
(In reply to comment #12)
> (From update of attachment 60451 [details])
> WebCore/html/HTMLTreeBuilder.cpp:832
>  +      } else // fragment case
> Maybe ASSERT that we're parsing a fragment?

Fixed.

> WebCore/html/HTMLTreeBuilder.cpp:821
>  +      HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName());
> "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element."
> 
> Why do you think |last| means topmost?  It seems ambiguous to me.

I think so.  But I could be reading wrong.  Pile your lovin on http://www.w3.org/Bugs/Public/show_bug.cgi?id=10063.

Also CCing Ian so he can witness our stack-induced tears. :)

> WebCore/html/HTMLTreeBuilder.cpp:831
>  +          fosterParentElement = lastTableElementRecord->next()->element();
> I see.  You've translated "before" to "our-below", which is intuitive in this case since we're looking for the "parent".  That means "last" would be "most our-above," which means closet to our-top, which means topmost.  Maybe ask Hixie to clarify?  At least his-above and his-below is clear (if backwards).

CC'd Hixie.  I'm clearly getting some foster parenting cases wrong, so something is wrong with my current implementation.
Comment 17 Eric Seidel (no email) 2010-07-03 23:20:52 PDT
(In reply to comment #15)
> > WebCore/html/HTMLTreeBuilder.cpp:860
> >  +              notImplemented(); // Check the stack of open elements for a more specific parse error.
> > "if that node is also in the stack of open elements but the element is not in scope"
> > 
> > You seem to be missing this text.
> 
> Ahha!  You might have found my bug. ;)

That fix changed behavior... positively I think.
Comment 18 Eric Seidel (no email) 2010-07-03 23:30:12 PDT
Created attachment 60468 [details]
Addressed Adam's feedback
Comment 19 Eric Seidel (no email) 2010-07-03 23:32:34 PDT
(In reply to comment #12)
> WebCore/html/HTMLTreeBuilder.cpp:821
>  +      HTMLElementStack::ElementRecord* lastTableElementRecord = m_openElements.topmost(tableTag.localName());
> "The foster parent element is the parent element of the last table element in the stack of open elements, if there is a table element and it has such a parent element."
> 
> Why do you think |last| means topmost?  It seems ambiguous to me.

I read "last" as "last added", which would be our "topmost" or hixie's "bottommost"
Comment 20 Eric Seidel (no email) 2010-07-03 23:33:18 PDT
(In reply to comment #13)
> Rather than saving off the tokens, you might be able to use cloneNode(false).  :)

Yeah, but that would get us in trouble if a <script> got to run in between (which it could do, I think...).
Comment 21 Eric Seidel (no email) 2010-07-04 11:54:56 PDT
I found out that at least one of my new adoption tests is failing due to lack of InTable mode. :)

I believe the rest may be failing due to the lack of bookmark support in my adoption agency algorithm.  I can't (easily) add bookmark support until we change the list of active formatting elements to actually be a list instead of a vector.

I think I'm done with this patch and going to start work on the other pieces (like InTable mode) before trying to find more bugs in this new code.
Comment 22 Adam Barth 2010-07-04 13:29:55 PDT
Comment on attachment 60468 [details]
Addressed Adam's feedback

Ok.  There's still work to do here around mutation events, etc, but we can do that in a subsequent patch.  Feel free to cq+ if you removed that notImplemented() mentioned below on purpose.

WebCore/html/HTMLTreeBuilder.cpp: 
 +              notImplemented();
I thought there was a reason I had this notImplemented here.

WebCore/html/HTMLTreeBuilder.cpp:843
 +  void HTMLTreeBuilder::reparentChildren(Element* oldParent, Element* newParent)
I'm sad that this is still subject to mutation event tricks.
Comment 23 Eric Seidel (no email) 2010-07-04 14:10:31 PDT
Yes, we'll go back and fix all the mutation event stuff in a separate patch. :)
Comment 24 Eric Seidel (no email) 2010-07-04 14:11:49 PDT
(In reply to comment #22)
> (From update of attachment 60468 [details])
> Ok.  There's still work to do here around mutation events, etc, but we can do that in a subsequent patch.  Feel free to cq+ if you removed that notImplemented() mentioned below on purpose.
> 
> WebCore/html/HTMLTreeBuilder.cpp: 
>  +              notImplemented();
> I thought there was a reason I had this notImplemented here.

Oops.  I assume you mean line 424.  That was unintentional.  Will add it back. :)
Comment 25 Eric Seidel (no email) 2010-07-04 14:16:36 PDT
Committed r62468: <http://trac.webkit.org/changeset/62468>
Comment 26 WebKit Review Bot 2010-07-04 14:22:32 PDT
http://trac.webkit.org/changeset/62468 might have broken Qt Linux Release minimal
Comment 27 Eric Seidel (no email) 2010-07-04 15:10:01 PDT
Committed r62470: <http://trac.webkit.org/changeset/62470>
Comment 28 WebKit Review Bot 2010-07-04 16:38:45 PDT
http://trac.webkit.org/changeset/62470 might have broken Leopard Intel Release (Tests)