RESOLVED FIXED 74828
The HTML parser doesn't enforce the "Noah's Ark condition" from the HTML5 spec
https://bugs.webkit.org/show_bug.cgi?id=74828
Summary The HTML parser doesn't enforce the "Noah's Ark condition" from the HTML5 spec
Adam Barth
Reported 2011-12-18 17:55:50 PST
The HTML parser doesn't enforce the "Noah's Ark condition" from the HTML5 spec
Attachments
Patch (259.40 KB, patch)
2011-12-18 18:02 PST, Adam Barth
no flags
incorporated review comments (259.21 KB, patch)
2011-12-18 23:15 PST, Adam Barth
no flags
Adam Barth
Comment 1 2011-12-18 18:02:47 PST
Darin Adler
Comment 2 2011-12-18 18:22:19 PST
Comment on attachment 119792 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=119792&action=review > Source/WebCore/ChangeLog:10 > + elements that can be in the list of active formating elements. I'm not Typo: formating instead of formatting > Source/WebCore/ChangeLog:11 > + entirely sure that enforcing this condition is worth the complexity, Besides complexity, there is the performance cost. Did you measure that? I hope it’s negligible with the optimized code path you added here. > Source/WebCore/html/parser/HTMLFormattingElementList.cpp:43 > +inline size_t attributeCount(Element* element) Since this is in a cpp file it should have static so it gets internal linkage. > Source/WebCore/html/parser/HTMLFormattingElementList.cpp:134 > +void HTMLFormattingElementList::tryToEnsureNoahsArkConditionQuickly(Element* newElement, Vector<Element*>& remainingCandiates) Typo: remainingCandiates rather than remainingCandidates > Source/WebCore/html/parser/HTMLFormattingElementList.cpp:138 > + // Initially, use a vector with inline capacity to avoid a malloc in the The use of the word “initially” here is confusing. I know it refers to this case vs. the non-quickly function, but it is not obvious. > Source/WebCore/html/parser/HTMLFormattingElementList.cpp:142 > + size_t newElementAttributeCount = attributeCount(newElement); Is it possibly worth checking m_entries.size() for zero first, so we don’t have to fetch the attribute count at all in that case? > Source/WebCore/html/parser/HTMLFormattingElementList.cpp:145 > + for (size_t i = 0; i < m_entries.size(); ++i) { > + Entry& entry = m_entries[m_entries.size() - i - 1]; I would write the loop like this: for (size_t i = m_entries.size(); i; ) { --i; Entry& entry = m_entries[i]; It’s my new favorite way to write backwards loops. I think it’s easier to read this way. > Source/WebCore/html/parser/HTMLFormattingElementList.cpp:152 > + if (!newElement->hasTagName(candidate->tagQName())) > + continue; I believe hasTagName here is incorrect and the correct check is even faster and stricter, element->tagQName() != candidate->tagQName(). Is that right? > Source/WebCore/html/parser/HTMLFormattingElementList.cpp:192 > + // FIXME: Technically we shouldn't read this information back from > + // the DOM. Instead, the parser should keep a copy of the information. Could we be more specific about why reading back from the DOM is an issue? Maybe the reason is that a script that changes the DOM should have no effect on the Noah's Ark algorithm. Or maybe there’s some other reason that you know of and I do not. > Source/WebCore/html/parser/HTMLFormattingElementList.cpp:206 > + // formatting lement list gets permuted. Typo: lement list > Source/WebCore/html/parser/HTMLFormattingElementList.h:130 > + // http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#list-of-active-formatting-elements I think a human-readable description of what this is for could be valuable too. My comment would say something like this: // The "Noah's Ark" algorithm, which removes redundant mis-nested elements.
Adam Barth
Comment 3 2011-12-18 23:15:10 PST
Created attachment 119823 [details] incorporated review comments
Adam Barth
Comment 4 2011-12-18 23:31:26 PST
> Besides complexity, there is the performance cost. Did you measure that? I hope it’s negligible with the optimized code path you added here. Looks like about 1.5% on the HTMLParser benchmark. That's slightly more than I would have expected but still pretty small.
WebKit Review Bot
Comment 5 2011-12-19 00:28:27 PST
Comment on attachment 119823 [details] incorporated review comments Clearing flags on attachment: 119823 Committed r103223: <http://trac.webkit.org/changeset/103223>
WebKit Review Bot
Comment 6 2011-12-19 00:28:31 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.