WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
74828
The HTML parser doesn't enforce the "Noah's Ark condition" from the HTML5 spec
https://bugs.webkit.org/show_bug.cgi?id=74828
Summary
The HTML parser doesn't enforce the "Noah's Ark condition" from the HTML5 spec
Adam Barth
Reported
2011-12-18 17:55:50 PST
The HTML parser doesn't enforce the "Noah's Ark condition" from the HTML5 spec
Attachments
Patch
(259.40 KB, patch)
2011-12-18 18:02 PST
,
Adam Barth
no flags
Details
Formatted Diff
Diff
incorporated review comments
(259.21 KB, patch)
2011-12-18 23:15 PST
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2011-12-18 18:02:47 PST
Created
attachment 119792
[details]
Patch
Darin Adler
Comment 2
2011-12-18 18:22:19 PST
Comment on
attachment 119792
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=119792&action=review
> Source/WebCore/ChangeLog:10 > + elements that can be in the list of active formating elements. I'm not
Typo: formating instead of formatting
> Source/WebCore/ChangeLog:11 > + entirely sure that enforcing this condition is worth the complexity,
Besides complexity, there is the performance cost. Did you measure that? I hope it’s negligible with the optimized code path you added here.
> Source/WebCore/html/parser/HTMLFormattingElementList.cpp:43 > +inline size_t attributeCount(Element* element)
Since this is in a cpp file it should have static so it gets internal linkage.
> Source/WebCore/html/parser/HTMLFormattingElementList.cpp:134 > +void HTMLFormattingElementList::tryToEnsureNoahsArkConditionQuickly(Element* newElement, Vector<Element*>& remainingCandiates)
Typo: remainingCandiates rather than remainingCandidates
> Source/WebCore/html/parser/HTMLFormattingElementList.cpp:138 > + // Initially, use a vector with inline capacity to avoid a malloc in the
The use of the word “initially” here is confusing. I know it refers to this case vs. the non-quickly function, but it is not obvious.
> Source/WebCore/html/parser/HTMLFormattingElementList.cpp:142 > + size_t newElementAttributeCount = attributeCount(newElement);
Is it possibly worth checking m_entries.size() for zero first, so we don’t have to fetch the attribute count at all in that case?
> Source/WebCore/html/parser/HTMLFormattingElementList.cpp:145 > + for (size_t i = 0; i < m_entries.size(); ++i) { > + Entry& entry = m_entries[m_entries.size() - i - 1];
I would write the loop like this: for (size_t i = m_entries.size(); i; ) { --i; Entry& entry = m_entries[i]; It’s my new favorite way to write backwards loops. I think it’s easier to read this way.
> Source/WebCore/html/parser/HTMLFormattingElementList.cpp:152 > + if (!newElement->hasTagName(candidate->tagQName())) > + continue;
I believe hasTagName here is incorrect and the correct check is even faster and stricter, element->tagQName() != candidate->tagQName(). Is that right?
> Source/WebCore/html/parser/HTMLFormattingElementList.cpp:192 > + // FIXME: Technically we shouldn't read this information back from > + // the DOM. Instead, the parser should keep a copy of the information.
Could we be more specific about why reading back from the DOM is an issue? Maybe the reason is that a script that changes the DOM should have no effect on the Noah's Ark algorithm. Or maybe there’s some other reason that you know of and I do not.
> Source/WebCore/html/parser/HTMLFormattingElementList.cpp:206 > + // formatting lement list gets permuted.
Typo: lement list
> Source/WebCore/html/parser/HTMLFormattingElementList.h:130 > + //
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#list-of-active-formatting-elements
I think a human-readable description of what this is for could be valuable too. My comment would say something like this: // The "Noah's Ark" algorithm, which removes redundant mis-nested elements.
Adam Barth
Comment 3
2011-12-18 23:15:10 PST
Created
attachment 119823
[details]
incorporated review comments
Adam Barth
Comment 4
2011-12-18 23:31:26 PST
> Besides complexity, there is the performance cost. Did you measure that? I hope it’s negligible with the optimized code path you added here.
Looks like about 1.5% on the HTMLParser benchmark. That's slightly more than I would have expected but still pretty small.
WebKit Review Bot
Comment 5
2011-12-19 00:28:27 PST
Comment on
attachment 119823
[details]
incorporated review comments Clearing flags on attachment: 119823 Committed
r103223
: <
http://trac.webkit.org/changeset/103223
>
WebKit Review Bot
Comment 6
2011-12-19 00:28:31 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug