Bug 63082 - HTML parser should limit element depth of produced tree
Summary: HTML parser should limit element depth of produced tree
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Sam Weinig
URL:
Keywords:
Depends on:
Blocks: 63686
  Show dependency treegraph
 
Reported: 2011-06-21 11:26 PDT by Sam Weinig
Modified: 2011-06-29 23:26 PDT (History)
7 users (show)

See Also:


Attachments
Patch (15.88 KB, patch)
2011-06-21 18:53 PDT, Sam Weinig
no flags Details | Formatted Diff | Diff
Patch (15.88 KB, patch)
2011-06-21 18:55 PDT, Sam Weinig
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.39 MB, application/zip)
2011-06-21 19:18 PDT, WebKit Review Bot
no flags Details
Patch (17.91 KB, patch)
2011-06-21 20:14 PDT, Sam Weinig
no flags Details | Formatted Diff | Diff
Patch (18.21 KB, patch)
2011-06-21 22:34 PDT, Sam Weinig
abarth: review+
abarth: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Weinig 2011-06-21 11:26:36 PDT
The old HTML parser (and the current XML parser) constrained the depth of the tree it produced, in part to avoid running into stack overflow when laying out (due to the recursive nature of WebCore rendering).  It would be great to get this feature back.
Comment 1 Sam Weinig 2011-06-21 18:53:53 PDT
Created attachment 98097 [details]
Patch
Comment 2 Sam Weinig 2011-06-21 18:55:03 PDT
Created attachment 98098 [details]
Patch
Comment 3 WebKit Review Bot 2011-06-21 18:57:01 PDT
Attachment 98097 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1

Source/WebCore/html/parser/HTMLTreeBuilder.h:119:  The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/html/parser/HTMLTreeBuilder.h:120:  The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 2 in 11 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 WebKit Review Bot 2011-06-21 18:58:08 PDT
Attachment 98098 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1

Source/WebCore/html/parser/HTMLTreeBuilder.h:119:  The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/html/parser/HTMLTreeBuilder.h:120:  The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 2 in 11 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Joseph Pecoraro 2011-06-21 19:00:30 PDT
No test? We already lost this feature once, maybe because
it didn't have a test. =)
Comment 6 Sam Weinig 2011-06-21 19:08:31 PDT
(In reply to comment #5)
> No test? We already lost this feature once, maybe because
> it didn't have a test. =)

The best way to test this would be to be able to set the limit quite low, parse a bunch of nested <div>s and check the depth of the deepest one.  I will try and whip that up.
Comment 7 WebKit Review Bot 2011-06-21 19:18:41 PDT
Comment on attachment 98098 [details]
Patch

Attachment 98098 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8919190

New failing tests:
fast/parser/block-nesting-cap.html
fast/parser/element-nesting-cap.html
Comment 8 WebKit Review Bot 2011-06-21 19:18:46 PDT
Created attachment 98101 [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 9 Sam Weinig 2011-06-21 19:30:05 PDT
(In reply to comment #7)
> (From update of attachment 98098 [details])
> Attachment 98098 [details] did not pass chromium-ews (chromium-xvfb):
> Output: http://queues.webkit.org/results/8919190
> 
> New failing tests:
> fast/parser/block-nesting-cap.html
> fast/parser/element-nesting-cap.html

Look at that, there already were tests, they were just failing :)
Comment 10 Sam Weinig 2011-06-21 19:42:49 PDT
There is some wrongness in this patch.  I will have a new one shortly.
Comment 11 Sam Weinig 2011-06-21 20:14:42 PDT
Created attachment 98109 [details]
Patch
Comment 12 Sam Weinig 2011-06-21 20:16:13 PDT
(In reply to comment #11)
> Created an attachment (id=98109) [details]
> Patch

This new patch also updates the tests results. The failures in fast/parser/element-nesting-cap.html come from the fact that the old parser had a different limit for different types of elements.
Comment 13 WebKit Review Bot 2011-06-21 20:17:21 PDT
Attachment 98109 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1

Source/WebCore/html/parser/HTMLTreeBuilder.h:119:  The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/html/parser/HTMLTreeBuilder.h:120:  The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 2 in 14 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 14 mitz 2011-06-21 20:45:37 PDT
Comment on attachment 98109 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98109&action=review

> Source/WebCore/ChangeLog:16
> +        Limit the DOM tree depth by turning attempts to add a child passed a certain

Typo: passed

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:575
> +    ASSERT(document);
> +    if (!document->settings())

Since you are immediately dereferencing document, the assertion has very limited usefulness.

> Source/WebCore/html/parser/HTMLElementStack.h:78
> +    size_t size() const { return m_size; }

Why are you using size_t for the size of the stack but unsigned for the depth of the DOM tree?

> Source/WebCore/page/Settings.cpp:103
> +    , m_maximumDOMTreeDepth(4096)

Please turn this into a named constant, and expose it via a static member in Settings so that HTMLDocumentParser::maximumDOMTreeDepth() can refer to it.

> Source/WebCore/page/Settings.h:392
> +        void setMaximumDOMTreeDepth(unsigned maximumDOMTreeDepth) { m_maximumDOMTreeDepth = maximumDOMTreeDepth; }
> +        unsigned maximumDOMTreeDepth() const { return m_maximumDOMTreeDepth; }

This name is misleading. The name should indicate that this limit only applies to the parser, not to the DOM API.
Comment 15 Alexey Proskuryakov 2011-06-21 21:22:40 PDT
-PASS s2.parentNode === s1 is true
-FAIL s3.previousSibling === s2 should be true. Was false.
+FAIL s2.parentNode === s1 should be true. Was false.
+PASS s3.previousSibling === s2 is true

Is this test wrong?
Comment 16 Sam Weinig 2011-06-21 21:45:06 PDT
(In reply to comment #15)
> -PASS s2.parentNode === s1 is true
> -FAIL s3.previousSibling === s2 should be true. Was false.
> +FAIL s2.parentNode === s1 should be true. Was false.
> +PASS s3.previousSibling === s2 is true
> 
> Is this test wrong?

Define wrong.  We used to have a different/larger limit for inlines, something about a test of hyatt's. Mitz might know more.
Comment 17 Alexey Proskuryakov 2011-06-21 22:00:33 PDT
> Define wrong.

Is it something that needs to be fixed, or should the test be fixed to say PASS?
Comment 18 Sam Weinig 2011-06-21 22:07:35 PDT
(In reply to comment #17)
> > Define wrong.
> 
> Is it something that needs to be fixed, or should the test be fixed to say PASS?

Ah, I don't know the answer to that.
Comment 19 Adam Barth 2011-06-21 22:29:24 PDT
Comment on attachment 98109 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98109&action=review

I agree with mitz's comments.  Eric should verify that we're keeping track of m_size properly.

> Source/WebCore/html/parser/HTMLConstructionSite.cpp:103
> +        parent->parserAddChild(child);
> +    } else
> +        parent->parserAddChild(child);

Should these lines go outside the if-block?
Comment 20 Sam Weinig 2011-06-21 22:34:20 PDT
Created attachment 98119 [details]
Patch
Comment 21 WebKit Review Bot 2011-06-21 22:36:03 PDT
Attachment 98119 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1

Source/WebCore/html/parser/HTMLTreeBuilder.h:119:  The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/html/parser/HTMLTreeBuilder.h:120:  The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 2 in 14 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 22 Adam Barth 2011-06-22 10:22:32 PDT
Comment on attachment 98119 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98119&action=review

This looks good.  I've now checked the HTMLElementStack accounting in detail.  I think the only remaining issue is whether the foster-parenting code path correct enforces the depth limit.  I'm marking r+/cq-, so you should feel free to either land this patch with the nits fixed and deal with foster parenting in a followup patch or post an updated version of the patch with the foster parenting issues addressed/tested.

> Source/WebCore/html/parser/HTMLConstructionSite.cpp:103
> +        parent->parserAddChild(child);
> +    } else
> +        parent->parserAddChild(child);

Looks like you didn't move this code outside the if-block.

Do we need a similar check in HTMLConstructionSite::attachAtSite (or in the code that chooses the site for attachAtSite?

>> Source/WebCore/html/parser/HTMLTreeBuilder.h:119
>> +    HTMLTreeBuilder(HTMLDocumentParser* parser, HTMLDocument*, bool reportErrors, bool usePreHTML5ParserQuirks, unsigned maximumDOMTreeDepth);
> 
> The parameter name "parser" adds no information, so it should be removed.  [readability/parameter_name] [5]

You could fix these style nits if you were so inclined.
Comment 23 Eric Seidel (no email) 2011-06-22 11:02:46 PDT
Do we know why we need this?  Does this crash iPhone safari?  Does this need to be a security bug?
Comment 24 Eric Seidel (no email) 2011-06-22 11:09:47 PDT
Comment on attachment 98119 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98119&action=review

> LayoutTests/fast/parser/element-nesting-cap-expected.txt:7
> +FAIL s2.parentNode === s1 should be true. Was false.

We should either adjust the constant in this test from 5100 to 4096 (or whatever it was decided on), or remove the test (since I"m not sure how much it adds over the block-nesting-cap.html
Comment 25 Eric Seidel (no email) 2011-06-22 11:11:27 PDT
You may have already done so, but just in case: you should run the HTML parser perf tests:
http://trac.webkit.org/browser/trunk/PerformanceTests/Parser

They're quick and easy, and will make sure that we don't regress other public benchmarks that rely on html parsing.

This is unlikely to be a hit, as we've found that per-token branches don't really affect perf (but per char ones definitely do!)
Comment 26 Eric Seidel (no email) 2011-06-22 11:12:00 PDT
It would also be nice to add some commentary in the tests or code as to why this is needed. :)
Comment 27 Sam Weinig 2011-06-22 11:26:09 PDT
Committed r89453: <http://trac.webkit.org/changeset/89453>
Comment 28 Sam Weinig 2011-06-22 11:37:54 PDT
Eric, sorry I missed your comments before landing. I don't think this needs to be a security bug as the result of the crashes that can result have not been exploitable.  

In general, this code is needed because rendering can become pathological for largely nested code.  It was also present in the old parser, and I find it objectionable that is was not added to the new one, even though there were tests that had to be turned into failures.  In general, we should not be removing these type of safe guards without understanding why it was added in the first place.

Regarding the test that still fails.  I want to talk to hyatt about why he wanted that behavior in the first place before changing it, but alas, he is on vacation.
Comment 29 Sam Weinig 2011-06-22 11:38:30 PDT
(In reply to comment #25)
> You may have already done so, but just in case: you should run the HTML parser perf tests:
> http://trac.webkit.org/browser/trunk/PerformanceTests/Parser
> 
> They're quick and easy, and will make sure that we don't regress other public benchmarks that rely on html parsing.
> 
> This is unlikely to be a hit, as we've found that per-token branches don't really affect perf (but per char ones definitely do!)

I ran the perf tests and saw no change.