Bug 43423 - HTML5 parser may cause onload not to fire
Summary: HTML5 parser may cause onload not to fire
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Ben Murdoch
URL:
Keywords:
Depends on:
Blocks: 41115
  Show dependency treegraph
 
Reported: 2010-08-03 08:43 PDT by Ben Murdoch
Modified: 2010-08-04 02:31 PDT (History)
2 users (show)

See Also:


Attachments
Fix and potential layout test ideas (1.64 KB, patch)
2010-08-03 11:50 PDT, Ben Murdoch
no flags Details | Formatted Diff | Diff
Patch. (1.80 KB, patch)
2010-08-03 13:14 PDT, Ben Murdoch
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Murdoch 2010-08-03 08:43:24 PDT
If a sufficiently complex page causes the HTML5 parser to yield (i.e. > 4096 tokens and > 0.5 parsing time), when we return to parsing and complete the document, onload is not fired.

Patch to follow.
Comment 1 Ben Murdoch 2010-08-03 11:48:51 PDT
Attaching the code change that I propose to fix the bug.

I haven't set r? yet or created a ChangeLog as I've been trying to write a layout test for this. It's tricky as we need to make the parser stall (without creating a new PumpSession) so that it triggers the timeout on HTMLParserScheduler.h:59, i.e. elapsedTime needs to be greater than 0.5 seconds. Thanks to Adam's advice on trying to write a PHP script that sleeps/flushes, I managed to write a layout test that works on Windows, but natch when moving it to my Mac to try it doesn't trigger the bug. I fear this is going to be very tricky to reliably test due to the timing based nature of what we are testing. If for example the parser speeds up due to a hardware change we might not trigger the timeout anymore and this test would become useless.

Any ideas folks? I can attach the layout test that worked on Windows if it might help.
Comment 2 Ben Murdoch 2010-08-03 11:50:37 PDT
Created attachment 63362 [details]
Fix and potential layout test ideas

The Windows layout test is actually attached to the patch as well.
Comment 3 Eric Seidel (no email) 2010-08-03 12:04:48 PDT
Comment on attachment 63362 [details]
Fix and potential layout test ideas

You may be fighting the layout timer.

The two constants you're trying to trip in your test are:

// defaultParserChunkSize is used to define how many tokens the parser will
// process before checking against parserTimeLimit and possibly yielding.
// This is a performance optimization to prevent checking after every token.
static const int defaultParserChunkSize = 4096;

// defaultParserTimeLimit is the seconds the parser will run in one write() call
// before yielding.  Inline <script> execution can cause it to excede the limit.
// FIXME: We would like this value to be 0.2.
static const double defaultParserTimeLimit = 0.500;

Why do you ahve this section:
+<!--
+<?php
+print str_repeat("A", 2048);
+?>
+-->

To get around some buffering delay at the network layer?

What do you mean by "start a new PumpSession"?

Isn't the parser in full control until it yields (by being blocked by a script).

I guess you have to be careful that the final pumping will be done inside resumeParsingAfterYield, so you need to send more than 4096 tokens, but fewer than 8192, right?

Ideally you'd want the tokens as small as possible.

You might try sending <a></a> instead (two tokens).  You could send <a>a</a>a to get 4 tokens per bunch instead.

Of course you have to send the 4096th token after 0.5 seconds to hav it yield.  And then you need to have sent all of the rest of the document before it ends up resuming from its yield.
Comment 4 Eric Seidel (no email) 2010-08-03 12:05:11 PDT
I agree, this is complicated to test and may not be reliably testable.
Comment 5 Ben Murdoch 2010-08-03 12:09:17 PDT
(In reply to comment #3)

Thanks for the ideas Eric!

> Why do you ahve this section:
> +<!--
> +<?php
> +print str_repeat("A", 2048);
> +?>
> +-->
> 
> To get around some buffering delay at the network layer?

Exactly.

> 
> What do you mean by "start a new PumpSession"?

Well, it seemed to me that the startTime that we compare with when we decide to yield or not is set in the PumpSession constructor. I'm not exactly sure what constitutes a PumpSession though?

>
> Isn't the parser in full control until it yields (by being blocked by a script).
> 
> I guess you have to be careful that the final pumping will be done inside resumeParsingAfterYield, so you need to send more than 4096 tokens, but fewer than 8192, right?
> 
> Ideally you'd want the tokens as small as possible.
> 
> You might try sending <a></a> instead (two tokens).  You could send <a>a</a>a to get 4 tokens per bunch instead.
> 
> Of course you have to send the 4096th token after 0.5 seconds to hav it yield.  And then you need to have sent all of the rest of the document before it ends up resuming from its yield.

Good suggestions, I'll give it a shot.

Thanks!
Comment 6 Ben Murdoch 2010-08-03 12:46:08 PDT
Still no luck.

I think the problem is that as soon as the pump loop in HTMLDocumentParser doesn't see a token, we stop pumping (and also presumably then the PumpingSession?). Trying to stall by sleeping in the php script causes the tokenizer to not get a token, so we break out of the loop and so when the PHP resumes, we need to send another 4096 tokens, etc...

Test aside, does this look like a sane fix?
Comment 7 Eric Seidel (no email) 2010-08-03 12:55:53 PDT
The fix looks great.

I think you should upload a patch which explains that this seems impossible to test.

And then I'll happily r+ it.
Comment 8 Ben Murdoch 2010-08-03 13:14:54 PDT
Created attachment 63371 [details]
Patch.

Awesome, thanks Eric!
Comment 9 Darin Adler 2010-08-03 13:26:31 PDT
Comment on attachment 63371 [details]
Patch.

Typo: "dependant"
Comment 10 Ben Murdoch 2010-08-04 02:31:58 PDT
Fixed Changelog typo.

Sending        WebCore/ChangeLog
Sending        WebCore/html/HTMLDocumentParser.cpp
Transmitting file data ..
Committed revision 64638.