WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
55011
Fragment parsing does not need to use HTMLSourceTracker
https://bugs.webkit.org/show_bug.cgi?id=55011
Summary
Fragment parsing does not need to use HTMLSourceTracker
Eric Seidel (no email)
Reported
2011-02-22 17:38:50 PST
Fragment parsing does not need to use HTMLSourceTracker Fragment parsing still calls m_sourceTracker.start() and end() even though the data isn't needed. This is most noticable due to the malloc it causes. However that malloc will go away if
bug 55005
is fixed.
Attachments
Patch
(3.79 KB, patch)
2011-02-22 19:35 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Patch
(4.01 KB, patch)
2011-02-22 19:50 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Patch
(5.36 KB, patch)
2011-02-23 03:13 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Eric Seidel (no email)
Comment 1
2011-02-22 19:31:06 PST
I'm not sure I believe it, but this patch seems to make our tiny-innerHTML benchmark twice as fast: Before patch: avg 5586.1 median 5594 stdev 41.295157101045135 min 5425 max 5633 After Patch: avg 2603.9 median 2609.5 stdev 32.500615378789355 min 2475 max 2649 I suspect (if those numbers are to be believed!) that's all due to avoiding
bug 55005
.
Eric Seidel (no email)
Comment 2
2011-02-22 19:35:15 PST
Created
attachment 83429
[details]
Patch
Adam Barth
Comment 3
2011-02-22 19:42:30 PST
Comment on
attachment 83429
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=83429&action=review
> Source/WebCore/html/parser/HTMLDocumentParser.h:155 > + bool m_isParsingFragment;
Can this class ask someone else whether we're parsing a fragment?
Eric Seidel (no email)
Comment 4
2011-02-22 19:46:26 PST
The reason why we don't see the same thing with PerformanceTests/Parser/html-parser.html, is that html-parser and tiny-innerHTML are totally different benchmarks. tiny-innerHTML doesn't really even make sense as a benchmark, and was only created to model Peacekeeper (which I believe to be a broken benchmark!). tiny-innerHTML (and peacekeeper) are just taking very tiny numbers (the time it takes to innerHTML a single text node) and multiplying them by very very large numbers. In the real world, we don't care if innerHTML of a single text string takes .5 miliseconds or .2 miliseconds. :) Both are well beyond what any website uses. html-parser.html on the other hand is a very useful benchmark.
Eric Seidel (no email)
Comment 5
2011-02-22 19:50:03 PST
Created
attachment 83436
[details]
Patch
Adam Barth
Comment 6
2011-02-22 19:50:55 PST
Comment on
attachment 83436
[details]
Patch Yay. This seems like a much less invasive way to get this performance.
Eric Seidel (no email)
Comment 7
2011-02-22 19:57:13 PST
I've looked at the next hottest section and it is again caused by
bug 55005
.
Andy Estes
Comment 8
2011-02-22 21:34:39 PST
(In reply to
comment #1
)
> I'm not sure I believe it, but this patch seems to make our tiny-innerHTML benchmark twice as fast: > > Before patch: > avg 5586.1 > median 5594 > stdev 41.295157101045135 > min 5425 > max 5633 > > After Patch: > avg 2603.9 > median 2609.5 > stdev 32.500615378789355 > min 2475 > max 2649 > > I suspect (if those numbers are to be believed!) that's all due to avoiding
bug 55005
.
I'm not able to reproduce as much of a speedup as you're seeing. Here are the numbers I'm getting on tiny-innerHTML: Before: avg 2990.85 median 2967 stdev 68.55382921471272 min 2907 max 3198 After: avg 2562 median 2569 stdev 33.406586176980134 min 2420 max 2582 Is it possible that you were testing with debug builds? That would explain why your before numbers are so high since malloc() is much more expensive in debug.
Andy Estes
Comment 9
2011-02-22 21:37:22 PST
Here are the peacekeeper numbers: Before: domDynamicCreationCreateElement 21691.973969631235 ops domDynamicCreationInnerHTML 13333.333333333334 ops domJQueryAttributeFilters 5047.955577990913 ops domJQueryBasics 4191.5 ops domJQueryContentFilters 1056 ops domJQueryHierarchy 1692.5 ops After: domDynamicCreationCreateElement 21739.130434782608 ops domDynamicCreationInnerHTML 14164.3059490085 ops domJQueryAttributeFilters 4954.5 ops domJQueryBasics 4182.5 ops domJQueryContentFilters 1080 ops domJQueryHierarchy 1713.5 ops
Andy Estes
Comment 10
2011-02-22 22:43:05 PST
(In reply to
comment #4
)
> The reason why we don't see the same thing with PerformanceTests/Parser/html-parser.html, is that html-parser and tiny-innerHTML are totally different benchmarks. > > tiny-innerHTML doesn't really even make sense as a benchmark, and was only created to model Peacekeeper (which I believe to be a broken benchmark!). tiny-innerHTML (and peacekeeper) are just taking very tiny numbers (the time it takes to innerHTML a single text node) and multiplying them by very very large numbers. In the real world, we don't care if innerHTML of a single text string takes .5 miliseconds or .2 miliseconds. :) Both are well beyond what any website uses. > > html-parser.html on the other hand is a very useful benchmark.
I don't think we should discount the prevalence of repeated calls to innerHTML as a content generation technique. For instance, here's a benchmark from quirksmode comparing DOM API to innerHTML for creating a 50x50 table:
http://www.quirksmode.org/dom/innerhtml.html
One interesting thing this benchmark demonstrates is that, especially in IE, innerHTML is significantly faster than using DOM API. It seems likely that at least some content authors have used this as a rule of thumb and designed pages that rely on innerHTML to generate dynamic content. That convinces me that improving fragment parsing performance is an important task regardless of our opinions of Peacekeeper as a benchmark.
WebKit Commit Bot
Comment 11
2011-02-23 02:07:50 PST
Comment on
attachment 83436
[details]
Patch Rejecting
attachment 83436
[details]
from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-3', 'build-..." exit_code: 2 Last 500 characters of output: URL ........................... http/tests/security/listener ....... http/tests/security/mixedContent ................ http/tests/security/originHeader ..... http/tests/security/postMessage ....... http/tests/security/xssAuditor ................ http/tests/security/xssAuditor/dom-write-innerHTML.html -> failed Exiting early after 1 failures. 22353 tests run. 618.34s total testing time 22352 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 14 test cases (<1%) had stderr output Full output:
http://queues.webkit.org/results/7982076
Eric Seidel (no email)
Comment 12
2011-02-23 02:17:43 PST
(In reply to
comment #8
)
> Is it possible that you were testing with debug builds? That would explain why your before numbers are so high since malloc() is much more expensive in debug.
It's not possible to run the benchmark in a Debug build (slow script dialog), so no. :) I suspect that this benchmark is so tight that we're hitting hardware caching differences, and that the improvements aren't necessarily real. It's also possible malloc is more expensive (relatively speaking) on my machine (due to locking) because it's a 16 core machine instead of whatever you're testing on. Given that Peacekeeper didn't impove as much as we saw tiny-innerHTML, perhaps the reduced benchmark is too reduced.
Eric Seidel (no email)
Comment 13
2011-02-23 03:13:03 PST
@abarth: I've removed support for XSSFiltering innerHTML content as per our discussion. Please re-confirm that this is OK.
Eric Seidel (no email)
Comment 14
2011-02-23 03:13:44 PST
Created
attachment 83463
[details]
Patch
Eric Seidel (no email)
Comment 15
2011-02-23 03:28:26 PST
(In reply to
comment #10
)
> (In reply to
comment #4
) > > The reason why we don't see the same thing with PerformanceTests/Parser/html-parser.html, is that html-parser and tiny-innerHTML are totally different benchmarks. > > > > tiny-innerHTML doesn't really even make sense as a benchmark, and was only created to model Peacekeeper (which I believe to be a broken benchmark!). tiny-innerHTML (and peacekeeper) are just taking very tiny numbers (the time it takes to innerHTML a single text node) and multiplying them by very very large numbers. In the real world, we don't care if innerHTML of a single text string takes .5 miliseconds or .2 miliseconds. :) Both are well beyond what any website uses. > > > > html-parser.html on the other hand is a very useful benchmark. > > I don't think we should discount the prevalence of repeated calls to innerHTML as a content generation technique. For instance, here's a benchmark from quirksmode comparing DOM API to innerHTML for creating a 50x50 table:
>
>
http://www.quirksmode.org/dom/innerhtml.html
Sure. It's just that the bechmark does 100k innerHTMLs, each of which takes only 0.056 milliseconds! The fact that we're fretting about reducing each to 0.026 milliseconds is silly IMO. :) Shaving 0.03ms per innerHTML does not seem worth our time (which is why I continue to believe the benchmark is crap). It's more interesting that we can parse an entire 5MB HTML file in 800ms, and continuing to make that number smaller seems useful.
> One interesting thing this benchmark demonstrates is that, especially in IE, innerHTML is significantly faster than using DOM API. It seems likely that at least some content authors have used this as a rule of thumb and designed pages that rely on innerHTML to generate dynamic content.
Cool. I'll take a look at
http://www.quirksmode.org/dom/innerhtml.html
and see if there is anything useful there.
> That convinces me that improving fragment parsing performance is an important task regardless of our opinions of Peacekeeper as a benchmark.
Yes, but peacekeeper (or at lest tiny-innerHTML) does not actually test fragment parsing performance since total time to parse the example fragment is counted in microseconds. :) We could continue to improve our longer-running parser benchmarks and probably see further improvements to the PLT score (which is also basically useless due to being too small a set these days) and other "more real" pages.
Eric Seidel (no email)
Comment 16
2011-02-23 03:29:53 PST
I think we mostly agree, btw. And are just re-stating the same things back and forth. :) I'm interested in improving perf here and will spend another day looking at this tomorrow.
WebKit Commit Bot
Comment 17
2011-02-24 06:34:26 PST
Comment on
attachment 83463
[details]
Patch Clearing flags on attachment: 83463 Committed
r79554
: <
http://trac.webkit.org/changeset/79554
>
WebKit Commit Bot
Comment 18
2011-02-24 06:34:30 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug