WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
97270
DOM-in-JavaScript performance experiments
https://bugs.webkit.org/show_bug.cgi?id=97270
Summary
DOM-in-JavaScript performance experiments
Adam Barth
Reported
2012-09-20 16:09:28 PDT
This bug is just a holder for DOM-in-JavaScript performance experiment patches. If you don't care about DOM-in-JavaScript performance, you can safely ignore it.
Attachments
Patch to measure the cost of pimpl for Node (-3% on Dromaeo/dom-traverse; -5% on Dromaeo/dom-modify)
(3.15 KB, patch)
2012-09-20 16:10 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch to Dromaeo/dom-traverse to compare CPP DOM and JS DOM (29.32% better with JS DOM)
(2.81 KB, patch)
2012-09-20 16:27 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Version in which JS outperforms CPP in JavaScriptCore as well
(3.06 KB, patch)
2012-09-26 16:22 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2012-09-20 16:10:49 PDT
Created
attachment 165006
[details]
Patch to measure the cost of pimpl for Node (-3% on Dromaeo/dom-traverse; -5% on Dromaeo/dom-modify)
Adam Barth
Comment 2
2012-09-20 16:27:34 PDT
Created
attachment 165007
[details]
Patch to Dromaeo/dom-traverse to compare CPP DOM and JS DOM (29.32% better with JS DOM)
Kentaro Hara
Comment 3
2012-09-20 16:30:18 PDT
ccing arv@, who has been also prototyping DOM-in-JavaScript.
Adam Barth
Comment 4
2012-09-20 17:04:03 PDT
Looks like arv has been running some experiments too:
https://github.com/arv/JS-DOM-Test
Erik Arvidsson
Comment 5
2012-09-21 06:52:53 PDT
Let me see if I understand this correctly. The pure cpp impact is -5% but we get almost 30% faster from JS?
Adam Barth
Comment 6
2012-09-21 08:28:22 PDT
We don't have enough data to draw those conclusions, but so far that's about the scale of the performance changes. These patches are just rough approximations. Haraken has also suggested some more experiments, which I'd like to run.
Sam Weinig
Comment 7
2012-09-22 23:49:59 PDT
Can you explain a bit more about what this experiment is looking at?
Adam Barth
Comment 8
2012-09-23 12:10:05 PDT
> Can you explain a bit more about what this experiment is looking at?
Currently all of the DOM structure is stored outside the JavaScript VM. We're experimenting with the idea of uploading some amount of this state into the JavaScript VM. At least from these simple experiments, there does appear to a significant amount of performance on the table. I certainly don't have a complete design in mind. At the moment, I'm leaning towards some sort of state representation that can be shared by the JavaScript VM and the CPP implementation. In particular, to get the full advantage of the JavaScript VM's memory management, we'd likely want this state to be understandable (and compactable) by the garbage collector. We're just exploring the design space. The performance characteristics of JavaScript VMs have changed radically since the design of the current bindings. It seems likely that there are different performance considerations now. If you're interested in this topic, I'm happy to involve you in more of the discussions.
Sam Weinig
Comment 9
2012-09-23 16:14:10 PDT
(In reply to
comment #8
)
> > Can you explain a bit more about what this experiment is looking at? > > Currently all of the DOM structure is stored outside the JavaScript VM. We're experimenting with the idea of uploading some amount of this state into the JavaScript VM. At least from these simple experiments, there does appear to a significant amount of performance on the table. > > I certainly don't have a complete design in mind. At the moment, I'm leaning towards some sort of state representation that can be shared by the JavaScript VM and the CPP implementation. In particular, to get the full advantage of the JavaScript VM's memory management, we'd likely want this state to be understandable (and compactable) by the garbage collector. > > We're just exploring the design space. The performance characteristics of JavaScript VMs have changed radically since the design of the current bindings. It seems likely that there are different performance considerations now. If you're interested in this topic, I'm happy to involve you in more of the discussions.
Please do. And also please involve the folks working on JavaScriptCore.
Adam Barth
Comment 10
2012-09-23 19:19:24 PDT
Anyone in particular?
Sam Weinig
Comment 11
2012-09-23 20:06:04 PDT
(In reply to
comment #10
)
> Anyone in particular?
At the very least Geoff Garen.
Maciej Stachowiak
Comment 12
2012-09-26 14:13:16 PDT
Adam, I am intrigued by the possibility of 30% speedups. How do I interpret the results of the patched Dromaeo/dom-traverse? I get output like this, how do I interpret it? Which are the C++ DOM numbers and which are the JS DOM numbers? -------- 233.2742048370791 228.85231409663135 1419 1451.8 250.69890229530947 Time: avg 3583.6254212290196 runs/s median 0 runs/s stdev 106.78560737165455 runs/s min 3374.3917685085466 runs/s max 3653.791186857055 runs/s
Adam Barth
Comment 13
2012-09-26 14:21:37 PDT
> Adam, I am intrigued by the possibility of 30% speedups.
Me too! I should warn you that these are just experiments. There isn't a real design here. Just some evidence that we might want to look into moving move of the DOM into JavaScript.
> How do I interpret the results of the patched Dromaeo/dom-traverse? I get output like this, how do I interpret it? Which are the C++ DOM numbers and which are the JS DOM numbers?
If you look in the dom-traverse patch, you'll see the following two lines: + document.body.cppFirstChild = document.body.firstChild; + document.body.jsFirstChild = mirror(document.body.firstChild); What this means is that document.body.cppFirstChild points to the normal implementation of the DOM: namely backed by CPP code in WebCore. On the other hand, document.body.jsFirstChild is a JavaScript-only mirror of the DOM structure, created just by wiring up firstChild/lastChild/nextSibling/previousSibling properties up on empty JavaScript objects. (I haven't tested the mirror function, so it might have bugs.) If you look in the nextSibling test, you'll see the following line of code: + var cur = document.body.jsFirstChild; // Change to document.body.cppFirstChild That means the test is running with the JavaScript version of the DOM tree. If you change jsFirstChild to cppFirstChild, you'll be running with the CPP version of the DOM tree. You can run test once with jsFirstChild and once with cppFirstChild to compare the performance differences between the two implementations of the tree. The way I've been doing that with with the run-perf-tests script, which creates nice graphs for comparing the output: $ ./Tools/Scripts/run-perf-tests PerformanceTests/Dromaeo/dom-traverse.html You can also just compare the avg runs/s numerically.
Adam Barth
Comment 14
2012-09-26 14:23:10 PDT
(I should note that I changed both the nextSibling and the previousSibling tests at the same time. You can also delete one or the other since they're very similar tests.)
Maciej Stachowiak
Comment 15
2012-09-26 14:38:26 PDT
Thanks for the tips! Here are some results I got: JS - Safari TOT avg 253710.8 runs/s CPP - Safari TOT avg 499435.80000000005 runs/s That looks like the JS-based version is about 50% slower, not 30% faster. Am I misunderstanding the test or what the claim was? I wanted to try the test in Chrome Canary but it doesn't seem to run there.
Adam Barth
Comment 16
2012-09-26 15:03:19 PDT
> Here are some results I got:
Fascinating.
> That looks like the JS-based version is about 50% slower, not 30% faster. Am I misunderstanding the test or what the claim was?
Here are the numbers I got (five minutes ago) with DumpRenderTree in the chromium port: $ ./out/Release/DumpRenderTree file:///src/abarth-webkit/PerformanceTests/Dromaeo/dom-traverse.html == js == avg 326188 runs/s median 0 runs/s stdev 1218.1536848854498 runs/s min 325071 runs/s max 328243 runs/s == cpp == avg 266485.3966033966 runs/s median 0 runs/s stdev 671.5778183390692 runs/s min 265506 runs/s max 267288.48851148854 runs/s That shows the JS implementation running faster than the CPP implementation. I need re-run these tests on a Mac to compare the absolute numbers between JSC and V8. We might be getting different outcomes here because there are two moving pieces: 1) The performance of the JSC and the V8 bindings code. 2) The performance of the underlying JavaScript engine.
Maciej Stachowiak
Comment 17
2012-09-26 16:02:26 PDT
(In reply to
comment #16
)
> > Here are some results I got: > > Fascinating. > > > That looks like the JS-based version is about 50% slower, not 30% faster. Am I misunderstanding the test or what the claim was? > > Here are the numbers I got (five minutes ago) with DumpRenderTree in the chromium port: > > $ ./out/Release/DumpRenderTree file:///src/abarth-webkit/PerformanceTests/Dromaeo/dom-traverse.html > > == js == > > avg 326188 runs/s > median 0 runs/s > stdev 1218.1536848854498 runs/s > min 325071 runs/s > max 328243 runs/s > > == cpp == > > avg 266485.3966033966 runs/s > median 0 runs/s > stdev 671.5778183390692 runs/s > min 265506 runs/s > max 267288.48851148854 runs/s > > That shows the JS implementation running faster than the CPP implementation. I need re-run these tests on a Mac to compare the absolute numbers between JSC and V8. > > We might be getting different outcomes here because there are two moving pieces: > > 1) The performance of the JSC and the V8 bindings code. > 2) The performance of the underlying JavaScript engine.
I will try to rearrange the test to be a single file and print both results, so it can easily be loaded in different browsers. Because I'm also curious what the results are in different versions. And that is probably easier for me than building chromium webkit.
Adam Barth
Comment 18
2012-09-26 16:12:15 PDT
Ok, I just ran WebKit@129710 in a Mac Pro in both chromium-mac and apple-mac: == js == chromium-mac: avg 190979.59999999998 runs/s apple-mac: avg 178240.40000000002 runs/s == cpp == chromium-mac: avg 148605.44592533214 runs/s apple-mac: avg 388969 runs/s The performance ordering appears to be: apple-mac-cpp > chromium-mac-js > apple-mac-js > chromium-mac-cpp
Adam Barth
Comment 19
2012-09-26 16:18:53 PDT
With a small tweak to how we build the JS objets, I can make the JS version outperform the CPP version in JSC as well: chromium-mac-js2: 191981.8 runs/s apple-mac-js2: 472447.4 runs/s The difference is that js2 uses createMirrorDOMNode() to add the properties in the same order for every JavaScript "DOM node." (Attaching updated perf test shortly.)
Adam Barth
Comment 20
2012-09-26 16:22:18 PDT
Created
attachment 165894
[details]
Version in which JS outperforms CPP in JavaScriptCore as well
Adam Barth
Comment 21
2012-09-26 16:31:53 PDT
Recoding my command lines for copy-pasta later: $ DYLD_FRAMEWORK_PATH=/Users/abarth/svn/webkit/WebKitBuild/Release ./WebKitBuild/Release/DumpRenderTree file:///Users/abarth/svn/webkit/PerformanceTests/Dromaeo/dom-traverse.html $ ./out/Release/DumpRenderTree.app/Contents/MacOS/DumpRenderTree file:///Users/abarth/svn/webkit/PerformanceTests/Dromaeo/dom-traverse.html
Maciej Stachowiak
Comment 22
2012-09-26 18:14:59 PDT
(In reply to
comment #18
)
> Ok, I just ran WebKit@129710 in a Mac Pro in both chromium-mac and apple-mac: > > == js == > chromium-mac: avg 190979.59999999998 runs/s > apple-mac: avg 178240.40000000002 runs/s > > == cpp == > chromium-mac: avg 148605.44592533214 runs/s > apple-mac: avg 388969 runs/s > > The performance ordering appears to be: > > apple-mac-cpp > chromium-mac-js > apple-mac-js > chromium-mac-cpp
Seems the main issue identified here is that V8 DOM bindings are very slow compared to JSC DOM bindings. (I will look into the version that seems to show a benefit in JSC as well.)
Kentaro Hara
Comment 23
2012-09-26 18:24:35 PDT
> > apple-mac-cpp > chromium-mac-js > apple-mac-js > chromium-mac-cpp
>
> Seems the main issue identified here is that V8 DOM bindings are very slow compared to JSC DOM bindings.
V8 DOM bindings are still very slower than JSC DOM bindings, that's true to some extent. (I've been optimizing the performance for months.) That being said, conceptually, I think that apple-mac-js should be faster than apple-mac-cpp. In apple-mac-cpp, .nextSibling needs to call back JSC binding, do some value conversion from JS to C++, do something in WebCore, do some value conversion from C++ to JS, and return back to JSC. On the other hand, in apple-mac-js, .nextSibling doesn't need to call back JSC binding. Everything completes without leaving the JIT code. In that sense, in the world where we optimize things appropriately, apple-mac-js will become faster than apple-mac-cpp.
Adam Barth
Comment 24
2012-09-26 19:49:35 PDT
> Seems the main issue identified here is that V8 DOM bindings are very slow compared to JSC DOM bindings.
That may well be true (and is something we can work on in another bug), but
Comment #19
seems to indiciate that the first JS implementation was not being fully optimized by JSC. In the second JS implementation, the JS version appears to be significantly faster than the CPP version, even in JSC.
Maciej Stachowiak
Comment 25
2012-09-26 20:26:52 PDT
(In reply to
comment #24
)
> > Seems the main issue identified here is that V8 DOM bindings are very slow compared to JSC DOM bindings. > > That may well be true (and is something we can work on in another bug), but
Comment #19
seems to indiciate that the first JS implementation was not being fully optimized by JSC. In the second JS implementation, the JS version appears to be significantly faster than the CPP version, even in JSC.
As mentioned, I'm definitely interested in that and will look into it.
Geoffrey Garen
Comment 26
2012-09-26 23:38:20 PDT
From <
http://trac.webkit.org/wiki/DOMInJavaScript?version=3
>, and the dom-traverse test case, it looks like the main goal here is to optimize DOM traversal. Are there other goals?
Kentaro Hara
Comment 27
2012-09-26 23:50:57 PDT
(In reply to
comment #26
)
> From <
http://trac.webkit.org/wiki/DOMInJavaScript?version=3
>, and the dom-traverse test case, it looks like the main goal here is to optimize DOM traversal. Are there other goals?
FYI, this slide explains other motivations:
https://docs.google.com/a/chromium.org/presentation/d/1X3hExrBtjXIUVqmDArAdEh9JYQudhCPvgSguilK54iE/edit?pli=1#slide=id.p
Adam Barth
Comment 28
2012-09-26 23:57:35 PDT
Thanks for noticing the wiki page I'm writing up. I think I've got it into a more sane shape now. Here's a strawman design for we might actually be able to realize these speedups:
https://trac.webkit.org/wiki/DOMInJavaScript
> it looks like the main goal here is to optimize DOM traversal. Are there other goals?
Speeding up DOM traversal is certainly near the top of my list because it's so widely used. If we do a good job with DOM traversal, we might be able to re-use the mechanism in other places where it shows a speedup. For example, haraken has some data showing that nodeType and className are the hotest DOM properties. There might also be opportunities for optimizing these properties by uploading them into the JavaScript engine. Another benefit of this approach is that it somewhat simplifies garbage collection (e.g., it might help fix
Bug 88834
). The idea here is that by uploading the DOM tree into the JavaScript engine, the JavaScript engine's garbage collector has a better understanding of which objects are keeping which other objects alive. The usefulness of this information depends on how the garbage collector works. For example, a generational garbage collector might need only examine DOM edges in the new space rather than needing to crawl the entire DOM tree (i.e., the portion in the old space).
Adam Barth
Comment 29
2012-09-27 00:01:22 PDT
> FYI, this slide explains other motivations:
https://docs.google.com/a/chromium.org/presentation/d/1X3hExrBtjXIUVqmDArAdEh9JYQudhCPvgSguilK54iE/edit?pli=1#slide=id.p
I don't agree with some of the conclusions in this slide deck. In particular, I'm hopeful that DOM-in-JavaScript will actually be faster, especially for JS-heavy workloads. That's why I created this bug initially: to experiment with performance.
Kentaro Hara
Comment 30
2012-09-27 07:41:25 PDT
>
https://trac.webkit.org/wiki/DOMInJavaScript
Thank you very much for the write up! The approach looks good. One question: Are you intending to create wrapper objects for all DOM nodes? Or are you intending to create wrapper objects only for DOM nodes that are touched by JavaScript (as is done in the current JSC/V8+WebKit)? The latter idea would be feasible in a following way (and might be better/worse in terms of performance). As a fast path, in a case where a wrapper object exists, .nextSibling is realized by normal JavaScript property access. As a slow path, in a case where a wrapper object does not exist, .nextSibling is realized by Node::nextSibling() as is done in the current WebKit.
Adam Barth
Comment 31
2012-09-27 07:51:17 PDT
> One question: Are you intending to create wrapper objects for all DOM nodes? Or are you intending to create wrapper objects only for DOM nodes that are touched by JavaScript (as is done in the current JSC/V8+WebKit)?
The proposal in the wiki page is to create wrappers for all DOM nodes. Before doing that, of course, we'll need to look at the impact on memory and performance.
> The latter idea would be feasible in a following way (and might be better/worse in terms of performance). As a fast path, in a case where a wrapper object exists, .nextSibling is realized by normal JavaScript property access. As a slow path, in a case where a wrapper object does not exist, .nextSibling is realized by Node::nextSibling() as is done in the current WebKit.
The trade-off is in that approach need to reserve space both in Node and in its wrapper for nextSibling. Depending on the wrapper rate, this can either be a win or a loss. We'll probably need to try a few designs before we find the best one. Would you mind adding a section comparing the optional-wrapper and mandatory-wrapper approaches to the wiki page?
Adam Barth
Comment 32
2012-09-27 07:55:37 PDT
I should also say that if you have storage for nextSibling in Node, another approach is to keep both copies up to date. In that case, JS would read the data from the JS object and CPP would read the data from the CPP object. The write operation (e.g., setNextSibling) would write to both locations. For purposes of discussion, we might want to call this the "write through" approach (and we should probably add it to the wiki as well).
Kentaro Hara
Comment 33
2012-09-27 07:57:45 PDT
(In reply to
comment #31
)
> We'll probably need to try a few designs before we find the best one. Would you mind adding a section comparing the optional-wrapper and mandatory-wrapper approaches to the wiki page?
Sure! Another disadvantage of the optional-wrapper approach is that it won't contribute to simplifying the GC. Either way I'll write them up tomorrow.
Maciej Stachowiak
Comment 34
2012-09-27 16:35:51 PDT
Adding Gavin Barraclough to Cc, I had a good conversation with him recently about plans to speed up DOM property access for JavaScriptCore. He said that JSC folks have discussed letting the JIT have sufficient understanding of some DOM accessors to make them efficient without having to modify the layout of DOM nodes or implement anything in JS. From what I can gather of the approach outlined in this bug, it seems that it would likely have a memory cost, and a speed cost for DOM traversal and mutation from C++. I am concerned about these potential costs, because it seems they would affect speed of core operations like HTML parsing and style resolution. Since JSC+WebKit, as far as I know, already has the fastest DOM traversal of any browser engine even with no changes, it would be a tough call whether a further ~1.2x speedup to DOM traversal would outweigh such costs. Maybe there is a plan to mitigate these downsides. If so, I am interested in learning more about that. Anyway - though the approach Gavin described to me sounds like it may more technically challenging (you'd have to understand and modify JIT code), it seems much less likely to hurt memory use or access speed from C++ code. So I found it to be a very intriguing idea. Since the JavaScriptCore team has a track record of success in making really wicked fast DOM bindings, I think it is worth considering their input in this area. It might also be useful to have some conversations about this by email or in person, and not just in bug comments.
Maciej Stachowiak
Comment 35
2012-09-27 16:39:10 PDT
(In reply to
comment #29
)
> > FYI, this slide explains other motivations:
https://docs.google.com/a/chromium.org/presentation/d/1X3hExrBtjXIUVqmDArAdEh9JYQudhCPvgSguilK54iE/edit?pli=1#slide=id.p
> > I don't agree with some of the conclusions in this slide deck. In particular, I'm hopeful that DOM-in-JavaScript will actually be faster, especially for JS-heavy workloads. That's why I created this bug initially: to experiment with performance.
One experiment that would help us learn about likely perf characteristics of a DOM implemented completely in JavaScript would be to try reimplementing some more complex DOM methods in JS. Seems like a trivial accessor like firstChild is the best possible case.
Adam Barth
Comment 36
2012-09-27 17:09:06 PDT
Interesting. I hadn't considered the approach of teaching the JIT more about the structure of DOM objects. As haraken writes in
Comment #23
, you're still going to run up against the limitation of needing to follow a pointer from the JSObject to the Node and then back to the other JSObject. As long as there are two objects and only one representation of the tree, one of the two kinds of objects is going to need to transit the graph by way of the other. I guess the logical conclusion of that thought process is that we should try to merge the wrapper object and the wrapped object so that neither needs to consult the other in order to transit the graph.
> One experiment that would help us learn about likely perf characteristics of a DOM implemented completely in JavaScript would be to try reimplementing some more complex DOM methods in JS. Seems like a trivial accessor like firstChild is the best possible case.
I wasn't suggesting implementing the entire DOM in JavaScript. The proposal in <
http://trac.webkit.org/wiki/DOMInJavaScript
> just suggests moving the tree representation. In any case, all of this is very much open for discussion and I welcome everyone's input, especially JSC hackers. :)
Kentaro Hara
Comment 37
2012-09-28 07:18:57 PDT
(In reply to
comment #31
)
> We'll probably need to try a few designs before we find the best one. Would you mind adding a section comparing the optional-wrapper and mandatory-wrapper approaches to the wiki page?
Done. Please edit it freely.
Sam Weinig
Comment 38
2012-09-28 12:15:48 PDT
Can you state the goal of these experiments? What benchmarks/scenarios are you trying to improve on? It seems a bit odd to me to start with a solution.
Adam Barth
Comment 39
2012-09-28 12:41:05 PDT
> Can you state the goal of these experiments? What benchmarks/scenarios are you trying to improve on?
See
Comment #28
.
> It seems a bit odd to me to start with a solution.
This bug is about experimenting with an alternative approach for implement some DOM bindings. The goal is to learn about the performance characteristics of the approach by comparing its scores on various benchmarks. The approach might turn out to be a better or worse performance trade-off than the current approach. That's what I'm hoping to learn from these experiments. I could have chosen to perform these experiments in a secret branch of WebKit, but I decided to perform them here on this bug so that other folks in the community can participate in the discussion if they so choose. As I wrote in
Comment #0
, if you're not interested in the performance characteristics of this approach, you can safely ignore this bug thread.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug