Bug 97270

Summary: DOM-in-JavaScript performance experiments
Product: WebKit Reporter: Adam Barth <abarth>
Component: WebCore JavaScriptAssignee: Adam Barth <abarth>
Status: NEW ---    
Severity: Normal CC: alex, ap, arv, barraclough, dmikurube, dominicc, eric, ggaren, haraken, mjs, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch to measure the cost of pimpl for Node (-3% on Dromaeo/dom-traverse; -5% on Dromaeo/dom-modify)
none
Patch to Dromaeo/dom-traverse to compare CPP DOM and JS DOM (29.32% better with JS DOM)
none
Version in which JS outperforms CPP in JavaScriptCore as well none

Description Adam Barth 2012-09-20 16:09:28 PDT
This bug is just a holder for DOM-in-JavaScript performance experiment patches.  If you don't care about DOM-in-JavaScript performance, you can safely ignore it.
Comment 1 Adam Barth 2012-09-20 16:10:49 PDT
Created attachment 165006 [details]
Patch to measure the cost of pimpl for Node (-3% on Dromaeo/dom-traverse; -5% on Dromaeo/dom-modify)
Comment 2 Adam Barth 2012-09-20 16:27:34 PDT
Created attachment 165007 [details]
Patch to Dromaeo/dom-traverse to compare CPP DOM and JS DOM (29.32% better with JS DOM)
Comment 3 Kentaro Hara 2012-09-20 16:30:18 PDT
ccing arv@, who has been also prototyping DOM-in-JavaScript.
Comment 4 Adam Barth 2012-09-20 17:04:03 PDT
Looks like arv has been running some experiments too:

https://github.com/arv/JS-DOM-Test
Comment 5 Erik Arvidsson 2012-09-21 06:52:53 PDT
Let me see if I understand this correctly. The pure cpp impact is -5% but we get almost 30% faster from JS?
Comment 6 Adam Barth 2012-09-21 08:28:22 PDT
We don't have enough data to draw those conclusions, but so far that's about the scale of the performance changes.  These patches are just rough approximations.  Haraken has also suggested some more experiments, which I'd like to run.
Comment 7 Sam Weinig 2012-09-22 23:49:59 PDT
Can you explain a bit more about what this experiment is looking at?
Comment 8 Adam Barth 2012-09-23 12:10:05 PDT
> Can you explain a bit more about what this experiment is looking at?

Currently all of the DOM structure is stored outside the JavaScript VM.  We're experimenting with the idea of uploading some amount of this state into the JavaScript VM.  At least from these simple experiments, there does appear to a significant amount of performance on the table.

I certainly don't have a complete design in mind.  At the moment, I'm leaning towards some sort of state representation that can be shared by the JavaScript VM and the CPP implementation.  In particular, to get the full advantage of the JavaScript VM's memory management, we'd likely want this state to be understandable (and compactable) by the garbage collector.

We're just exploring the design space.  The performance characteristics of JavaScript VMs have changed radically since the design of the current bindings.  It seems likely that there are different performance considerations now.  If you're interested in this topic, I'm happy to involve you in more of the discussions.
Comment 9 Sam Weinig 2012-09-23 16:14:10 PDT
(In reply to comment #8)
> > Can you explain a bit more about what this experiment is looking at?
> 
> Currently all of the DOM structure is stored outside the JavaScript VM.  We're experimenting with the idea of uploading some amount of this state into the JavaScript VM.  At least from these simple experiments, there does appear to a significant amount of performance on the table.
> 
> I certainly don't have a complete design in mind.  At the moment, I'm leaning towards some sort of state representation that can be shared by the JavaScript VM and the CPP implementation.  In particular, to get the full advantage of the JavaScript VM's memory management, we'd likely want this state to be understandable (and compactable) by the garbage collector.
> 
> We're just exploring the design space.  The performance characteristics of JavaScript VMs have changed radically since the design of the current bindings.  It seems likely that there are different performance considerations now.  If you're interested in this topic, I'm happy to involve you in more of the discussions.

Please do.  And also please involve the folks working on JavaScriptCore.
Comment 10 Adam Barth 2012-09-23 19:19:24 PDT
Anyone in particular?
Comment 11 Sam Weinig 2012-09-23 20:06:04 PDT
(In reply to comment #10)
> Anyone in particular?

At the very least Geoff Garen.
Comment 12 Maciej Stachowiak 2012-09-26 14:13:16 PDT
Adam, I am intrigued by the possibility of 30% speedups. How do I interpret the results of the patched Dromaeo/dom-traverse? I get output like this, how do I interpret it? Which are the C++ DOM numbers and which are the JS DOM numbers?

--------
233.2742048370791
228.85231409663135
1419
1451.8
250.69890229530947

Time:
avg 3583.6254212290196 runs/s
median 0 runs/s
stdev 106.78560737165455 runs/s
min 3374.3917685085466 runs/s
max 3653.791186857055 runs/s
Comment 13 Adam Barth 2012-09-26 14:21:37 PDT
> Adam, I am intrigued by the possibility of 30% speedups.

Me too!  I should warn you that these are just experiments.  There isn't a real design here.  Just some evidence that we might want to look into moving move of the DOM into JavaScript.

> How do I interpret the results of the patched Dromaeo/dom-traverse? I get output like this, how do I interpret it? Which are the C++ DOM numbers and which are the JS DOM numbers?

If you look in the dom-traverse patch, you'll see the following two lines:

+ document.body.cppFirstChild = document.body.firstChild;
+ document.body.jsFirstChild = mirror(document.body.firstChild);

What this means is that document.body.cppFirstChild points to the normal implementation of the DOM: namely backed by CPP code in WebCore.  On the other hand, document.body.jsFirstChild is a JavaScript-only mirror of the DOM structure, created just by wiring up firstChild/lastChild/nextSibling/previousSibling properties up on empty JavaScript objects.  (I haven't tested the mirror function, so it might have bugs.)

If you look in the nextSibling test, you'll see the following line of code:

+ var cur = document.body.jsFirstChild; // Change to document.body.cppFirstChild

That means the test is running with the JavaScript version of the DOM tree.  If you change jsFirstChild to cppFirstChild, you'll be running with the CPP version of the DOM tree.  You can run test once with jsFirstChild and once with cppFirstChild to compare the performance differences between the two implementations of the tree.

The way I've been doing that with with the run-perf-tests script, which creates nice graphs for comparing the output:

$ ./Tools/Scripts/run-perf-tests PerformanceTests/Dromaeo/dom-traverse.html

You can also just compare the avg runs/s numerically.
Comment 14 Adam Barth 2012-09-26 14:23:10 PDT
(I should note that I changed both the nextSibling and the previousSibling tests at the same time.  You can also delete one or the other since they're very similar tests.)
Comment 15 Maciej Stachowiak 2012-09-26 14:38:26 PDT
Thanks for the tips! Here are some results I got:

JS - Safari TOT
    avg 253710.8 runs/s

CPP - Safari TOT
    avg 499435.80000000005 runs/s

That looks like the JS-based version is about 50% slower, not 30% faster. Am I misunderstanding the test or what the claim was?

I wanted to try the test in Chrome Canary but it doesn't seem to run there.
Comment 16 Adam Barth 2012-09-26 15:03:19 PDT
> Here are some results I got:

Fascinating.

> That looks like the JS-based version is about 50% slower, not 30% faster. Am I misunderstanding the test or what the claim was?

Here are the numbers I got (five minutes ago) with DumpRenderTree in the chromium port:

$ ./out/Release/DumpRenderTree file:///src/abarth-webkit/PerformanceTests/Dromaeo/dom-traverse.html

== js ==

avg 326188 runs/s
median 0 runs/s
stdev 1218.1536848854498 runs/s
min 325071 runs/s
max 328243 runs/s

== cpp ==

avg 266485.3966033966 runs/s
median 0 runs/s
stdev 671.5778183390692 runs/s
min 265506 runs/s
max 267288.48851148854 runs/s

That shows the JS implementation running faster than the CPP implementation.  I need re-run these tests on a Mac to compare the absolute numbers between JSC and V8.

We might be getting different outcomes here because there are two moving pieces:

1) The performance of the JSC and the V8 bindings code.
2) The performance of the underlying JavaScript engine.
Comment 17 Maciej Stachowiak 2012-09-26 16:02:26 PDT
(In reply to comment #16)
> > Here are some results I got:
> 
> Fascinating.
> 
> > That looks like the JS-based version is about 50% slower, not 30% faster. Am I misunderstanding the test or what the claim was?
> 
> Here are the numbers I got (five minutes ago) with DumpRenderTree in the chromium port:
> 
> $ ./out/Release/DumpRenderTree file:///src/abarth-webkit/PerformanceTests/Dromaeo/dom-traverse.html
> 
> == js ==
> 
> avg 326188 runs/s
> median 0 runs/s
> stdev 1218.1536848854498 runs/s
> min 325071 runs/s
> max 328243 runs/s
> 
> == cpp ==
> 
> avg 266485.3966033966 runs/s
> median 0 runs/s
> stdev 671.5778183390692 runs/s
> min 265506 runs/s
> max 267288.48851148854 runs/s
> 
> That shows the JS implementation running faster than the CPP implementation.  I need re-run these tests on a Mac to compare the absolute numbers between JSC and V8.
> 
> We might be getting different outcomes here because there are two moving pieces:
> 
> 1) The performance of the JSC and the V8 bindings code.
> 2) The performance of the underlying JavaScript engine.

I will try to rearrange the test to be a single file and print both results, so it can easily be loaded in different browsers. Because I'm also curious what the results are in different versions. And that is probably easier for me than building chromium webkit.
Comment 18 Adam Barth 2012-09-26 16:12:15 PDT
Ok, I just ran WebKit@129710 in a Mac Pro in both chromium-mac and apple-mac:

== js ==
chromium-mac: avg 190979.59999999998 runs/s
apple-mac: avg 178240.40000000002 runs/s

== cpp ==
chromium-mac: avg 148605.44592533214 runs/s
apple-mac: avg 388969 runs/s

The performance ordering appears to be:

apple-mac-cpp > chromium-mac-js > apple-mac-js > chromium-mac-cpp
Comment 19 Adam Barth 2012-09-26 16:18:53 PDT
With a small tweak to how we build the JS objets, I can make the JS version outperform the CPP version in JSC as well:

chromium-mac-js2: 191981.8 runs/s
apple-mac-js2: 472447.4 runs/s

The difference is that js2 uses createMirrorDOMNode() to add the properties in the same order for every JavaScript "DOM node."  (Attaching updated perf test shortly.)
Comment 20 Adam Barth 2012-09-26 16:22:18 PDT
Created attachment 165894 [details]
Version in which JS outperforms CPP in JavaScriptCore as well
Comment 21 Adam Barth 2012-09-26 16:31:53 PDT
Recoding my command lines for copy-pasta later:

$ DYLD_FRAMEWORK_PATH=/Users/abarth/svn/webkit/WebKitBuild/Release ./WebKitBuild/Release/DumpRenderTree file:///Users/abarth/svn/webkit/PerformanceTests/Dromaeo/dom-traverse.html
$ ./out/Release/DumpRenderTree.app/Contents/MacOS/DumpRenderTree file:///Users/abarth/svn/webkit/PerformanceTests/Dromaeo/dom-traverse.html
Comment 22 Maciej Stachowiak 2012-09-26 18:14:59 PDT
(In reply to comment #18)
> Ok, I just ran WebKit@129710 in a Mac Pro in both chromium-mac and apple-mac:
> 
> == js ==
> chromium-mac: avg 190979.59999999998 runs/s
> apple-mac: avg 178240.40000000002 runs/s
> 
> == cpp ==
> chromium-mac: avg 148605.44592533214 runs/s
> apple-mac: avg 388969 runs/s
> 
> The performance ordering appears to be:
> 
> apple-mac-cpp > chromium-mac-js > apple-mac-js > chromium-mac-cpp

Seems the main issue identified here is that V8 DOM bindings are very slow compared to JSC DOM bindings.

(I will look into the version that seems to show a benefit in JSC as well.)
Comment 23 Kentaro Hara 2012-09-26 18:24:35 PDT
> > apple-mac-cpp > chromium-mac-js > apple-mac-js > chromium-mac-cpp
>
> Seems the main issue identified here is that V8 DOM bindings are very slow compared to JSC DOM bindings.

V8 DOM bindings are still very slower than JSC DOM bindings, that's true to some extent. (I've been optimizing the performance for months.)

That being said, conceptually, I think that apple-mac-js should be faster than apple-mac-cpp. In apple-mac-cpp, .nextSibling needs to call back JSC binding, do some value conversion from JS to C++, do something in WebCore, do some value conversion from C++ to JS, and return back to JSC. On the other hand, in apple-mac-js, .nextSibling doesn't need to call back JSC binding. Everything completes without leaving the JIT code. In that sense, in the world where we optimize things appropriately, apple-mac-js will become faster than apple-mac-cpp.
Comment 24 Adam Barth 2012-09-26 19:49:35 PDT
> Seems the main issue identified here is that V8 DOM bindings are very slow compared to JSC DOM bindings.

That may well be true (and is something we can work on in another bug), but Comment #19 seems to indiciate that the first JS implementation was not being fully optimized by JSC.  In the second JS implementation, the JS version appears to be significantly faster than the CPP version, even in JSC.
Comment 25 Maciej Stachowiak 2012-09-26 20:26:52 PDT
(In reply to comment #24)
> > Seems the main issue identified here is that V8 DOM bindings are very slow compared to JSC DOM bindings.
> 
> That may well be true (and is something we can work on in another bug), but Comment #19 seems to indiciate that the first JS implementation was not being fully optimized by JSC.  In the second JS implementation, the JS version appears to be significantly faster than the CPP version, even in JSC.

As mentioned, I'm definitely interested in that and will look into it.
Comment 26 Geoffrey Garen 2012-09-26 23:38:20 PDT
From <http://trac.webkit.org/wiki/DOMInJavaScript?version=3>, and the dom-traverse test case, it looks like the main goal here is to optimize DOM traversal. Are there other goals?
Comment 27 Kentaro Hara 2012-09-26 23:50:57 PDT
(In reply to comment #26)
> From <http://trac.webkit.org/wiki/DOMInJavaScript?version=3>, and the dom-traverse test case, it looks like the main goal here is to optimize DOM traversal. Are there other goals?

FYI, this slide explains other motivations:  https://docs.google.com/a/chromium.org/presentation/d/1X3hExrBtjXIUVqmDArAdEh9JYQudhCPvgSguilK54iE/edit?pli=1#slide=id.p
Comment 28 Adam Barth 2012-09-26 23:57:35 PDT
Thanks for noticing the wiki page I'm writing up.  I think I've got it into a more sane shape now.  Here's a strawman design for we might actually be able to realize these speedups:

https://trac.webkit.org/wiki/DOMInJavaScript

> it looks like the main goal here is to optimize DOM traversal. Are there other goals?

Speeding up DOM traversal is certainly near the top of my list because it's so widely used.  If we do a good job with DOM traversal, we might be able to re-use the mechanism in other places where it shows a speedup.  For example, haraken has some data showing that nodeType and className are the hotest DOM properties.  There might also be opportunities for optimizing these properties by uploading them into the JavaScript engine.

Another benefit of this approach is that it somewhat simplifies garbage collection (e.g., it might help fix Bug 88834).  The idea here is that by uploading the DOM tree into the JavaScript engine, the JavaScript engine's garbage collector has a better understanding of which objects are keeping which other objects alive.  The usefulness of this information depends on how the garbage collector works.  For example, a generational garbage collector might need only examine DOM edges in the new space rather than needing to crawl the entire DOM tree (i.e., the portion in the old space).
Comment 29 Adam Barth 2012-09-27 00:01:22 PDT
> FYI, this slide explains other motivations:  https://docs.google.com/a/chromium.org/presentation/d/1X3hExrBtjXIUVqmDArAdEh9JYQudhCPvgSguilK54iE/edit?pli=1#slide=id.p

I don't agree with some of the conclusions in this slide deck.  In particular, I'm hopeful that DOM-in-JavaScript will actually be faster, especially for JS-heavy workloads.  That's why I created this bug initially: to experiment with performance.
Comment 30 Kentaro Hara 2012-09-27 07:41:25 PDT
> https://trac.webkit.org/wiki/DOMInJavaScript

Thank you very much for the write up! The approach looks good.

One question: Are you intending to create wrapper objects for all DOM nodes? Or are you intending to create wrapper objects only for DOM nodes that are touched by JavaScript (as is done in the current JSC/V8+WebKit)?

The latter idea would be feasible in a following way (and might be better/worse in terms of performance). As a fast path, in a case where a wrapper object exists, .nextSibling is realized by normal JavaScript property access. As a slow path, in a case where a wrapper object does not exist, .nextSibling is realized by Node::nextSibling() as is done in the current WebKit.
Comment 31 Adam Barth 2012-09-27 07:51:17 PDT
> One question: Are you intending to create wrapper objects for all DOM nodes? Or are you intending to create wrapper objects only for DOM nodes that are touched by JavaScript (as is done in the current JSC/V8+WebKit)?

The proposal in the wiki page is to create wrappers for all DOM nodes.  Before doing that, of course, we'll need to look at the impact on memory and performance.

> The latter idea would be feasible in a following way (and might be better/worse in terms of performance). As a fast path, in a case where a wrapper object exists, .nextSibling is realized by normal JavaScript property access. As a slow path, in a case where a wrapper object does not exist, .nextSibling is realized by Node::nextSibling() as is done in the current WebKit.

The trade-off is in that approach need to reserve space both in Node and in its wrapper for nextSibling.  Depending on the wrapper rate, this can either be a win or a loss.

We'll probably need to try a few designs before we find the best one.  Would you mind adding a section comparing the optional-wrapper and mandatory-wrapper approaches to the wiki page?
Comment 32 Adam Barth 2012-09-27 07:55:37 PDT
I should also say that if you have storage for nextSibling in Node, another approach is to keep both copies up to date.  In that case, JS would read the data from the JS object and CPP would read the data from the CPP object.  The write operation (e.g., setNextSibling) would write to both locations.  For purposes of discussion, we might want to call this the "write through" approach (and we should probably add it to the wiki as well).
Comment 33 Kentaro Hara 2012-09-27 07:57:45 PDT
(In reply to comment #31)
> We'll probably need to try a few designs before we find the best one.  Would you mind adding a section comparing the optional-wrapper and mandatory-wrapper approaches to the wiki page?

Sure! Another disadvantage of the optional-wrapper approach is that it won't contribute to simplifying the GC. Either way I'll write them up tomorrow.
Comment 34 Maciej Stachowiak 2012-09-27 16:35:51 PDT
Adding Gavin Barraclough to Cc, I had a good conversation with him recently about plans to speed up DOM property access for JavaScriptCore.

He said that JSC folks have discussed letting the JIT have sufficient understanding of some DOM accessors to make them efficient without having to modify the layout of DOM nodes or implement anything in JS.


From what I can gather of the approach outlined in this bug, it seems that it would likely have a memory cost, and a speed cost for DOM traversal and mutation from C++. I am concerned about these potential costs, because it seems they would affect speed of core operations like HTML parsing and style resolution. Since JSC+WebKit, as far as I know, already has the fastest DOM traversal of any browser engine even with no changes, it would be a tough call whether a further ~1.2x speedup to DOM traversal would outweigh such costs. Maybe there is a plan to mitigate these downsides. If so, I am interested in learning more about that.


Anyway - though the approach Gavin described to me sounds like it may more technically challenging (you'd have to understand and modify JIT code), it seems much less likely to hurt memory use or access speed from C++ code. So I found it to be a very intriguing idea.

Since the JavaScriptCore team has a track record of success in making really wicked fast DOM bindings, I think it is worth considering their input in this area.

It might also be useful to have some conversations about this by email or in person, and not just in bug comments.
Comment 35 Maciej Stachowiak 2012-09-27 16:39:10 PDT
(In reply to comment #29)
> > FYI, this slide explains other motivations:  https://docs.google.com/a/chromium.org/presentation/d/1X3hExrBtjXIUVqmDArAdEh9JYQudhCPvgSguilK54iE/edit?pli=1#slide=id.p
> 
> I don't agree with some of the conclusions in this slide deck.  In particular, I'm hopeful that DOM-in-JavaScript will actually be faster, especially for JS-heavy workloads.  That's why I created this bug initially: to experiment with performance.

One experiment that would help us learn about likely perf characteristics of a DOM implemented completely in JavaScript would be to try reimplementing some more complex DOM methods in JS. Seems like a trivial accessor like firstChild is the best possible case.
Comment 36 Adam Barth 2012-09-27 17:09:06 PDT
Interesting. I hadn't considered the approach of teaching the JIT more about the structure of DOM objects.

As haraken writes in Comment #23, you're still going to run up against the limitation of needing to follow a pointer from the JSObject to the Node and then back to the other JSObject. As long as there are two objects and only one representation of the tree, one of the two kinds of objects is going to need to transit the graph by way of the other.

I guess the logical conclusion of that thought process is that we should try to merge the wrapper object and the wrapped object so that neither needs to consult the other in order to transit the graph.

> One experiment that would help us learn about likely perf characteristics of a DOM implemented completely in JavaScript would be to try reimplementing some more complex DOM methods in JS. Seems like a trivial accessor like firstChild is the best possible case.

I wasn't suggesting implementing the entire DOM in JavaScript. The proposal in <http://trac.webkit.org/wiki/DOMInJavaScript> just suggests moving the tree representation.

In any case, all of this is very much open for discussion and I welcome everyone's input, especially JSC hackers.  :)
Comment 37 Kentaro Hara 2012-09-28 07:18:57 PDT
(In reply to comment #31)
> We'll probably need to try a few designs before we find the best one.  Would you mind adding a section comparing the optional-wrapper and mandatory-wrapper approaches to the wiki page?

Done. Please edit it freely.
Comment 38 Sam Weinig 2012-09-28 12:15:48 PDT
Can you state the goal of these experiments?  What benchmarks/scenarios are you trying to improve on? It seems a bit odd to me to start with a solution.
Comment 39 Adam Barth 2012-09-28 12:41:05 PDT
> Can you state the goal of these experiments?  What benchmarks/scenarios are you trying to improve on?

See Comment #28.

> It seems a bit odd to me to start with a solution.

This bug is about experimenting with an alternative approach for implement some DOM bindings. The goal is to learn about the performance characteristics of the approach by comparing its scores on various benchmarks. The approach might turn out to be a better or worse performance trade-off than the current approach. That's what I'm hoping to learn from these experiments.

I could have chosen to perform these experiments in a secret branch of WebKit, but I decided to perform them here on this bug so that other folks in the community can participate in the discussion if they so choose.  As I wrote in Comment #0, if you're not interested in the performance characteristics of this approach, you can safely ignore this bug thread.