RESOLVED FIXED Bug 32430
Add a pointer field to C++ Node class (for v8)
https://bugs.webkit.org/show_bug.cgi?id=32430
Summary Add a pointer field to C++ Node class (for v8)
anton muhin
Reported 2009-12-11 06:48:17 PST
We're planning to use this pointer to make mapping between C++ Nodes and V8 JavaScript wrappers faster. The speed up we got is quite notable, so we're willing to pay some bigger memory consumption. Another option is to add an int, but we'd prefer to have a pointer.
Attachments
first take (1.23 KB, patch)
2009-12-11 06:48 PST, anton muhin
no flags
Forgotten ChangeLog update (1.98 KB, patch)
2009-12-11 06:53 PST, anton muhin
no flags
Introducing ScriptWrappable (11.27 KB, patch)
2009-12-15 12:25 PST, anton muhin
no flags
Minor ChangeLog adjustments (11.27 KB, patch)
2009-12-15 12:27 PST, anton muhin
no flags
Minor ChangeLog adjustments (this time for real) (11.26 KB, patch)
2009-12-15 12:29 PST, anton muhin
no flags
Rebaselining (11.19 KB, patch)
2010-01-19 12:19 PST, anton muhin
no flags
Style issue fixed (11.18 KB, patch)
2010-01-19 12:26 PST, anton muhin
no flags
Patch (7.05 KB, patch)
2010-01-20 03:23 PST, anton muhin
no flags
Sorry, learning to use webkit-patch and git (11.32 KB, patch)
2010-01-20 03:34 PST, anton muhin
no flags
Setting proper role to allow exporing of the header (11.36 KB, patch)
2010-01-20 06:32 PST, anton muhin
no flags
Rebaselining WebCore/WebCore.gypi (11.36 KB, patch)
2010-01-21 04:01 PST, anton muhin
no flags
Fixed a typo in NULL pointer check (11.36 KB, patch)
2010-01-21 06:52 PST, anton muhin
no flags
Same patch, rerunning through EWS (11.36 KB, patch)
2010-01-21 08:36 PST, anton muhin
no flags
anton muhin
Comment 1 2009-12-11 06:48:55 PST
Created attachment 44684 [details] first take
anton muhin
Comment 2 2009-12-11 06:53:14 PST
Created attachment 44687 [details] Forgotten ChangeLog update
WebKit Review Bot
Comment 3 2009-12-11 06:54:28 PST
style-queue ran check-webkit-style on attachment 44687 [details] without any errors.
anton muhin
Comment 4 2009-12-11 08:41:26 PST
Comment on attachment 44687 [details] Forgotten ChangeLog update Dmitry Glazkov suggested to rework in a ways more maintainable fashion. My current plan is to introduce ScriptWrappable class and inherit Node from it (in the vain of ScriptValue). JSC implementation would be just an empty class, while V8 implementation would hold a pointer to proper type. If anyone has any additional concerns I should address before asking for review, please, update the bug.
Adam Barth
Comment 5 2009-12-11 08:54:34 PST
Empty base class optimization > #ifdefs
Dimitri Glazkov (Google)
Comment 6 2009-12-11 09:35:36 PST
I think I just suggested a ScriptObjectHandle member of Node. In either case, it's better than void* :)
Sam Weinig
Comment 7 2009-12-11 11:24:59 PST
This bug is quite devoid of any actual details of the speedup you got. What was the win? What did you test on? Why is this the right tradeoff?
anton muhin
Comment 8 2009-12-15 12:25:05 PST
Created attachment 44897 [details] Introducing ScriptWrappable
anton muhin
Comment 9 2009-12-15 12:27:09 PST
Created attachment 44898 [details] Minor ChangeLog adjustments
anton muhin
Comment 10 2009-12-15 12:29:15 PST
Created attachment 44899 [details] Minor ChangeLog adjustments (this time for real)
anton muhin
Comment 11 2009-12-15 12:31:12 PST
Comment on attachment 44899 [details] Minor ChangeLog adjustments (this time for real) A note: I'm drafting how ScriptWrappable for v8 bindings could look like. However it might be better deferred until actual commit into v8 bindings which would use this new information. Still keeping it for time being to illustrate how it's supposed to be used.
anton muhin
Comment 12 2009-12-15 12:32:22 PST
Sorry for delay. New version, using SrciptWrappable and empty class optimization for JSC bindings is uploaded. May you have another look?
anton muhin
Comment 13 2009-12-15 12:33:29 PST
(In reply to comment #7) > This bug is quite devoid of any actual details of the speedup you got. What > was the win? What did you test on? Why is this the right tradeoff? Sorry, Sam, didn't want to overload with details. The speedup is ~10% (tested on Windows and Linux) for overall score for DOM core of Dromaeo.
Sam Weinig
Comment 14 2009-12-15 21:44:21 PST
(In reply to comment #13) > (In reply to comment #7) > > This bug is quite devoid of any actual details of the speedup you got. What > > was the win? What did you test on? Why is this the right tradeoff? > > Sorry, Sam, didn't want to overload with details. > > The speedup is ~10% (tested on Windows and Linux) for overall score for DOM > core of Dromaeo. At what cost? What was the change in memory use?
Sam Weinig
Comment 15 2009-12-15 21:47:47 PST
(In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #7) > > > This bug is quite devoid of any actual details of the speedup you got. What > > > was the win? What did you test on? Why is this the right tradeoff? > > > > Sorry, Sam, didn't want to overload with details. > > > > The speedup is ~10% (tested on Windows and Linux) for overall score for DOM > > core of Dromaeo. > > At what cost? What was the change in memory use? Another question: Is putting this pointer on Node really the right choice? We have discussed specializing Element or even HTMLElement to avoid the cost for every Node (Text nodes for instance). Did you measure which type of Nodes this would be the biggest win for?
anton muhin
Comment 16 2009-12-16 09:56:47 PST
(In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #7) > > > This bug is quite devoid of any actual details of the speedup you got. What > > > was the win? What did you test on? Why is this the right tradeoff? > > > > Sorry, Sam, didn't want to overload with details. > > > > The speedup is ~10% (tested on Windows and Linux) for overall score for DOM > > core of Dromaeo. > > At what cost? What was the change in memory use? First answer: I don't know, but hopefully we have enough infrastructure to see the impact on Chromium (and I hope Safari won't be affected at all). Some more details. I carried on completely unscientific experiment: opened big GMail account in Chromium and got three snapshots. Numbers are (first number is total number of Nodes as reported by liveNodeSet (see Node.cpp) vs. number of wrapped Nodes: 1) 22945/3103 2) 35055/6911 3) 37223/7012. Now embedding a pointer into Node we pay 4 bytes per each Node. However, this pointer allows to use less memory for HashTables (in CL to be send for review). My estimates: 1) Currently we use 8 bytes per entry in the HashMap (4 bytes for Node pointer and 4 bytes for a handle), however HashMap allocated blocks which are from twice to six times bigger than necessary, let's average that as 4 times, so we allocate 32 bytes per each wrapped node. 2) If all CLs in, we would have only 4 bytes overhead per wrapped node (plus some constant overhead for the rest of unused chunk). So roughly we actual memory usage would be less, if we wrap every ~7th node. Numbers for my GMail run shows something like one wrapped per 5-7 nodes. Definitely that's not a prove, but kind of evidence that idea at least has a chance. Taking it from other side 37223 * 4 gives me 145.5K of overhead, once again, looks reasonable for me (but I am biased).
anton muhin
Comment 17 2009-12-16 09:57:39 PST
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #13) > > > (In reply to comment #7) > > > > This bug is quite devoid of any actual details of the speedup you got. What > > > > was the win? What did you test on? Why is this the right tradeoff? > > > > > > Sorry, Sam, didn't want to overload with details. > > > > > > The speedup is ~10% (tested on Windows and Linux) for overall score for DOM > > > core of Dromaeo. > > > > At what cost? What was the change in memory use? > > Another question: Is putting this pointer on Node really the right choice? We > have discussed specializing Element or even HTMLElement to avoid the cost for > every Node (Text nodes for instance). Did you measure which type of Nodes this > would be the biggest win for? Sorry, could you elaborate? Was it discussed in public? If yes, could you give me a pointer to the thread?
anton muhin
Comment 18 2009-12-18 12:20:20 PST
ping
Sam Weinig
Comment 19 2009-12-18 17:41:34 PST
> > At what cost? What was the change in memory use? > > First answer: I don't know, but hopefully we have enough infrastructure to see > the impact on Chromium (and I hope Safari won't be affected at all). > > Some more details. I carried on completely unscientific experiment: opened big > GMail account in Chromium and got three snapshots. Numbers are (first number > is total number of Nodes as reported by liveNodeSet (see Node.cpp) vs. number > of wrapped Nodes: > > 1) 22945/3103 > 2) 35055/6911 > 3) 37223/7012. > > Now embedding a pointer into Node we pay 4 bytes per each Node. However, this > pointer allows to use less memory for HashTables (in CL to be send for review). > My estimates: > > 1) Currently we use 8 bytes per entry in the HashMap (4 bytes for Node pointer > and 4 bytes for a handle), however HashMap allocated blocks which are from > twice to six times bigger than necessary, let's average that as 4 times, so we > allocate 32 bytes per each wrapped node. > > 2) If all CLs in, we would have only 4 bytes overhead per wrapped node (plus > some constant overhead for the rest of unused chunk). > > So roughly we actual memory usage would be less, if we wrap every ~7th node. > Numbers for my GMail run shows something like one wrapped per 5-7 nodes. > > Definitely that's not a prove, but kind of evidence that idea at least has a > chance. > > Taking it from other side 37223 * 4 gives me 145.5K of overhead, once again, > looks reasonable for me (but I am biased). These are interesting numbers, but I think a more scientific approach would better. I would recommend running the membuster test (http://random.pavlov.net/membuster/index.html) and using the membuster memory harness to test the memory usage during Dromeo before and after your change. I don't plan on trying to block your change, but these numbers would be very valuable for trying to decide if this is the right tradeoff to make. > > Another question: Is putting this pointer on Node really the right choice? We > > have discussed specializing Element or even HTMLElement to avoid the cost for > > every Node (Text nodes for instance). Did you measure which type of Nodes this > > would be the biggest win for? > > Sorry, could you elaborate? Was it discussed in public? If yes, could you > give me a pointer to the thread? I don't have a pointer, but this was discussed at some amount of length with Dimitri Glazkov when the v8 stuff first landed, and at that time, we decided that we would not use an intrusive pointer without additional memory testing. The specializing for Element idea was to just put an intrusive pointer only on Elements (instead of all Nodes), based on the theory that Elements are the most likely to have a wrapper and that all Nodes should not have to pay the price.
anton muhin
Comment 20 2009-12-24 07:47:14 PST
Sam, all, first of all, sorry for long delay and merry Christmas. (In reply to comment #19) > These are interesting numbers, but I think a more scientific approach would > better. I would recommend running the membuster test > (http://random.pavlov.net/membuster/index.html) and using the membuster memory > harness to test the memory usage during Dromeo before and after your change. I > don't plan on trying to block your change, but these numbers would be very > valuable for trying to decide if this is the right tradeoff to make. That's a good idea. What I did. I logged (every 100th time) number of nodes (liveNodeSet.size()) and number of wrapped nodes (actually getDOMNodeMap().size()) on 4 events: Node creation/destruction, node wrapper addition and removal (if anyone is interested, I'd post a patch). Then I processed this file offline for various sites. To estimate current overhead I used three measures: original_min = 2 * 8 * wrapped nodes (2 due to two times bigger underlying arrays size), original_max = 6 * 8 * wrapped nodes (accounting for shrink strategy) and just original = 4 * 8 * wrapped nodes (kind of average). Please note that this doesn't account for deleted entries (which live as tombstones until rehashed/shrinked/expanded), so those numbers are underestimates. To account for overhead by this patch + new Node map implementation I used the following formula: 4 * nodes + 4 * wrapped + 2 * 1024 (2K due to the current chunk size which is half of page). Results: === membuster avg nodes: 59339.0, avg wrapped: 9968.6 (16.799%) measure: original_min max overhead: 197892 bytes (193.3K) for 76867 9302 max win: -440 bytes (-0.4K) for 3086 1236 average: -117.0K measure: original_max max overhead: 13936 bytes (13.6K) for 3005 3 max win: -439620 bytes (-429.3K) for 69224 16331 average: 194.5K measure: original max overhead: 49756 bytes (48.6K) for 76712 9255 max win: -178324 bytes (-174.1K) for 69224 16331 average: 38.8K === Dromaeo, DOM core avg nodes: 167563.1, avg wrapped: 62046.0 (37.028%) measure: original_min max overhead: 1267224 bytes (1237.5K) for 679459 121055 max win: -2096996 bytes (-2047.8K) for 290936 271899 average: 70.6K measure: original_max max overhead: 84992 bytes (83.0K) for 28557 711 max win: -10797764 bytes (-10544.7K) for 290936 271899 average: 2009.5K measure: original max overhead: 96368 bytes (94.1K) for 28557 711 max win: -6447380 bytes (-6296.3K) for 290936 271899 average: 1040.0K === Dromaeo, JS libs avg nodes: 210501.8, avg wrapped: 74075.6 (35.190%) measure: original_min max overhead: 276512 bytes (270.0K) for 70872 752 max win: -738036 bytes (-720.7K) for 528847 237956 average: 43.8K measure: original_max max overhead: 252448 bytes (246.5K) for 70872 752 max win: -8352628 bytes (-8156.9K) for 528847 237956 average: 2358.7K measure: original max overhead: 264480 bytes (258.3K) for 70872 752 max win: -4545332 bytes (-4438.8K) for 528847 237956 average: 1201.2K === Apple.com and then click store avg nodes: 1151.9, avg wrapped: 239.3 (20.776%) measure: original_min max overhead: 7812 bytes (7.6K) for 2233 264 max win: 0 bytes (0.0K) for -1 -1 average: -3.7K measure: original_max max overhead: 380 bytes (0.4K) for 408 75 max win: -12748 bytes (-12.4K) for 2351 550 average: 3.8K measure: original max overhead: 3588 bytes (3.5K) for 2233 264 max win: -3948 bytes (-3.9K) for 2351 550 average: 0.0K === Google.ru + search for Chromium avg nodes: 399.9, avg wrapped: 146.2 (36.549%) measure: original_min max overhead: 4152 bytes (4.1K) for 802 92 max win: 0 bytes (0.0K) for -1 -1 average: -1.8K measure: original_max max overhead: 1648 bytes (1.6K) for 175 25 max win: -9664 bytes (-9.4K) for 449 307 average: 2.7K measure: original max overhead: 2680 bytes (2.6K) for 802 92 max win: -4752 bytes (-4.6K) for 449 307 average: 0.4K === http://www.w3.org/ + Reload avg nodes: 780.0, avg wrapped: 101.7 (13.040%) measure: original_min max overhead: 6392 bytes (6.2K) for 1086 0 max win: 0 bytes (0.0K) for -1 -1 average: -3.9K measure: original_max max overhead: 6392 bytes (6.2K) for 1086 0 max win: -15640 bytes (-15.3K) for 1199 511 average: -0.7K measure: original max overhead: 6392 bytes (6.2K) for 1086 0 max win: -7464 bytes (-7.3K) for 1199 511 average: -2.3K I'd say those numbers show that actually Chromium could even reduce its memory usage. > > > Another question: Is putting this pointer on Node really the right choice? We > > > have discussed specializing Element or even HTMLElement to avoid the cost for > > > every Node (Text nodes for instance). Did you measure which type of Nodes this > > > would be the biggest win for? > > > > Sorry, could you elaborate? Was it discussed in public? If yes, could you > > give me a pointer to the thread? > > I don't have a pointer, but this was discussed at some amount of length with > Dimitri Glazkov when the v8 stuff first landed, and at that time, we decided > that we would not use an intrusive pointer without additional memory testing. > The specializing for Element idea was to just put an intrusive pointer only on > Elements (instead of all Nodes), based on the theory that Elements are the most > likely to have a wrapper and that all Nodes should not have to pay the price. Thanks a lot for clarifications. That sounds like an interesting idea, however I'd postpone it as it adds some notable complications into the code (need to treat nodes and elements separately). YMMW, of course. And just in case, I'd be off from Jan, 1 to Jan, 11.
anton muhin
Comment 21 2010-01-11 08:36:30 PST
ping
Sam Weinig
Comment 22 2010-01-15 15:47:14 PST
I am still very interested in seeing the changes to Membuster.
anton muhin
Comment 23 2010-01-18 11:28:26 PST
(In reply to comment #22) > I am still very interested in seeing the changes to Membuster. Sam, I gave numbers for membuster in comment#20 (https://bugs.webkit.org/show_bug.cgi?id=32430#c20) Just if you're curious I ran membuster in baseline and patched version of chromium. The results for two runs are: baseline (working set after the finish, max working set): 1st run: 166.316K 192.884K 2nd run: 161.576K 184.749K patched: 1st run: 163.432K 185.828K 2nd run: 162.076K 186.812K best to worst ratio for max WS: 1.01. There probably should exist some better mechanism to estimate memory usage with membuster, e.g., after finish WS doesn't appear quite telling to me.
Adam Barth
Comment 24 2010-01-18 14:28:20 PST
These numbers make this patch look like a win from a number of different angles. Would you be willing to up a version up for review? We can iterate on Node/Element and other fine details in a future bug.
Maciej Stachowiak
Comment 25 2010-01-18 19:57:20 PST
(In reply to comment #24) > These numbers make this patch look like a win from a number of different > angles. Would you be willing to up a version up for review? We can iterate on > Node/Element and other fine details in a future bug. So it's like a 1% (or less) overall memory use regression for a 10% DOM core speedup. I guess we'll want to implement this for JSC bindings too but it might be at the level where we want to give ports for memory-constrained environments a way to opt out.
anton muhin
Comment 26 2010-01-19 12:19:12 PST
Created attachment 46933 [details] Rebaselining
WebKit Review Bot
Comment 27 2010-01-19 12:23:58 PST
Attachment 46933 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/bindings/v8/ScriptWrappable.h:49: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1 If any of these errors are false positives, please file a bug against check-webkit-style.
anton muhin
Comment 28 2010-01-19 12:26:04 PST
Created attachment 46936 [details] Style issue fixed
Eric Seidel (no email)
Comment 29 2010-01-19 12:28:56 PST
Eric Seidel (no email)
Comment 30 2010-01-19 12:40:59 PST
Eric Seidel (no email)
Comment 31 2010-01-19 15:44:14 PST
Another way to implement this would be to have two implementations of this class. One which knows how to pull from the current wrapper caches, and one which knows how to do the lookup directly. Basically instead of JSC having an empty implementation for this, it could use an implementation which called getDOMObjectWrapper. I don't understand how this works with isolated worlds. But then again, I don't know how isolated worlds work for V8. When grabbing a wrapper in JSc you have to pass the global object.
anton muhin
Comment 32 2010-01-20 03:23:19 PST
Adam Barth
Comment 33 2010-01-20 03:30:56 PST
Comment on attachment 47002 [details] Patch Missing ScriptWrappable.h
anton muhin
Comment 34 2010-01-20 03:34:56 PST
Created attachment 47003 [details] Sorry, learning to use webkit-patch and git
anton muhin
Comment 35 2010-01-20 03:35:58 PST
(In reply to comment #33) > (From update of attachment 47002 [details]) > Missing ScriptWrappable.h Thanks a lot for spotting this, new patch uploaded. BTW, I hope previous mac build failure was spurious. And is there a way to see the build logs?
anton muhin
Comment 36 2010-01-20 06:32:46 PST
Created attachment 47018 [details] Setting proper role to allow exporing of the header
anton muhin
Comment 37 2010-01-20 08:22:09 PST
(In reply to comment #36) > Created an attachment (id=47018) [details] > Setting proper role to allow exporing of the header mac build is now fixed, but for some reason qt ews fails to start. I could repost the patch if necessary. Is there some other way to force ews?
Adam Barth
Comment 38 2010-01-20 18:31:13 PST
> mac build is now fixed, but for some reason qt ews fails to start. I could > repost the patch if necessary. Is there some other way to force ews? Looks like the EWS is happy. I might need to give the bots more horsepower to reduce latency.
Adam Barth
Comment 39 2010-01-20 18:33:59 PST
Comment on attachment 47018 [details] Setting proper role to allow exporing of the header Given the discussion above, this looks good. Presumably we need to actually use this functionality for something to realize the speedups. :) Also, you might want to file a bug to track the JSC implementation of this feature in case that shows similar performance gains.
anton muhin
Comment 40 2010-01-21 03:31:38 PST
(In reply to comment #39) Thanks a lot to everyone for review, cq+'ing. The v8-bindings patch is on the way (I'd post a reference here if you're interested). > (From update of attachment 47018 [details]) > Given the discussion above, this looks good. Presumably we need to actually > use this functionality for something to realize the speedups. :) > > Also, you might want to file a bug to track the JSC implementation of this > feature in case that shows similar performance gains. Ok, I'll file the bug for JSC bindings as well.
WebKit Commit Bot
Comment 41 2010-01-21 03:43:12 PST
Comment on attachment 47018 [details] Setting proper role to allow exporing of the header Rejecting patch 47018 from commit-queue. Failed to run "['/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', '--reviewer', 'Adam Barth', '--force']" exit_code: 1 Last 500 characters of output: ded at 1028 (offset 1 line). patching file WebCore/WebCore.vcproj/WebCore.vcproj Hunk #1 succeeded at 40516 (offset 64 lines). patching file WebCore/WebCore.xcodeproj/project.pbxproj Hunk #1 succeeded at 557 (offset -2 lines). Hunk #2 succeeded at 5848 (offset 9 lines). Hunk #3 succeeded at 14597 (offset 27 lines). Hunk #4 succeeded at 18421 (offset 26 lines). patching file WebCore/bindings/js/ScriptWrappable.h patching file WebCore/bindings/v8/ScriptWrappable.h patching file WebCore/dom/Node.h Full output: http://webkit-commit-queue.appspot.com/results/203058
anton muhin
Comment 42 2010-01-21 04:01:14 PST
Created attachment 47108 [details] Rebaselining WebCore/WebCore.gypi
anton muhin
Comment 43 2010-01-21 06:52:50 PST
Created attachment 47116 [details] Fixed a typo in NULL pointer check
WebKit Review Bot
Comment 44 2010-01-21 06:54:49 PST
anton muhin
Comment 45 2010-01-21 08:36:35 PST
Created attachment 47124 [details] Same patch, rerunning through EWS
anton muhin
Comment 46 2010-01-21 08:38:06 PST
(In reply to comment #44) > Attachment 47116 [details] did not build on qt: > Build output: http://webkit-commit-queue.appspot.com/results/203155 Looks suspicious---I only removed ! in ASSERT. Reposting the same patch to rerun EWSes.
anton muhin
Comment 47 2010-01-21 09:14:24 PST
https://bugs.webkit.org/show_bug.cgi?id=33957 filed with the patch which actually implements intrusive map for Nodes using ScriptWrappables.
David Levin
Comment 48 2010-01-21 09:17:49 PST
Changed title since this has a potentially significant effect on non-v8 code.
Eric Seidel (no email)
Comment 49 2010-01-21 15:39:40 PST
Comment on attachment 47124 [details] Same patch, rerunning through EWS Looks OK. I think we'll want this baseclass to hide how we access the wrapper eventually.
WebKit Commit Bot
Comment 50 2010-01-22 02:26:10 PST
Comment on attachment 47124 [details] Same patch, rerunning through EWS Clearing flags on attachment: 47124 Committed r53690: <http://trac.webkit.org/changeset/53690>
WebKit Commit Bot
Comment 51 2010-01-22 02:26:21 PST
All reviewed patches have been landed. Closing bug.
anton muhin
Comment 52 2010-01-22 02:38:55 PST
(In reply to comment #31) > Another way to implement this would be to have two implementations of this > class. One which knows how to pull from the current wrapper caches, and one > which knows how to do the lookup directly. > > Basically instead of JSC having an empty implementation for this, it could use > an implementation which called getDOMObjectWrapper. > > I don't understand how this works with isolated worlds. But then again, I > don't know how isolated worlds work for V8. When grabbing a wrapper in JSc you > have to pass the global object. Sounds interesting, thanks. Let me think a bit about it, but I'd be glad to have this change in first and improve it later.
Note You need to log in before you can comment on or make changes to this bug.