Summary: | Add a pointer field to C++ Node class (for v8) | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | anton muhin <antonm> | ||||||||||||||||||||||||||||
Component: | WebCore Misc. | Assignee: | Nobody <webkit-unassigned> | ||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||
Severity: | Normal | CC: | abarth, ager, christian.plesner.hansen, commit-queue, dglazkov, eric, levin, mjs, sam, vitalyr, webkit.review.bot | ||||||||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||||||||
Attachments: |
|
Description
anton muhin
2009-12-11 06:48:17 PST
Created attachment 44684 [details]
first take
Created attachment 44687 [details]
Forgotten ChangeLog update
style-queue ran check-webkit-style on attachment 44687 [details] without any errors.
Comment on attachment 44687 [details]
Forgotten ChangeLog update
Dmitry Glazkov suggested to rework in a ways more maintainable fashion.
My current plan is to introduce ScriptWrappable class and inherit Node from it (in the vain of ScriptValue). JSC implementation would be just an empty class, while V8 implementation would hold a pointer to proper type.
If anyone has any additional concerns I should address before asking for review, please, update the bug.
Empty base class optimization > #ifdefs I think I just suggested a ScriptObjectHandle member of Node. In either case, it's better than void* :) This bug is quite devoid of any actual details of the speedup you got. What was the win? What did you test on? Why is this the right tradeoff? Created attachment 44897 [details]
Introducing ScriptWrappable
Created attachment 44898 [details]
Minor ChangeLog adjustments
Created attachment 44899 [details]
Minor ChangeLog adjustments (this time for real)
Comment on attachment 44899 [details]
Minor ChangeLog adjustments (this time for real)
A note: I'm drafting how ScriptWrappable for v8 bindings could look like. However it might be better deferred until actual commit into v8 bindings which would use this new information. Still keeping it for time being to illustrate how it's supposed to be used.
Sorry for delay. New version, using SrciptWrappable and empty class optimization for JSC bindings is uploaded. May you have another look? (In reply to comment #7) > This bug is quite devoid of any actual details of the speedup you got. What > was the win? What did you test on? Why is this the right tradeoff? Sorry, Sam, didn't want to overload with details. The speedup is ~10% (tested on Windows and Linux) for overall score for DOM core of Dromaeo. (In reply to comment #13) > (In reply to comment #7) > > This bug is quite devoid of any actual details of the speedup you got. What > > was the win? What did you test on? Why is this the right tradeoff? > > Sorry, Sam, didn't want to overload with details. > > The speedup is ~10% (tested on Windows and Linux) for overall score for DOM > core of Dromaeo. At what cost? What was the change in memory use? (In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #7) > > > This bug is quite devoid of any actual details of the speedup you got. What > > > was the win? What did you test on? Why is this the right tradeoff? > > > > Sorry, Sam, didn't want to overload with details. > > > > The speedup is ~10% (tested on Windows and Linux) for overall score for DOM > > core of Dromaeo. > > At what cost? What was the change in memory use? Another question: Is putting this pointer on Node really the right choice? We have discussed specializing Element or even HTMLElement to avoid the cost for every Node (Text nodes for instance). Did you measure which type of Nodes this would be the biggest win for? (In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #7) > > > This bug is quite devoid of any actual details of the speedup you got. What > > > was the win? What did you test on? Why is this the right tradeoff? > > > > Sorry, Sam, didn't want to overload with details. > > > > The speedup is ~10% (tested on Windows and Linux) for overall score for DOM > > core of Dromaeo. > > At what cost? What was the change in memory use? First answer: I don't know, but hopefully we have enough infrastructure to see the impact on Chromium (and I hope Safari won't be affected at all). Some more details. I carried on completely unscientific experiment: opened big GMail account in Chromium and got three snapshots. Numbers are (first number is total number of Nodes as reported by liveNodeSet (see Node.cpp) vs. number of wrapped Nodes: 1) 22945/3103 2) 35055/6911 3) 37223/7012. Now embedding a pointer into Node we pay 4 bytes per each Node. However, this pointer allows to use less memory for HashTables (in CL to be send for review). My estimates: 1) Currently we use 8 bytes per entry in the HashMap (4 bytes for Node pointer and 4 bytes for a handle), however HashMap allocated blocks which are from twice to six times bigger than necessary, let's average that as 4 times, so we allocate 32 bytes per each wrapped node. 2) If all CLs in, we would have only 4 bytes overhead per wrapped node (plus some constant overhead for the rest of unused chunk). So roughly we actual memory usage would be less, if we wrap every ~7th node. Numbers for my GMail run shows something like one wrapped per 5-7 nodes. Definitely that's not a prove, but kind of evidence that idea at least has a chance. Taking it from other side 37223 * 4 gives me 145.5K of overhead, once again, looks reasonable for me (but I am biased). (In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #13) > > > (In reply to comment #7) > > > > This bug is quite devoid of any actual details of the speedup you got. What > > > > was the win? What did you test on? Why is this the right tradeoff? > > > > > > Sorry, Sam, didn't want to overload with details. > > > > > > The speedup is ~10% (tested on Windows and Linux) for overall score for DOM > > > core of Dromaeo. > > > > At what cost? What was the change in memory use? > > Another question: Is putting this pointer on Node really the right choice? We > have discussed specializing Element or even HTMLElement to avoid the cost for > every Node (Text nodes for instance). Did you measure which type of Nodes this > would be the biggest win for? Sorry, could you elaborate? Was it discussed in public? If yes, could you give me a pointer to the thread? ping > > At what cost? What was the change in memory use? > > First answer: I don't know, but hopefully we have enough infrastructure to see > the impact on Chromium (and I hope Safari won't be affected at all). > > Some more details. I carried on completely unscientific experiment: opened big > GMail account in Chromium and got three snapshots. Numbers are (first number > is total number of Nodes as reported by liveNodeSet (see Node.cpp) vs. number > of wrapped Nodes: > > 1) 22945/3103 > 2) 35055/6911 > 3) 37223/7012. > > Now embedding a pointer into Node we pay 4 bytes per each Node. However, this > pointer allows to use less memory for HashTables (in CL to be send for review). > My estimates: > > 1) Currently we use 8 bytes per entry in the HashMap (4 bytes for Node pointer > and 4 bytes for a handle), however HashMap allocated blocks which are from > twice to six times bigger than necessary, let's average that as 4 times, so we > allocate 32 bytes per each wrapped node. > > 2) If all CLs in, we would have only 4 bytes overhead per wrapped node (plus > some constant overhead for the rest of unused chunk). > > So roughly we actual memory usage would be less, if we wrap every ~7th node. > Numbers for my GMail run shows something like one wrapped per 5-7 nodes. > > Definitely that's not a prove, but kind of evidence that idea at least has a > chance. > > Taking it from other side 37223 * 4 gives me 145.5K of overhead, once again, > looks reasonable for me (but I am biased). These are interesting numbers, but I think a more scientific approach would better. I would recommend running the membuster test (http://random.pavlov.net/membuster/index.html) and using the membuster memory harness to test the memory usage during Dromeo before and after your change. I don't plan on trying to block your change, but these numbers would be very valuable for trying to decide if this is the right tradeoff to make. > > Another question: Is putting this pointer on Node really the right choice? We > > have discussed specializing Element or even HTMLElement to avoid the cost for > > every Node (Text nodes for instance). Did you measure which type of Nodes this > > would be the biggest win for? > > Sorry, could you elaborate? Was it discussed in public? If yes, could you > give me a pointer to the thread? I don't have a pointer, but this was discussed at some amount of length with Dimitri Glazkov when the v8 stuff first landed, and at that time, we decided that we would not use an intrusive pointer without additional memory testing. The specializing for Element idea was to just put an intrusive pointer only on Elements (instead of all Nodes), based on the theory that Elements are the most likely to have a wrapper and that all Nodes should not have to pay the price. Sam, all, first of all, sorry for long delay and merry Christmas. (In reply to comment #19) > These are interesting numbers, but I think a more scientific approach would > better. I would recommend running the membuster test > (http://random.pavlov.net/membuster/index.html) and using the membuster memory > harness to test the memory usage during Dromeo before and after your change. I > don't plan on trying to block your change, but these numbers would be very > valuable for trying to decide if this is the right tradeoff to make. That's a good idea. What I did. I logged (every 100th time) number of nodes (liveNodeSet.size()) and number of wrapped nodes (actually getDOMNodeMap().size()) on 4 events: Node creation/destruction, node wrapper addition and removal (if anyone is interested, I'd post a patch). Then I processed this file offline for various sites. To estimate current overhead I used three measures: original_min = 2 * 8 * wrapped nodes (2 due to two times bigger underlying arrays size), original_max = 6 * 8 * wrapped nodes (accounting for shrink strategy) and just original = 4 * 8 * wrapped nodes (kind of average). Please note that this doesn't account for deleted entries (which live as tombstones until rehashed/shrinked/expanded), so those numbers are underestimates. To account for overhead by this patch + new Node map implementation I used the following formula: 4 * nodes + 4 * wrapped + 2 * 1024 (2K due to the current chunk size which is half of page). Results: === membuster avg nodes: 59339.0, avg wrapped: 9968.6 (16.799%) measure: original_min max overhead: 197892 bytes (193.3K) for 76867 9302 max win: -440 bytes (-0.4K) for 3086 1236 average: -117.0K measure: original_max max overhead: 13936 bytes (13.6K) for 3005 3 max win: -439620 bytes (-429.3K) for 69224 16331 average: 194.5K measure: original max overhead: 49756 bytes (48.6K) for 76712 9255 max win: -178324 bytes (-174.1K) for 69224 16331 average: 38.8K === Dromaeo, DOM core avg nodes: 167563.1, avg wrapped: 62046.0 (37.028%) measure: original_min max overhead: 1267224 bytes (1237.5K) for 679459 121055 max win: -2096996 bytes (-2047.8K) for 290936 271899 average: 70.6K measure: original_max max overhead: 84992 bytes (83.0K) for 28557 711 max win: -10797764 bytes (-10544.7K) for 290936 271899 average: 2009.5K measure: original max overhead: 96368 bytes (94.1K) for 28557 711 max win: -6447380 bytes (-6296.3K) for 290936 271899 average: 1040.0K === Dromaeo, JS libs avg nodes: 210501.8, avg wrapped: 74075.6 (35.190%) measure: original_min max overhead: 276512 bytes (270.0K) for 70872 752 max win: -738036 bytes (-720.7K) for 528847 237956 average: 43.8K measure: original_max max overhead: 252448 bytes (246.5K) for 70872 752 max win: -8352628 bytes (-8156.9K) for 528847 237956 average: 2358.7K measure: original max overhead: 264480 bytes (258.3K) for 70872 752 max win: -4545332 bytes (-4438.8K) for 528847 237956 average: 1201.2K === Apple.com and then click store avg nodes: 1151.9, avg wrapped: 239.3 (20.776%) measure: original_min max overhead: 7812 bytes (7.6K) for 2233 264 max win: 0 bytes (0.0K) for -1 -1 average: -3.7K measure: original_max max overhead: 380 bytes (0.4K) for 408 75 max win: -12748 bytes (-12.4K) for 2351 550 average: 3.8K measure: original max overhead: 3588 bytes (3.5K) for 2233 264 max win: -3948 bytes (-3.9K) for 2351 550 average: 0.0K === Google.ru + search for Chromium avg nodes: 399.9, avg wrapped: 146.2 (36.549%) measure: original_min max overhead: 4152 bytes (4.1K) for 802 92 max win: 0 bytes (0.0K) for -1 -1 average: -1.8K measure: original_max max overhead: 1648 bytes (1.6K) for 175 25 max win: -9664 bytes (-9.4K) for 449 307 average: 2.7K measure: original max overhead: 2680 bytes (2.6K) for 802 92 max win: -4752 bytes (-4.6K) for 449 307 average: 0.4K === http://www.w3.org/ + Reload avg nodes: 780.0, avg wrapped: 101.7 (13.040%) measure: original_min max overhead: 6392 bytes (6.2K) for 1086 0 max win: 0 bytes (0.0K) for -1 -1 average: -3.9K measure: original_max max overhead: 6392 bytes (6.2K) for 1086 0 max win: -15640 bytes (-15.3K) for 1199 511 average: -0.7K measure: original max overhead: 6392 bytes (6.2K) for 1086 0 max win: -7464 bytes (-7.3K) for 1199 511 average: -2.3K I'd say those numbers show that actually Chromium could even reduce its memory usage. > > > Another question: Is putting this pointer on Node really the right choice? We > > > have discussed specializing Element or even HTMLElement to avoid the cost for > > > every Node (Text nodes for instance). Did you measure which type of Nodes this > > > would be the biggest win for? > > > > Sorry, could you elaborate? Was it discussed in public? If yes, could you > > give me a pointer to the thread? > > I don't have a pointer, but this was discussed at some amount of length with > Dimitri Glazkov when the v8 stuff first landed, and at that time, we decided > that we would not use an intrusive pointer without additional memory testing. > The specializing for Element idea was to just put an intrusive pointer only on > Elements (instead of all Nodes), based on the theory that Elements are the most > likely to have a wrapper and that all Nodes should not have to pay the price. Thanks a lot for clarifications. That sounds like an interesting idea, however I'd postpone it as it adds some notable complications into the code (need to treat nodes and elements separately). YMMW, of course. And just in case, I'd be off from Jan, 1 to Jan, 11. ping I am still very interested in seeing the changes to Membuster. (In reply to comment #22) > I am still very interested in seeing the changes to Membuster. Sam, I gave numbers for membuster in comment#20 (https://bugs.webkit.org/show_bug.cgi?id=32430#c20) Just if you're curious I ran membuster in baseline and patched version of chromium. The results for two runs are: baseline (working set after the finish, max working set): 1st run: 166.316K 192.884K 2nd run: 161.576K 184.749K patched: 1st run: 163.432K 185.828K 2nd run: 162.076K 186.812K best to worst ratio for max WS: 1.01. There probably should exist some better mechanism to estimate memory usage with membuster, e.g., after finish WS doesn't appear quite telling to me. These numbers make this patch look like a win from a number of different angles. Would you be willing to up a version up for review? We can iterate on Node/Element and other fine details in a future bug. (In reply to comment #24) > These numbers make this patch look like a win from a number of different > angles. Would you be willing to up a version up for review? We can iterate on > Node/Element and other fine details in a future bug. So it's like a 1% (or less) overall memory use regression for a 10% DOM core speedup. I guess we'll want to implement this for JSC bindings too but it might be at the level where we want to give ports for memory-constrained environments a way to opt out. Created attachment 46933 [details]
Rebaselining
Attachment 46933 [details] did not pass style-queue:
Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/bindings/v8/ScriptWrappable.h:49: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5]
Total errors found: 1
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 46936 [details]
Style issue fixed
Attachment 46933 [details] did not build on mac: Build output: http://webkit-commit-queue.appspot.com/results/198862 Attachment 46936 [details] did not build on mac: Build output: http://webkit-commit-queue.appspot.com/results/200647 Another way to implement this would be to have two implementations of this class. One which knows how to pull from the current wrapper caches, and one which knows how to do the lookup directly. Basically instead of JSC having an empty implementation for this, it could use an implementation which called getDOMObjectWrapper. I don't understand how this works with isolated worlds. But then again, I don't know how isolated worlds work for V8. When grabbing a wrapper in JSc you have to pass the global object. Created attachment 47002 [details]
Patch
Comment on attachment 47002 [details]
Patch
Missing ScriptWrappable.h
Created attachment 47003 [details]
Sorry, learning to use webkit-patch and git
(In reply to comment #33) > (From update of attachment 47002 [details]) > Missing ScriptWrappable.h Thanks a lot for spotting this, new patch uploaded. BTW, I hope previous mac build failure was spurious. And is there a way to see the build logs? Created attachment 47018 [details]
Setting proper role to allow exporing of the header
(In reply to comment #36) > Created an attachment (id=47018) [details] > Setting proper role to allow exporing of the header mac build is now fixed, but for some reason qt ews fails to start. I could repost the patch if necessary. Is there some other way to force ews? > mac build is now fixed, but for some reason qt ews fails to start. I could
> repost the patch if necessary. Is there some other way to force ews?
Looks like the EWS is happy. I might need to give the bots more horsepower to reduce latency.
Comment on attachment 47018 [details]
Setting proper role to allow exporing of the header
Given the discussion above, this looks good. Presumably we need to actually use this functionality for something to realize the speedups. :)
Also, you might want to file a bug to track the JSC implementation of this feature in case that shows similar performance gains.
(In reply to comment #39) Thanks a lot to everyone for review, cq+'ing. The v8-bindings patch is on the way (I'd post a reference here if you're interested). > (From update of attachment 47018 [details]) > Given the discussion above, this looks good. Presumably we need to actually > use this functionality for something to realize the speedups. :) > > Also, you might want to file a bug to track the JSC implementation of this > feature in case that shows similar performance gains. Ok, I'll file the bug for JSC bindings as well. Comment on attachment 47018 [details] Setting proper role to allow exporing of the header Rejecting patch 47018 from commit-queue. Failed to run "['/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', '--reviewer', 'Adam Barth', '--force']" exit_code: 1 Last 500 characters of output: ded at 1028 (offset 1 line). patching file WebCore/WebCore.vcproj/WebCore.vcproj Hunk #1 succeeded at 40516 (offset 64 lines). patching file WebCore/WebCore.xcodeproj/project.pbxproj Hunk #1 succeeded at 557 (offset -2 lines). Hunk #2 succeeded at 5848 (offset 9 lines). Hunk #3 succeeded at 14597 (offset 27 lines). Hunk #4 succeeded at 18421 (offset 26 lines). patching file WebCore/bindings/js/ScriptWrappable.h patching file WebCore/bindings/v8/ScriptWrappable.h patching file WebCore/dom/Node.h Full output: http://webkit-commit-queue.appspot.com/results/203058 Created attachment 47108 [details]
Rebaselining WebCore/WebCore.gypi
Created attachment 47116 [details]
Fixed a typo in NULL pointer check
Attachment 47116 [details] did not build on qt: Build output: http://webkit-commit-queue.appspot.com/results/203155 Created attachment 47124 [details]
Same patch, rerunning through EWS
(In reply to comment #44) > Attachment 47116 [details] did not build on qt: > Build output: http://webkit-commit-queue.appspot.com/results/203155 Looks suspicious---I only removed ! in ASSERT. Reposting the same patch to rerun EWSes. https://bugs.webkit.org/show_bug.cgi?id=33957 filed with the patch which actually implements intrusive map for Nodes using ScriptWrappables. Changed title since this has a potentially significant effect on non-v8 code. Comment on attachment 47124 [details]
Same patch, rerunning through EWS
Looks OK. I think we'll want this baseclass to hide how we access the wrapper eventually.
Comment on attachment 47124 [details] Same patch, rerunning through EWS Clearing flags on attachment: 47124 Committed r53690: <http://trac.webkit.org/changeset/53690> All reviewed patches have been landed. Closing bug. (In reply to comment #31) > Another way to implement this would be to have two implementations of this > class. One which knows how to pull from the current wrapper caches, and one > which knows how to do the lookup directly. > > Basically instead of JSC having an empty implementation for this, it could use > an implementation which called getDOMObjectWrapper. > > I don't understand how this works with isolated worlds. But then again, I > don't know how isolated worlds work for V8. When grabbing a wrapper in JSc you > have to pass the global object. Sounds interesting, thanks. Let me think a bit about it, but I'd be glad to have this change in first and improve it later. |