RESOLVED FIXED100790
Improve performance of MaskPtr
https://bugs.webkit.org/show_bug.cgi?id=100790
Summary Improve performance of MaskPtr
Chris Evans
Reported 2012-10-30 14:45:05 PDT
Eric noted that MaskPtr could be faster.
Attachments
Patch (4.77 KB, patch)
2012-10-30 15:05 PDT, Chris Evans
no flags
Patch (5.01 KB, patch)
2012-10-30 16:16 PDT, Chris Evans
no flags
Chris Evans
Comment 1 2012-10-30 14:52:58 PDT
The problem is that, at least with gcc, the mask value is being calculated every time. This is a series of serialized operations on the same register, which is not ideal. Conceptually, we can do it in one instruction -- it's just an xor. We don't simply put the value in a static, as referencing statics in an ASLR world is expensive on 32-bit and even uses a weird / noncompact instuction on 64-bit (rip-relative). Instead we just store the mask as a member of RenderArena, which causes very compact instruction usage. Whilst looking at the assembly of RenderArena::allocate(), I also noted another problem on the hot path: a serialized register extension due to mismatch of int vs. size_t in the code. This is also easy to take care of in the same CL, so I will do so.
Eric Seidel (no email)
Comment 2 2012-10-30 14:55:01 PDT
OMG. run-perf-tests --profile finds its first victim! :) Thanks for the follow-up.
Chris Evans
Comment 3 2012-10-30 15:05:28 PDT
Eric Seidel (no email)
Comment 4 2012-10-30 15:07:21 PDT
Comment on attachment 171525 [details] Patch Since this is a perf patch, seems we should run one of the tests which this showed up in and see if there is a change?
Ojan Vafai
Comment 5 2012-10-30 15:08:46 PDT
Comment on attachment 171525 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171525&action=review > Source/WebCore/rendering/RenderArena.cpp:120 > + if (LIKELY(size < gMaxRecycledSize)) { Did you actually find these LIKELY macros to make a measurable improvement? It's unclear from your ChangeLog description.
Chris Evans
Comment 6 2012-10-30 15:14:23 PDT
@ojan: for gcc at least, the branch structure isn't changed by the new LIKELY()s. Since Eric has identified this function as very performance hot, though, I thought there's no harm in being explicit about it.
Ojan Vafai
Comment 7 2012-10-30 15:16:53 PDT
(In reply to comment #6) > @ojan: for gcc at least, the branch structure isn't changed by the new LIKELY()s. Since Eric has identified this function as very performance hot, though, I thought there's no harm in being explicit about it. As a rule, we (i.e. WebKit) only put LIKELY/UNLIKELY in cases where we've seen measurable improvement. In practice, it almost never matters.
Chris Evans
Comment 8 2012-10-30 15:17:44 PDT
Ok, I'll remove them.
Eric Seidel (no email)
Comment 9 2012-10-30 15:19:24 PDT
To test, I might do something like this: git stash (save your patch) Tools/Scripts/build-webkit --chromium (or however you do a build for your checkout) Tools/Scripts/run-perf-tests Parser/tiny-innerHTML.html (since that's one of the ones which showed MaskPtr as hot) git stash apply Tools/Scripts/run-perf-tests Parser/tiny-innerHTML.html update the bug/ChangeLog with your results. I would expect you will see a small progression, or no change at all. There are many other tests in PerformanceTests you could run. You can look at my comment on bug 97494 for other tests which showed MaskPtr as hot. The profiles in bug 97494 came from an experimental --profile flag which I'm adding to run-perf-tests as part of bug 99517. If you want you can apply the patch from there, but given that it's only ever been tested on my machine, it's highly likely it won't work for you out of the box (Also requires apt-get install google-perftools or prodaccess to the shared pprof binary). In any case, just run a couple run-perf-tests tests, and comment about the change or lack there of. Just showing that you did some minor perf testing, is all I really care about. The bots will catch if we did anything horribly wrong. :)
Eric Seidel (no email)
Comment 10 2012-10-30 15:20:19 PDT
Your instinct to fix the assembler is good, but I think we generally lean on the profile/benchmark data first (which is I think what Ojan is trying to say)
Eric Seidel (no email)
Comment 11 2012-10-30 15:20:51 PDT
I forgot to list a second build step in my steps I gave you above, but I'm sure you'll figure it out. :)
Chris Evans
Comment 12 2012-10-30 16:16:48 PDT
Eric Seidel (no email)
Comment 13 2012-10-30 16:18:04 PDT
Comment on attachment 171543 [details] Patch Thanks. This sort of ChangeLog information is really useful!
Chris Evans
Comment 14 2012-10-30 16:41:30 PDT
Well, Eric and I are vaguely embarrassed because we now think that MaskPtr in the profiles in fact refers to tcmalloc's MaskPtr in the anonymous namespace. If this were the RenderArena MaskPtr, we'd expect to have seen WebCore::MaskPtr. Still, if MaskPtr is hot in tcmalloc, there's probably another benchmark we've fixed that would have been hot in RenderArena.
WebKit Review Bot
Comment 15 2012-10-30 18:10:32 PDT
Comment on attachment 171543 [details] Patch Clearing flags on attachment: 171543 Committed r132970: <http://trac.webkit.org/changeset/132970>
WebKit Review Bot
Comment 16 2012-10-30 18:10:36 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.