Bug 100790 - Improve performance of MaskPtr
: Improve performance of MaskPtr
Product: WebKit
Classification: Unclassified
Component: Platform
: 528+ (Nightly build)
: Unspecified Unspecified
: P2 Normal
Assigned To: Chris Evans
Depends on: 97494
  Show dependency treegraph
Reported: 2012-10-30 14:45 PDT by Chris Evans
Modified: 2012-10-30 18:10 PDT (History)
6 users (show)

See Also:

Patch (4.77 KB, patch)
2012-10-30 15:05 PDT, Chris Evans
no flags Details | Formatted Diff | Diff
Patch (5.01 KB, patch)
2012-10-30 16:16 PDT, Chris Evans
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Evans 2012-10-30 14:45:05 PDT
Eric noted that MaskPtr could be faster.
Comment 1 Chris Evans 2012-10-30 14:52:58 PDT
The problem is that, at least with gcc, the mask value is being calculated every time. This is a series of serialized operations on the same register, which is not ideal.

Conceptually, we can do it in one instruction -- it's just an xor.

We don't simply put the value in a static, as referencing statics in an ASLR world is expensive on 32-bit and even uses a weird / noncompact instuction on 64-bit (rip-relative). Instead we just store the mask as a member of RenderArena, which causes very compact instruction usage.

Whilst looking at the assembly of RenderArena::allocate(), I also noted another problem on the hot path: a serialized register extension due to mismatch of int vs. size_t in the code. This is also easy to take care of in the same CL, so I will do so.
Comment 2 Eric Seidel 2012-10-30 14:55:01 PDT
OMG.  run-perf-tests --profile finds its first victim! :)

Thanks for the follow-up.
Comment 3 Chris Evans 2012-10-30 15:05:28 PDT
Created attachment 171525 [details]
Comment 4 Eric Seidel 2012-10-30 15:07:21 PDT
Comment on attachment 171525 [details]

Since this is a perf patch, seems we should run one of the tests which this showed up in and see if there is a change?
Comment 5 Ojan Vafai 2012-10-30 15:08:46 PDT
Comment on attachment 171525 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=171525&action=review

> Source/WebCore/rendering/RenderArena.cpp:120
> +    if (LIKELY(size < gMaxRecycledSize)) {

Did you actually find these LIKELY macros to make a measurable improvement? It's unclear from your ChangeLog description.
Comment 6 Chris Evans 2012-10-30 15:14:23 PDT
@ojan: for gcc at least, the branch structure isn't changed by the new LIKELY()s. Since Eric has identified this function as very performance hot, though, I thought there's no harm in being explicit about it.
Comment 7 Ojan Vafai 2012-10-30 15:16:53 PDT
(In reply to comment #6)
> @ojan: for gcc at least, the branch structure isn't changed by the new LIKELY()s. Since Eric has identified this function as very performance hot, though, I thought there's no harm in being explicit about it.

As a rule, we (i.e. WebKit) only put LIKELY/UNLIKELY in cases where we've seen measurable improvement. In practice, it almost never matters.
Comment 8 Chris Evans 2012-10-30 15:17:44 PDT
Ok, I'll remove them.
Comment 9 Eric Seidel 2012-10-30 15:19:24 PDT
To test, I might do something like this:

git stash (save your patch)
Tools/Scripts/build-webkit --chromium (or however you do a build for your checkout)
Tools/Scripts/run-perf-tests Parser/tiny-innerHTML.html (since that's one of the ones which showed MaskPtr as hot)
git stash apply
Tools/Scripts/run-perf-tests Parser/tiny-innerHTML.html

update the bug/ChangeLog with your results.  I would expect you will see a small progression, or no change at all.

There are many other tests in PerformanceTests you could run.  You can look at my comment on bug 97494 for other tests which showed MaskPtr as hot.

The profiles in bug 97494 came from an experimental --profile flag which I'm adding to run-perf-tests as part of bug 99517.  If you want you can apply the patch from there, but given that it's only ever been tested on my machine, it's highly likely it won't work for you out of the box (Also requires apt-get install google-perftools or prodaccess to the shared pprof binary).

In any case, just run a couple run-perf-tests tests, and comment about the change or lack there of.  Just showing that you did some minor perf testing, is all I really care about.  The bots will catch if we did anything horribly wrong. :)
Comment 10 Eric Seidel 2012-10-30 15:20:19 PDT
Your instinct to fix the assembler is good, but I think we generally lean on the profile/benchmark data first (which is I think what Ojan is trying to say)
Comment 11 Eric Seidel 2012-10-30 15:20:51 PDT
I forgot to list a second build step in my steps I gave you above, but I'm sure you'll figure it out. :)
Comment 12 Chris Evans 2012-10-30 16:16:48 PDT
Created attachment 171543 [details]
Comment 13 Eric Seidel 2012-10-30 16:18:04 PDT
Comment on attachment 171543 [details]

Thanks.  This sort of ChangeLog information is really useful!
Comment 14 Chris Evans 2012-10-30 16:41:30 PDT
Well, Eric and I are vaguely embarrassed because we now think that MaskPtr in the profiles in fact refers to tcmalloc's MaskPtr in the anonymous namespace. If this were the RenderArena MaskPtr, we'd expect to have seen WebCore::MaskPtr.

Still, if MaskPtr is hot in tcmalloc, there's probably another benchmark we've fixed that would have been hot in RenderArena.
Comment 15 WebKit Review Bot 2012-10-30 18:10:32 PDT
Comment on attachment 171543 [details]

Clearing flags on attachment: 171543

Committed r132970: <http://trac.webkit.org/changeset/132970>
Comment 16 WebKit Review Bot 2012-10-30 18:10:36 PDT
All reviewed patches have been landed.  Closing bug.