Bug 100790 - Improve performance of MaskPtr
: Improve performance of MaskPtr
Status: RESOLVED FIXED
: WebKit
Platform
: 528+ (Nightly build)
: Unspecified Unspecified
: P2 Normal
Assigned To:
:
:
: 97494
:
  Show dependency treegraph
 
Reported: 2012-10-30 14:45 PST by
Modified: 2012-10-30 18:10 PST (History)


Attachments
Patch (4.77 KB, patch)
2012-10-30 15:05 PST, Chris Evans
no flags Review Patch | Details | Formatted Diff | Diff
Patch (5.01 KB, patch)
2012-10-30 16:16 PST, Chris Evans
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2012-10-30 14:45:05 PST
Eric noted that MaskPtr could be faster.
------- Comment #1 From 2012-10-30 14:52:58 PST -------
The problem is that, at least with gcc, the mask value is being calculated every time. This is a series of serialized operations on the same register, which is not ideal.

Conceptually, we can do it in one instruction -- it's just an xor.

We don't simply put the value in a static, as referencing statics in an ASLR world is expensive on 32-bit and even uses a weird / noncompact instuction on 64-bit (rip-relative). Instead we just store the mask as a member of RenderArena, which causes very compact instruction usage.

Whilst looking at the assembly of RenderArena::allocate(), I also noted another problem on the hot path: a serialized register extension due to mismatch of int vs. size_t in the code. This is also easy to take care of in the same CL, so I will do so.
------- Comment #2 From 2012-10-30 14:55:01 PST -------
OMG.  run-perf-tests --profile finds its first victim! :)

Thanks for the follow-up.
------- Comment #3 From 2012-10-30 15:05:28 PST -------
Created an attachment (id=171525) [details]
Patch
------- Comment #4 From 2012-10-30 15:07:21 PST -------
(From update of attachment 171525 [details])
Since this is a perf patch, seems we should run one of the tests which this showed up in and see if there is a change?
------- Comment #5 From 2012-10-30 15:08:46 PST -------
(From update of attachment 171525 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=171525&action=review

> Source/WebCore/rendering/RenderArena.cpp:120
> +    if (LIKELY(size < gMaxRecycledSize)) {

Did you actually find these LIKELY macros to make a measurable improvement? It's unclear from your ChangeLog description.
------- Comment #6 From 2012-10-30 15:14:23 PST -------
@ojan: for gcc at least, the branch structure isn't changed by the new LIKELY()s. Since Eric has identified this function as very performance hot, though, I thought there's no harm in being explicit about it.
------- Comment #7 From 2012-10-30 15:16:53 PST -------
(In reply to comment #6)
> @ojan: for gcc at least, the branch structure isn't changed by the new LIKELY()s. Since Eric has identified this function as very performance hot, though, I thought there's no harm in being explicit about it.

As a rule, we (i.e. WebKit) only put LIKELY/UNLIKELY in cases where we've seen measurable improvement. In practice, it almost never matters.
------- Comment #8 From 2012-10-30 15:17:44 PST -------
Ok, I'll remove them.
------- Comment #9 From 2012-10-30 15:19:24 PST -------
To test, I might do something like this:

git stash (save your patch)
Tools/Scripts/build-webkit --chromium (or however you do a build for your checkout)
Tools/Scripts/run-perf-tests Parser/tiny-innerHTML.html (since that's one of the ones which showed MaskPtr as hot)
git stash apply
Tools/Scripts/run-perf-tests Parser/tiny-innerHTML.html

update the bug/ChangeLog with your results.  I would expect you will see a small progression, or no change at all.

There are many other tests in PerformanceTests you could run.  You can look at my comment on bug 97494 for other tests which showed MaskPtr as hot.

The profiles in bug 97494 came from an experimental --profile flag which I'm adding to run-perf-tests as part of bug 99517.  If you want you can apply the patch from there, but given that it's only ever been tested on my machine, it's highly likely it won't work for you out of the box (Also requires apt-get install google-perftools or prodaccess to the shared pprof binary).

In any case, just run a couple run-perf-tests tests, and comment about the change or lack there of.  Just showing that you did some minor perf testing, is all I really care about.  The bots will catch if we did anything horribly wrong. :)
------- Comment #10 From 2012-10-30 15:20:19 PST -------
Your instinct to fix the assembler is good, but I think we generally lean on the profile/benchmark data first (which is I think what Ojan is trying to say)
------- Comment #11 From 2012-10-30 15:20:51 PST -------
I forgot to list a second build step in my steps I gave you above, but I'm sure you'll figure it out. :)
------- Comment #12 From 2012-10-30 16:16:48 PST -------
Created an attachment (id=171543) [details]
Patch
------- Comment #13 From 2012-10-30 16:18:04 PST -------
(From update of attachment 171543 [details])
Thanks.  This sort of ChangeLog information is really useful!
------- Comment #14 From 2012-10-30 16:41:30 PST -------
Well, Eric and I are vaguely embarrassed because we now think that MaskPtr in the profiles in fact refers to tcmalloc's MaskPtr in the anonymous namespace. If this were the RenderArena MaskPtr, we'd expect to have seen WebCore::MaskPtr.

Still, if MaskPtr is hot in tcmalloc, there's probably another benchmark we've fixed that would have been hot in RenderArena.
------- Comment #15 From 2012-10-30 18:10:32 PST -------
(From update of attachment 171543 [details])
Clearing flags on attachment: 171543

Committed r132970: <http://trac.webkit.org/changeset/132970>
------- Comment #16 From 2012-10-30 18:10:36 PST -------
All reviewed patches have been landed.  Closing bug.