Summary: | Improve performance of MaskPtr | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Chris Evans <cevans> | ||||||
Component: | Platform | Assignee: | Chris Evans <cevans> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Normal | CC: | abarth, eric, inferno, jschuh, ojan, webkit.review.bot | ||||||
Priority: | P2 | ||||||||
Version: | 528+ (Nightly build) | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Bug Depends on: | 97494 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Chris Evans
2012-10-30 14:45:05 PDT
The problem is that, at least with gcc, the mask value is being calculated every time. This is a series of serialized operations on the same register, which is not ideal. Conceptually, we can do it in one instruction -- it's just an xor. We don't simply put the value in a static, as referencing statics in an ASLR world is expensive on 32-bit and even uses a weird / noncompact instuction on 64-bit (rip-relative). Instead we just store the mask as a member of RenderArena, which causes very compact instruction usage. Whilst looking at the assembly of RenderArena::allocate(), I also noted another problem on the hot path: a serialized register extension due to mismatch of int vs. size_t in the code. This is also easy to take care of in the same CL, so I will do so. OMG. run-perf-tests --profile finds its first victim! :) Thanks for the follow-up. Created attachment 171525 [details]
Patch
Comment on attachment 171525 [details]
Patch
Since this is a perf patch, seems we should run one of the tests which this showed up in and see if there is a change?
Comment on attachment 171525 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171525&action=review > Source/WebCore/rendering/RenderArena.cpp:120 > + if (LIKELY(size < gMaxRecycledSize)) { Did you actually find these LIKELY macros to make a measurable improvement? It's unclear from your ChangeLog description. @ojan: for gcc at least, the branch structure isn't changed by the new LIKELY()s. Since Eric has identified this function as very performance hot, though, I thought there's no harm in being explicit about it. (In reply to comment #6) > @ojan: for gcc at least, the branch structure isn't changed by the new LIKELY()s. Since Eric has identified this function as very performance hot, though, I thought there's no harm in being explicit about it. As a rule, we (i.e. WebKit) only put LIKELY/UNLIKELY in cases where we've seen measurable improvement. In practice, it almost never matters. Ok, I'll remove them. To test, I might do something like this: git stash (save your patch) Tools/Scripts/build-webkit --chromium (or however you do a build for your checkout) Tools/Scripts/run-perf-tests Parser/tiny-innerHTML.html (since that's one of the ones which showed MaskPtr as hot) git stash apply Tools/Scripts/run-perf-tests Parser/tiny-innerHTML.html update the bug/ChangeLog with your results. I would expect you will see a small progression, or no change at all. There are many other tests in PerformanceTests you could run. You can look at my comment on bug 97494 for other tests which showed MaskPtr as hot. The profiles in bug 97494 came from an experimental --profile flag which I'm adding to run-perf-tests as part of bug 99517. If you want you can apply the patch from there, but given that it's only ever been tested on my machine, it's highly likely it won't work for you out of the box (Also requires apt-get install google-perftools or prodaccess to the shared pprof binary). In any case, just run a couple run-perf-tests tests, and comment about the change or lack there of. Just showing that you did some minor perf testing, is all I really care about. The bots will catch if we did anything horribly wrong. :) Your instinct to fix the assembler is good, but I think we generally lean on the profile/benchmark data first (which is I think what Ojan is trying to say) I forgot to list a second build step in my steps I gave you above, but I'm sure you'll figure it out. :) Created attachment 171543 [details]
Patch
Comment on attachment 171543 [details]
Patch
Thanks. This sort of ChangeLog information is really useful!
Well, Eric and I are vaguely embarrassed because we now think that MaskPtr in the profiles in fact refers to tcmalloc's MaskPtr in the anonymous namespace. If this were the RenderArena MaskPtr, we'd expect to have seen WebCore::MaskPtr. Still, if MaskPtr is hot in tcmalloc, there's probably another benchmark we've fixed that would have been hot in RenderArena. Comment on attachment 171543 [details] Patch Clearing flags on attachment: 171543 Committed r132970: <http://trac.webkit.org/changeset/132970> All reviewed patches have been landed. Closing bug. |