Bug 47230

Summary: Avoid inlining large and/or virtual functions in widely included header files
Product: WebKit Reporter: James Robinson <jamesr>
Component: New BugsAssignee: James Robinson <jamesr>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, commit-queue, darin, dglazkov, eric, mjs
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Attachments:
Description Flags
Patch
none
applies to ToT
none
include WebCore.exp.in changes so this builds on mac
none
Patch none

James Robinson
Reported 2010-10-05 16:58:37 PDT
Avoid inlining large and/or virtual functions in widely included header files
Attachments
Patch (66.79 KB, patch)
2010-10-05 17:07 PDT, James Robinson
no flags
applies to ToT (64.51 KB, patch)
2010-10-05 17:36 PDT, James Robinson
no flags
include WebCore.exp.in changes so this builds on mac (65.57 KB, patch)
2010-10-06 18:20 PDT, James Robinson
no flags
Patch (66.04 KB, patch)
2010-10-08 18:17 PDT, James Robinson
no flags
James Robinson
Comment 1 2010-10-05 17:07:54 PDT
James Robinson
Comment 2 2010-10-05 17:12:12 PDT
Based on some work initially done by Elliot Glaysher from the Chrome team. One particularly nasty anti-pattern I've seen a lot in our codebase is this: Foo.h: class Foo { public: virtual ~Foo(); }; Bar.h: class Bar : public Foo { HashSet<String> m_set; }; With this sort of construct, the implicitly defined destructor for Foo is placed in the header and inlines the very large HashSet destructor into every object file that #includes Bar.h. Since Bar::~Bar is virtual the compiler can't inline this code even if it wanted to. Thanks to this we end up with 800+ copies of common destructors spread across object files across the system. I'd like it if we had some sort of automated system to detect this antipattern, if possible.
James Robinson
Comment 3 2010-10-05 17:26:59 PDT
Comment on attachment 69868 [details] Patch Patch doesn't apply, will resolve and reupload.
Adam Barth
Comment 4 2010-10-05 17:32:01 PDT
Nice.
James Robinson
Comment 5 2010-10-05 17:36:20 PDT
Created attachment 69869 [details] applies to ToT
Adam Barth
Comment 6 2010-10-05 17:47:52 PDT
Comment on attachment 69869 [details] applies to ToT View in context: https://bugs.webkit.org/attachment.cgi?id=69869&action=review <3 > WebCore/dom/QualifiedName.h:63 > - ~QualifiedName() { deref(); } > + ~QualifiedName(); You sure there's no perf hit from making this non-inline? > WebCore/platform/network/HTTPHeaderMap.cpp:80 > + Extra blank lines?
James Robinson
Comment 7 2010-10-05 18:09:12 PDT
(In reply to comment #6) > (From update of attachment 69869 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=69869&action=review > > <3 > > > WebCore/dom/QualifiedName.h:63 > > - ~QualifiedName() { deref(); } > > + ~QualifiedName(); > > You sure there's no perf hit from making this non-inline? QualifiedName::deref is not defined inline (either with this patch or without it) so the compiler can't do any useful inlining here anyway. > > > WebCore/platform/network/HTTPHeaderMap.cpp:80 > > + > > Extra blank lines? Whoops!
Adam Barth
Comment 8 2010-10-05 18:12:58 PDT
(In reply to comment #7) > (In reply to comment #6) > > (From update of attachment 69869 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=69869&action=review > > > > <3 > > > > > WebCore/dom/QualifiedName.h:63 > > > - ~QualifiedName() { deref(); } > > > + ~QualifiedName(); > > > > You sure there's no perf hit from making this non-inline? > > QualifiedName::deref is not defined inline (either with this patch or without it) so the compiler can't do any useful inlining here anyway. This doesn't add an extra jmp? QualifiedNames get created and destroyed very, very often.
James Robinson
Comment 9 2010-10-05 18:22:52 PDT
The compiler is free to inline the body of QN::deref() into the destructor in QualifiedName.o since they are in the same translation unit. External callers (i.e. those in different translation units) will still do one jump, either to ~QualifiedName or to deref. I'll double-check this change in isolation on the html parser benchmark to be sure.
Eric Seidel (no email)
Comment 10 2010-10-05 18:48:23 PDT
Darin Adler
Comment 11 2010-10-06 08:26:23 PDT
(In reply to comment #9) > The compiler is free to inline the body of QN::deref() into the destructor in QualifiedName.o since they are in the same translation unit. External callers (i.e. those in different translation units) will still do one jump, either to ~QualifiedName or to deref. I'll double-check this change in isolation on the html parser benchmark to be sure. I’m pretty sure we inlined that because of a measurable speedup.
Darin Adler
Comment 12 2010-10-06 08:27:20 PDT
Comment on attachment 69869 [details] applies to ToT The added explicit keywords are great, but unrelated. I think this goes overboard. Some of these inlined functions are more efficient than if they were compiled separately. Measuring the benefit by looking at object file sizes and not at end-user phenomena such as the total executable size or the speed of WebKit means you are seeing only part of the picture. There’s a good chance this hurt performance, and I don’t see any evidence you measured performance. It’s also inelegant to declare lots of empty destructors when the compiler will already do the right thing by default. There needs to be a significant benefit, because the explicit destructors make the classes more wordy, which hurts readability.
Darin Adler
Comment 13 2010-10-06 08:30:23 PDT
(In reply to comment #2) > One particularly nasty anti-pattern I've seen a lot in our codebase is this: > > Foo.h: > class Foo { > public: > virtual ~Foo(); > }; > > Bar.h: > class Bar : public Foo { > HashSet<String> m_set; > }; > > > With this sort of construct, the implicitly defined destructor for Foo is placed in the header and inlines the very large HashSet destructor into every object file that #includes Bar.h. Since Bar::~Bar is virtual the compiler can't inline this code even if it wanted to. Thanks to this we end up with 800+ copies of common destructors spread across object files across the system. I'd like it if we had some sort of automated system to detect this antipattern, if possible. But this is not an anti-pattern. It’s normal elegant C++ usage. However, if the compilers deal particularly poorly with it, then it might be OK to forbid it; that is a new development though, and not a well known anti-pattern. Further, if Bar is a base class and others derive from it, then compiler *can* inline the code for the destructor in the destructors for those derived classes. So it’s wrong to say that “it’s virtual therefore can never be inlined”. But leaving an explicit destructor out is not a request to inline. It’s just a normal use of the language. It’s unfortunate that the development tools are penalizing us for using the more-elegant, more-normal syntax for declaring a class that requires not additional special destruction code. Most classes do not require special destruction code, and it’s an anti-pattern to explicitly define empty constructors and destructors in all those cases.
Darin Adler
Comment 14 2010-10-06 10:48:12 PDT
Comment on attachment 69869 [details] applies to ToT View in context: https://bugs.webkit.org/attachment.cgi?id=69869&action=review >>>> WebCore/dom/QualifiedName.h:63 >>>> + ~QualifiedName(); >>> >>> You sure there's no perf hit from making this non-inline? >> >> QualifiedName::deref is not defined inline (either with this patch or without it) so the compiler can't do any useful inlining here anyway. > > This doesn't add an extra jmp? QualifiedNames get created and destroyed very, very often. Yes, this does add an extra level function call and return, which has some performance cost.
Darin Adler
Comment 15 2010-10-06 10:49:40 PDT
(In reply to comment #9) > The compiler is free to inline the body of QN::deref() into the destructor in QualifiedName.o since they are in the same translation unit. External callers (i.e. those in different translation units) will still do one jump, either to ~QualifiedName or to deref. I'll double-check this change in isolation on the html parser benchmark to be sure. Some compilers do this sort of optimization and others don’t. I am pretty sure that the Mac OS X version of WebKit does not invoke gcc at the optimization level where it inlines functions that are not marked inline. So for this particular change there may be no performance difference for Chromium but there may be a slowdown on other platforms because of tool chain differences.
James Robinson
Comment 16 2010-10-06 15:13:24 PDT
(In reply to comment #13) > (In reply to comment #2) > > One particularly nasty anti-pattern I've seen a lot in our codebase is this: > > > > Foo.h: > > class Foo { > > public: > > virtual ~Foo(); > > }; > > > > Bar.h: > > class Bar : public Foo { > > HashSet<String> m_set; > > }; > > > > > > With this sort of construct, the implicitly defined destructor for Foo is placed in the header and inlines the very large HashSet destructor into every object file that #includes Bar.h. Since Bar::~Bar is virtual the compiler can't inline this code even if it wanted to. Thanks to this we end up with 800+ copies of common destructors spread across object files across the system. I'd like it if we had some sort of automated system to detect this antipattern, if possible. > > But this is not an anti-pattern. It’s normal elegant C++ usage. However, if the compilers deal particularly poorly with it, then it might be OK to forbid it; that is a new development though, and not a well known anti-pattern. > > Further, if Bar is a base class and others derive from it, then compiler *can* inline the code for the destructor in the destructors for those derived classes. So it’s wrong to say that “it’s virtual therefore can never be inlined”. In WebKit our type hierarchies tend to be very wide but not very deep. For example, there are lots of subclasses of Event but relatively few subclasses of subclasses of Event. > > But leaving an explicit destructor out is not a request to inline. It’s just a normal use of the language. It’s unfortunate that the development tools are penalizing us for using the more-elegant, more-normal syntax for declaring a class that requires not additional special destruction code. Most classes do not require special destruction code, and it’s an anti-pattern to explicitly define empty constructors and destructors in all those cases. The issue is where the definition of the destructor ends up (in the header or in the cpp). We happen to have lots of very large destructors (thanks mostly to our WTF types having very large destructors defined in header files) and we tend to include our headers in many translation units. This means the default behavior in this case is to put the definition of these large destructors into hundreds of object files. It could be that this is fine in most cases (in fact the linker is good at discarding duplicate copies of the same function) but for WebKit it's causing us trouble since we can't actually link at all in some 32-bit toolchains.
James Robinson
Comment 17 2010-10-06 15:14:51 PDT
(In reply to comment #15) > (In reply to comment #9) > > The compiler is free to inline the body of QN::deref() into the destructor in QualifiedName.o since they are in the same translation unit. External callers (i.e. those in different translation units) will still do one jump, either to ~QualifiedName or to deref. I'll double-check this change in isolation on the html parser benchmark to be sure. > > Some compilers do this sort of optimization and others don’t. I am pretty sure that the Mac OS X version of WebKit does not invoke gcc at the optimization level where it inlines functions that are not marked inline. > > So for this particular change there may be no performance difference for Chromium but there may be a slowdown on other platforms because of tool chain differences. I'm running performance tests on the Mac platform as well for this particular change to see if there's any difference (although I'm doing them in Release instead of Production as I can't figure out how to make the Production version work). If there is a concern about ~QualifiedName in particular I'm happy to leave that change out, as it's a relatively small impact compared to the other changes
James Robinson
Comment 18 2010-10-06 15:26:25 PDT
(In reply to comment #12) > (From update of attachment 69869 [details]) > The added explicit keywords are great, but unrelated. > > I think this goes overboard. Some of these inlined functions are more efficient than if they were compiled separately. Measuring the benefit by looking at object file sizes and not at end-user phenomena such as the total executable size or the speed of WebKit means you are seeing only part of the picture. > For context we are having serious issues in Chromium trying to successfully link on systems with a 32-bit toolchain without running out of virtual address space. The biggest issue with linking is object file size and the largest source of object file bloat in the chromium link is WebCore. We gave up on trying to link on Leopard a few months back since Snow Leopard provides a 64-bit toolchain, but there is no 64-bit toolchain available on windows and it's not clear when or if there ever will be. Other ports may not be hitting hard limits yet, but the memory and time requirements are not fun for anyone. I'm specifically making changes that have little or no impact on the final binary, only on the resource requirements in order to compile. These are easy wins. > There’s a good chance this hurt performance, and I don’t see any evidence you measured performance. I've been looking at benchmarks as well as disassembly of the object files in question to verify that I'm not regressing performance. If there was something to report I'd note it in the bug but I haven't seen anything change yet (nor would I expect to). > > It’s also inelegant to declare lots of empty destructors when the compiler will already do the right thing by default. There needs to be a significant benefit, because the explicit destructors make the classes more wordy, which hurts readability. Well, it's C++ - rarely accused of being overly elegant.
Darin Adler
Comment 19 2010-10-06 16:01:08 PDT
(In reply to comment #17) > If there is a concern about ~QualifiedName in particular I'm happy to leave that change out, as it's a relatively small impact compared to the other changes Which are the changes with the biggest impact? Can we do these changes in multiple smaller patches? That will be helpful if later we find they cause performance regressions, so we can pinpoint which one had that effect.
James Robinson
Comment 20 2010-10-06 17:01:47 PDT
(In reply to comment #19) > (In reply to comment #17) > > If there is a concern about ~QualifiedName in particular I'm happy to leave that change out, as it's a relatively small impact compared to the other changes > > Which are the changes with the biggest impact? Can we do these changes in multiple smaller patches? That will be helpful if later we find they cause performance regressions, so we can pinpoint which one had that effect. The most impactful changes are those to EventTarget and Element - moving those four virtual event handler functions from being defined in the header to being declared in the header and defined in the .cpp is a ~4.5mb improvement in debug object size. I compared the performance with and without this patch on Safari with a release (not production) WebKit build on my slightly crappy core 2 duo Leopard MacBook Pro. The results are actually a little surprising: with patch: avg 4016.6 stdev 65.28506720529589 avg 4280.1 stdev 228.23165862780738 avg 4021.25 stdev 60.233607728576246 avg 4025.05 stdev 70.40488264318037 avg 4015.05 stdev 71.23445444446108 avg 4007.95 stdev 63.84158127740885 overall avg (excluding outlier): 4017.7 without patch: avg 4144.9 stdev 66.55741882014357 avg 4131.95 stdev 71.22041491033312 avg 4128.5 stdev 56.01383757608472 avg 4145.15 stdev 72.71126116359144 avg 4132.2 stdev 68.45845455456907 overall avg: 4136.54 diff: 188, % diff = 2.87 As you can see excluding one outlier with the patch (notice the stdev, something hiccuped on my box on that run) this patch is a nearly 3% speedup. The speedup is quite consistent and well within the error range of this test, so I believe it is real. I have to admit I don't fully understand why this is. My best guess it that the linker on this system is not very good at dealing with the duplicate symbols and is either failing to discard the duplicates or (more likely) that the duplicates are hurting code locality. I'll try to investigate further as I have time as it's pretty interesting that we can get such a speedup. Maybe we can get more. I don't know for certain if the production build settings will produce a similar speedup or not and I'm not sure how to test that locally (or even if I can). Either way, though, I think it's pretty safe to say that this won't be a perf regression.
James Robinson
Comment 21 2010-10-06 18:20:45 PDT
Created attachment 70022 [details] include WebCore.exp.in changes so this builds on mac
Adam Barth
Comment 22 2010-10-06 22:24:45 PDT
Comment on attachment 70022 [details] include WebCore.exp.in changes so this builds on mac I'm still in favor of this change. It's hard to argue with a 3% speedup that also reduces code size and makes it possible to build the project on a 32 bit machine.
WebKit Commit Bot
Comment 23 2010-10-07 14:20:58 PDT
Comment on attachment 70022 [details] include WebCore.exp.in changes so this builds on mac Rejecting patch 70022 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', 'apply-attachment', '--force-clean', '--non-interactive', '--quiet', 70022]" exit_code: 2 Cleaning working directory Updating working directory Logging in as commit-queue@webkit.org... Fetching: https://bugs.webkit.org/attachment.cgi?id=70022&action=edit Fetching: https://bugs.webkit.org/show_bug.cgi?id=47230&ctype=xml Processing 1 patch from 1 bug. Processing patch 70022 from bug 47230. Failed to run "[u'/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Adam Barth', u'--force']" exit_code: 1 Full output: http://queues.webkit.org/results/4216136
James Robinson
Comment 24 2010-10-08 18:17:31 PDT
James Robinson
Comment 25 2010-10-08 18:18:14 PDT
Comment on attachment 70327 [details] Patch Merged up to ToT, gonna give the commit queue one more shot.
Darin Adler
Comment 26 2010-10-08 18:21:50 PDT
If I was making this change and had measured a speedup when most of the patch should be entirely performance-neutral, I would want to separate the part that definitely has no effect on performance from any part that might be speeding things up. If we find out what is making things faster, then perhaps we can take advantage of it elsewhere. There’s also the chance that part of the patch makes things faster and part makes it slower, in which case we want to keep the faster part.
WebKit Commit Bot
Comment 27 2010-10-08 19:00:18 PDT
Comment on attachment 70327 [details] Patch Clearing flags on attachment: 70327 Committed r69437: <http://trac.webkit.org/changeset/69437>
WebKit Commit Bot
Comment 28 2010-10-08 19:00:25 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.