NEW 59348
Find large header files
https://bugs.webkit.org/show_bug.cgi?id=59348
Summary Find large header files
Nico Weber
Reported 2011-04-25 14:48:20 PDT
Approach: 0.) Get a list of all interesting header files in webkit 1.) Port http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/tools/include_tracer.py&q=include_tracer.py&exact_package=chromium&sa=N&cd=1&ct=rc to webkit (this gives the "effective" file size of a header file) 2.) Have some simple grep command that counts how often a given header file is included 3.) run 1 & 2 for every file in 0, sort by file_size * num_includes 4.) Look at the files at the top of this list, make them smaller / cut dependencies
Attachments
Script that estimates effective header size (6.45 KB, text/x-python-script)
2011-04-25 15:39 PDT, Nico Weber
no flags
slightly better script (6.51 KB, text/x-python-script)
2011-04-25 15:41 PDT, Nico Weber
no flags
h file sizes (276.04 KB, text/plain)
2011-04-25 15:57 PDT, Nico Weber
no flags
Patch (8.81 KB, patch)
2011-04-25 16:17 PDT, Eric Seidel (no email)
no flags
find-includes.sh (141 bytes, application/octet-stream)
2011-04-25 16:18 PDT, Nico Weber
no flags
Patch (9.02 KB, patch)
2011-04-25 16:19 PDT, Eric Seidel (no email)
ojan: review-
ojan: commit-queue-
join script (281 bytes, text/x-python-script)
2011-04-25 16:19 PDT, Nico Weber
no flags
include-tracer with -c flag (6.71 KB, text/x-python-script)
2011-04-25 20:10 PDT, Nico Weber
no flags
h include counts (360.55 KB, text/plain)
2011-04-25 20:10 PDT, Nico Weber
no flags
new join script (382 bytes, text/x-python-script)
2011-04-25 20:11 PDT, Nico Weber
no flags
header files with file size and number of translation units (311.62 KB, text/plain)
2011-04-25 20:12 PDT, Nico Weber
no flags
sizes of translation units (243.07 KB, text/plain)
2011-04-25 20:13 PDT, Nico Weber
no flags
Nico Weber
Comment 1 2011-04-25 14:55:30 PDT
Step 0: git ls-files --full-name *.h
Nico Weber
Comment 2 2011-04-25 15:29:16 PDT
Step 2: #!/bin/bash f=$(basename $1 | sed -e 's:\.:\\.:') git grep -n -e "^#include \"${f}\"$" -- "*.cpp" "*.h" | wc -l Takes ~6 seconds per .h file though :-/ We have 4832 .h files, so it'd take 8h to get the includes. There's probably some better way.
Nico Weber
Comment 3 2011-04-25 15:39:10 PDT
Created attachment 90962 [details] Script that estimates effective header size step 1
Nico Weber
Comment 4 2011-04-25 15:41:43 PDT
Created attachment 90965 [details] slightly better script First half of step 3: for f in $(git ls-files --full-name *.h); do Tools/Scripts/include-tracer.py -q $f; done
Nico Weber
Comment 5 2011-04-25 15:57:02 PDT
Created attachment 90974 [details] h file sizes Output of (for f in $(git ls-files --full-name *.h); do Tools/Scripts/include-tracer.py -q $f; done) | tee h-sizes.txt Top 10: thakis$ sort -r -k 2 -n h-sizes.txt | head Source/WebCore/rendering/RenderView.h: 1.538 MiB Source/WebKit/qt/WebCoreSupport/PageClientQt.h: 1.447 MiB Source/WebCore/page/FrameView.h: 1.390 MiB Source/WebKit/gtk/WebCoreSupport/EditorClientGtk.h: 1.300 MiB Source/WebCore/loader/EmptyClients.h: 1.291 MiB Source/WebCore/accessibility/AccessibilityMediaControls.h: 1.192 MiB Source/WebCore/rendering/RenderLayerCompositor.h: 1.171 MiB Source/WebCore/platform/efl/RenderThemeEfl.h: 1.153 MiB Source/WebCore/rendering/RenderMediaControlsChromium.h: 1.122 MiB Source/WebCore/rendering/RenderMediaControls.h: 1.122 MiB Turns out there are 89 .h files that evaluate to more than 1MB!
Eric Seidel (no email)
Comment 6 2011-04-25 16:17:22 PDT
Nico Weber
Comment 7 2011-04-25 16:18:30 PDT
Created attachment 90991 [details] find-includes.sh I'm currently running (for f in $(git ls-files --full-name *.h); do ./find-includes.sh $f; done) | tee h-counts.txt which will take a few hours to complete. find-includes.sh is the 2-line script that counts how often a .h is included, I mentioned it somewhere above; also attached.
Eric Seidel (no email)
Comment 8 2011-04-25 16:19:00 PDT
Nico Weber
Comment 9 2011-04-25 16:19:30 PDT
Created attachment 90994 [details] join script Once that other command is completed, this script can be used to combine the two outputs.
Mihai Parparita
Comment 10 2011-04-25 16:35:31 PDT
See also bug 52451 where Tony looked at header files included everywhere and the benefits of forward declaration.
Nico Weber
Comment 11 2011-04-25 20:09:43 PDT
The command completed, but I realized it's not very helpful as it double-counts header files. We really only want to know how many different translation units include a header, since including a .h twice in a translation unit costs the same as including it once. To do this, I added a -c option to the include-tracer script that just prints all header files that the script finds and then used this to find how many translation unit include each .h file like this: time (rm cpp-include-counts.txt time (for f in $(git ls-files --full-name *.cpp); do Tools/Scripts/include-tracer.py -c $f; done) | tee -a cpp-include-counts.txt) sort cpp-include-counts.txt | uniq -c > cpp-include-counts-processed.txt I then used a modified join script to combine this with the header file sizes file. The top files are: thakis$ python join-cpp.py | sort -k 4 -r -n | head -20 Source/WebCore/rendering/RenderObject.h: 0.883 559 493.597000 Source/WebCore/page/Frame.h: 0.746 653 487.138000 Source/WebCore/bindings/js/ScriptValue.h: 0.578 829 479.162000 Source/WebCore/rendering/style/RenderStyle.h: 0.646 597 385.662000 Source/WebCore/page/FrameView.h: 1.390 237 329.430000 Source/WebCore/bindings/js/ScriptSourceCode.h: 0.378 842 318.276000 Source/WebCore/dom/Element.h: 0.292 1084 316.528000 Source/WebCore/bindings/v8/V8Proxy.h: 0.362 841 304.442000 Source/WebCore/bindings/v8/ScriptValue.h: 0.366 829 303.414000 Source/WebCore/rendering/RenderBoxModelObject.h: 0.894 331 295.914000 Source/WebCore/loader/FrameLoader.h: 0.443 659 291.937000 Source/WebCore/dom/Document.h: 0.220 1307 287.540000 Source/WebCore/rendering/RenderBox.h: 0.924 301 278.124000 Source/WebCore/bindings/v8/ScriptController.h: 0.377 663 249.951000 Source/WebCore/rendering/RenderText.h: 0.891 260 231.660000 Source/WebCore/dom/StyledElement.h: 0.296 772 228.512000 Source/WebCore/rendering/InlineBox.h: 0.917 247 226.499000 Source/WebCore/rendering/RenderBR.h: 0.894 249 222.606000 Source/WebCore/bindings/js/ScriptController.h: 0.327 663 216.801000 Source/WebCore/rendering/RenderBlock.h: 1.016 206 209.296000 (.h filename, size of the .h with all its includes resolved in MiB, number of translation units including that .h, product of the previous 2 numbers.) I also ran the include tracer on all cpp files, to find the biggest cpp files: thakis$ sort cpp-sizes.txt -k 2 -r -n | head -20 Source/WebCore/rendering/RenderingAllInOne.cpp: 4.851 MiB Source/WebCore/dom/DOMAllInOne.cpp: 3.898 MiB Source/WebCore/bindings/js/JSBindingsAllInOne.cpp: 3.624 MiB Source/WebCore/html/HTMLElementsAllInOne.cpp: 3.396 MiB Source/WebCore/svg/SVGAllInOne.cpp: 3.287 MiB Source/WebKit/chromium/src/WebViewImpl.cpp: 2.868 MiB Source/WebKit/chromium/src/WebFrameImpl.cpp: 2.687 MiB Source/WebCore/rendering/svg/RenderSVGAllInOne.cpp: 2.616 MiB Source/WebCore/dom/Document.cpp: 2.549 MiB Source/WebCore/accessibility/AccessibilityAllInOne.cpp: 2.357 MiB Source/WebKit/qt/Api/qwebpage.cpp: 2.331 MiB Source/WebKit/chromium/src/FrameLoaderClientImpl.cpp: 2.299 MiB Source/WebKit/chromium/src/ChromeClientImpl.cpp: 2.264 MiB Source/WebKit/chromium/src/WebMediaPlayerClientImpl.cpp: 2.260 MiB Source/WebKit/qt/WebCoreSupport/DumpRenderTreeSupportQt.cpp: 2.235 MiB Source/WebCore/page/EventHandler.cpp: 2.227 MiB Source/WebKit/chromium/src/PlatformBridge.cpp: 2.191 MiB Source/WebKit/chromium/src/WebPluginContainerImpl.cpp: 2.186 MiB Source/WebKit/chromium/src/ContextMenuClientImpl.cpp: 2.179 MiB Source/WebCore/page/Frame.cpp: 2.167 MiB I will attach all the scripts I used and all data files I produced.
Nico Weber
Comment 12 2011-04-25 20:10:17 PDT
Created attachment 91040 [details] include-tracer with -c flag
Nico Weber
Comment 13 2011-04-25 20:10:51 PDT
Created attachment 91042 [details] h include counts
Nico Weber
Comment 14 2011-04-25 20:11:14 PDT
Created attachment 91043 [details] new join script
Nico Weber
Comment 15 2011-04-25 20:12:38 PDT
Created attachment 91044 [details] header files with file size and number of translation units
Nico Weber
Comment 16 2011-04-25 20:13:08 PDT
Created attachment 91045 [details] sizes of translation units
Adam Barth
Comment 17 2011-04-26 13:51:53 PDT
Comment on attachment 90992 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review > Tools/Scripts/include-tracer:30 > +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala) Which is available under BSD? It would be nice to have a link to the license here. > Tools/Scripts/include-tracer:42 > +# FIXME: This should be a per-port list. Right now this is Apple-Mac only. > +INCLUDE_PATHS = [ This is pretty lame. > Tools/Scripts/include-tracer:150 > + return total_bytes Otherwise known as 0? > Tools/Scripts/include-tracer:159 > + # Skip system includes. > + if filename[0] == '<': > + return total_bytes This will also skip wtf, right? > Tools/Scripts/include-tracer:170 > + lines = open(resolved_filename).readlines() We should use "with" with open to make sure we don't leak. > Tools/Scripts/include-tracer:183 > + if line.startswith('#include "'): > + total_bytes += self._walk(seen, line.split('"')[1], resolved_filename, indent + 2) > + elif line.startswith('#include '): > + include = '<' + line.split('<')[1].split('>')[0] + '>' > + total_bytes += self._walk(seen, include, resolved_filename, indent + 2) Would this be clearer with a regular expression?
David Levin
Comment 18 2011-04-26 13:56:34 PDT
Comment on attachment 90992 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review Looks like Adam got it but here's comment anyway :). > Tools/Scripts/include-tracer:192 > + self._be_quiet = True Seems like quiet would be the default (and verbose would be an option).
Ojan Vafai
Comment 19 2011-04-26 17:05:22 PDT
Comment on attachment 90992 [details] Patch R- per abarth's comments
Eric Seidel (no email)
Comment 20 2011-04-26 18:08:17 PDT
(In reply to comment #17) > (From update of attachment 90992 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review > > > Tools/Scripts/include-tracer:30 > > +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala) > > Which is available under BSD? It would be nice to have a link to the license here. Which was previously google internal and covered under google copyright, thus as a google submitter licensed however we'd like here. :)
Nico Weber
Comment 21 2011-04-26 18:34:10 PDT
(In reply to comment #20) > (In reply to comment #17) > > (From update of attachment 90992 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review > > > > > Tools/Scripts/include-tracer:30 > > > +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala) > > > > Which is available under BSD? It would be nice to have a link to the license here. > > Which was previously google internal and covered under google copyright, thus as a google submitter licensed however we'd like here. :) This script is based on a similar script from the chromium tree that has chromium's license (e.g. bsd style): http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/tools/include_tracer.py&q=include_tracer.py&exact_package=chromium&sa=N&cd=1&ct=rc
Note You need to log in before you can comment on or make changes to this bug.