Approach: 0.) Get a list of all interesting header files in webkit 1.) Port http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/tools/include_tracer.py&q=include_tracer.py&exact_package=chromium&sa=N&cd=1&ct=rc to webkit (this gives the "effective" file size of a header file) 2.) Have some simple grep command that counts how often a given header file is included 3.) run 1 & 2 for every file in 0, sort by file_size * num_includes 4.) Look at the files at the top of this list, make them smaller / cut dependencies
Step 0: git ls-files --full-name *.h
Step 2: #!/bin/bash f=$(basename $1 | sed -e 's:\.:\\.:') git grep -n -e "^#include \"${f}\"$" -- "*.cpp" "*.h" | wc -l Takes ~6 seconds per .h file though :-/ We have 4832 .h files, so it'd take 8h to get the includes. There's probably some better way.
Created attachment 90962 [details] Script that estimates effective header size step 1
Created attachment 90965 [details] slightly better script First half of step 3: for f in $(git ls-files --full-name *.h); do Tools/Scripts/include-tracer.py -q $f; done
Created attachment 90974 [details] h file sizes Output of (for f in $(git ls-files --full-name *.h); do Tools/Scripts/include-tracer.py -q $f; done) | tee h-sizes.txt Top 10: thakis$ sort -r -k 2 -n h-sizes.txt | head Source/WebCore/rendering/RenderView.h: 1.538 MiB Source/WebKit/qt/WebCoreSupport/PageClientQt.h: 1.447 MiB Source/WebCore/page/FrameView.h: 1.390 MiB Source/WebKit/gtk/WebCoreSupport/EditorClientGtk.h: 1.300 MiB Source/WebCore/loader/EmptyClients.h: 1.291 MiB Source/WebCore/accessibility/AccessibilityMediaControls.h: 1.192 MiB Source/WebCore/rendering/RenderLayerCompositor.h: 1.171 MiB Source/WebCore/platform/efl/RenderThemeEfl.h: 1.153 MiB Source/WebCore/rendering/RenderMediaControlsChromium.h: 1.122 MiB Source/WebCore/rendering/RenderMediaControls.h: 1.122 MiB Turns out there are 89 .h files that evaluate to more than 1MB!
Created attachment 90990 [details] Patch
Created attachment 90991 [details] find-includes.sh I'm currently running (for f in $(git ls-files --full-name *.h); do ./find-includes.sh $f; done) | tee h-counts.txt which will take a few hours to complete. find-includes.sh is the 2-line script that counts how often a .h is included, I mentioned it somewhere above; also attached.
Created attachment 90992 [details] Patch
Created attachment 90994 [details] join script Once that other command is completed, this script can be used to combine the two outputs.
See also bug 52451 where Tony looked at header files included everywhere and the benefits of forward declaration.
The command completed, but I realized it's not very helpful as it double-counts header files. We really only want to know how many different translation units include a header, since including a .h twice in a translation unit costs the same as including it once. To do this, I added a -c option to the include-tracer script that just prints all header files that the script finds and then used this to find how many translation unit include each .h file like this: time (rm cpp-include-counts.txt time (for f in $(git ls-files --full-name *.cpp); do Tools/Scripts/include-tracer.py -c $f; done) | tee -a cpp-include-counts.txt) sort cpp-include-counts.txt | uniq -c > cpp-include-counts-processed.txt I then used a modified join script to combine this with the header file sizes file. The top files are: thakis$ python join-cpp.py | sort -k 4 -r -n | head -20 Source/WebCore/rendering/RenderObject.h: 0.883 559 493.597000 Source/WebCore/page/Frame.h: 0.746 653 487.138000 Source/WebCore/bindings/js/ScriptValue.h: 0.578 829 479.162000 Source/WebCore/rendering/style/RenderStyle.h: 0.646 597 385.662000 Source/WebCore/page/FrameView.h: 1.390 237 329.430000 Source/WebCore/bindings/js/ScriptSourceCode.h: 0.378 842 318.276000 Source/WebCore/dom/Element.h: 0.292 1084 316.528000 Source/WebCore/bindings/v8/V8Proxy.h: 0.362 841 304.442000 Source/WebCore/bindings/v8/ScriptValue.h: 0.366 829 303.414000 Source/WebCore/rendering/RenderBoxModelObject.h: 0.894 331 295.914000 Source/WebCore/loader/FrameLoader.h: 0.443 659 291.937000 Source/WebCore/dom/Document.h: 0.220 1307 287.540000 Source/WebCore/rendering/RenderBox.h: 0.924 301 278.124000 Source/WebCore/bindings/v8/ScriptController.h: 0.377 663 249.951000 Source/WebCore/rendering/RenderText.h: 0.891 260 231.660000 Source/WebCore/dom/StyledElement.h: 0.296 772 228.512000 Source/WebCore/rendering/InlineBox.h: 0.917 247 226.499000 Source/WebCore/rendering/RenderBR.h: 0.894 249 222.606000 Source/WebCore/bindings/js/ScriptController.h: 0.327 663 216.801000 Source/WebCore/rendering/RenderBlock.h: 1.016 206 209.296000 (.h filename, size of the .h with all its includes resolved in MiB, number of translation units including that .h, product of the previous 2 numbers.) I also ran the include tracer on all cpp files, to find the biggest cpp files: thakis$ sort cpp-sizes.txt -k 2 -r -n | head -20 Source/WebCore/rendering/RenderingAllInOne.cpp: 4.851 MiB Source/WebCore/dom/DOMAllInOne.cpp: 3.898 MiB Source/WebCore/bindings/js/JSBindingsAllInOne.cpp: 3.624 MiB Source/WebCore/html/HTMLElementsAllInOne.cpp: 3.396 MiB Source/WebCore/svg/SVGAllInOne.cpp: 3.287 MiB Source/WebKit/chromium/src/WebViewImpl.cpp: 2.868 MiB Source/WebKit/chromium/src/WebFrameImpl.cpp: 2.687 MiB Source/WebCore/rendering/svg/RenderSVGAllInOne.cpp: 2.616 MiB Source/WebCore/dom/Document.cpp: 2.549 MiB Source/WebCore/accessibility/AccessibilityAllInOne.cpp: 2.357 MiB Source/WebKit/qt/Api/qwebpage.cpp: 2.331 MiB Source/WebKit/chromium/src/FrameLoaderClientImpl.cpp: 2.299 MiB Source/WebKit/chromium/src/ChromeClientImpl.cpp: 2.264 MiB Source/WebKit/chromium/src/WebMediaPlayerClientImpl.cpp: 2.260 MiB Source/WebKit/qt/WebCoreSupport/DumpRenderTreeSupportQt.cpp: 2.235 MiB Source/WebCore/page/EventHandler.cpp: 2.227 MiB Source/WebKit/chromium/src/PlatformBridge.cpp: 2.191 MiB Source/WebKit/chromium/src/WebPluginContainerImpl.cpp: 2.186 MiB Source/WebKit/chromium/src/ContextMenuClientImpl.cpp: 2.179 MiB Source/WebCore/page/Frame.cpp: 2.167 MiB I will attach all the scripts I used and all data files I produced.
Created attachment 91040 [details] include-tracer with -c flag
Created attachment 91042 [details] h include counts
Created attachment 91043 [details] new join script
Created attachment 91044 [details] header files with file size and number of translation units
Created attachment 91045 [details] sizes of translation units
Comment on attachment 90992 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review > Tools/Scripts/include-tracer:30 > +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala) Which is available under BSD? It would be nice to have a link to the license here. > Tools/Scripts/include-tracer:42 > +# FIXME: This should be a per-port list. Right now this is Apple-Mac only. > +INCLUDE_PATHS = [ This is pretty lame. > Tools/Scripts/include-tracer:150 > + return total_bytes Otherwise known as 0? > Tools/Scripts/include-tracer:159 > + # Skip system includes. > + if filename[0] == '<': > + return total_bytes This will also skip wtf, right? > Tools/Scripts/include-tracer:170 > + lines = open(resolved_filename).readlines() We should use "with" with open to make sure we don't leak. > Tools/Scripts/include-tracer:183 > + if line.startswith('#include "'): > + total_bytes += self._walk(seen, line.split('"')[1], resolved_filename, indent + 2) > + elif line.startswith('#include '): > + include = '<' + line.split('<')[1].split('>')[0] + '>' > + total_bytes += self._walk(seen, include, resolved_filename, indent + 2) Would this be clearer with a regular expression?
Comment on attachment 90992 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review Looks like Adam got it but here's comment anyway :). > Tools/Scripts/include-tracer:192 > + self._be_quiet = True Seems like quiet would be the default (and verbose would be an option).
Comment on attachment 90992 [details] Patch R- per abarth's comments
(In reply to comment #17) > (From update of attachment 90992 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review > > > Tools/Scripts/include-tracer:30 > > +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala) > > Which is available under BSD? It would be nice to have a link to the license here. Which was previously google internal and covered under google copyright, thus as a google submitter licensed however we'd like here. :)
(In reply to comment #20) > (In reply to comment #17) > > (From update of attachment 90992 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review > > > > > Tools/Scripts/include-tracer:30 > > > +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala) > > > > Which is available under BSD? It would be nice to have a link to the license here. > > Which was previously google internal and covered under google copyright, thus as a google submitter licensed however we'd like here. :) This script is based on a similar script from the chromium tree that has chromium's license (e.g. bsd style): http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/tools/include_tracer.py&q=include_tracer.py&exact_package=chromium&sa=N&cd=1&ct=rc