Bug 59348 - Find large header files
Summary: Find large header files
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-25 14:48 PDT by Nico Weber
Modified: 2022-06-14 12:55 PDT (History)
11 users (show)

See Also:


Attachments
Script that estimates effective header size (6.45 KB, text/x-python-script)
2011-04-25 15:39 PDT, Nico Weber
no flags Details
slightly better script (6.51 KB, text/x-python-script)
2011-04-25 15:41 PDT, Nico Weber
no flags Details
h file sizes (276.04 KB, text/plain)
2011-04-25 15:57 PDT, Nico Weber
no flags Details
Patch (8.81 KB, patch)
2011-04-25 16:17 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
find-includes.sh (141 bytes, application/octet-stream)
2011-04-25 16:18 PDT, Nico Weber
no flags Details
Patch (9.02 KB, patch)
2011-04-25 16:19 PDT, Eric Seidel (no email)
ojan: review-
ojan: commit-queue-
Details | Formatted Diff | Diff
join script (281 bytes, text/x-python-script)
2011-04-25 16:19 PDT, Nico Weber
no flags Details
include-tracer with -c flag (6.71 KB, text/x-python-script)
2011-04-25 20:10 PDT, Nico Weber
no flags Details
h include counts (360.55 KB, text/plain)
2011-04-25 20:10 PDT, Nico Weber
no flags Details
new join script (382 bytes, text/x-python-script)
2011-04-25 20:11 PDT, Nico Weber
no flags Details
header files with file size and number of translation units (311.62 KB, text/plain)
2011-04-25 20:12 PDT, Nico Weber
no flags Details
sizes of translation units (243.07 KB, text/plain)
2011-04-25 20:13 PDT, Nico Weber
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nico Weber 2011-04-25 14:48:20 PDT
Approach:

0.) Get a list of all interesting header files in webkit
1.) Port http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/tools/include_tracer.py&q=include_tracer.py&exact_package=chromium&sa=N&cd=1&ct=rc to webkit (this gives the "effective" file size of a header file)
2.) Have some simple grep command that counts how often a given header file is included
3.) run 1 & 2 for every file in 0, sort by file_size * num_includes
4.) Look at the files at the top of this list, make them smaller / cut dependencies
Comment 1 Nico Weber 2011-04-25 14:55:30 PDT
Step 0:  git ls-files --full-name *.h
Comment 2 Nico Weber 2011-04-25 15:29:16 PDT
Step 2:

#!/bin/bash
f=$(basename $1 | sed -e 's:\.:\\.:')
git grep -n -e "^#include \"${f}\"$" -- "*.cpp" "*.h" | wc -l

Takes ~6 seconds per .h file though :-/ We have 4832 .h files, so it'd take 8h to get the includes. There's probably some better way.
Comment 3 Nico Weber 2011-04-25 15:39:10 PDT
Created attachment 90962 [details]
Script that estimates effective header size

step 1
Comment 4 Nico Weber 2011-04-25 15:41:43 PDT
Created attachment 90965 [details]
slightly better script

First half of step 3:

  for f in $(git ls-files --full-name *.h); do Tools/Scripts/include-tracer.py -q $f; done
Comment 5 Nico Weber 2011-04-25 15:57:02 PDT
Created attachment 90974 [details]
h file sizes

Output of

   (for f in $(git ls-files --full-name *.h); do Tools/Scripts/include-tracer.py -q $f; done) | tee h-sizes.txt


Top 10:


thakis$ sort -r -k 2 -n h-sizes.txt | head 
Source/WebCore/rendering/RenderView.h: 1.538 MiB
Source/WebKit/qt/WebCoreSupport/PageClientQt.h: 1.447 MiB
Source/WebCore/page/FrameView.h: 1.390 MiB
Source/WebKit/gtk/WebCoreSupport/EditorClientGtk.h: 1.300 MiB
Source/WebCore/loader/EmptyClients.h: 1.291 MiB
Source/WebCore/accessibility/AccessibilityMediaControls.h: 1.192 MiB
Source/WebCore/rendering/RenderLayerCompositor.h: 1.171 MiB
Source/WebCore/platform/efl/RenderThemeEfl.h: 1.153 MiB
Source/WebCore/rendering/RenderMediaControlsChromium.h: 1.122 MiB
Source/WebCore/rendering/RenderMediaControls.h: 1.122 MiB


Turns out there are 89 .h files that evaluate to more than 1MB!
Comment 6 Eric Seidel (no email) 2011-04-25 16:17:22 PDT
Created attachment 90990 [details]
Patch
Comment 7 Nico Weber 2011-04-25 16:18:30 PDT
Created attachment 90991 [details]
find-includes.sh

I'm currently running

  (for f in $(git ls-files --full-name *.h); do ./find-includes.sh $f; done) | tee h-counts.txt

which will take a few hours to complete. find-includes.sh is the 2-line script that counts how often a .h is included, I mentioned it somewhere above; also attached.
Comment 8 Eric Seidel (no email) 2011-04-25 16:19:00 PDT
Created attachment 90992 [details]
Patch
Comment 9 Nico Weber 2011-04-25 16:19:30 PDT
Created attachment 90994 [details]
join script

Once that other command is completed, this script can be used to combine the two outputs.
Comment 10 Mihai Parparita 2011-04-25 16:35:31 PDT
See also bug 52451 where Tony looked at header files included everywhere and the benefits of forward declaration.
Comment 11 Nico Weber 2011-04-25 20:09:43 PDT
The command completed, but I realized it's not very helpful as it double-counts header files. We really only want to know how many different translation units include a header, since including a .h twice in a translation unit costs the same as including it once.

To do this, I added a -c option to the include-tracer script that just prints all header files that the script finds and then used this to find how many translation unit include each .h file like this:

  time (rm cpp-include-counts.txt
  time (for f in $(git ls-files --full-name *.cpp); do Tools/Scripts/include-tracer.py -c $f; done) | tee -a cpp-include-counts.txt)
   sort cpp-include-counts.txt | uniq -c > cpp-include-counts-processed.txt

I then used a modified join script to combine this with the header file sizes file. The top files are:



thakis$ python join-cpp.py | sort -k 4 -r -n | head -20
Source/WebCore/rendering/RenderObject.h: 0.883 559 493.597000
Source/WebCore/page/Frame.h: 0.746 653 487.138000
Source/WebCore/bindings/js/ScriptValue.h: 0.578 829 479.162000
Source/WebCore/rendering/style/RenderStyle.h: 0.646 597 385.662000
Source/WebCore/page/FrameView.h: 1.390 237 329.430000
Source/WebCore/bindings/js/ScriptSourceCode.h: 0.378 842 318.276000
Source/WebCore/dom/Element.h: 0.292 1084 316.528000
Source/WebCore/bindings/v8/V8Proxy.h: 0.362 841 304.442000
Source/WebCore/bindings/v8/ScriptValue.h: 0.366 829 303.414000
Source/WebCore/rendering/RenderBoxModelObject.h: 0.894 331 295.914000
Source/WebCore/loader/FrameLoader.h: 0.443 659 291.937000
Source/WebCore/dom/Document.h: 0.220 1307 287.540000
Source/WebCore/rendering/RenderBox.h: 0.924 301 278.124000
Source/WebCore/bindings/v8/ScriptController.h: 0.377 663 249.951000
Source/WebCore/rendering/RenderText.h: 0.891 260 231.660000
Source/WebCore/dom/StyledElement.h: 0.296 772 228.512000
Source/WebCore/rendering/InlineBox.h: 0.917 247 226.499000
Source/WebCore/rendering/RenderBR.h: 0.894 249 222.606000
Source/WebCore/bindings/js/ScriptController.h: 0.327 663 216.801000
Source/WebCore/rendering/RenderBlock.h: 1.016 206 209.296000


(.h filename, size of the .h with all its includes resolved in MiB, number of translation units including that .h, product of the previous 2 numbers.)

I also ran the include tracer on all cpp files, to find the biggest cpp files:


 thakis$  sort cpp-sizes.txt -k 2 -r -n | head -20
Source/WebCore/rendering/RenderingAllInOne.cpp: 4.851 MiB
Source/WebCore/dom/DOMAllInOne.cpp: 3.898 MiB
Source/WebCore/bindings/js/JSBindingsAllInOne.cpp: 3.624 MiB
Source/WebCore/html/HTMLElementsAllInOne.cpp: 3.396 MiB
Source/WebCore/svg/SVGAllInOne.cpp: 3.287 MiB
Source/WebKit/chromium/src/WebViewImpl.cpp: 2.868 MiB
Source/WebKit/chromium/src/WebFrameImpl.cpp: 2.687 MiB
Source/WebCore/rendering/svg/RenderSVGAllInOne.cpp: 2.616 MiB
Source/WebCore/dom/Document.cpp: 2.549 MiB
Source/WebCore/accessibility/AccessibilityAllInOne.cpp: 2.357 MiB
Source/WebKit/qt/Api/qwebpage.cpp: 2.331 MiB
Source/WebKit/chromium/src/FrameLoaderClientImpl.cpp: 2.299 MiB
Source/WebKit/chromium/src/ChromeClientImpl.cpp: 2.264 MiB
Source/WebKit/chromium/src/WebMediaPlayerClientImpl.cpp: 2.260 MiB
Source/WebKit/qt/WebCoreSupport/DumpRenderTreeSupportQt.cpp: 2.235 MiB
Source/WebCore/page/EventHandler.cpp: 2.227 MiB
Source/WebKit/chromium/src/PlatformBridge.cpp: 2.191 MiB
Source/WebKit/chromium/src/WebPluginContainerImpl.cpp: 2.186 MiB
Source/WebKit/chromium/src/ContextMenuClientImpl.cpp: 2.179 MiB
Source/WebCore/page/Frame.cpp: 2.167 MiB


I will attach all the scripts I used and all data files I produced.
Comment 12 Nico Weber 2011-04-25 20:10:17 PDT
Created attachment 91040 [details]
include-tracer with -c flag
Comment 13 Nico Weber 2011-04-25 20:10:51 PDT
Created attachment 91042 [details]
h include counts
Comment 14 Nico Weber 2011-04-25 20:11:14 PDT
Created attachment 91043 [details]
new join script
Comment 15 Nico Weber 2011-04-25 20:12:38 PDT
Created attachment 91044 [details]
header files with file size and number of translation units
Comment 16 Nico Weber 2011-04-25 20:13:08 PDT
Created attachment 91045 [details]
sizes of translation units
Comment 17 Adam Barth 2011-04-26 13:51:53 PDT
Comment on attachment 90992 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review

> Tools/Scripts/include-tracer:30
> +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala)

Which is available under BSD?  It would be nice to have a link to the license here.

> Tools/Scripts/include-tracer:42
> +# FIXME: This should be a per-port list.  Right now this is Apple-Mac only.
> +INCLUDE_PATHS = [

This is pretty lame.

> Tools/Scripts/include-tracer:150
> +            return total_bytes

Otherwise known as 0?

> Tools/Scripts/include-tracer:159
> +        # Skip system includes.
> +        if filename[0] == '<':
> +            return total_bytes

This will also skip wtf, right?

> Tools/Scripts/include-tracer:170
> +            lines = open(resolved_filename).readlines()

We should use "with" with open to make sure we don't leak.

> Tools/Scripts/include-tracer:183
> +            if line.startswith('#include "'):
> +                total_bytes += self._walk(seen, line.split('"')[1], resolved_filename, indent + 2)
> +            elif line.startswith('#include '):
> +                include = '<' + line.split('<')[1].split('>')[0] + '>'
> +                total_bytes += self._walk(seen, include, resolved_filename, indent + 2)

Would this be clearer with a regular expression?
Comment 18 David Levin 2011-04-26 13:56:34 PDT
Comment on attachment 90992 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review

Looks like Adam got it but here's comment anyway :).

> Tools/Scripts/include-tracer:192
> +            self._be_quiet = True

Seems like quiet would be the default (and verbose would be an option).
Comment 19 Ojan Vafai 2011-04-26 17:05:22 PDT
Comment on attachment 90992 [details]
Patch

R- per abarth's comments
Comment 20 Eric Seidel (no email) 2011-04-26 18:08:17 PDT
(In reply to comment #17)
> (From update of attachment 90992 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review
> 
> > Tools/Scripts/include-tracer:30
> > +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala)
> 
> Which is available under BSD?  It would be nice to have a link to the license here.

Which was previously google internal and covered under google copyright, thus as a google submitter licensed however we'd like here. :)
Comment 21 Nico Weber 2011-04-26 18:34:10 PDT
(In reply to comment #20)
> (In reply to comment #17)
> > (From update of attachment 90992 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=90992&action=review
> > 
> > > Tools/Scripts/include-tracer:30
> > > +# Based on an almost identical script by: jyrki@google.com (Jyrki Alakuijala)
> > 
> > Which is available under BSD?  It would be nice to have a link to the license here.
> 
> Which was previously google internal and covered under google copyright, thus as a google submitter licensed however we'd like here. :)

This script is based on a similar script from the chromium tree that has chromium's license (e.g. bsd style): http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/tools/include_tracer.py&q=include_tracer.py&exact_package=chromium&sa=N&cd=1&ct=rc