Bug 192391 - Use directory local sequential numbers for Unified Sources filenames instead of global sequential numbers for CMake
Summary: Use directory local sequential numbers for Unified Sources filenames instead ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Fujii Hironori
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-12-04 19:45 PST by Carlos Eduardo Ramalho
Modified: 2019-01-09 18:26 PST (History)
10 users (show)

See Also:


Attachments
Patch for the experiment (2.13 KB, patch)
2018-12-04 19:45 PST, Carlos Eduardo Ramalho
no flags Details | Formatted Diff | Diff
Test the bots (2.22 KB, patch)
2018-12-04 19:56 PST, Carlos Eduardo Ramalho
no flags Details | Formatted Diff | Diff
Test Xcode bots after adding --use-hash-filenames option (5.12 KB, patch)
2018-12-05 14:30 PST, Carlos Eduardo Ramalho
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews124 for ios-simulator-wk2 (2.42 MB, application/zip)
2018-12-05 22:39 PST, Build Bot
no flags Details
Patch (6.82 KB, patch)
2018-12-25 20:43 PST, Fujii Hironori
no flags Details | Formatted Diff | Diff
Patch (5.20 KB, patch)
2018-12-26 18:51 PST, Fujii Hironori
no flags Details | Formatted Diff | Diff
Patch (6.57 KB, patch)
2019-01-07 22:38 PST, Fujii Hironori
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Eduardo Ramalho 2018-12-04 19:45:45 PST
Created attachment 356571 [details]
Patch for the experiment

During a git-bisect session, I noticed that building times were taking too long even though I was:
- Using a compiler cache (CCache in case)
- Doing many WebKit builds in sequence

So I noticed that the building is stalled during UnifiedSources because the following may happen easily:
1) a new bundle (up to 8 source files) or more is added
2) all generated Unified Sources filenames (UnifiedSourceXYZ.cpp) after this bundle are shifted XYZ+N (N=number of new bundles)

This implies ccache having lots of cache misses.

To confirm that was the case, I did a little experiment (patch for the experiment attached):
- Cleared CCache and compiled WebKit from clean build directory

Time for a clean build (for comparison with build times below): 14973.56s user 1947.89s system 2514% cpu 11:12.97 total

- After, added an source file containing only `#include "config.h"` in a separate folder in JavaScriptCore, WebCore and WebKit e.g TestUnified/TestUnified.cpp inside these folders
- Added it to the beginning of each corresponding Sources.txt
  - It will cause a new bundle to be added because it only bundles together files in the same directory
  - The shifting will then occur
- Built WebKit again without cleaning the build directory.

Result:
- Time to build: 7315.90s user 727.49s system 2456% cpu 5:27.38 total

Half the time of clean-build with only 3 new "empty" source files.

To improve this build time, I changed the naming of UnifiedSources to UnifiedSources-{hash}.cpp where the hash is calculated based on the UnifiedSource text content (only the #include lines with the cpp files).

I then repeated the experiment:
- Time to build: 27.14s user 5.44s system 226% cpu 14.405 total

Just like building without changes.

Advantages:
- The Unified Source filename is based on its content rather than an arbitrary integer that may change frequently
- When new bundles are added, only that new bundle and the ones with sources modified are recompiled.

I think it should improve build time in the bots (if they do use compiler caching) and eventual git-bisect sessions in the future.
Comment 1 Carlos Eduardo Ramalho 2018-12-04 19:56:52 PST
Created attachment 356572 [details]
Test the bots
Comment 2 Carlos Eduardo Ramalho 2018-12-05 05:07:49 PST
It looks like the bot building with Xcode are not happy :(

Looking quickly at the build scripts, it seems like for the Xcode build it needs the ascending integers because the UnifiedSource filenames are not reported back after generation. The build just loops until the given number of bundles via the generate-unified-sources.sh (correct me if I'm wrong).

I'll add an option '--use-hash-filename' to generate-unified-source-bundles.rb and use only in CMake builds to check if the bots are happy then, because I'm not familiar with the Xcode build to fix it there.
Comment 3 Carlos Eduardo Ramalho 2018-12-05 05:08:10 PST
(In reply to Carlos Eduardo Ramalho from comment #2)
> It looks like the bot building with Xcode are not happy :(
the bots*
Comment 4 Carlos Eduardo Ramalho 2018-12-05 07:43:41 PST
This change is actually relevant even when not using ccache. Shitfting the UnifiedSources will make CMake to detect the sources to rebuild already.

Without this change, in the experiment CMake rebuilds about 690 UnifiedSources. After the change, only the unified sources with TestUnified.cpp are compiled.
Comment 5 Keith Miller 2018-12-05 09:08:28 PST
Comment on attachment 356572 [details]
Test the bots

View in context: https://bugs.webkit.org/attachment.cgi?id=356572&action=review

> Source/WTF/Scripts/generate-unified-source-bundles.rb:207
> +        digest = Digest::SHA1.hexdigest(@currentBundleText)[0..12]

Can we still use the number too, preferably before the hash for sorting reasons? I like having the number as it helps me know how far into the build I am. CMake doesn't really track the completion percentage all that well.

So it would be "UnifiedSource-#{number}-#{digest}.#{extension}".
Comment 6 Carlos Eduardo Ramalho 2018-12-05 09:24:33 PST
(In reply to Keith Miller from comment #5)
> Can we still use the number too, preferably before the hash for sorting
> reasons? I like having the number as it helps me know how far into the build
> I am. CMake doesn't really track the completion percentage all that well.
> 
> So it would be "UnifiedSource-#{number}-#{digest}.#{extension}".

It would be nice indeed. But wouldn't the problem we're trying to solve happen again? I mean, with the number the shifting occurs and all unified sources are renamed regardless of the hash.

Unless I am not seeing something. WDYT?
Comment 7 Michael Catanzaro 2018-12-05 09:37:01 PST
Wow, good idea.

(In reply to Carlos Eduardo Ramalho from comment #6)
> It would be nice indeed. But wouldn't the problem we're trying to solve
> happen again? I mean, with the number the shifting occurs and all unified
> sources are renamed regardless of the hash.

Yeah... then again, adding a new file anywhere could also cause the contents of many subsequent unified source bundles to change....

(In reply to Keith Miller from comment #5)
> CMake doesn't really track the completion percentage all that well.

This problem is caused by using a separate build target for every single forwarding header. I think the build progress would run more naturally if that were not the case.
Comment 8 Carlos Eduardo Ramalho 2018-12-05 14:30:27 PST
Created attachment 356658 [details]
Test Xcode bots after adding --use-hash-filenames option
Comment 9 Carlos Eduardo Ramalho 2018-12-05 16:36:43 PST
(In reply to Carlos Eduardo Ramalho from comment #8)
> Created attachment 356658 [details]
> Test Xcode bots after adding --use-hash-filenames option

Bots are happy now.
Comment 10 Build Bot 2018-12-05 22:39:39 PST
Comment on attachment 356658 [details]
Test Xcode bots after adding --use-hash-filenames option

Attachment 356658 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: https://webkit-queues.webkit.org/results/10288269

New failing tests:
imported/w3c/web-platform-tests/service-workers/service-worker/register-closed-window.https.html
Comment 11 Build Bot 2018-12-05 22:39:41 PST
Created attachment 356715 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.6
Comment 12 Michael Catanzaro 2018-12-06 12:20:25 PST
Comment on attachment 356658 [details]
Test Xcode bots after adding --use-hash-filenames option

This LGTM but I'm not going to give cq+ because I want to wait and see what Keith thinks.
Comment 13 Fujii Hironori 2018-12-07 16:29:09 PST
IIUC, this change leaves stale generated unified sources and its object files.
IIUC, source files are grouped by directories. What about using filenames UnifiedSource-{hash-of-dir-name}-{seq-num-within-the-dir}.cpp?
Comment 14 Carlos Eduardo Ramalho 2018-12-12 16:01:47 PST
(In reply to Fujii Hironori from comment #13)
> IIUC, this change leaves stale generated unified sources and its object
> files.
> IIUC, source files are grouped by directories. What about using filenames
> UnifiedSource-{hash-of-dir-name}-{seq-num-within-the-dir}.cpp?

Well noted! I'll try to do as per you recommendation and see.

Although in this new way it'll be sub-optimal compared to the original strategy because we'll have a sequence of unified sources for a given folder which may be all changed after adding a new source. But it should still speed up recompilation a lot compared to the worst case scenario as described in the example.
Comment 15 Fujii Hironori 2018-12-25 20:43:04 PST
Created attachment 358073 [details]
Patch
Comment 16 Michael Catanzaro 2018-12-26 10:02:47 PST
Comment on attachment 358073 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=358073&action=review

> Source/WTF/Scripts/generate-unified-source-bundles.rb:53
> +    puts "--use-hash-filenames                 Use a hash in unified source filenames instead of ascending integers"

This description should be updated now
Comment 17 Fujii Hironori 2018-12-26 18:51:26 PST
Created attachment 358091 [details]
Patch
Comment 18 Carlos Eduardo Ramalho 2018-12-26 19:00:57 PST
(In reply to Fujii Hironori from comment #17)
> Created attachment 358091 [details]
> Patch

Great! You found out that a new option is not needed after all! :)
Comment 19 Fujii Hironori 2019-01-06 19:48:47 PST
Review, please.
Comment 20 Fujii Hironori 2019-01-07 17:08:20 PST
Do you have a chance to review this patch, Keith?
Comment 21 Stephan Szabo 2019-01-07 17:32:28 PST
Comment on attachment 358091 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=358091&action=review

> Source/WTF/Scripts/generate-unified-source-bundles.rb:214
> +                "-#{hash}-#{@bundleCountInDirectory}"

It feels a bit strange that the bundleFileName(number) function by its signature would seem to be something that would generate a distinct filename when called with different numbers, but won't actually do so when maxCount isn't set without external help (like changing bundleCountInDirectory). It's not a big deal, but could be confusing.
Comment 22 Fujii Hironori 2019-01-07 22:27:57 PST
Comment on attachment 358091 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=358091&action=review

>> Source/WTF/Scripts/generate-unified-source-bundles.rb:214
>> +                "-#{hash}-#{@bundleCountInDirectory}"
> 
> It feels a bit strange that the bundleFileName(number) function by its signature would seem to be something that would generate a distinct filename when called with different numbers, but won't actually do so when maxCount isn't set without external help (like changing bundleCountInDirectory). It's not a big deal, but could be confusing.

Good point. Will fix.
Comment 23 Fujii Hironori 2019-01-07 22:38:12 PST
Created attachment 358576 [details]
Patch
Comment 24 Don Olmstead 2019-01-08 08:55:35 PST
Comment on attachment 358576 [details]
Patch

r=me

I think this is fine. I'm not sure if Keith is out of office but I'd probably prefer that he cq+ it.
Comment 25 Fujii Hironori 2019-01-09 18:21:24 PST
Comment on attachment 358576 [details]
Patch

Clearing flags on attachment: 358576

Committed r239813: <https://trac.webkit.org/changeset/239813>
Comment 26 Fujii Hironori 2019-01-09 18:21:27 PST
All reviewed patches have been landed.  Closing bug.
Comment 27 Radar WebKit Bug Importer 2019-01-09 18:26:16 PST
<rdar://problem/47166490>