RESOLVED FIXED 184062
MSVC __forceinline slows down JSC release build fivefold after r229391
https://bugs.webkit.org/show_bug.cgi?id=184062
Summary MSVC __forceinline slows down JSC release build fivefold after r229391
Ross Kirsling
Reported 2018-03-27 17:14:41 PDT
After the TMP refactor in r229391 (https://bugs.webkit.org/show_bug.cgi?id=183263), the WinCairo Release build takes 50 minutes to compile JSC alone, compared to only 10 minutes prior to the patch. This is an MSVC-specific issue -- their optimizer has trouble dealing with "too liberal a use of __forceinline". (https://blogs.msdn.microsoft.com/vcblog/2018/01/04/visual-studio-2017-throughput-improvements-and-advice/) We have reported the issue to Microsoft, but in the meantime it would seem wise to define ALWAYS_INLINE as simply "inline" on Windows, which brings the compile time back down to 10 minutes.
Attachments
Patch (1.20 KB, patch)
2018-03-27 17:17 PDT, Ross Kirsling
no flags
Patch (4.84 KB, patch)
2018-03-28 16:20 PDT, Ross Kirsling
no flags
Patch for landing (4.74 KB, patch)
2018-03-29 17:27 PDT, Ross Kirsling
no flags
Ross Kirsling
Comment 1 2018-03-27 17:17:34 PDT
JF Bastien
Comment 2 2018-03-27 21:32:05 PDT
That seems like a pretty wide net to cast to avoid the force-inline problem with the TMP code. Could you instead have a TMP_ALWAYS_INLINE or something like that which only applies to that code? On the other hand, I don't know how much tuning has been done for the Windows build and inlining, so the heuristics we've applied may be totally wrong. Do we have developers who measure whether this changes pessimizes performance?
Ross Kirsling
Comment 3 2018-03-28 09:45:24 PDT
(In reply to JF Bastien from comment #2) > That seems like a pretty wide net to cast to avoid the force-inline problem > with the TMP code. Could you instead have a TMP_ALWAYS_INLINE or something > like that which only applies to that code? That might be a possibility. I guess there's a similar thing going on here: https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/graphics/FormatConverter.cpp#L39-L43 (...actually I wonder if this one's still necessary in VS2017, hehe.) > On the other hand, I don't know how much tuning has been done for the > Windows build and inlining, so the heuristics we've applied may be totally > wrong. Do we have developers who measure whether this changes pessimizes > performance? That would be great to know, especially as this should apply equally to AppleWin. I could look into performance test comparisons over here as well.
JF Bastien
Comment 4 2018-03-28 10:47:44 PDT
(In reply to Ross Kirsling from comment #3) > (In reply to JF Bastien from comment #2) > > That seems like a pretty wide net to cast to avoid the force-inline problem > > with the TMP code. Could you instead have a TMP_ALWAYS_INLINE or something > > like that which only applies to that code? > > That might be a possibility. I guess there's a similar thing going on here: > https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/ > graphics/FormatConverter.cpp#L39-L43 > (...actually I wonder if this one's still necessary in VS2017, hehe.) > > > On the other hand, I don't know how much tuning has been done for the > > Windows build and inlining, so the heuristics we've applied may be totally > > wrong. Do we have developers who measure whether this changes pessimizes > > performance? > > That would be great to know, especially as this should apply equally to > AppleWin. I could look into performance test comparisons over here as well. Alex / Per / Brent are more informed about WebKit on Windows than I am, so I'll defer to them.
Alex Christensen
Comment 5 2018-03-28 10:52:29 PDT
Comment on attachment 336631 [details] Patch I agree with JF that this is too drastic a change. I think we should instead do something like I needed to do in https://bugs.webkit.org/show_bug.cgi?id=125808 and only remove the __forceinline on only the problematic inlines. There are a lot of instances of ALWAYS_INLINE that we probably want to keep.
Ross Kirsling
Comment 6 2018-03-28 11:01:08 PDT
(In reply to Alex Christensen from comment #5) > Comment on attachment 336631 [details] > Patch > > I agree with JF that this is too drastic a change. I think we should > instead do something like I needed to do in > https://bugs.webkit.org/show_bug.cgi?id=125808 and only remove the > __forceinline on only the problematic inlines. There are a lot of instances > of ALWAYS_INLINE that we probably want to keep. Sounds good. Just thought I'd kick off the conversation from the naive approach. :)
Ross Kirsling
Comment 7 2018-03-28 14:01:35 PDT
It appears that the issue is indeed quite localized and must have to do with MSVC getting overwhelmed by recursive templates in particular. I've confirmed that: 1. Using non-forced inline for every `setupArgumentsImpl` overload results in a 9.5-minute build. https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/jit/CCallHelpers.h#L318-L464 2. Using non-forced inline for `marshallArgumentRegister` alone results in an 11-minute build. https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/jit/CCallHelpers.h#L299 These are both potentially acceptable -- does anyone have a clear preference? Perhaps a recommendation of which test(s) would be most useful for perf impact?
Alex Christensen
Comment 8 2018-03-28 15:09:44 PDT
The least invasive change would be to make marshallArgumentRegister non-forced-inline on MSVC only.
Ross Kirsling
Comment 9 2018-03-28 15:31:59 PDT
(In reply to Alex Christensen from comment #8) > The least invasive change would be to make marshallArgumentRegister > non-forced-inline on MSVC only. Sounds good. FWIW, I also checked Sunspider benchmarks, which (unsurprisingly) support the same conclusion: Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. marshallArgumentRegister setupArgumentsImpl 3d-cube 17.1135+-1.2752 ? 17.4554+-1.8218 ? might be 1.0200x slower 3d-morph 12.1973+-0.8931 ? 13.6648+-2.0364 ? might be 1.1203x slower 3d-raytrace 23.5151+-2.4159 ? 25.1138+-7.1721 ? might be 1.0680x slower access-binary-trees 8.3303+-0.5779 ? 8.7787+-2.4330 ? might be 1.0538x slower access-fannkuch 14.8218+-2.2912 ? 15.4262+-7.4340 ? might be 1.0408x slower access-nbody 11.0309+-1.1147 10.1383+-0.2915 might be 1.0880x faster access-nsieve 6.6365+-0.8363 6.3742+-0.4838 might be 1.0411x faster bitops-3bit-bits-in-byte 5.1899+-0.1658 ? 5.8102+-1.2525 ? might be 1.1195x slower bitops-bits-in-byte 7.3887+-2.3269 6.6924+-0.1452 might be 1.1040x faster bitops-bitwise-and 9.1475+-1.1505 ? 9.3076+-1.3356 ? might be 1.0175x slower bitops-nsieve-bits 7.7549+-1.3521 7.7289+-0.6407 controlflow-recursive 7.2060+-0.4298 ! 8.4953+-0.8057 ! definitely 1.1789x slower crypto-aes 19.2986+-8.2190 18.0129+-3.0947 might be 1.0714x faster crypto-md5 8.2644+-1.2072 7.7612+-1.1438 might be 1.0648x faster crypto-sha1 8.1423+-1.5990 7.9152+-0.6251 might be 1.0287x faster date-format-tofte 35.1650+-0.6483 ? 36.1490+-2.2741 ? might be 1.0280x slower date-format-xparb 21.6351+-2.0519 21.2189+-0.1581 might be 1.0196x faster math-cordic 11.2579+-2.4418 9.3053+-0.4511 might be 1.2098x faster math-partial-sums 13.2822+-1.8823 13.0983+-2.1215 might be 1.0140x faster math-spectral-norm 7.7226+-0.4115 7.6494+-0.2857 regexp-dna 11.4542+-0.2199 ? 12.5857+-3.0776 ? might be 1.0988x slower string-base64 13.4223+-1.0573 ? 15.1614+-2.0637 ? might be 1.1296x slower string-fasta 20.6008+-2.9576 ? 21.6987+-6.0305 ? might be 1.0533x slower string-tagcloud 32.4288+-0.5497 ? 34.3539+-3.1244 ? might be 1.0594x slower string-unpack-code 84.0179+-8.3223 ? 90.4980+-27.9551 ? might be 1.0771x slower string-validate-input 23.8311+-1.8061 ? 24.0583+-1.7425 ? <arithmetic> 16.9560+-0.5638 ? 17.4789+-0.6725 ? might be 1.0308x slower
Ross Kirsling
Comment 10 2018-03-28 16:20:42 PDT
Ross Kirsling
Comment 11 2018-03-28 16:48:42 PDT
Additionally, it looks like that "C1063 Fatal Error" no longer occurs for FormatConverter.cpp, so we should be able to restore the ALWAYS_INLINE there in a follow-up patch.
Alex Christensen
Comment 12 2018-03-28 17:10:25 PDT
(In reply to Ross Kirsling from comment #11) > Additionally, it looks like that "C1063 Fatal Error" no longer occurs for > FormatConverter.cpp, so we should be able to restore the ALWAYS_INLINE there > in a follow-up patch. I've been meaning to look into whether we should just remove the ALWAYS_INLINE there. It could make the binary significantly smaller while de-optiizing the outer loop, which would probably be a performance gain.
WebKit Commit Bot
Comment 13 2018-03-28 17:48:01 PDT
Comment on attachment 336731 [details] Patch Clearing flags on attachment: 336731 Committed r230062: <https://trac.webkit.org/changeset/230062>
WebKit Commit Bot
Comment 14 2018-03-28 17:48:03 PDT
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 15 2018-03-28 17:49:24 PDT
Ryosuke Niwa
Comment 16 2018-03-29 00:33:08 PDT
Somehow this patch caused Mac port to break. We can't load apple.com now :(
WebKit Commit Bot
Comment 17 2018-03-29 00:33:14 PDT
Re-opened since this is blocked by bug 184128
Alex Christensen
Comment 18 2018-03-29 10:39:45 PDT
(In reply to Ryosuke Niwa from comment #16) > Somehow this patch caused Mac port to break. We can't load apple.com now :( This change makes no change on Mac. You have almost certainly misdiagnosed
Ryosuke Niwa
Comment 19 2018-03-29 13:48:55 PDT
(In reply to Alex Christensen from comment #18) > (In reply to Ryosuke Niwa from comment #16) > > Somehow this patch caused Mac port to break. We can't load apple.com now :( > > This change makes no change on Mac. You have almost certainly misdiagnosed You say that but Safari builds before this patch loads web pages file, and Safari builds after this patch cause web content process to instantly crash. The crash reproduces at r230069 but doesn't after the rollout in r230070. So your assertion that this change makes no change on Mac is obviously wrong. It certainly affects Mac even though the code change doesn't seem like it should.
Ross Kirsling
Comment 20 2018-03-29 14:10:09 PDT
At r230075 (after the rollout, running Xcode 9.2 on Sierra), I can confirm that: - `run-minibrowser` succeeds with or without the patch (both release and debug) - the "play" button in Xcode fails with or without the patch (WebProcess crash on launch) I'm not sure what's causing the issue in Xcode, but it's not this patch; I will plan to reland accordingly.
Alex Christensen
Comment 21 2018-03-29 14:10:55 PDT
Woah, let Ryosuke double check before relanding.
Alex Christensen
Comment 22 2018-03-29 14:12:29 PDT
(In reply to Ross Kirsling from comment #20) > At r230075 (after the rollout, running Xcode 9.2 on Sierra), I can confirm > that: What about r230069 before the rollout?
Ross Kirsling
Comment 23 2018-03-29 14:15:50 PDT
(In reply to Alex Christensen from comment #21) > Woah, let Ryosuke double check before relanding. Whoops, okay. Standing by. (In reply to Alex Christensen from comment #22) > What about r230069 before the rollout? Not sure. Builds are a half-hour apiece so I've been working on this since 9:15 AM.
Ross Kirsling
Comment 24 2018-03-29 14:24:11 PDT
(In reply to Ross Kirsling from comment #20) > - `run-minibrowser` succeeds with or without the patch (both release and > debug) Er, to be more clear, there's no reproduction with `build-webkit [--debug]` followed by `run-minibrowser [--debug]` or `run-safari [--debug]` on the command line. I don't really understand what differs between this and the build-and-run from within Xcode.
Alex Christensen
Comment 25 2018-03-29 14:26:05 PDT
The definition of ALWAYS_INLINE depends on NDEBUG, so you'll need to do a release build.
Ross Kirsling
Comment 26 2018-03-29 14:27:50 PDT
(In reply to Alex Christensen from comment #25) > The definition of ALWAYS_INLINE depends on NDEBUG, so you'll need to do a > release build. Sure, I did both across the board to be extra careful.
Ross Kirsling
Comment 27 2018-03-29 15:28:51 PDT
Alright, one more scenario just to make things perfectly unambiguous. :D 1. git checkout dd7a49d554b8583f61cfda3f0a42ecdf3ed5b6de # i.e. r230061, the revision immediately before this patch landed 2. git clean -ffdx # because why not 3. Open Xcode, Product > Clean 4. Hit the Play button => WebContent process crashes immediately.
Ross Kirsling
Comment 28 2018-03-29 16:12:35 PDT
(In reply to Ross Kirsling from comment #27) > Alright, one more scenario just to make things perfectly unambiguous. :D > > 1. git checkout dd7a49d554b8583f61cfda3f0a42ecdf3ed5b6de # i.e. r230061, the > revision immediately before this patch landed > > 2. git clean -ffdx # because why not > > 3. Open Xcode, Product > Clean > > 4. Hit the Play button > > => WebContent process crashes immediately. ...or apparently this doesn't count as a thorough clean, from Xcode's perspective. Evidently there's a file: /Users/.../Library/Developer/Xcode/DerivedData/WebKit-.../Build/Products/Debug/WebKit.framework/Resources/com.apple.WebProcess.sb To which my comment from this patch was added: > // Sometimes needed due to MSVC optimizer sensitivities. And that's syntactically invalid amid s-expressions. This leaves me with many questions, but I'm guessing that there's a reason why only block comments are used in WTF/Compiler.h. :(
Alex Christensen
Comment 29 2018-03-29 16:14:23 PDT
Aha! Change the comment style and resubmit!
Alex Christensen
Comment 30 2018-03-29 16:14:39 PDT
Actually, just remove the comment.
Ross Kirsling
Comment 31 2018-03-29 17:27:33 PDT
Created attachment 336823 [details] Patch for landing
Alex Christensen
Comment 32 2018-03-29 17:33:55 PDT
(In reply to Alex Christensen from comment #18) > (In reply to Ryosuke Niwa from comment #16) > This change makes no change on Mac. You have almost certainly misdiagnosed For the record, you did not misdiagnose, Ryosuke.
Ross Kirsling
Comment 33 2018-03-29 17:48:48 PDT
(In reply to Alex Christensen from comment #32) > (In reply to Alex Christensen from comment #18) > > (In reply to Ryosuke Niwa from comment #16) > > This change makes no change on Mac. You have almost certainly misdiagnosed > For the record, you did not misdiagnose, Ryosuke. Indeed, what a trip. I've created https://bugs.webkit.org/show_bug.cgi?id=184166 to track the real bug here. :)
WebKit Commit Bot
Comment 34 2018-03-29 19:47:38 PDT
Comment on attachment 336823 [details] Patch for landing Clearing flags on attachment: 336823 Committed r230091: <https://trac.webkit.org/changeset/230091>
WebKit Commit Bot
Comment 35 2018-03-29 19:47:40 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.