The GNU toolchain maintainers say in https://bugzilla.redhat.com/show_bug.cgi?id=2043178#c8: "Note, ld.gold is abandoned/barely supported in binutils, switching to ld.bfd is what I'd strongly recommend in any case." I've been hearing this for a year or two now, so it's probably time to stop overriding the default linker and assume the system default is best. On most systems, this would be ld.bfd. Forcing ld.gold is especially uncool on systems that use ld.lld, or future systems that might use ld.mold.
Created attachment 449732 [details] Patch
Comment on attachment 449732 [details] Patch Er wait, this patch is broken, I can see another use of USE_LD_GOLD in the diff itself. <_< It is required for DEBUG_FISSION. Sigh....
Let's abandon this for now. We cannot give up debug fission as that makes debug builds take impractically long.
(In reply to Michael Catanzaro from comment #2) > Comment on attachment 449732 [details] > Patch > > Er wait, this patch is broken, I can see another use of USE_LD_GOLD in the > diff itself. <_< It is required for DEBUG_FISSION. Sigh.... This shouldn't be the case, ld.bfd has supported split-debug for a good while. You can try yourself: % cat hello.cc #include <cstdio> extern const char* whom(); int main() { std::printf("Hello, %s!\n", whom()); } % cat other.cc const char* whom() { return "Pete"; } % g++ -gsplit-dwarf -ggdb -c hello.cc % g++ -gsplit-dwarf -ggdb -c other.cc % g++ -gsplit-dwarf -ggdb -fuse-ld=bfd -o hello hello.o other.o % dwarfdump hello | grep DW_AT_dwo_name | sort -u DW_AT_dwo_name hello.dwo DW_AT_dwo_name other.dwo % So I think it is still a good idea to use ld.bfd by default, and do not make the DEBUG_FISSION option depend on USE_LD_GOLD. As a matter of fact, split-debug is *also* supported by lld and mold. GCC is annoying in that it does not allow passing -fuse-ld=mold, for example, but you can do the linking command replacing s/g++/clang++/ and the “clang” compiler driver will use whatever linker you throw at it.
Let's do one step at a time and make DEBUG_FISSION not depend on USE_LD_GOLD first, TBD in bug #235737
So a tricky thing here: even if we do not set USE_LD_GOLD=ON by default, the system compiler can have been built to use Gold as the default, or chosen by something like Debian's update-alternatives, or an user could run “ln -nf /usr/bin/{ld.gold,ld}”... This means the build might still be using Gold even with USE_LD_GOLD=OFF. I think the (most) correct way to go about this is trying to match the output “${CXX} -Wl,--version” and match the output to determine what's the default linker. Then, if no USE_LD_{GOLD,LLD} were set, and the linker flags do not include a “-fuse-ld=...” flag, arrange to pass “-fuse-ld=bfd” as the default. Or should we just keep it simple and set USE_LD_GOLD=OFF by default? We know that in the Flatpak SDK both the GCC and Clang drivers use ld.bfd by default so in that case the correct choice would be used, then packagers would be responsible to not use by default a linker (Gold in this case) that could break arbitrarily. Opinions?
Created attachment 450218 [details] Patch
(In reply to Adrian Perez from comment #7) > Created attachment 450218 [details] > Patch This version of the patch uses my last suggested approach: do not try to disable gold if it's the default and leave that choice to packagers. Checking which linker is actually in use was needed anyway to determine whether --gdb-index can be passed to it.
I have been looking a bit at the red EWS bubbles and finally have been able to reproduce it here: if there is a previous build done with ld.gold, and the dirty build directory is used after applying this patch, *and* at least one static (.a) library does not get rebuilt, then ld.bfd will fail linking. The difference I can see is that when choosing ld.bfd in a new build directory then the generated static libraries are regular archives; but the ones left from a previous run with ld.gold are thin archives instead. Could it be that ld.bfd does not like thin archives? I will try and figure that out.
(In reply to Adrian Perez from comment #9) > The difference I can see is that when choosing ld.bfd in a new build > directory then the generated static libraries are regular archives; > but the ones left from a previous run with ld.gold are thin archives > instead. Could it be that ld.bfd does not like thin archives? I will > try and figure that out. Possibly https://sourceware.org/bugzilla/show_bug.cgi?id=28138 - I'll see if applying that patch to Binutils solves the problem. Other than that, thin archives should work just fine with ld.bfd :|
(In reply to Adrian Perez from comment #6) > So a tricky thing here: even if we do not set USE_LD_GOLD=ON by > default, the system compiler can have been built to use Gold as the > default, or chosen by something like Debian's update-alternatives, > or an user could run “ln -nf /usr/bin/{ld.gold,ld}”... This means > the build might still be using Gold even with USE_LD_GOLD=OFF. Of course, but that's fine. This flag is an override "use ld.gold no matter what if ON." It doesn't mean "avoid ld.gold no matter what if OFF." > I think the (most) correct way to go about this is trying to match > the output “${CXX} -Wl,--version” and match the output to determine > what's the default linker. Then, if no USE_LD_{GOLD,LLD} were set, > and the linker flags do not include a “-fuse-ld=...” flag, arrange > to pass “-fuse-ld=bfd” as the default. Overkill. > Or should we just keep it simple and set USE_LD_GOLD=OFF by default? > We know that in the Flatpak SDK both the GCC and Clang drivers use > ld.bfd by default so in that case the correct choice would be used, > then packagers would be responsible to not use by default a linker > (Gold in this case) that could break arbitrarily. I would just remove the setting. If we want it OFF by default, what's the point of having it at all? People know how to use -fuse-ld=gold if they really want it, but nobody is going to want it. (In reply to Adrian Perez from comment #8) > This version of the patch uses my last suggested approach: do not > try to disable gold if it's the default and leave that choice to > packagers. Checking which linker is actually in use was needed > anyway to determine whether --gdb-index can be passed to it. A little web searching indicates that --gdb-index makes gdb faster. I wonder why this flag only exists for certain linkers. Real shame that it has to be different. :/ (In reply to Adrian Perez from comment #10) > Possibly https://sourceware.org/bugzilla/show_bug.cgi?id=28138 - I'll > see if applying that patch to Binutils solves the problem. Other than > that, thin archives should work just fine with ld.bfd :| Oh wow... so I don't know what to do now. Try to get the fixes into Debian? Sounds better than installing a custom binutils on our EWS bots?
(In reply to Michael Catanzaro from comment #11) > (In reply to Adrian Perez from comment #6) > > So a tricky thing here: even if we do not set USE_LD_GOLD=ON by > > default, the system compiler can have been built to use Gold as the > > default, or chosen by something like Debian's update-alternatives, > > or an user could run “ln -nf /usr/bin/{ld.gold,ld}”... This means > > the build might still be using Gold even with USE_LD_GOLD=OFF. > > Of course, but that's fine. This flag is an override "use ld.gold no matter > what if ON." It doesn't mean "avoid ld.gold no matter what if OFF." Good, let's go ahed with the simpler ways :) > > [...] > > I would just remove the setting. If we want it OFF by default, what's the > point of having it at all? People know how to use -fuse-ld=gold if they > really want it, but nobody is going to want it. I like this 💣️ > (In reply to Adrian Perez from comment #8) > > This version of the patch uses my last suggested approach: do not > > try to disable gold if it's the default and leave that choice to > > packagers. Checking which linker is actually in use was needed > > anyway to determine whether --gdb-index can be passed to it. > > A little web searching indicates that --gdb-index makes gdb faster. I wonder > why this flag only exists for certain linkers. Real shame that it has to be > different. :/ Yeah, bit of a pity. Also I don't like much having to match the output of “-Wl,--version” to determine whether we can pass “--gdb-index”, I am looking into changing this to actually run the linker to see whether it will gladly accept the command line flag—stay tuned for v2 of the patch. > (In reply to Adrian Perez from comment #10) > > Possibly https://sourceware.org/bugzilla/show_bug.cgi?id=28138 - I'll > > see if applying that patch to Binutils solves the problem. Other than > > that, thin archives should work just fine with ld.bfd :| > > Oh wow... so I don't know what to do now. Try to get the fixes into Debian? > Sounds better than installing a custom binutils on our EWS bots? I am also looking into determining automatically whether thin archives are supported by the linker in use. Again, doing feature checks is nicer than knowing which linker is in use and assuming the options we pass will always work—a linker from an unpatched binutils will fail the feature test and thin archives would be disabled (at the cost of more disk space, of course) but at least the build would not fail.
Created attachment 450409 [details] Patch v2
(In reply to Adrian Perez from comment #13) > Created attachment 450409 [details] > Patch v2 Note that we may still need to clean the CMake cache files for incremental builds to succeed, because the value of the USE_THIN_ARCHIVES option gets stored there.
Created attachment 450410 [details] Patch v3 Fixes usage of CMAKE_<foo>_{SUFFIX,EXTENSION} variables
Comment on attachment 450410 [details] Patch v3 View in context: https://bugs.webkit.org/attachment.cgi?id=450410&action=review > Source/cmake/OptionsCommon.cmake:134 > + # Check wether the linker supports --gdb-index. Use try_compile() directly whether > Source/cmake/OptionsCommon.cmake:138 > + "int main(int argc, char** argv) { (void) argc; (void) argv; return 0; }") You can simplify this by writing "int main(void)" > ChangeLog:17 > + While at it, also feature check whether the linker supports thin archives, in order > + to enable them only when actually usable. This avoids situations in which thin archives > + can be created by the archiver but later on the linker will not be able to consume them. This is really far removed from ld.gold -> ld.bfd. Please split it out into a separate bug. r=me applies to the ld.gold changes. I haven't devoted enough brainpower to think about the thin archive stuff yet, just enough to see that you added more code than removed, which is not ideal.
My understanding is that gold is faster when linking than bfd. I'm not sure if that is still the case. But if it is, then maybe we should leave gold as default for developer builds in order to speed up the build?
(In reply to Carlos Alberto Lopez Perez from comment #17) > My understanding is that gold is faster when linking than bfd. I'm not sure > if that is still the case. I think that may be true, and also it uses significantly less RAM than ld.bfd. > But if it is, then maybe we should leave gold as default for developer > builds in order to speed up the build? That's certainly an option. Ultimately, I think it's more important to use a linker that is well-maintained and expected to work reliably.
(In reply to Michael Catanzaro from comment #16) > Comment on attachment 450410 [details] > Patch v3 > > View in context: > https://bugs.webkit.org/attachment.cgi?id=450410&action=review > > > Source/cmake/OptionsCommon.cmake:134 > > + # Check wether the linker supports --gdb-index. Use try_compile() directly > > whether Aye, will fix. > > Source/cmake/OptionsCommon.cmake:138 > > + "int main(int argc, char** argv) { (void) argc; (void) argv; return 0; }") > > You can simplify this by writing "int main(void)" Good point. I thought the C compiler would be more strict with the definition of main() but after trying a few different -std=… options your version works even with -Wall. Let's simplify this. > > ChangeLog:17 > > + While at it, also feature check whether the linker supports thin archives, in order > > + to enable them only when actually usable. This avoids situations in which thin archives > > + can be created by the archiver but later on the linker will not be able to consume them. > > This is really far removed from ld.gold -> ld.bfd. Please split it out into > a separate bug. r=me applies to the ld.gold changes. I haven't devoted > enough brainpower to think about the thin archive stuff yet, just enough to > see that you added more code than removed, which is not ideal. If I split this into a separate bug/patch, we'll need this one to depend on the new one, because currently thin archives are enabled by default if running “ar -V” prints “GNU” — the ability of the linker to *consume* them is not taken into account! This is why switching to ld.bfd results in builds failing due to bug https://sourceware.org/bugzilla/show_bug.cgi?id=28138 What the new CHECK_THIN_ARCHIVE_SUPPORT() function does is: - Compile test-thin-archive.c → test-thin-archive.o → libtest-thin-archive.a - Compile test-thin-archive-main.c → test-thin-archive-main.o - Link test-thin-archive-main.o + libtest-thin-archive.a → test-thin-archive If any of the above steps fails, it returns OFF/FALSE; if all the steps succeed, then it returns ON/TRUE. Or, in other words: it tries to create and use a thin archive — that's all. Hopefully the explanation helps out with reviewing… or would you still prefer this part as a separate patch?
(In reply to Michael Catanzaro from comment #18) > (In reply to Carlos Alberto Lopez Perez from comment #17) > > My understanding is that gold is faster when linking than bfd. I'm not sure > > if that is still the case. > > I think that may be true, and also it uses significantly less RAM than > ld.bfd. Yet lld and mold are faster than gold anyway. So I would rather default to using lld if available. We have lld already in the Flatpak SDK, which is what most developers and build bots use anyway. If we combine this with a follow-up patch to default to lld, we get both *even faster* builds and stop using a poorly maintainer program. > > But if it is, then maybe we should leave gold as default for developer > > builds in order to speed up the build? > > That's certainly an option. > > Ultimately, I think it's more important to use a linker that is > well-maintained and expected to work reliably. Reliability and correctness are the key here. I would rather default to lld for developer builds than to gold.
(In reply to Adrian Perez from comment #19) > If any of the above steps fails, it returns OFF/FALSE; if all the steps > succeed, then it returns ON/TRUE. Or, in other words: it tries to create > and use a thin archive — that's all. Hopefully the explanation helps out > with reviewing… or would you still prefer this part as a separate patch? Well since it is directly related, I think it's fine in one patch. Real shame that we need to add so much test code, though. :( And all just to handle an ld.bfd bug that is already fixed. I have a counterproposal: instead of adding this test code to the CMakeLists.txt, let's configure any affected EWS to build with -fuse-ld=gold, with a pointer to this issue so that we can remove that in the future when they upgrade to a newer Debian with fixed ld.bfd. What do you think? I am trying to avoid the need to bloat the file so much as a workaround for an old bug. Audacious alternative: we could request that the Debian binutils maintainer backport the patch that fixed this.
(In reply to Michael Catanzaro from comment #21) > (In reply to Adrian Perez from comment #19) > > If any of the above steps fails, it returns OFF/FALSE; if all the steps > > succeed, then it returns ON/TRUE. Or, in other words: it tries to create > > and use a thin archive — that's all. Hopefully the explanation helps out > > with reviewing… or would you still prefer this part as a separate patch? > > Well since it is directly related, I think it's fine in one patch. Real > shame that we need to add so much test code, though. :( And all just to > handle an ld.bfd bug that is already fixed. It does not “just” to handle the ld.bfd bug. Let me rephrase something I wrote above: - Currently thin archives are enabled if “ar” can create them. - Currently we do NOT test that the linker (whichever is is use) reads thin archives. Therefore: - If the linker (any linker, not just ld.bfd!) cannot read thin archives, the build will fail. For example, building on MacOS “llvm-ar” would be used as the archiver, and it supports thin archives, but the default “ld64” linker does NOT support reading them… resulting in a failed build. The only 100% reliable solution is: - Test that “ar” can create thin archives, and - Test that the linker (whichever is in use) can read thin archives. > I have a counterproposal: instead of adding this test code to the > CMakeLists.txt, let's configure any affected EWS to build with > -fuse-ld=gold, with a pointer to this issue so that we can remove that in > the future when they upgrade to a newer Debian with fixed ld.bfd. What do > you think? I am trying to avoid the need to bloat the file so much as a > workaround for an old bug. That would be a workaround, and as explained above would not cover setups that can be reasonably found in the wild. > Audacious alternative: we could request that the Debian binutils maintainer > backport the patch that fixed this. We can ask about that in parallel, but we will still need to check whether the linker currently being used can ingest thin archives.
Created attachment 450476 [details] Patch v4 Applies edits suggested by Michael, simplifies things
Could we just use thin archives only on Linux? Sure it would be less accurate, but as a practical matter, thin archives are expected to work on Linux unless you are using a seriously oddball toolchain (sans bugs). I'm not going to r- this. If you really want these checks, well, OK. But I'm not enthused about adding so many new tests and enlarging this file, as it doesn't really seem needed.
(In reply to Michael Catanzaro from comment #24) > Could we just use thin archives only on Linux? Sure it would be less > accurate, but as a practical matter, thin archives are expected to work on > Linux unless you are using a seriously oddball toolchain (sans bugs). We can go back to matching the output from “${CC} -Wl,--version” and matching those linkers we know support thin archives. Maybe that is better as it implies “this is the list of linkers actually tested to be working”, dunno 🤔️ > I'm not going to r- this. If you really want these checks, well, OK. But I'm > not enthused about adding so many new tests and enlarging this file, as it > doesn't really seem needed. Well, I'm not going to cq+ it either because I just found a case in which the checks are failing to disable thin archives ~_~ It seems I will be submitting a fifth iteration of the patch in the end.
(In reply to Adrian Perez from comment #25) > Well, I'm not going to cq+ it either because I just found a case in which > the checks are failing to disable thin archives ~_~ It seems the EWS has also found this case. I see red. :(
Created attachment 450570 [details] Patch v5 Go back to explicitly listing which features known linkers and archivers support, which covers the ones I have successfully tested. Note that we need the patch from bug #235975 and an updated Flatpak SDK deployed before the EWS will be happy.
Comment on attachment 450570 [details] Patch v5 Meh. This isn't great, but I can't think of anything better. r=me if you can make EWS happy.
(In reply to Michael Catanzaro from comment #28) > Comment on attachment 450570 [details] > Patch v5 > > Meh. This isn't great, but I can't think of anything better. > > r=me if you can make EWS happy. The EWS will eventually be happy, once we deploy the Flatpak SDK with the binutils patch. Also I now realized looking at your first patch that we should do something like: if (ARM_TRADITIONAL_DETECTED AND LD_VARIANT STREQUAL GOLD) message(WARNING "Using ld.gold when targeting traditional ARM may result in errors") endif () Or should we just go ahead and use message(FATAL_ERROR …) instead?
(In reply to Adrian Perez from comment #29) > if (ARM_TRADITIONAL_DETECTED AND LD_VARIANT STREQUAL GOLD) > message(WARNING "Using ld.gold when targeting traditional ARM may > result in errors") > endif () > > Or should we just go ahead and use message(FATAL_ERROR …) instead? I would just remove it. We no longer force use of ld.gold, so it's not really our problem. If somebody tries it and finds that it doesn't work, they'll probably switch back to the default linker and move on.
In case it's not clear, I have a strong bias for removing as much CMake goo as possible, except for the goo that we really need to do useful stuff. Most build systems do not care about which linker is being used. We probably really need most of this goo, for debug fission and also because thin archives are probably important with our massive static libs(?). But if there's any question whether it's really needed, I'd say probably not.
Created attachment 450627 [details] Patch v6 Hopefuly "patch for landing" :-)
(In reply to Michael Catanzaro from comment #31) > In case it's not clear, I have a strong bias for removing as much CMake goo > as possible, except for the goo that we really need to do useful stuff. Most > build systems do not care about which linker is being used. We probably > really need most of this goo, for debug fission and also because thin > archives are probably important with our massive static libs(?). But if > there's any question whether it's really needed, I'd say probably not. I have removed the ARM_TRADITIONAL_DETECTED leftover in the end, and now the EWS builders are greed after deploying an updated Flatpak SDK with the binutils fix for thin archives applied 🙌️
(In reply to Adrian Perez from comment #20) > (In reply to Michael Catanzaro from comment #18) > > (In reply to Carlos Alberto Lopez Perez from comment #17) > > > My understanding is that gold is faster when linking than bfd. I'm not sure > > > if that is still the case. > > > > I think that may be true, and also it uses significantly less RAM than > > ld.bfd. > > Yet lld and mold are faster than gold anyway. So I would rather default > to using lld if available. We have lld already in the Flatpak SDK, which > is what most developers and build bots use anyway. > > If we combine this with a follow-up patch to default to lld, we get > both *even faster* builds and stop using a poorly maintainer program. That will be bug #235979 — it will take me a few days to get to it, so if someone else wants to try making the patch in the meantime after this one lands I can find a moment to help with reviews.
Committed r288994 (246713@main): <https://commits.webkit.org/246713@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 450627 [details].