Summary: | REGRESSION(r251875): Crash in JSC::StructureIDTable::get on ppc64le: gcSafeMemcpy broken on JSVALUE64 platforms other than x86_64 and aarch64 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Trung LE <trung.le> | ||||||||
Component: | WebCore Misc. | Assignee: | Nobody <webkit-unassigned> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Critical | CC: | bugs-noreply, ews-watchlist, keith_miller, mark.lam, mcatanzaro, msaboff, q66, saam, trung.le, tzagallo, webkit-bug-importer | ||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||
Version: | Other | ||||||||||
Hardware: | PC | ||||||||||
OS: | Linux | ||||||||||
See Also: |
https://bugzilla.redhat.com/show_bug.cgi?id=1823031 https://bugs.webkit.org/show_bug.cgi?id=209236 https://bugs.webkit.org/show_bug.cgi?id=211588 |
||||||||||
Attachments: |
|
Description
Trung LE
2020-04-18 05:43:46 PDT
*** Bug 210686 has been marked as a duplicate of this bug. *** The only practical way to debug this is to bisect to find the commit that broke it. Any chance you're able to try your own build from git and see if you can reproduce the crash there? If so, then it's bisectable. $ git clone git://git.webkit.org/WebKit-https.git WebKit Once you've cloned (it will take a while, but don't do a shallow clone because we need the history), follow the instructions "Building WebKitGTK from a release tarball" at [1], NOT the instructions "Building WebKitGTK from git" below that. Except do use git, not a release tarball. I know that's confusing, but it will be fine. ;) If you have trouble, then I can try to bisect it for you with the access to your machine that you offered, but that will go onto my TODO list with no ETA, so quickest way is to try yourself. My guess is that a bisect would require about 10 builds, probably can be done in a long morning or afternoon on your hardware. [1] https://trac.webkit.org/wiki/BuildingGtk#BuildingWebKitGTKfromareleasetarball I forgot to mention a couple things. First, before you do anything else: $ sudo dnf builddep webkit2gtk3 $ sudo dnf install 'pkgconfig(libsystemd)' And when you run cmake, explicitly enable MiniBrowser so that you can test the result in MiniBrowser. That will be easier than trying to test your build in Epiphany: $ cmake -DPORT=GTK -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_MINIBROWSER=ON -GNinja Well I also forgot about bug #1823031. That's going to cause some trouble for bisecting. When bisecting, you have to be very careful to check which crash you're getting and make sure it's not bug #1823031. You'll inevitably hit that crash probably for large portions of the bisect, in which case you'll need to manually apply this patch here and then un-apply before going on to the next revision: https://src.fedoraproject.org/rpms/webkit2gtk3/raw/d2086487babfb8d1e2c4cff00f7b4123262abf89/f/fix-ppc64le.patch That should work for most revisions that you test; there will be a couple weeks where that patch won't apply cleanly, but if you're unlucky and hit one of those revisions it should be pretty easy to adapt it to apply. OK, I know that's a bit more complex that I was hoping for, but it should still be doable... good luck, let me know if you have questions. Sure, let me try. I will come back to you asap. (In reply to Michael Catanzaro from comment #4) > Well I also forgot about bug #1823031. That's going to cause some trouble > for bisecting. When bisecting, you have to be very careful to check which > crash you're getting and make sure it's not bug #1823031. You'll inevitably > hit that crash probably for large portions of the bisect, in which case > you'll need to manually apply this patch here and then un-apply before going > on to the next revision: > > https://src.fedoraproject.org/rpms/webkit2gtk3/raw/ > d2086487babfb8d1e2c4cff00f7b4123262abf89/f/fix-ppc64le.patch > > That should work for most revisions that you test; there will be a couple > weeks where that patch won't apply cleanly, but if you're unlucky and hit > one of those revisions it should be pretty easy to adapt it to apply. > > OK, I know that's a bit more complex that I was hoping for, but it should > still be doable... good luck, let me know if you have questions. No luck so far for me. I am trying to find one version that actually works with FC31 before starting the git bisecting process. I am now at 2.27.4 (no luck) and continuing to go further back (let's hope it would work with 2.26.4). Btw, I think this issue is also related to other components that are specific to FC32. I could verify that webkit2gtk3-2.28.0 + epiphany 3.34 runs perfectly on FC31. EDIT: s/that actually works with FC31/that actually works with FC32 (In reply to Trung LE from comment #6) > Btw, I think this issue is also related to other components that are > specific to FC32. I could verify that webkit2gtk3-2.28.0 + epiphany 3.34 > runs perfectly on FC31. You need to look at the Fedora package revision as well if you're using the packaged version. 2.28.0 is guaranteed to crash on ppc64le due to bug #209236. 2.28.0-7 has my first attempt to fix it (which you had reported broken, but which might not have been since you might have been seeing the second crash instead) and 2.28.0-8 has my second attempt. > You need to look at the Fedora package revision as well if you're using the packaged version. 2.28.0 is guaranteed to crash on ppc64le due to bug #209236. 2.28.0-7 has my first attempt to fix it (which you had reported broken, but which might not have been since you might have been seeing the second crash instead) and 2.28.0-8 has my second attempt. I fetch the repo with `git clone https://src.fedoraproject.org/rpms/webkit2gtk3.git` then switch to f31 branch with `git checkout f31` and checkout the 2.26.4 commit with `git reset a9158c8285624fe6b1a41691bb079ea963c1b04f && git reset --hard && git clean -df` Next I build the RPM with `fedpkg --release f32 local` Once the RPMs are generated, I then install them with `sudo rpm -i ppc64le/*.rpm`. I open the Minibrowser by `/user/libexec/webkit2gtk-4.0/MiniBrowser`. The app report following error: ``` /usr/libexec/webkit2gtk-4.0/WebKitWebProcess: symbol lookup error: /usr/libexec/webkit2gtk-4.0/WebKitWebProcess: undefined symbol: WebProcessMainUnix ``` Running `/usr/libexec/webkit2gtk-4.0/WebKitWebProcess` directly also yield the same error. Wondering if you could suggest how to work-around this issue. My objective is to confirm that FC32 does not have issue with 2.26.4 Minibrowser. Hm, I don't know why your build is broken, but I didn't mean to suggest trying to build old packages on new Fedora. It would make more sense to just go back in time and try to figure out: when was the last time this worked? Does WebKitGTK work in F31? Does it work in F30? Does it work in F29? Eventually if you go back far enough, it ought to work, and we'll have a starting point. And if not, then we know this isn't a regression. (In reply to Michael Catanzaro from comment #10) > Hm, I don't know why your build is broken, but I didn't mean to suggest > trying to build old packages on new Fedora. It would make more sense to just > go back in time and try to figure out: when was the last time this worked? > Does WebKitGTK work in F31? Does it work in F30? Does it work in F29? > Eventually if you go back far enough, it ought to work, and we'll have a > starting point. And if not, then we know this isn't a regression. Sorry for late reply. Here is my finding: On Fedora 31 webkit2gtk3 2.26.0 to 2.28.2 ALL WORK :) On Fedora 32 webkit2gtk3 2.26.4 to 2.28.2 NONE WORK :( I could __not__ find a working starting revision to start the git bisection. (In reply to Trung LE from comment #11) > On Fedora 31 webkit2gtk3 2.26.0 to 2.28.2 ALL WORK :) > > On Fedora 32 webkit2gtk3 2.26.4 to 2.28.2 NONE WORK :( > > I could __not__ find a working starting revision to start the git bisection. OK, that's weird. So that's proof that this is not a regression in WebKit, but a regression somewhere else. I have absolutely no clue what component to suspect. (In reply to Michael Catanzaro from comment #12) > (In reply to Trung LE from comment #11) > > On Fedora 31 webkit2gtk3 2.26.0 to 2.28.2 ALL WORK :) > > > > On Fedora 32 webkit2gtk3 2.26.4 to 2.28.2 NONE WORK :( > > > > I could __not__ find a working starting revision to start the git bisection. > > OK, that's weird. So that's proof that this is not a regression in WebKit, > but a regression somewhere else. > > I have absolutely no clue what component to suspect. Again on Fedora 32, I was hopeful that 2.26.4 would run correctly but I ran into the /usr/libexec/webkit2gtk-4.0/WebKitWebProcess: symbol lookup error: /usr/libexec/webkit2gtk-4.0/WebKitWebProcess: undefined symbol: WebProcessMainUnix error which is totally different to the reported JSC one. Is there any chance you could trigger a custom-built RPM of the Fc32 2.26.4 on koji build system? There's no point. You've already proven that 2.28.2 works fine in F31. I don't run Fedora, but I can confirm that on Void Linux ppc64le this issue happens when updating to 2.28.2 but not in 2.26.4 (I'm the package maintainer, currently working on updating it). I get that exact StructureIDTable backtrace. Also, this does not crash always. Simple sites work (even with some JS), only complicated javascript (e.g. on twitter) crashes it. Anyway, I have the whole Git history cloned, built, managed to reproduce the issue, currently going back in history to catch the exact revision. I narrowed the crash down to a single commit: http://git.webkit.org/?p=WebKit.git;a=commit;h=2c0c47246d650167ff6c551d5c346f194538e0da This will not build on its own (see the #errors about unknown architecture) so it needs to be combined with a fix that came later: http://git.webkit.org/?p=WebKit.git;a=commit;h=bc5d7c9715a510503a0155b9fdbecbf5c551f559 After that, the crash is reproducible. The commit right before the first change, i.e. http://git.webkit.org/?p=WebKit.git;a=commit;h=2664ebec35d469bd909c9df8c5465c74a6cd49d0 builds and works perfectly fine; for anything after that, I get the StructureIDTable crash. I will try reverting this on top of master later to verify, as well as try to figure out why exactly this is crashing. It's late here though, so that's for later; if you have a hunch or something else, I'd appreciate that, too. Great work, thanks Daniel. (In reply to Michael Catanzaro from comment #14) > There's no point. You've already proven that 2.28.2 works fine in F31. Maybe a false negative, since Daniel says the crash does not always occur? I built WebKit master, which crashes in the same way. I then went on to modify GCMemoryOperations.h and replaced its functions with simply the following: ``` template <typename T> ALWAYS_INLINE void gcSafeMemcpy(T* dst, T* src, size_t bytes) { static_assert(sizeof(T) == sizeof(JSValue)); RELEASE_ASSERT(bytes % 8 == 0); memcpy(dst, src, bytes); } template <typename T> ALWAYS_INLINE void gcSafeMemmove(T* dst, T* src, size_t bytes) { static_assert(sizeof(T) == sizeof(JSValue)); RELEASE_ASSERT(bytes % 8 == 0); memmove(dst, src, bytes); } template <typename T> ALWAYS_INLINE void gcSafeZeroMemory(T* dst, size_t bytes) { static_assert(sizeof(T) == sizeof(JSValue)); RELEASE_ASSERT(bytes % 8 == 0); memset(reinterpret_cast<char*>(dst), 0, bytes); } ``` It no longer crashes when I do that (it effectively reverts the SVN revision, except it makes sure the problem is not in the way these functions are called). So it's definitely that specific rev that's the problem. I believe I have fixed the problem. I'm currently doing a rebuild to verify that. Will attach a patch once I've confirmed it Created attachment 398740 [details]
Patch to fix the regression
To quote the commit message:
The problem at hand here is that the control flow is wrong. As it was, we'd do something like:
```
if (bytes <= smallCutoff) {
slow path
} else if (aarch64 || bytes <= mediumCutoff) {
either x86_64 path, aarch64 path or slow path
} else {
assert(x86_64)
do x86_64 path, or nothing on other archs
}
```
That means everything on non-x86_64/aarch64 that tried to memcpy more than mediumCutoff would end up doing nothing.
Fix the code so that slow path is taken automatically always if running non-x86_64/aarch64 architectures. Remove the #else in the mediumCutoff branch as that is now never taken.
(In reply to Daniel Kolesa from comment #23) > Created attachment 398740 [details] > Patch to fix the regression > > To quote the commit message: > > The problem at hand here is that the control flow is wrong. As it was, we'd > do something like: > > ``` > if (bytes <= smallCutoff) { > slow path > } else if (aarch64 || bytes <= mediumCutoff) { > either x86_64 path, aarch64 path or slow path > } else { > assert(x86_64) > do x86_64 path, or nothing on other archs > } > ``` > > That means everything on non-x86_64/aarch64 that tried to memcpy more than > mediumCutoff would end up doing nothing. > > Fix the code so that slow path is taken automatically always if running > non-x86_64/aarch64 architectures. Remove the #else in the mediumCutoff > branch as that is now never taken. Good catch. I can confirm that patch addresses the regression. Nice. (In reply to Daniel Kolesa from comment #23) > That means everything on non-x86_64/aarch64 that tried to memcpy more than > mediumCutoff would end up doing nothing. Well it must be hitting the RELEASE_ASSERT(isX86_64()) there. Oops! Normally a RELEASE_ASSERT() is a smoking gun, but because the function used ALWAYS_INLINE, it didn't show up in the backtrace, which made this a *lot* harder than it should have been. :/ Makes sense. Either way, webkit master now builds and functions properly, which should take care of webkitgtk for a while. I think I'll submit my 32-bit big endian llint patch too while at it... Comment on attachment 398740 [details] Patch to fix the regression View in context: https://bugs.webkit.org/attachment.cgi?id=398740&action=review If you have a git checkout, please run Tools/Scripts/prepare-ChangeLog -b 210685 to prepare the patch for Bugzilla. If you're working from tarball, probably easiest for me to do that for you. > Source/JavaScriptCore/heap/GCMemoryOperations.h:57 > + if (bytes <= smallCutoff || (!isARM64() && !isX86_64())) Probably best to use build guards here rather than runtime guards. What I would do in your patch is: not touch this line, keep the call to slowPathForwardMemcpy() belong, remove the RELEASE_ASSERT(isX86_64()), and then add another #else that also calls slowPathForwardMemcpy() when this second #if CPU(X86_64) branch is not taken. Sound good? That results in a confusing mess, but importantly it would be parallel to the confusing mess in the other two functions, below, both of which duplicate the fallback path first for non-x86_64/aarch64 and then again for non-GCC/clang. Then I can follow up with a patch to change the guards so that we don't need to write the fallback case twice in a row at the bottom of the function. (In reply to Michael Catanzaro from comment #28) > What I would do in your patch is: not touch this line, keep the call to > slowPathForwardMemcpy() belong, I meant: "below" Doing that results in needless branches being taken based on smallCutoff/mediumCutoff, which just falling back to slow path early prevents... runtime guards shouldn't matter too much, these checks are constexpr, and the compiler will fold them away, plus it's shorter and less messy (In reply to Daniel Kolesa from comment #27) > Makes sense. Either way, webkit master now builds and functions properly, > which should take care of webkitgtk for a while. I think I'll submit my > 32-bit big endian llint patch too while at it... Sounds like you are using a git checkout. If you're bored, you could try Tools/run-jsc-stress-tests and see if it *really* works. :P On internal CI, I'm currently seeing 150 failures on ppc64le and s390x that weren't there in 2.26, but it's entirely plausible you just fixed them. ;) Created attachment 398760 [details]
patch with changelog entry
added changelog entry
I can run the stress tests, how do I do that from a built git tree? I've just built it using cmake as usual (In reply to Daniel Kolesa from comment #30) > Doing that results in needless branches being taken based on > smallCutoff/mediumCutoff, which just falling back to slow path early > prevents... OK, you're right. That would be avoided by my proposed follow-up patch, though. If you prefer, we can do it all in one patch. In WebKit we have an annoying rule that you only attach one patch per bug report, so we have a tendency to solve multiple slightly-related things in the same patch that might be expected to be two separate patches in other projects. We can fix it with build guards and also clean up the guards all in one. Here's what I was thinking: ``` #if USE(JSVALUE64) #if COMPILER(GCC_COMPATIBLE) && (CPU(X86_64) || CPU(ARM64)) if (bytes <= smallCutoff) { slow path } else if (aarch64 || bytes <= mediumCutoff) { #if CPU(X86_64) x86_64 fast path #elif CPU(ARM64) aarch64 fast path #endif } else { assert(x86_64) #if (X86_64) x86_64 fast path #endif } #else // non-x86_64/aarch64 or non-GCC-compatible JSVALUE64 path slow path #else // JSVALUE32 path vanilla memcpy #endif ``` I think that's the simplest we can get it down to (for some value of "simple"). Then we'd make the same change in the other two functions as well, adding && (CPU(X86_64) || CPU(ARM64)) guards at the top of the chain so that we can compress the non-GCC compat and non-x86_64/aarch64 fallbacks together. Do you agree that makes sense? I can upload a patch if you want, but you get your name in the commit message if you try. ;) We can also go with your simple patch here, but I would wind up basically undoing the change in that follow-up. yeah that looks good to me, I can update the patch (In reply to Daniel Kolesa from comment #33) > I can run the stress tests, how do I do that from a built git tree? I've > just built it using cmake as usual You probably need to use Tools/Scripts/update-webkitgtk-libs and then Tools/Scripts/build-webkit --gtk first. Come to think of it, that's probably not worth the effort; you're as likely as not to spend the rest of the day debugging the development scripts. I'll find out whether our ppc64le and s390x bots are happy with this change soon enough anyway. Alright, I have an updated patch (which actually makes it *more* consistent with the other functions, the logic was overall kinda broken) without changing any more than necessary, just doing one more build to check if I haven't messed up and things still work. Created attachment 398767 [details]
updated to use compile-time guards
This updates the patch to use compile-time guards. The second USE(JSVALUE64) was not necessary, since the whle thing is wrapped in USE(JSVALUE64) above that, so instead I replaced it with arch guards, which effectively caused the slow path to be taken for other JSVALUE64 archs always (plain memcpy fallback for non-JSVALUE64 works as before).
Comment on attachment 398767 [details]
updated to use compile-time guards
Exactly, thanks. I'll follow-up to change this in the other two functions as well, since they can benefit from the same change.
Also confirmed functionality on ppc64 (big endian), with webkitgtk 2.28.2. Committed r261326: <https://trac.webkit.org/changeset/261326> All reviewed patches have been landed. Closing bug and clearing flags on attachment 398767 [details]. (In reply to Michael Catanzaro from comment #36) > I'll find out whether our ppc64le and > s390x bots are happy with this change soon enough anyway. You fixed 149 out of 150 test failures. Thanks. ;) |