RESOLVED DUPLICATE of bug 150936 Bug 149061
[ARM] REGRESSION(r189575): It made 2860 tests fail/crash on AArch64 Linux
https://bugs.webkit.org/show_bug.cgi?id=149061
Summary [ARM] REGRESSION(r189575): It made 2860 tests fail/crash on AArch64 Linux
Csaba Osztrogonác
Reported 2015-09-11 04:32:12 PDT
Unfortunately I can't add the details to the bug report, because https://build.webkit.org/waterfall is out of order again and again. :-/
Attachments
Patch used for X86-64 Callee Saves debugging (3.74 KB, patch)
2015-09-11 09:50 PDT, Michael Saboff
no flags
Csaba Osztrogonác
Comment 1 2015-09-11 04:37:37 PDT
Ah, build.webkit.org works again, so here is the link about this regression: https://build.webkit.org/builders/EFL%20Linux%20AArch64%20Release/builds/3270 I tested manually, everything works fine on r189574, but there are 2860 failures/crash on r189575 (with its buildfix r189588 I'm going to investigate this issue and try to provide debug build logs and/or other useful information.
Csaba Osztrogonác
Comment 2 2015-09-11 06:39:20 PDT
Unfortunately I can't reproduce this bug in debug mode. :( I will try to reproduce it on a relase build with debug symbols.
Michael Saboff
Comment 3 2015-09-11 09:48:29 PDT
While debugging the callee saves work, I would run into failures on release builds that wouldn't reproduce with debug builds. Typically this was due to the optimizer making use of callee saves registers in the compiled C++ code. If JSC inadvertently stepped on one of those registers, it would only cause a problem on release builds. The first place I would look is in the FTL code. For example, I didn't test any of the changes to the Linux specific code in FTLUnwindInfo.cpp. See if failing tests work when the FTL is turned off. One technique that I used to track down these kinds of problems was to add back in the saving and restoring of callee saves to the pushCalleeSaves() / popCalleeSaves() macros in LowLevelInterpreter.asm and then in LowLEvelInterpreter64.asm:doVMEntry, write sentinel numeric values to the callee saves registers, e.g. 0x1019 to x19, 0x1020 to x20, ... After "makeCall()" in doVMEntry and at the beginning of _handleUncaughtException, compare the values with a breakpoint on mismatch. I made a macro to do the testing. That did 2 things, first it allowed building with debug. But probably more useful was that at any point executing in the JavaScript VMs I could look at the registers to see that they had the sentinel values were they should. I could also check the CallFrames that we saved the sentinel values where appropriate. I'll post a patch with this technique that I used for X86-64 debugging.
Michael Saboff
Comment 4 2015-09-11 09:50:48 PDT
Created attachment 261007 [details] Patch used for X86-64 Callee Saves debugging
Csaba Osztrogonác
Comment 5 2015-09-15 02:59:46 PDT
(In reply to comment #3) > While debugging the callee saves work, I would run into failures on release > builds that wouldn't reproduce with debug builds. Typically this was due to > the optimizer making use of callee saves registers in the compiled C++ code. > If JSC inadvertently stepped on one of those registers, it would only cause > a problem on release builds. > > The first place I would look is in the FTL code. For example, I didn't test > any of the changes to the Linux specific code in FTLUnwindInfo.cpp. See if > failing tests work when the FTL is turned off. > > One technique that I used to track down these kinds of problems was to add > back in the saving and restoring of callee saves to the pushCalleeSaves() / > popCalleeSaves() macros in LowLevelInterpreter.asm and then in > LowLEvelInterpreter64.asm:doVMEntry, write sentinel numeric values to the > callee saves registers, e.g. 0x1019 to x19, 0x1020 to x20, ... After > "makeCall()" in doVMEntry and at the beginning of _handleUncaughtException, > compare the values with a breakpoint on mismatch. I made a macro to do the > testing. That did 2 things, first it allowed building with debug. But > probably more useful was that at any point executing in the JavaScript VMs I > could look at the registers to see that they had the sentinel values were > they should. I could also check the CallFrames that we saved the sentinel > values where appropriate. I'll post a patch with this technique that I used > for X86-64 debugging. Thanks for the ideas and the patch for debugging. I didn't check the FTL code yet, because it is disabled by default on Linux. I don't know if it works at all, I didn't check it in the latest 4-5 months. But it seems the bug is in the DFG tier somewhere, because tests pass with (build time) disabled DFG. (except ~20 tests) And I already managed to catch register mismatches with the idea you suggested. I'll continue debugging in the near future.
Csaba Osztrogonác
Comment 6 2015-11-12 01:58:01 PST
https://trac.webkit.org/changeset/192352 already fixed this issue. Sometimes Linux failures point out real but hidden failure on iOS. ;) before: https://build.webkit.org/builders/EFL%20Linux%20AArch64%20Release/builds/4313 - 3227 failures after: https://build.webkit.org/builders/EFL%20Linux%20AArch64%20Release/builds/4314 - 52 failures The remaining failures might be related to this issue or can be a different issue, who knows what else happened in the latest 2 months. I'm going to file a new bug report for the remaining failures. *** This bug has been marked as a duplicate of bug 150936 ***
Note You need to log in before you can comment on or make changes to this bug.