Summary: | Tail calls are broken on ARM_THUMB2 and MIPS | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Caio Lima <ticaiolima> | ||||||||||||||||||||||||||||||||
Component: | JavaScriptCore | Assignee: | Caio Lima <ticaiolima> | ||||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||||||
Severity: | Normal | CC: | commit-queue, ews-watchlist, guijemont, guijemont+jsc-armv7-ews, keith_miller, mark.lam, msaboff, rmorisset, saam, tzagallo, webkit-bug-importer, ysuzuki | ||||||||||||||||||||||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||||||||
Bug Depends on: | |||||||||||||||||||||||||||||||||||
Bug Blocks: | 196399 | ||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Caio Lima
2019-05-10 13:59:06 PDT
Created attachment 369606 [details]
WIP - Patch
WIP
Comment on attachment 369606 [details] WIP - Patch Attachment 369606 [details] did not pass jsc-armv7-ews (jsc-only): Output: https://webkit-queues.webkit.org/results/12157526 New failing tests: jsc-layout-tests.yaml/js/script-tests/regress-150580.js.layout jsc-layout-tests.yaml/js/script-tests/regress-150580.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/regress-150580.js.layout-no-cjit stress/tail-call-with-spilled-registers.js.--useConcurrentJIT=false stress/call-link-info-osrexit-repatch.js.ftl-eager jsc-layout-tests.yaml/js/script-tests/regress-150580.js.layout-no-llint apiTests Comment on attachment 369606 [details] WIP - Patch Attachment 369606 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/12161417 New failing tests: security/contentSecurityPolicy/video-with-file-url-allowed-by-media-src-star-with-AllowContentSecurityPolicySourceStarToMatchAnyProtocol-enabled.html Created attachment 369650 [details]
Archive of layout-test-results from ews215 for win-future
The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews215 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Created attachment 369895 [details]
WIP - Patch
Comment on attachment 369895 [details] WIP - Patch Attachment 369895 [details] did not pass jsc-armv7-ews (jsc-only): Output: https://webkit-queues.webkit.org/results/12191921 New failing tests: jsc-layout-tests.yaml/js/script-tests/regress-150580.js.layout-dfg-eager-no-cjit es6.yaml/es6/proper_tail_calls_tail_call_optimisation_mutual_recursion.js.default stress/shadow-chicken-enabled.js.shadow-chicken jsc-layout-tests.yaml/js/script-tests/regress-150580.js.layout-no-llint stress/mutual-tail-call-no-stack-overflow.js.dfg-eager stress/dfg-tail-calls.js.no-llint jsc-layout-tests.yaml/js/script-tests/regress-150580.js.layout stress/mutual-tail-call-no-stack-overflow.js.no-llint stress/dfg-tail-calls.js.default stress/mutual-tail-call-no-stack-overflow.js.no-cjit-collect-continuously stress/dfg-tail-calls.js.no-cjit-validate-phases stress/mutual-tail-call-no-stack-overflow.js.dfg-maximal-flush-validate-no-cjit jsc-layout-tests.yaml/js/script-tests/regress-150580.js.layout-no-cjit stress/dfg-tail-calls.js.dfg-maximal-flush-validate-no-cjit stress/mutual-tail-call-no-stack-overflow.js.no-cjit-validate-phases stress/dfg-tail-calls.js.dfg-eager-no-cjit-validate stress/mutual-tail-call-no-stack-overflow.js.dfg-eager-no-cjit-validate stress/dfg-tail-calls.js.dfg-eager stress/mutual-tail-call-no-stack-overflow.js.default stress/dfg-tail-calls.js.no-cjit-collect-continuously stress/call-link-info-osrexit-repatch.js.ftl-eager apiTests Created attachment 369981 [details]
WIP - Patch
Test to verify EWS.
Created attachment 370019 [details]
WIP - Patch
I want to test EWS.
Attachment 370019 [details] did not pass style-queue:
ERROR: Source/JavaScriptCore/runtime/StackAlignment.h:35: Place brace on its own line for function definitions. [whitespace/braces] [4]
Total errors found: 1 in 4 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 370021 [details]
WIP - Patch
Testing EWS.
Attachment 370021 [details] did not pass style-queue:
ERROR: Source/JavaScriptCore/runtime/StackAlignment.h:35: Place brace on its own line for function definitions. [whitespace/braces] [4]
Total errors found: 1 in 4 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 370650 [details]
WIP - Patch
Created attachment 370651 [details]
ARMv7 Benchmark results
This change seems to be neutral on ARMv7.
Comment on attachment 370650 [details] WIP - Patch Attachment 370650 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/12296122 New failing tests: svg/text/textpath-reference-update.html storage/indexeddb/modern/gc-closes-database.html Created attachment 370661 [details]
Archive of layout-test-results from ews211 for win-future
The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews211 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
I don't think this is the right fix. In x64 and ARM64, super actively maintained JIT architectures, we are strongly assuming this requirement. I think the right fix is pushing something or doing something, and adjusting the misaligned stack pointer to the aligned one in these architectures. (In reply to Yusuke Suzuki from comment #16) > I don't think this is the right fix. In x64 and ARM64, super actively > maintained JIT architectures, we are strongly assuming this requirement. I > think the right fix is pushing something or doing something, and adjusting > the misaligned stack pointer to the aligned one in these architectures. Thx for the comment! I'm going to resuse part of the text I was adding to the ChangeLog. Let's consider the broken case with the following script: ``` let o = { get x() { return tail_call(); } } let foo = (o) => { ... return o.x == o.y; } ``` Also, lets suppose that we compiled `foo` into DFG and we generated IC code for `o.x` and `o.y`. The register `r1` is assigned to `o`. Since the IC code of `o.x` is a getter case, this code will spill `r1` to the stack to proper setup the JS call and restore its value when the getter returns from execution (see third line of stack below). The getter IC code allocates 32 bytes to call the JS getter, and during the getter's prologue, `lr` and `cfr` are pushed to the stack, ending up with a new frame pointed by `cfr1`. During tail call preparation, we then calculate the top of stack using the following operations: ``` muli SlotSize, argc # Slot size is 8 # StackAlignment = 16 # CallFrameHeaderSize = 32 addi StackAlignment - 1 + CallFrameHeaderSize, argc, argcInBytes andi ~StackAlignmentMask, argcInBytes move cfr1, newCFR addp argcInBytes, newCFR ``` Cosidering `argc = 1` in the operation above, the result is then `argcInBytes = 48`. Adding 48 bytes to `newCFR` will make the top of the stack off by 8 bytes, since current stack frame is 40 bytes (32 from caller allocation + 4 of `lr` + 4 of `cfr`). In the end, when we start the copy operation of the stack, the address where `r1` is spilled will be cloberred, generating unexpected results. +----------------+ <-- Stack of program execution. Each cell is 4 bytes. | ... | +----------------+ <-- newCFR (where the calculation of top of stack will result) | ... | +----------------+ | r1 | +----------------+ <-- Start of stack allocated to call getter | ... | | +----------------+ | 32 bytes | ... | | +----------------+ <-- End of stack allocated to call getter | lr | +----------------+ | cfr | +----------------+ <-- cfr1 Now that I gave more context about the issue, lets move to possible solutions. The first solution, as you mentioned, would be to adjust `cfr` to be 16-byte aligned and calculate top of stack properly. The problem of this approach is that every time we execute `prepareForTailCall`, we will be increasing the stack 8-bytes to adjust `cfr`, causing Stack Overflow at some point. That's what caused the errors on https://bugs.webkit.org/attachment.cgi?id=369606. Pushing things to the stack to align it is pretty much the same as first approach. The problem triggered specifically when there is an IC getter and tail calls involved, but I have the feeling that this stack alignment of 16 bytes can cause crashes on other cases. The problem is that they are very hard to reproduce. Changing the stack alignment for those architectures seems to be a better fix IMHO, because it will reflect the real alignment of their stack. Also, it seems to be performance neutral on ARMv7 (I'm collecting numbers from MIPS now). Maybe I'm missing some context of the rest of the code, but I don't understand what would be the problem of changing this requirement. If it makes the code harder to maintain or cause any kind of regression, I agree we should look for another solution, but I don't know if that is the case. Created attachment 370784 [details]
WIP - Patch
Created attachment 370785 [details]
MIPS performance benchmarks
This patch also seems to be performance neutral (to slightly better) on MIPS.
Created attachment 371567 [details]
Patch
Comment on attachment 371567 [details] Patch Attachment 371567 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/12403988 New failing tests: imported/blink/fast/canvas/bug382588.html Created attachment 371580 [details]
Archive of layout-test-results from ews211 for win-future
The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews211 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Ping review. Comment on attachment 371580 [details]
Archive of layout-test-results from ews211 for win-future
Failure seems not related with changes.
Ping review. Comment on attachment 371567 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=371567&action=review r- > Source/JavaScriptCore/bytecode/AccessCase.cpp:899 > + // For ARMv7 and MIPS architectures, we need 8 extra bytes to > + // guarantee that stack size have enough space to be reused by a > + // tail call. Since sizeof(CallerFrameAndPC) == 8 for those architectures, > + // we only need to calculate `numberOfBytesForCall = numberOfRegsForCall * sizeof(Register)`. > + unsigned numberOfBytesForCall = numberOfRegsForCall * sizeof(Register); This doesn't seem like a reasonable fix. It "works" by creating extra stack space, but that would suggest that our stack allocation code when making calls is suspect, which I strongly doubt. It is more likely that the call frame shuffler doesn't know that r1 got spilled or is still alive, or there is another call frame shuffler bug. (In reply to Michael Saboff from comment #26) > Comment on attachment 371567 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=371567&action=review > > r- > > > Source/JavaScriptCore/bytecode/AccessCase.cpp:899 > > + // For ARMv7 and MIPS architectures, we need 8 extra bytes to > > + // guarantee that stack size have enough space to be reused by a > > + // tail call. Since sizeof(CallerFrameAndPC) == 8 for those architectures, > > + // we only need to calculate `numberOfBytesForCall = numberOfRegsForCall * sizeof(Register)`. > > + unsigned numberOfBytesForCall = numberOfRegsForCall * sizeof(Register); > > This doesn't seem like a reasonable fix. It "works" by creating extra stack > space, but that would suggest that our stack allocation code when making > calls is suspect, which I strongly doubt. It is more likely that the call > frame shuffler doesn't know that r1 got spilled or is still alive, or there > is another call frame shuffler bug. I think I was not clear enough in my previous explanation, so let me give deeper details and more concrete data. There is no problem with call frame shuffler and spilled registers. See the generated code for the crashing test on ARMv7: ``` Generated JIT code for Access stub for c1#AVpIb9:[0x7289c2a0->0x7289c0e0->0x728ce1c0, DFGFunctionCall, 54 (NeverInline) (StrictMode)] bc#10 with return point CodePtr(executable = 0x73dfe731, dataLocation = 0x73dfe730): Getter:(Generated, offset = 0, structure = 0x728ff110:[Object, {c2:0}, NonArray, Proto:0x728c4000, Leaf], viaProxy = false, additionalSet = (nil), customSlotBase = (nil), callLinkInfo = 0x73ea8400, customAccessor = (nil)): Code at [0x73dfe061, 0x73dfe121): 0x73dfe060: ldr r6, [r0] 0x73dfe062: movw ip, #0xf110 0x73dfe066: movt ip, #0x728f 0x73dfe06a: cmp.w r6, ip 0x73dfe06e: ittt ne 0x73dfe070: nopne 0x73dfe072: nopne.w 0x73dfe076: bne.w #0x73dfe8f8 0x73dfe07a: ldr r3, [r0, #0x10] 0x73dfe07c: sub sp, #0x20 0x73dfe07e: str r0, [sp] <----------------------------- Spilling `r0` 0x73dfe080: str r1, [sp, #8] 0x73dfe082: str r2, [sp, #0x10] 0x73dfe084: mov.w ip, #0 0x73dfe088: str.w ip, [r7, #0x1c] 0x73dfe08c: ldr r3, [r3, #0x10] 0x73dfe08e: tst r3, r3 0x73dfe090: beq #0x73dfe0ec 0x73dfe092: sub sp, #0x20 <----------------------------- Stack allocation to call getter 0x73dfe094: mov.w ip, #1 0x73dfe098: str.w ip, [sp, #0x10] 0x73dfe09c: str r3, [sp, #8] 0x73dfe09e: mvn ip, #4 0x73dfe0a2: str.w ip, [sp, #0xc] 0x73dfe0a6: str r0, [sp, #0x18] 0x73dfe0a8: mvn ip, #4 0x73dfe0ac: str.w ip, [sp, #0x1c] 0x73dfe0b0: movw ip, #0 0x73dfe0b4: movt ip, #0 0x73dfe0b8: cmp.w r3, ip 0x73dfe0bc: bne #0x73dfe0ce 0x73dfe0be: movw ip, #0 0x73dfe0c2: movt ip, #0 0x73dfe0c6: blx ip 0x73dfe0c8: mov r2, r1 0x73dfe0ca: mov r1, r0 0x73dfe0cc: b #0x73dfe0f2 0x73dfe0ce: mov r0, r3 0x73dfe0d0: mvn r1, #4 0x73dfe0d4: movw r2, #0x8400 0x73dfe0d8: movt r2, #0x73ea 0x73dfe0dc: movw ip, #0xf501 0x73dfe0e0: movt ip, #0x72df 0x73dfe0e4: blx ip 0x73dfe0e6: mov r2, r1 0x73dfe0e8: mov r1, r0 0x73dfe0ea: b #0x73dfe0f2 0x73dfe0ec: mvn r2, #3 0x73dfe0f0: movs r1, #0 0x73dfe0f2: mvn ip, #0x87 0x73dfe0f6: add ip, r7 0x73dfe0f8: mov sp, ip 0x73dfe0fa: ldr r0, [sp] <----------------------------- Restoring `r0` 0x73dfe0fc: add sp, #0x20 0x73dfe0fe: nop 0x73dfe100: nop.w 0x73dfe104: b.w #0x73dfe730 0x73dfe108: nop 0x73dfe10a: nop.w 0x73dfe10e: b.w #0x73dfe8f8 0x73dfe112: bkpt #0 0x73dfe114: bkpt #0 0x73dfe116: bkpt #0 0x73dfe118: bkpt #0 0x73dfe11a: bkpt #0 ``` In this example, the corrupted value is in `r0`. We can see on instruction `0x73dfe07e` that this register is being spilled to `[sp]` and being restored on `0x73dfe0fa`. This same generated code shows why we are having corrupted value on `r0` after the execution of getter. Instruction `0x73dfe092` is allocating 32 bytes to call the getter `c2`. The JS code of this getter is the following: ``` function c2() { let b = Math.random(); let c = Math.random(); let d = Math.random(); let e = Math.random(); return c3('test', b, c, d, e); } ``` In this case, this getter contains a tail call and that’s where the problem is. If we look into LLInt operation to calculate the top of stack during `prepareForTailCall`, we can see that the minimum size of a stack frame when there is 1 argument is 48 bytes: ``` loadi PayloadOffset + ArgumentCount[cfr], temp2 loadp CodeBlock[cfr], temp1 loadi CodeBlock::m_numParameters[temp1], temp1 bilteq temp1, temp2, .noArityFixup move temp1, temp2 .noArityFixup: # We assume < 2^28 arguments # At this point, temp2 is 1, since the getter contains only `this` as parameter. # SlotSize is 8. muli SlotSize, temp2 # StackAlignment - 1 here is 15 and CallFrameHeaderSize is 32 on MIPS and ARM. # This will result in temp2 + 47. In our example, we will then have # temp2 = 8 + 47 = 55 addi StackAlignment - 1 + CallFrameHeaderSize, temp2 andi ~StackAlignmentMask, temp2 # After applying the mask temp2 == 48 # Then calculate top of stack move cfr, temp1 addp temp2, temp1 ``` When the program reaches this point, the current call frame size is 40 bytes (32 allocated by the caller plus 4 bytes of `lr` and 4 bytes of `cfr` pushed to stack during prologue). This means that the top of stack is 8 bytes off from its real position and the memory position where `r0` is spilled will be clobbered during copy operation. This is also going to be the case for “prepare for tail call” on Baseline and DFG. In the end, when `r0` is restored from stack, it will then contain garbage and next access to it can potentially crash. This is not a problem on 64-bit architectures, because their stack frame will have at least 48 bytes in this case. Given this analysis, I proposed 2 fixes in mind (and I also uploaded both of them in this bug). One of them is allocating +8 bytes during getter IC call and avoid clobbering on spilled registers (that’s what the current patch is doing). Another solution is to change StackAlignment to 8 bytes on those architectures (https://bugs.webkit.org/attachment.cgi?id=370784&action=diff), since it will then make the calculation of top of stack work properly. IMHO, the later solution sounds better because it will reflect the real stack alignment of ARMv7 and MIPS. Maybe there is something behind the scenes that makes such change not correct (see comment #16), but, TBH, I can’t find any evidence that changing the alignment requirement can cause issues. Also, I’ve already ran performance benchmarks to check if changing the stack alignment will cause any regression, but the results are neutral (they are also uploaded to the bug). Anyway, I’m fine with any of these solutions. The final question is: do these solutions still looks wrong fix for the bug? PS: One more important thing to share is the reason we can’t see such issues on regular calls. In this case, we allocate at least 48 bytes, even if there is no argument for the call. @Yusuke and @Michael Saboff Does your opinions prevail after my comment #27? (In reply to Caio Lima from comment #28) > @Yusuke and @Michael Saboff do your opinions prevail after my comment #27? Ping. (In reply to Caio Lima from comment #28) > @Yusuke and @Michael Saboff Does your opinions prevail after my comment #27? ping Created attachment 392300 [details]
Patch
Ping Comment on attachment 392300 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=392300&action=review r=me > Source/JavaScriptCore/ChangeLog:9 > + `prepareForTailCall` operation expects that header size + parameters > + size is aligned with stack (alignment is 16-bytes for every architecture). Can we debug-assert it? Created attachment 393034 [details]
Patch
Comment on attachment 392300 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=392300&action=review Thank you very much for the review! >> Source/JavaScriptCore/ChangeLog:9 >> + size is aligned with stack (alignment is 16-bytes for every architecture). > > Can we debug-assert it? Done. Comment on attachment 393034 [details] Patch Clearing flags on attachment: 393034 Committed r258143: <https://trac.webkit.org/changeset/258143> All reviewed patches have been landed. Closing bug. |