Summary: | Enhance MacroAssembler::probe() to support an initializeStackFunction callback. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Mark Lam <mark.lam> | ||||||
Component: | JavaScriptCore | Assignee: | Mark Lam <mark.lam> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Normal | CC: | fpizlo, jfbastien, keith_miller, msaboff, ossy, pvollan, rmorisset, saam, webkit-bug-importer | ||||||
Priority: | P2 | Keywords: | InRadar | ||||||
Version: | WebKit Nightly Build | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Bug Depends on: | |||||||||
Bug Blocks: | 174645 | ||||||||
Attachments: |
|
Description
Mark Lam
2017-08-15 21:30:33 PDT
Created attachment 318234 [details]
proposed patch.
Let's try this on the EWS.
Created attachment 318278 [details]
proposed patch.
This patch has passed the testmasm test for x86, x86_64, arm64, and armv7s. It's ready for a review.
Comment on attachment 318278 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=318278&action=review r=me On x86-64 I think you should do a follow-up patch to avoid setting eflags with pushf / popf. It's slow, and historically has been the source of some exploits. I have code and benchmark for it here: https://github.com/jfbastien/benchmark-x86-flags I'd use seto+lahf / addb+sahf instead. > Source/JavaScriptCore/ChangeLog:60 > + we're moving the ProbeContext, we're always be moving it to a lower address. "we're always be moving it" *we'll ? > Source/JavaScriptCore/assembler/MacroAssemblerARM64.cpp:358 > + // Initialize ProbeContex::initializeStackFunction to zero. ProbeContext > Source/JavaScriptCore/assembler/MacroAssemblerARM64.cpp:370 > + "add x2, x27, #" STRINGIZE_VALUE_OF(PROBE_SIZE_PLUS_EXTRAS + OUT_SIZE) "\n" // End of ProveContext + buffer. ProbeContext > Source/JavaScriptCore/assembler/MacroAssemblerARM64.cpp:376 > + "bic x1, x1, #0xf" "\n" // The ARM EABI specifies that the stack needs to be 16 byte aligned. IIUC x1 is: RESULT_SP - (PROBE_SIZE_PLUS_EXTRA + OUT_SIZE) Right? So why do you need to align it? It seems like OUT_SIZE is the only thing that could force misalignment, no? Why not assert it's always a value that won't mis-align SP? > Source/JavaScriptCore/assembler/MacroAssemblerARM64.cpp:377 > + "mov sp, x1" "\n" // Set the new sp to protect that memory from interrupts before we copy the ProbeContext. I think you should also set FP here, so that backtraces will look fine. > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:264 > "bic r3, r3, #0xf" "\n" Ditto on mask. > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:265 > + "mov sp, r3" "\n" // Set the sp to protect the ProbeContext from interrupts before we initialize it. Ditto on FP. > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:292 > + "vstmia.64 ip!, { d16-d31 }" "\n" Wait... The previous code was just wrong, lost half the values, overwrote the other half, and kept garbage? > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:296 > + // Initialize ProbeContex::initializeStackFunction to zero. ProbeContext > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:314 > + "bic r1, r1, #0xf" "\n" // The ARM EABI specifies that the stack needs to be 16 byte aligned. Ditto > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:328 > + "str r4, [r6], #4" "\n" You could use load / store pair here, but I'm not sure it's worth sweating the details on ARM32. You could also use load / store multiple to avoid looping entirely and have less code, but it's not necessarily faster. > Source/JavaScriptCore/assembler/MacroAssemblerARMv7.cpp:235 > + // Initialize ProbeContex::initializeStackFunction to zero. ProbeContext > Source/JavaScriptCore/assembler/MacroAssemblerARMv7.cpp:254 > + "bic r1, r1, #0xf" "\n" // The ARM EABI specifies that the stack needs to be 16 byte aligned. Ditto Thanks for the review. (In reply to JF Bastien from comment #4) > On x86-64 I think you should do a follow-up patch to avoid setting eflags > with pushf / popf. It's slow, and historically has been the source of some > exploits. I have code and benchmark for it here: > https://github.com/jfbastien/benchmark-x86-flags I'd use seto+lahf / > addb+sahf instead. I filed https://bugs.webkit.org/show_bug.cgi?id=175636 to address this later. > > Source/JavaScriptCore/ChangeLog:60 > > + we're moving the ProbeContext, we're always be moving it to a lower address. > > "we're always be moving it" > *we'll ? Fixed. > > Source/JavaScriptCore/assembler/MacroAssemblerARM64.cpp:358 > > + // Initialize ProbeContex::initializeStackFunction to zero. > > ProbeContext Fixed. > > Source/JavaScriptCore/assembler/MacroAssemblerARM64.cpp:370 > > + "add x2, x27, #" STRINGIZE_VALUE_OF(PROBE_SIZE_PLUS_EXTRAS + OUT_SIZE) "\n" // End of ProveContext + buffer. > > ProbeContext Fixed. > > Source/JavaScriptCore/assembler/MacroAssemblerARM64.cpp:376 > > + "bic x1, x1, #0xf" "\n" // The ARM EABI specifies that the stack needs to be 16 byte aligned. > > IIUC x1 is: > RESULT_SP - (PROBE_SIZE_PLUS_EXTRA + OUT_SIZE) > Right? RESULT_SP is set by the probe function. Technically, it should never set it to an unaligned value, but if it does due to a bug, I don't what the crash to happen in the trampoline code. > So why do you need to align it? It seems like OUT_SIZE is the only thing > that could force misalignment, no? Why not assert it's always a value that > won't mis-align SP? The probe code is not meant for use in performance hot paths, plus an explicit mask here makes the probe trampoline more robust and impervious to a bug in the probe function setting a misaligned sp. > > Source/JavaScriptCore/assembler/MacroAssemblerARM64.cpp:377 > > + "mov sp, x1" "\n" // Set the new sp to protect that memory from interrupts before we copy the ProbeContext. > > I think you should also set FP here, so that backtraces will look fine. Not strictly needed but is nice to have. I filed https://bugs.webkit.org/show_bug.cgi?id=175637 to address this later. > > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:264 > > "bic r3, r3, #0xf" "\n" > > Ditto on mask. Same reason as above: not perf critical, and more robust (does not require PROBE_SIZE and OUT_SIZE to be carefully packed for alignment). > > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:265 > > + "mov sp, r3" "\n" // Set the sp to protect the ProbeContext from interrupts before we initialize it. > > Ditto on FP. Will address later in https://bugs.webkit.org/show_bug.cgi?id=175637. > > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:292 > > + "vstmia.64 ip!, { d16-d31 }" "\n" > > Wait... The previous code was just wrong, lost half the values, overwrote > the other half, and kept garbage? Yep. This is a bug fix. > > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:296 > > + // Initialize ProbeContex::initializeStackFunction to zero. > > ProbeContext Fixed. > > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:314 > > + "bic r1, r1, #0xf" "\n" // The ARM EABI specifies that the stack needs to be 16 byte aligned. > > Ditto Same as above. > > Source/JavaScriptCore/assembler/MacroAssemblerARM.cpp:328 > > + "str r4, [r6], #4" "\n" > > You could use load / store pair here, but I'm not sure it's worth sweating > the details on ARM32. You could also use load / store multiple to avoid > looping entirely and have less code, but it's not necessarily faster. Yeah. I'm keeping it simple as is. The ARM_TRADITIONAL folks can choose to optimize if they see a need later. Since this is not hot path code, I don't think there'll be a need. > > Source/JavaScriptCore/assembler/MacroAssemblerARMv7.cpp:235 > > + // Initialize ProbeContex::initializeStackFunction to zero. > > ProbeContext Fixed. > > Source/JavaScriptCore/assembler/MacroAssemblerARMv7.cpp:254 > > + "bic r1, r1, #0xf" "\n" // The ARM EABI specifies that the stack needs to be 16 byte aligned. > > Ditto Same as above. Landed in r220807: <http://trac.webkit.org/r220807>. (In reply to Mark Lam from comment #6) > Landed in r220807: <http://trac.webkit.org/r220807>. It broke the ARMv7 traditional build: standard input}:76: Error: selected processor does not support ARM mode `cbz r2,.LctiMasmProbeTrampolineRestoreRegisters' http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cjaghefc.html It says that cbz is 16 bit thumb1 instruction, there is no cbz 32 bit ARM instruction. (In reply to Csaba Osztrogonác_OOO_until_21st_Aug from comment #7) > It broke the ARMv7 traditional build: > standard input}:76: Error: selected processor does not support ARM mode `cbz > r2,.LctiMasmProbeTrampolineRestoreRegisters' Fixed in r220834: <http://trac.webkit.org/r220834>. Comment on attachment 318278 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=318278&action=review > Source/JavaScriptCore/assembler/testmasm.cpp:608 > + context->initializeStackFunction = fillStack; > + context->initializeStackArg = &data; This is a super weird API. Why is this needed? (In reply to Filip Pizlo from comment #9) > Comment on attachment 318278 [details] > proposed patch. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=318278&action=review > > > Source/JavaScriptCore/assembler/testmasm.cpp:608 > > + context->initializeStackFunction = fillStack; > > + context->initializeStackArg = &data; > > This is a super weird API. Why is this needed? This is needed by the low level implementation because we can't write things to the stack until after we move the stack pointer. That said, this is not the API you're looking for. What you want will be coming in https://bugs.webkit.org/show_bug.cgi?id=175688 soon. |