Bug 190656 - useProbeOSRExit causes failures for Win64 DFG JIT
Summary: useProbeOSRExit causes failures for Win64 DFG JIT
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ross Kirsling
Keywords: InRadar
Depends on:
Reported: 2018-10-16 18:38 PDT by Ross Kirsling
Modified: 2018-11-01 10:42 PDT (History)
10 users (show)

See Also:

Patch (3.79 KB, patch)
2018-10-29 17:44 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ross Kirsling 2018-10-16 18:38:09 PDT
About three dozen JSC tests fail for WinCairo in dfg-eager mode only, but all of these pass if useProbeOSRExit is switched off.

Two-thirds appear to be failing the `!Heap::heap(value) || Heap::heap(value) == Heap::heap(this)` assert in JSObject::putDirectInternal:

The rest are failing isObject() assertions.


Here is a simple test case that reproduces the issue (distilled from stress/sparse-map-non-skip.js):

function checkGetter(object) {
  if (object.foo !== 0)
    throw new Error(`bad value for object.foo! expected 0, found ${object.foo}`);

for (var i = 0; i < 2305; ++i)
  checkGetter({ get foo() { return 0; } });

checkGetter({ get foo() { return 0; } });


With this as input, `jsc --useConcurrentJIT=false --useProbeOSRExit=true --dfgWhitelist=dfg-whitelist.txt test.js` suffices to repro.

> ASSERTION FAILED: getter.isObject() || setter.isObject()
> ...\jit\JITOperations.cpp(1748) : JSC::operationPutGetterSetter
> 1   00007FFB3E4321EA
> 2   00007FFB2FC42F11
> 3   00007FFB30910027
> 4   000002B0398E2AE0

> Exception: Error: bad value for object.foo! expected 0, found undefined
> checkGetter@test.js:3:20
> global code@test.js:10:12
Comment 1 Ross Kirsling 2018-10-19 16:02:49 PDT
I'll try to list up what I know so far:

- FWIW, testmasm has no failures on WinCairo.

- If I dataLog on save and restore in DFGOSREntry.cpp, AssemblyHelpers.h, and DFGOSRExit.cpp (and use printEachOSRExit), it definitely looks like callee-save registers are saved and restored consistently whether or not the probe is used.

- If I use dumpDFGDisassembly and blot out all raw hex values (and normalize Structure identifiers), then the output is consistent with or without the probe. There is a final large codeblock which is only spit out in the non-probe case, but that's also true on Mac, which isn't broken.

- On the last run of operationPutGetterSetter (in JITOperations.cpp), the encodedGetterValue has an entirely different prefix (0x00007FFB... instead of 0x000001...) from the first 2305 times. If we look at the JSCell for the getter, there is one point of potential note...

  All cases without probe, first 2305 cases with probe:
    m_structureID             0x00000104                      unsigned int
    m_indexingTypeAndMisc     0x00 '\0'                       unsigned char
    m_type                    JSFunctionType (0x18 '\x18')    JSC::JSType
    m_flags                   0x0e '\xe'                      unsigned char
    m_cellState               DefinitelyWhite (0x01 '\x1')    JSC::CellState

  Final case, after DFG OSR exit, with probe:
    m_structureID             0x0000000d                      unsigned int
    m_indexingTypeAndMisc     0x18 '\x18'                     unsigned char
    m_type                    CellType (0x00 '\0')            JSC::JSType
    m_flags                   0x00 '\0'                       unsigned char
    m_cellState               PossiblyBlack (0x00 '\0')       JSC::CellState

...namely, that 0x18 seems like it's jumped up a slot. But then, everything after it is zeroes, so this may be a red herring...?

- One way or another, it's hard to imagine what other than Win64 calling convention could be at play here. Now, although we're saving and restoring RBX, RSI, RDI, R12, R13, R14, and R15, we're intentionally skipping over RBP and RSP -- RBP appears to be callee-save for x86_64 in general, so surely that one's a non-issue(?), but RSP is callee-save for Windows (according to MSDN, although it's not indicated as such in RegisterSet or GPRInfo) so perhaps the probe mucks with the stack pointer in a way that Win64 doesn't like?
Comment 2 Stephan Szabo 2018-10-23 15:24:43 PDT
Adding a little bit of information from what was seen today.

The value that appears to come through for operationPutGetterSetter's encodedGetterValue seems to be the same value that was used in the prior call to operationPutByIdStrictOptimize's uid (which is also the fifth argument to that call).

The timeline with probe on seems to be:
operationNewFunction is called and seems to return in rax the value that we would want to use later as encodedGetterValue
ctiMasmProbeTrampoline is started
 - goes into executeProbe and then executeOSRExit and some more before leaving
operationPutByIdNonStrictOptimize is called and a value is placed into memory for the uid value.
operationPutGetterSetter is called and the value on the stack for the fifth argument has the same value as what was passed to the previous call as uid.
Comment 3 Stephan Szabo 2018-10-23 16:55:23 PDT
In the case with probe on, when it goes to set up the arguments for operationPutByIdNonStrictOptimize I seem to see that rsp+20h and rbp-60h [in my test the last 4 digits of rsp were D220 and rbp was D300] are the same location, so setting the argument seems to overwrite the location that the getter was stored in. Later, when the call to operationPutGetterSetter goes to get the value, it reads the overwritten value and then puts that on the stack as the argument value and things fail.

With the probe off, at the point the argument is set up, I seem to have values like rsp ending with D2E0 and rbp ending with D3A0, so the two values are in different places. When the call to operationPutGetterSetter goes to get the value it gets the value from operationNewFunction and puts that on the stack as the argument and things work.
Comment 4 Stephan Szabo 2018-10-23 16:59:06 PDT
(In reply to Stephan Szabo from comment #3)
> In the case with probe on, when it goes to set up the arguments for
> operationPutByIdNonStrictOptimize I seem to see that rsp+20h and rbp-60h [in
> my test the last 4 digits of rsp were D220 and rbp was D300] are the same

Correction D280 and D300
Comment 5 Ross Kirsling 2018-10-26 11:33:44 PDT

This definitely is a stack pointer management problem.

When the probe is off, RSP always has the same value at the first line of operationPutGetterSetter in JITOperations, whether before DFG entry or after DFG exit:

(* Technically, with or without the probe, RSP is raised by 0x10 just on the single run after calling prepareOSREntry but before actually switching out of baseline JIT. Apparently this isn't problematic though.)

When the probe is on, RSP ends up raised by 0x40 after DFG exit.

As such, if we hack in a `exitState.stackPointerOffset -= 0x40;` line right before setting RSP in executeOSRExit, the test case succeeds:
Comment 6 Ross Kirsling 2018-10-26 14:22:30 PDT
(In reply to Ross Kirsling from comment #5)
> As such, if we hack in a `exitState.stackPointerOffset -= 0x40;` line right
> before setting RSP in executeOSRExit, the test case succeeds:
> https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/dfg/
> DFGOSRExit.cpp#L437

To put this last part a bit more clearly, if we add 8 to the variable count returned from the DFG::VariableEventStream::reconstruct call here:

And comment out a sanity-check assert for Probe::Stack::lowWatermark here:

Then we finally have a state in which all the failing dfg-eager tests pass.
Comment 7 Saam Barati 2018-10-26 15:32:25 PDT
Seems like the probe itself is probably making mistakes on windows ABI
Comment 8 Ross Kirsling 2018-10-29 16:40:35 PDT
A bit of progress:

The issue doesn't seem to be with the return value of DFG::VariableEventStream::reconstruct, as this is the same regardless of probe and regardless of platform.
the issue seems to be that this return value isn't appropriate as a stack pointer offset -- after all, it is simply thrown away in the non-probe path.

It appears that if we mimic this line from the non-probe path:

That is, if we replace numVariables with `codeBlock->jitCode()->dfgCommon()->requiredRegisterCountForExit` on this line:

Then all of the relevant test cases pass (even on Mac!).

The catch is that we still need to remove the lowWatermark assert mentioned in my last comment:
Not sure at the moment what correction needs to occur here.
Comment 9 Ross Kirsling 2018-10-29 17:44:57 PDT
Created attachment 353338 [details]
Comment 10 Keith Miller 2018-10-30 11:03:58 PDT
Comment on attachment 353338 [details]

Comment 11 WebKit Commit Bot 2018-10-30 11:32:29 PDT
Comment on attachment 353338 [details]

Clearing flags on attachment: 353338

Committed r237595: <https://trac.webkit.org/changeset/237595>
Comment 12 WebKit Commit Bot 2018-10-30 11:32:31 PDT
All reviewed patches have been landed.  Closing bug.
Comment 13 Radar WebKit Bug Importer 2018-10-30 11:34:02 PDT
Comment 14 Ross Kirsling 2018-11-01 10:42:52 PDT
Ugh, we might not be totally out of the woods yet...

While this patch fixed all dfg-eager failures in Debug, apparently four failures remain in Release only:

And sure enough, disabling the probe makes them pass.