Bug 128743 - [JSC] Crash in LLInt CLoop on S390X
Summary: [JSC] Crash in LLInt CLoop on S390X
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 131449 131495 132330 132333
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-13 07:59 PST by Tomas Popela
Modified: 2014-05-15 20:55 PDT (History)
4 users (show)

See Also:


Attachments
Crash from 2.0.4 (10.74 KB, text/plain)
2014-02-13 08:00 PST, Tomas Popela
no flags Details
Crash from 2.2.4 (161.32 KB, text/plain)
2014-02-13 08:00 PST, Tomas Popela
no flags Details
LLIntAssambly.h for 2.0.4 (553.71 KB, text/plain)
2014-02-13 08:01 PST, Tomas Popela
no flags Details
LLIntAssambly.h for 2.2.4 (494.93 KB, text/plain)
2014-02-13 08:01 PST, Tomas Popela
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tomas Popela 2014-02-13 07:59:59 PST
JSC in GTK port is crashing on S390X in JSC::LLInt::CLoop::execute. The crash occurs when you open Yelp (Gnome help) and it loads the JQuery. It can be reproduced on WebKitGTK+ 2.0.x and 2.2.x, but the crash is different as on 2.0.x it is crashing in getScope macro ( http://trac.webkit.org/browser/releases/WebKitGTK/webkit-2.0.4/Source/JavaScriptCore/llint/LowLevelInterpreter.asm#L588 ) that was removed before 2.2.x. In 2.2.x it is crashing in getById macro ( http://trac.webkit.org/browser/releases/WebKitGTK/webkit-2.2.4/Source/JavaScriptCore/llint/LowLevelInterpreter64.asm#L867 ) - t3 is empty there. The backtraces and LLIntAssambly.h are attached.
Comment 1 Tomas Popela 2014-02-13 08:00:29 PST
Created attachment 224070 [details]
Crash from 2.0.4
Comment 2 Tomas Popela 2014-02-13 08:00:50 PST
Created attachment 224071 [details]
Crash from 2.2.4
Comment 3 Tomas Popela 2014-02-13 08:01:31 PST
Created attachment 224072 [details]
LLIntAssambly.h for 2.0.4
Comment 4 Tomas Popela 2014-02-13 08:01:53 PST
Created attachment 224073 [details]
LLIntAssambly.h for 2.2.4
Comment 5 Mark Lam 2014-02-17 12:06:07 PST
I can't reproduce this issue on Mac x86_64 with the C loop.  yelp.com loads and runs fine.  Looks like this is a S390X specific issue, and needs to be debugged on your side.

Here's a tip for debugging: with the C loop LLINT, you can use a LLINT pseudo instructions "cloopDo".  For example:
    cloopDo // print("I was here\n");

Whatever you put in the // comment after the cloopDo instruction will be inserted into the generated LLINT instruction stream verbatim.  You can use this mechanism to do some debugging.  For example, you can insert printfs to dump the values of relevant CLoopRegisters, and see if they are what you expect.
Comment 6 Filip Pizlo 2014-02-17 12:12:17 PST
(In reply to comment #5)
> I can't reproduce this issue on Mac x86_64 with the C loop.  yelp.com loads and runs fine.  Looks like this is a S390X specific issue, and needs to be debugged on your side.
> 
> Here's a tip for debugging: with the C loop LLINT, you can use a LLINT pseudo instructions "cloopDo".  For example:
>     cloopDo // print("I was here\n");
> 
> Whatever you put in the // comment after the cloopDo instruction will be inserted into the generated LLINT instruction stream verbatim.  You can use this mechanism to do some debugging.  For example, you can insert printfs to dump the values of relevant CLoopRegisters, and see if they are what you expect.

Which way do the endians go on S390X?
Comment 7 Tomas Popela 2014-02-18 00:07:27 PST
(In reply to comment #5)
> I can't reproduce this issue on Mac x86_64 with the C loop.  yelp.com loads and runs fine.  Looks like this is a S390X specific issue, and needs to be debugged on your side.

Sorry Mark I had to write more clearly it's application Yelp - https://wiki.gnome.org/Apps/Yelp not Yelp.com

> 
> Here's a tip for debugging: with the C loop LLINT, you can use a LLINT pseudo instructions "cloopDo".  For example:
>     cloopDo // print("I was here\n");
> 
> Whatever you put in the // comment after the cloopDo instruction will be inserted into the generated LLINT instruction stream verbatim.  You can use this mechanism to do some debugging.  For example, you can insert printfs to dump the values of relevant CLoopRegisters, and see if they are what you expect.

Thank you for the info. I will try it. Clearly there is something wrong with registers, as on 2.2.x the t3 register is empty (and I'm pretty sure it shouldn't be)..
Comment 8 Tomas Popela 2014-02-18 00:11:24 PST
(In reply to comment #6)
> (In reply to comment #5)
> > I can't reproduce this issue on Mac x86_64 with the C loop.  yelp.com loads and runs fine.  Looks like this is a S390X specific issue, and needs to be debugged on your side.
> > 
> > Here's a tip for debugging: with the C loop LLINT, you can use a LLINT pseudo instructions "cloopDo".  For example:
> >     cloopDo // print("I was here\n");
> > 
> > Whatever you put in the // comment after the cloopDo instruction will be inserted into the generated LLINT instruction stream verbatim.  You can use this mechanism to do some debugging.  For example, you can insert printfs to dump the values of relevant CLoopRegisters, and see if they are what you expect.
> 
> Which way do the endians go on S390X?

Hi Filip,
the S390X is 64bit Big Endian - http://en.wikipedia.org/wiki/Z/Architecture
Comment 9 Tomas Popela 2014-02-27 11:51:31 PST
Some news. It's crashing even on PowerPC (64bit) so it looks like big endian issue. Also it's not just crashing in Yelp, but on any site opened in GtkLauncher or on empty site just when invoking the inspector.
Comment 10 Tomas Popela 2014-02-27 11:54:48 PST
Also it's crashing in 2.3.90 that was branched from trunk ~ 3 weeks ago - https://trac.webkit.org/changeset/163300
Comment 11 Filip Pizlo 2014-02-27 11:55:18 PST
(In reply to comment #9)
> Some news. It's crashing even on PowerPC (64bit) so it looks like big endian issue. Also it's not just crashing in Yelp, but on any site opened in GtkLauncher or on empty site just when invoking the inspector.

The LLINT and some other parts of JSC make incorrect assumptions about endianness. This isn't a systemic problem; I think it's just a handful of places that incorrectly assume that the offset of the low bits is zero. These have crept in because JSC is mainly just tested on little endian hardware. 

I can't really fix these issues myself because I don't have access to any big endian hardware. But, if you could devote some time to finding where we're goofing things up, then certainly we would be happy to accept such fixes.
Comment 12 Tomas Popela 2014-02-27 12:11:37 PST
(In reply to comment #11)
> (In reply to comment #9)
> > Some news. It's crashing even on PowerPC (64bit) so it looks like big endian issue. Also it's not just crashing in Yelp, but on any site opened in GtkLauncher or on empty site just when invoking the inspector.
> 
> The LLINT and some other parts of JSC make incorrect assumptions about endianness. This isn't a systemic problem; I think it's just a handful of places that incorrectly assume that the offset of the low bits is zero. These have crept in because JSC is mainly just tested on little endian hardware. 
> 
> I can't really fix these issues myself because I don't have access to any big endian hardware. But, if you could devote some time to finding where we're goofing things up, then certainly we would be happy to accept such fixes.

As I have access to big endian hardware and I definitely could devote time to fix these issues. But I will need some help in beginnings, and I will contact you off the bugzilla..
Comment 13 Mark Lam 2014-04-09 13:25:45 PDT
I've implemented a fix for a ProtoCallFrame endianness issue in https://bugs.webkit.org/show_bug.cgi?id=131449 based on information that Tomas provided offline.  Please re-test to see if there are any outstanding issues.
Comment 14 Tomas Popela 2014-04-10 04:10:10 PDT
Mark thanks for the fix. Now the easy things like 1+1 are working in jsc. But it looks like there are more places like this. Use a=1 and a+1 in jsc and you will get a crash. Wrong value is taken in getProperty() macro on line loadisFromInstruction(6, t0). Changing it to loadpFromInstruction fixes it.
Comment 15 Mark Lam 2014-04-10 10:51:54 PDT
(In reply to comment #14)
> Wrong value is taken in getProperty() macro on line loadisFromInstruction(6, t0). Changing it to loadpFromInstruction fixes it.

Fixed in https://bugs.webkit.org/show_bug.cgi?id=131495.
Comment 16 Tomas Popela 2014-04-14 02:41:18 PDT
Ok, so after https://bugs.webkit.org/show_bug.cgi?id=131495 the jsc doesn't handle anything - it is crashing. The problem is that when it hits the loadisFromInstruction (XXX, register) where XXX != 6 the right data are 4 bytes back than counted offset (thus it is loading bad data). It looks like solution introduced in https://bugs.webkit.org/show_bug.cgi?id=131495 cannot be applied globally.
Comment 17 Mike Gorse 2014-05-15 20:55:15 PDT
For get_from_scope and probably put_to_scope, bytecode/CodeBlock.cpp has code that writes a pointer to the instruction at +6. Sometimes it is an actual pointer, and sometimes it is an integer offset cast to a pointer. But sometimes the offset is added later in llint/LLIntSlowPaths.cpp, and there it the instruction's operand is set. These code paths need to be made to behave consistently; otherwise a big-endian system will sometimes have the offset in the low word and sometimes in the high word.

I have a patch that modifies the code in LLIntSlowPaths.cpp to cast to a pointer and changes the loadisfrominstruction in getProperty and putProperty to loadpfrominstruction. I don't really understand the code and don't know if this is the best way to fix it, and I'm holding off on ataching it because I haven't tried to run the tests yet, although it fixes the crash that I'm seeing, when combined with adjusting the commit size (that's a separate bug; not sure if it is filed here yet or not).