Bug 192496

Summary: Enable DFG on ARM/Linux again
Product: WebKit Reporter: Dominik Inführ <dominik.infuehr>
Component: New BugsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, commit-queue, dbates, ews, guijemont, keith_miller, mark.lam, msaboff, sbarati, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 193330    
Bug Blocks: 191163, 192983    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Dominik Inführ 2018-12-07 04:23:48 PST
Enable DFG on ARM/Linux again
Comment 1 Dominik Inführ 2018-12-07 04:32:55 PST
Created attachment 356799 [details]
Patch
Comment 2 Dominik Inführ 2018-12-07 04:37:04 PST
Created attachment 356800 [details]
Patch
Comment 3 Guillaume Emont 2018-12-07 08:23:46 PST
Comment on attachment 356800 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=356800&action=review

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:2017
>      loadp Callee[cfr], t1
>      andp MarkedBlockMask, t1
>      loadp MarkedBlockFooterOffset + MarkedBlock::Footer::m_vm[t1], t1

Would it make more sense to save t1 earlier than recalculate it (on a callee-saved reg or pushed on the stack)?
Comment 4 Dominik Inführ 2018-12-09 23:59:04 PST
LowLevelInterpreter64.asm also recalculates that reference. So I don't think this code is super hot and I am not sure where we would best store that value.
Comment 5 Guillaume Emont 2018-12-10 01:22:24 PST
(In reply to Dominik Inführ from comment #4)
> LowLevelInterpreter64.asm also recalculates that reference. So I don't think
> this code is super hot and I am not sure where we would best store that
> value.

Fair enough.
Comment 6 Yusuke Suzuki 2018-12-10 01:44:33 PST
Comment on attachment 356800 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=356800&action=review

> Source/JavaScriptCore/ChangeLog:10
> +        architectures. Enable DFG now again on ARM/Linux. Do not use register r11
> +        in compiled DFG code.

Please describe why r11 is avoided.

> Source/JavaScriptCore/dfg/DFGOSRExit.cpp:143
> +#endif

Why?

> Source/JavaScriptCore/dfg/DFGOSRExit.cpp:171
> +#endif

Ditto.

> Source/JavaScriptCore/jit/CallFrameShuffler.cpp:58
> +#endif

Why?
Comment 7 Dominik Inführ 2018-12-10 05:26:59 PST
Comment on attachment 356800 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=356800&action=review

Thanks for taking a look at the patch! I agree, I should definitely add some more comments for that. I will certainly add them, when you agree with the approaches I took. Again, thanks for your time!

>> Source/JavaScriptCore/ChangeLog:10
>> +        in compiled DFG code.
> 
> Please describe why r11 is avoided.

True, I should definitely document that. I don't use it, since r11 is used as callee-saved register for metadataTable in LLInt.

>> Source/JavaScriptCore/dfg/DFGOSRExit.cpp:143
>> +#endif
> 
> Why?

I avoided to implement this since we don't have callee-saved floating-point registers on 32-bit architectures yet. AFAIU implementing this might be a bit complicated since general-purpose register are 32-bit wide for 32-bit architectures, while floating point registers are 64-bit wide.

>> Source/JavaScriptCore/jit/CallFrameShuffler.cpp:58
>> +#endif
> 
> Why?

RegisterSet::vmCalleeSaveRegisters() used to be the empty set on 32-bit architectures, now on ARM this returns r11 (so the metadataTable register). So the register r11 would be used by CallFrameShuffler and therefore needs to be saved/restored, it seemed the simplest to just avoid using this register in here. RegisterSet::vmCalleeSaveRegisters() used to be the empty set, so there shouldn't be fewer register available than before.
Comment 8 Yusuke Suzuki 2018-12-11 03:15:27 PST
Comment on attachment 356800 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=356800&action=review

>>> Source/JavaScriptCore/ChangeLog:10
>>> +        in compiled DFG code.
>> 
>> Please describe why r11 is avoided.
> 
> True, I should definitely document that. I don't use it, since r11 is used as callee-saved register for metadataTable in LLInt.

Make sense, please describe it in ChangeLog.

>>> Source/JavaScriptCore/dfg/DFGOSRExit.cpp:143
>>> +#endif
>> 
>> Why?
> 
> I avoided to implement this since we don't have callee-saved floating-point registers on 32-bit architectures yet. AFAIU implementing this might be a bit complicated since general-purpose register are 32-bit wide for 32-bit architectures, while floating point registers are 64-bit wide.

Could you add FIXME and comments about this here?
And let's put braces for `else`.

>> Source/JavaScriptCore/dfg/DFGOSRExit.cpp:171
>> +#endif
> 
> Ditto.

Ditto.

>>> Source/JavaScriptCore/jit/CallFrameShuffler.cpp:58
>>> +#endif
>> 
>> Why?
> 
> RegisterSet::vmCalleeSaveRegisters() used to be the empty set on 32-bit architectures, now on ARM this returns r11 (so the metadataTable register). So the register r11 would be used by CallFrameShuffler and therefore needs to be saved/restored, it seemed the simplest to just avoid using this register in here. RegisterSet::vmCalleeSaveRegisters() used to be the empty set, so there shouldn't be fewer register available than before.

OK, I see. Please describe the detail here as a comment.
In JSVALUE32_64 environment, we do not have CallFrameShuffleData::setupCalleeSaveRegisters, right?

> Source/JavaScriptCore/jit/GPRInfo.h:551
> +    static const GPRReg regT7 = ARMRegisters::r5;

Let's clean up some code in offline arm.rb too. (Note that ARM traditional JIT is removed now).

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:2017
>>      loadp MarkedBlockFooterOffset + MarkedBlock::Footer::m_vm[t1], t1
> 
> Would it make more sense to save t1 earlier than recalculate it (on a callee-saved reg or pushed on the stack)?

looks ok.
Comment 9 Dominik Inführ 2018-12-13 03:18:21 PST
Created attachment 357221 [details]
Patch
Comment 10 Dominik Inführ 2018-12-14 01:58:53 PST
Hi, I think I've addressed all the comments you had. I've updated some code in offlineasm/arm.rb as well.

> In JSVALUE32_64 environment, we do not have CallFrameShuffleData::setupCalleeSaveRegisters, right?
Right.
Comment 11 Dominik Inführ 2019-01-08 09:20:08 PST
Created attachment 358603 [details]
Patch
Comment 12 Yusuke Suzuki 2019-01-08 11:19:44 PST
Comment on attachment 358603 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=358603&action=review

r=me with nits.

> Source/JavaScriptCore/dfg/DFGOSRExit.cpp:142
> +            // FIXME: support callee-saved floating point registers on 32-bit architectures

Please file a bug with FIXME and add a url here.

> Source/JavaScriptCore/dfg/DFGOSRExit.cpp:172
> +            // FIXME: support callee-saved floating point registers on 32-bit architectures

Ditto.

> Source/JavaScriptCore/jit/GPRInfo.h:529
>  #if CPU(ARM)

I think we should use #if CPU(ARM_THUMB2) here since the following information is not correct for CPU(ARM) from now.
Comment 13 Dominik Inführ 2019-01-09 01:54:25 PST
Created attachment 358683 [details]
Patch
Comment 14 Dominik Inführ 2019-01-09 01:56:47 PST
Thanks for your review! I've created a bug and changed `#if CPU(ARM)` as requested.
Comment 15 WebKit Commit Bot 2019-01-10 03:55:59 PST
Comment on attachment 358683 [details]
Patch

Clearing flags on attachment: 358683

Committed r239825: <https://trac.webkit.org/changeset/239825>
Comment 16 WebKit Commit Bot 2019-01-10 03:56:01 PST
All reviewed patches have been landed.  Closing bug.
Comment 17 Radar WebKit Bug Importer 2019-01-10 03:56:32 PST
<rdar://problem/47174867>
Comment 18 WebKit Commit Bot 2019-01-10 12:00:55 PST
Re-opened since this is blocked by bug 193330
Comment 19 Guillaume Emont 2019-01-10 12:06:00 PST
We had 118 test failures on the armv7 bot (and 117 on the armv7 softfpabi bot) after this landed.
Also, running all the tests took about 6h40, whereas IIRC it used to take about 2h15 on the same bot before the bytecode change landed (last time we had DFG for the platform), so we might have a performance issue too.

Link to test run: https://build.webkit.org/builders/JSCOnly%20Linux%20ARMv7%20Thumb2%20Release/builds/7209
Comment 20 Guillaume Emont 2019-01-11 08:51:58 PST
The issue was with the buildbot and not this patch: the buildbot needed to do a clean build. After doing one, the revision that landed this patch only yields 1 failure, which is a test that should be skipped on arm and mips as it time outs (that's not new, only the skip was changed to "not $jitTests").

See the test run:
https://build.webkit.org/builders/JSCOnly%20Linux%20ARMv7%20Thumb2%20Release/builds/7217

Also, the slowness was due to compiling with C_LOOP and running all the tests and test configurations as if we had DFG. Now in that last run, the tests ran in under 2h08m.

Please re-land that patch and then ping me so that I can trigger a buildbot clean build.
Comment 21 WebKit Commit Bot 2019-01-11 12:41:49 PST
Comment on attachment 358683 [details]
Patch

Clearing flags on attachment: 358683

Committed r239867: <https://trac.webkit.org/changeset/239867>
Comment 22 WebKit Commit Bot 2019-01-11 12:41:51 PST
All reviewed patches have been landed.  Closing bug.