Bug 99213 - [ARMv7] Neither linkCall() nor linkPointer() should flush code
Summary: [ARMv7] Neither linkCall() nor linkPointer() should flush code
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-12 15:04 PDT by Yong Li
Modified: 2012-11-20 08:33 PST (History)
7 users (show)

See Also:


Attachments
the patch (3.80 KB, patch)
2012-10-13 14:16 PDT, Yong Li
no flags Details | Formatted Diff | Diff
Add the fix to replaceWith* methods (4.92 KB, patch)
2012-10-15 11:10 PDT, Yong Li
no flags Details | Formatted Diff | Diff
again (4.92 KB, patch)
2012-10-17 09:25 PDT, Yong Li
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yong Li 2012-10-12 15:04:31 PDT
LinkBuffer calls linkCall() and linkPointer(), which ends up with code flushing on ARMv7. However, this doesn't look necessary, as the code will be finally flushed anyway.

A simple solution is to pass an argument "flush" to linkCall() and linkPointer(), and suppress code flushing when it is called from LinkBuffer.

This gives "noticeable" boost on sunspider result.

But I'm not sure if this is the best solution.
Comment 1 Yong Li 2012-10-12 15:11:45 PDT
A better plan: For ARMv7 only, make linkCall/linkPointer not flash code by default.  make other calls to linkCall/linkPointer in ARMv7 use explicit flags to flush code...
Comment 2 Yong Li 2012-10-13 14:16:32 PDT
Created attachment 168565 [details]
the patch
Comment 3 Yong Li 2012-10-15 08:38:45 PDT
I also found replaceWithLoad and replaceWithAddressComputation can unnecessarily flush code. going to fix it in another patch
Comment 4 Yong Li 2012-10-15 10:23:38 PDT
Comment on attachment 168565 [details]
the patch

hm... Not only LinkBuffer can call linkCall
Comment 5 Yong Li 2012-10-15 11:10:28 PDT
Created attachment 168744 [details]
Add the fix to replaceWith* methods

actually it was false alarm. Also, linkCall never flushes code in ARM/SHA/MIPS ports. This is pure ARMv7 bug
Comment 6 Geoffrey Garen 2012-10-15 12:21:27 PDT
> A better plan: For ARMv7 only, make linkCall/linkPointer not flash code by default.

Why make this optimization CPU-specific?
Comment 7 Yong Li 2012-10-15 12:23:15 PDT
(In reply to comment #6)
> > A better plan: For ARMv7 only, make linkCall/linkPointer not flash code by default.
> 
> Why make this optimization CPU-specific?

This is only an issue in ARMv7Assembler. Other assemblers don't have this problem. x86/x64 don't need to flush code. ARM-traditional/mips/sh4 don't flush code in linkCall()
Comment 8 Geoffrey Garen 2012-10-15 16:06:49 PDT
> > Why make this optimization CPU-specific?
> 
> This is only an issue in ARMv7Assembler. Other assemblers don't have this problem. x86/x64 don't need to flush code. ARM-traditional/mips/sh4 don't flush code in linkCall()

It seems brittle, and a layering violation, for the ARMv7 assembler to guess that calls like linkCall() and linkPointer() will always be followed by a call to cacheFlush().

Even though you believe, based on reading the implementation details of all our assemblers, that only ARMv7 needs this particular optimization at this time, and that the optimization is correct for the current implementation of LinkBuffer, the goal of our MacroAssembler and LinkBuffer interfaces is to abstract away those details so we can make future changes without thinking about all platforms at once.

Is there a major blocker that forces this code to be ARMv7-specific?
Comment 9 Yong Li 2012-10-16 07:40:28 PDT
(In reply to comment #8)
> > > Why make this optimization CPU-specific?
> > 
> > This is only an issue in ARMv7Assembler. Other assemblers don't have this problem. x86/x64 don't need to flush code. ARM-traditional/mips/sh4 don't flush code in linkCall()
> 
> It seems brittle, and a layering violation, for the ARMv7 assembler to guess that calls like linkCall() and linkPointer() will always be followed by a call to cacheFlush().
> 
> Even though you believe, based on reading the implementation details of all our assemblers, that only ARMv7 needs this particular optimization at this time, and that the optimization is correct for the current implementation of LinkBuffer, the goal of our MacroAssembler and LinkBuffer interfaces is to abstract away those details so we can make future changes without thinking about all platforms at once.
> 
> Is there a major blocker that forces this code to be ARMv7-specific?

Maybe the title is confusing. I thought It was a more generic issue at the very beginning. However after reading more code, I found that it is not LinkBuffer's fault at all. I believe that none of the link*() calls need/should flush code. LinkBuffer will flush the entire assembly at the finalization step. However, those relink*/repatch*/replace* calls must flush code (except for x86 families). None of other assemblers flush code in linkCall() and linkPointer(), and ARMv7Assembler doesn't flush in linkJump(). So I think we can call it a bug in ARMv7Assembler, and it is an accident that linkCall/linkPointer finally flushes code through setInt32.
Comment 10 Yong Li 2012-10-17 09:25:13 PDT
Created attachment 169195 [details]
again
Comment 11 Yong Li 2012-10-18 10:42:08 PDT
Ping reviewers... has anyone tried it on iOS?
Comment 12 Gavin Barraclough 2012-10-26 17:51:46 PDT
Hi Yong,

Nice catch – but I think there's a slightly better fix if you feel like taking it on.

Your patch fixes the problem that we'll flush ints/pointers more than once during the link phase, but there is a related issue that we sometimes flush the same cache line more than once when repatching code.  I'd suggest changing the MacroAssembler::repatch* etc methods in all assemblers to return the range of code modified (start address & end/size as a pair), and have the RepatchBuffer cache all required flushes & only perform them once all modifications have been performed.  When buffering up the flushes, we can round to cache line size, then ditch redundant flushes & coalesce flushes to adjacent cache lines.

(minor detail: JITWriteBarrier & JumpReplacementWatchpoint currently aren't using RepatchBuffer & should be – that will also need to be fixed).

I won't r+ for now, since I'd rather see the fix I've described, but if you don't feel like taking this on we could take your patch as is & revisit – would just be unfortunate since some of this change would be redundant & end up being reverted.
Comment 13 Benjamin Poulain 2012-10-26 19:08:09 PDT
Comment on attachment 169195 [details]
again

View in context: https://bugs.webkit.org/attachment.cgi?id=169195&action=review

I remove the review flag for now.
I think this is a nice patch but I would really like it to be completely generic. ARM is not the only platform without completely consistent caches.

I think we should follow Gavin's advices here. Yong, any opinion?

> Source/JavaScriptCore/ChangeLog:10
> +        LinkBuffer doesn't need to flush code during linking. It will
> +        eventually flush the whole executable. Fixing this gives >%5
> +        sunspider boost (on QNX).

After profiling on hardware, I find it strange you get so much improvement from this. It is pretty far from a hotspot.

It is still a good idea to fix this though.
Comment 14 Yong Li 2012-10-26 21:15:45 PDT
(In reply to comment #12)
> Hi Yong,
> 
> Nice catch – but I think there's a slightly better fix if you feel like taking it on.
> 
> Your patch fixes the problem that we'll flush ints/pointers more than once during the link phase, but there is a related issue that we sometimes flush the same cache line more than once when repatching code.  I'd suggest changing the MacroAssembler::repatch* etc methods in all assemblers to return the range of code modified (start address & end/size as a pair), and have the RepatchBuffer cache all required flushes & only perform them once all modifications have been performed.  When buffering up the flushes, we can round to cache line size, then ditch redundant flushes & coalesce flushes to adjacent cache lines.

Actually I did try a very similar solution with RepatchBuffer. However it didn't show any boost on benchmarks. I could try improving that patch when time allowing. But I think the bug should be separated as this one is only an issue in ARMv7Assembler, and repatching is generic. Even now I had a ready patch for the repatching one, I still think it should be a separate bug.

> 
> (minor detail: JITWriteBarrier & JumpReplacementWatchpoint currently aren't using RepatchBuffer & should be – that will also need to be fixed).

Thanks. Very helpful info :)
Comment 15 Yong Li 2012-10-29 08:04:34 PDT
(In reply to comment #13)
> (From update of attachment 169195 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=169195&action=review
> 
> I remove the review flag for now.
> I think this is a nice patch but I would really like it to be completely generic. ARM is not the only platform without completely consistent caches.
> 
> I think we should follow Gavin's advices here. Yong, any opinion?
> 

I still think it is an ARMV-v7 only bug, as none of other assemblers' linkCall/linkPointer does cache flushing, nor ARMv7's linkJump() does. So it is very clear to me it is just an independent bug rather than a generic optimization. I could modify the changelog if needed.

We can create another bug for the potential duplicate cache flushing during "repatching".

So I'll add back the review request.
Comment 16 Yong Li 2012-11-13 09:18:40 PST
The patch is pending so long here. It seems there is no objection with this patch itself. Although it doesn't give so much boost on other platforms, it is still a proper fix, right?
Comment 17 George Staikos 2012-11-19 20:53:35 PST
Comment on attachment 169195 [details]
again

No complaints, it looks good, and gives improvements on at least one platform.
Comment 18 WebKit Review Bot 2012-11-20 08:33:12 PST
Comment on attachment 169195 [details]
again

Clearing flags on attachment: 169195

Committed r135286: <http://trac.webkit.org/changeset/135286>
Comment 19 WebKit Review Bot 2012-11-20 08:33:17 PST
All reviewed patches have been landed.  Closing bug.