Bug 141548 - Generate incq instead of addq when the immediate value is one
Summary: Generate incq instead of addq when the immediate value is one
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-12 19:51 PST by Benjamin Poulain
Modified: 2015-02-13 13:54 PST (History)
3 users (show)

See Also:


Attachments
Patch (2.34 KB, patch)
2015-02-12 19:56 PST, Benjamin Poulain
barraclough: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2015-02-12 19:51:32 PST
Generate incq instead of addq when the immediate value is one
Comment 1 Benjamin Poulain 2015-02-12 19:56:29 PST
Created attachment 246499 [details]
Patch
Comment 2 Gavin Barraclough 2015-02-12 21:00:45 PST
Comment on attachment 246499 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=246499&action=review

Have you performance tested? – inc used to be slower on some old intel chips (since it only sets 5/6 of the main flags) so add was often preferable – though that may well no longer be the case.

> Source/JavaScriptCore/assembler/X86Assembler.h:566
> +        m_formatter.oneByteOp64(OP_GROUP5_Ev, GROUP1_OP_ADD, base, offset);

Shouldn't really be using 'GROUP1_OP_ADD' here – I think this should use 'GROUP5_OP_INC' (defining it, if it doesn't exist).
Comment 3 Benjamin Poulain 2015-02-13 00:06:51 PST
(In reply to comment #2)
> Comment on attachment 246499 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=246499&action=review
> 
> Have you performance tested? – inc used to be slower on some old intel chips
> (since it only sets 5/6 of the main flags) so add was often preferable –
> though that may well no longer be the case.

I was intrigued by your comment so I looked into the perf tables all the way to pentium. For throughput and latency, Inc was equal or better than Add in all the cases I looked at.

I checked the Intel Perf guideline, it says: "INC and DEC instructions should be replaced with ADD or SUB instructions, because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore creating false dependencies on earlier instructions that set the flags."
-> exactly what you were saying.

I checked what other compilers do. GCC seems to use ADD systematically. Clang uses INC systematically. ICC uses INC systematically.

I'll have to test on our benchmarks...
Comment 4 Benjamin Poulain 2015-02-13 13:54:46 PST
Committed r180075: <http://trac.webkit.org/changeset/180075>