150777 – Consider still matching an address expression even if B3 has already assigned a Tmp to it

RESOLVED FIXED150777

Consider still matching an address expression even if B3 has already assigned a Tmp to it

https://bugs.webkit.org/show_bug.cgi?id=150777

Summary Consider still matching an address expression even if B3 has already assigned...

Filip Pizlo

Reported 2015-11-01 11:40:45 PST

Probably if we want to do hoisting of address expressions, we should have some better heuristics for it.

Attachments
the patch (2.49 KB, patch) 2015-12-10 09:16 PST, Filip Pizlo	ggaren: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Filip Pizlo

Comment 1 2015-12-10 09:16:24 PST

Created attachment 267111 [details] the patch

Geoffrey Garen

Comment 2 2015-12-10 15:34:51 PST

Comment on attachment 267111 [details] the patch I would have guessed 2.

Filip Pizlo

Comment 3 2015-12-10 18:44:29 PST

(In reply to comment #2) > Comment on attachment 267111 [details] > the patch > > I would have guessed 2. Depends on the architecture and a lot of other things. On Intel, any address expression takes 1 cycle, regardless of its complexity. So if you do this: mov (%rax), %rax then you will spend 1 cycle deducing that the address is simply the thing in %rax. And if you do this: mov 42(%rcx,%rdx,4), %rax then you will also spend 1 cycle deducing what the address is. Therefore, if you had to pick between the following two snippets, you would pick the one with fewer instructions even though it computes the same address twice: This is faster: mov 42(%rcx,%rdx,4), %rax mov 46(%rcx,%rdx,4), %rdi than this: lea 42(%rcx,%rdx,4), %rsi mov (%rsi), %rax mov 4(%rsi), %rdi This would still be true even if the address expressions were identical rather than the second one being offset by +4. The first version is faster because even though we recompute the same seemingly complex address expression, it's actually free to do so. Hence, setting the threshold to 2 is probably not a good thing. It's far too low. The last time I wrote a compiler, I set the threshold to +Inf. Later on, I learned through whispers in the wind that you want to have some kind of upper limit, though I never learned the reason. I suspect that the reason is just registers. It's possible that in the "faster" example above, keeping %rcx and %rdx alive causes too much register pressure. You don't know if that's an issue at the time that you do instruction selection, and usually it won't be an issue. But you can see how if you really had a lot of uses of the same address, then the second form may be better because it requires only one register to be alive for the memory access to compute the address. So, 2 is probably too low because it adds instructions without reducing the amount of work, but +Inf is probably too high because at some point the register pressure of keeping all of the inputs to the address computation alive is a bigger issue than the cost of the "lea" instruction.

Filip Pizlo

Comment 4 2015-12-10 19:42:31 PST

Landed in http://trac.webkit.org/changeset/193941

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware All

OS All

Product WebKit

Component JavaScriptCore

Assignee

Filip Pizlo

Reported

2015-11-01 11:40 PST

Modified

2015-12-10 19:42 PST History

CC List

12 users Show

URL

Keywords

Depends on

Blocks

152106 150279

Dependencies

tree graph