For both opcodes (op_jeq_null, op_jneq_null) the NullTag and UndefinedTag are checked to control the jump. These values differ only in the lowest bit. So, we can use some arithmetic instructions instead of set and compare instruction sequences.
Created attachment 71421 [details] Speed up op_jeq_null and op_jneq_null in JIT An artificial test says this construct is 1.1x faster on x86 and 1.25x faster on ARM. SunSpider says nothing on x86, but it is 1.01x faster on ARM.
Comment on attachment 71421 [details] Speed up op_jeq_null and op_jneq_null in JIT I think we can do better than this. Currently there's enum { Int32Tag = 0xffffffff }; enum { CellTag = 0xfffffffe }; enum { TrueTag = 0xfffffffd }; enum { FalseTag = 0xfffffffc }; enum { NullTag = 0xfffffffb }; enum { UndefinedTag = 0xfffffffa }; enum { EmptyValueTag = 0xfffffff9 }; enum { DeletedValueTag = 0xfffffff8 }; If we reorder these as: enum { NullTag = 0xfffffffb }; enum { UndefinedTag = 0xfffffffe }; enum { Int32Tag = 0xfffffffd }; enum { CellTag = 0xfffffffc }; enum { TrueTag = 0xfffffffb }; enum { FalseTag = 0xfffffffa }; enum { EmptyValueTag = 0xfffffff9 }; enum { DeletedValueTag = 0xfffffff8 }; Then we don't need any bit logic at all, and can replace it with addJump(branch32(GreaterOrEqual, regT1, Imm32(JSValue::UndefinedTag)), target); (or whichever is the unsigned >= condition) Gavin is there any reason this would not work?
> If we reorder these as: > enum { NullTag = 0xfffffffb }; It should be 0xffffffff (not 0xfffffffb) > enum { UndefinedTag = 0xfffffffe }; > enum { Int32Tag = 0xfffffffd }; > enum { CellTag = 0xfffffffc }; > enum { TrueTag = 0xfffffffb }; > enum { FalseTag = 0xfffffffa }; > enum { EmptyValueTag = 0xfffffff9 }; > enum { DeletedValueTag = 0xfffffff8 }; Anyway, it looks good to me, but we have to guard it with an ASSERT as well. eg.: ASSERT(NullTag == 0xffffffff && UndefinedTag = 0xfffffffe); or ASSERT(NullTag + 1 == 0 && UndefinedTag + 1 == NullTag); Well, not a very nice one, but it is just an ASSERT.
> enum { Int32Tag = 0xffffffff }; > enum { CellTag = 0xfffffffe }; If my memory seves my correctly a lot of code depend on these two constants. Perhaps we can move "down" the "undefined" like constants to the end of the list.
You probably want Geoff or Sam to comment on this, not me. I only remember a reliance on the values of true & false – (false | 1) == true, but there may well be other ordering issues.
(In reply to comment #4) > > enum { Int32Tag = 0xffffffff }; > > enum { CellTag = 0xfffffffe }; > > If my memory seves my correctly a lot of code depend on these two constants. > > Perhaps we can move "down" the "undefined" like constants to the end of the list. Adding Sam and Geoff as they know the most about these tags. To my knowledge though there aren't any real constraints on these values so we could look at maybe making use of these bits in more interesting ways
If we do sort this, would be nice to get rid of DeletedValueTag. We don't use a special tag for this in JSVALUE64 – the value just needs to not be normally possible – e.g. tag = CellTag payload = 1 or -1 should be fine.
(In reply to comment #4) > > enum { Int32Tag = 0xffffffff }; > > enum { CellTag = 0xfffffffe }; > > If my memory seves my correctly a lot of code depend on these two constants. > > Perhaps we can move "down" the "undefined" like constants to the end of the list. I was going to say we couldn't do this, and then i forgot why, now i remembered -- the goal i had was to reduce the null + undefined check to a single branch with no additional logic. If we moved them down rather than up we'd need to do a lower+upper bound check as we currently identify doubles simply by doing a < DeletedValueTag check.
> > enum { Int32Tag = 0xffffffff }; I remember an optimization: if ((val1.tag() & val2.tag()) == 0xffffffff) -> both are integer. And CellTag has only one zero bit at the end of the bit stream: ALWAYS_INLINE JIT::Jump JIT::emitJumpIfBothJSCells(RegisterID reg1, RegisterID reg2, RegisterID scratch) { move(reg1, scratch); orPtr(reg2, scratch); return emitJumpIfJSCell(scratch); }
> enum { NullTag = 0xffffffff }; > enum { UndefinedTag = 0xfffffffe }; > > Then we don't need any bit logic at all, and can replace it with > addJump(branch32(GreaterOrEqual, regT1, Imm32(JSValue::UndefinedTag)), target); I took a closer look into emit_op_jneq_null and emit_op_jeq_null and there is no check for non-double values. Both functions check MasqueradesAsUndefined bit and NullTag or UndefinedTag to determine the source is null (or not). So, this suggestion will not work. Although we can insert an unsigned comparison to the LowestTag to check for double, but this case will add new comparisons which will influence the execution speed in different ways for different control flow.
Ohh, the GreaterThanOrEqual comment misled me. No more comparison is needed (ignore my last comment). The patch is coming.
> ALWAYS_INLINE JIT::Jump JIT::emitJumpIfBothJSCells(RegisterID reg1, RegisterID reg2, RegisterID scratch) It was used by JSValue32 which is obsolete now. So, it can be safely removed.
Created attachment 72025 [details] Speed up op_jeq_null and op_jneq_null in JIT
Attachment 72025 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1 JavaScriptCore/jit/JITOpcodes32_64.cpp:926: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] JavaScriptCore/jit/JITOpcodes32_64.cpp:950: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 2 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 72026 [details] Speed up op_jeq_null and op_jneq_null in JIT Style issue is fixed.
Comment on attachment 72026 [details] Speed up op_jeq_null and op_jneq_null in JIT Do you have any performance numbers?
> Do you have any performance numbers? Same as above (on ARM): Artificial test for j[n]?eq_null: 1.25x faster SunSpider: 1.01x faster Nonrelated tests are faster, for example: nsieve-bits: 1.027x faster cordic: 1.053x faster No other significant change. There is no significant slowdown. I am pretty sure that the benchmarks do not hit these cases very often.
Comment on attachment 72026 [details] Speed up op_jeq_null and op_jneq_null in JIT can we do similar for x86_64?
Comment on attachment 72026 [details] Speed up op_jeq_null and op_jneq_null in JIT Clearing flags on attachment: 72026 Committed r70699: <http://trac.webkit.org/changeset/70699>
All reviewed patches have been landed. Closing bug.
(JSValue32_64 on x86) The same artificial test: 1.12x as fast SunSpider still says nothing overall. 3d-cude: 1.025x as fast base64: 1.019x as fast fasta: 1.011x as fast tagcloud: 1.005x as fast unpack-code: 1.004x as slow No other significant change.