The patch to be attached adds support for op_get_by_it_proto to the llint. Before, the v8 benchmark results: Richards: 783 DeltaBlue: 713 Crypto: 1397 RayTrace: 2415 EarleyBoyer: 3344 RegExp: 389 Splay: 3936 NavierStokes: 2015 ---- Score (version 7): 1449 After: Richards: 835 DeltaBlue: 778 Crypto: 1597 RayTrace: 2516 EarleyBoyer: 3340 RegExp: 398 Splay: 4311 NavierStokes: 2019 ---- Score (version 7): 1531
Created attachment 151713 [details] Patch
I note that those benchmark results are --useJIT=no.
Comment on attachment 151713 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=151713&action=review > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2049 > + if (!curInstruction[4].u.structure || Heap::isMarked(curInstruction[4].u.structure.get())) > + break; What if the main structure is marked but the prototype structure isn't? I guess this should be: ASSERT(!!curInstruction[4].u.structure == !!curInstruction[5].u.structure); // The structures should either be both 0 or both non-0. if (!curInstruction[4].u.structure) break; if (Heap::isMarked(curInstruction[4].u.structure.get()) && Heap::isMarked(curInstruction[5].u.structure.get())) break; Though, I think it would be even better if the clearing you're doing below flipped this back to op_get_by_id - that's the style I've been using for put_by_id. > Source/JavaScriptCore/llint/LowLevelInterpreter.asm:508 > +_llint_op_get_by_id_proto: > + traceExecution() > + callSlowPath(_llint_slow_path_get_by_id_proto) > + dispatch(9) So you get a speed-up even though this isn't inline? Nifty. We should inline it at some point.
(In reply to comment #3) > (From update of attachment 151713 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=151713&action=review > > > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2049 > > + if (!curInstruction[4].u.structure || Heap::isMarked(curInstruction[4].u.structure.get())) > > + break; > > What if the main structure is marked but the prototype structure isn't? I guess this should be: > > ASSERT(!!curInstruction[4].u.structure == !!curInstruction[5].u.structure); // The structures should either be both 0 or both non-0. > if (!curInstruction[4].u.structure) > break; > if (Heap::isMarked(curInstruction[4].u.structure.get()) && Heap::isMarked(curInstruction[5].u.structure.get())) > break; To clarify, please fix this before landing - otherwise we could get annoying GC crashies. > > Though, I think it would be even better if the clearing you're doing below flipped this back to op_get_by_id - that's the style I've been using for put_by_id. > > > Source/JavaScriptCore/llint/LowLevelInterpreter.asm:508 > > +_llint_op_get_by_id_proto: > > + traceExecution() > > + callSlowPath(_llint_slow_path_get_by_id_proto) > > + dispatch(9) > > So you get a speed-up even though this isn't inline? Nifty. We should inline it at some point.
(In reply to comment #3) > (From update of attachment 151713 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=151713&action=review > > > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2049 > > + if (!curInstruction[4].u.structure || Heap::isMarked(curInstruction[4].u.structure.get())) > > + break; > > What if the main structure is marked but the prototype structure isn't? How is that possible? Wouldn't a marked structure imply that its prototype is marked? Or are these dead marks and not live marks? Apologies for the naivete here. > Though, I think it would be even better if the clearing you're doing below flipped this back to op_get_by_id - that's the style I've been using for put_by_id. I wonder though, is there any benefit to flipping back to get_by_id just because the op_get_by_id_proto failed? Consider: you fail your prediction so you go through the slow_path_get_by_id. If it's going to predict something else, it will set the opcode and structure/offset fields appropriately. If not, the cache is not updated. It seems that the likelihood of a future get_by_id_proto cache hit after a miss is somewhat likely. I was thinking of just leaving the state as get_by_id_proto, given that it dispatches fairly quickly to get_by_id if things are not right. > > Source/JavaScriptCore/llint/LowLevelInterpreter.asm:508 > > +_llint_op_get_by_id_proto: > > + traceExecution() > > + callSlowPath(_llint_slow_path_get_by_id_proto) > > + dispatch(9) > > So you get a speed-up even though this isn't inline? Nifty. We should inline it at some point. I think the speedup is mostly because we cut the number of dynamic dispatches in half (~20% of the time to ~10% of the time). I was playing around with inline vs not inline for a couple other opcodes (get_string_length and get_array_length) and didn't see much difference there. Incidentally, op_get_by_val is unfortunate in some ways because of the pointer-chasing you have to do just to determine that it is an array (getting at the jsArrayClassInfo through the global data through the code block). In C++ it is more direct. Would be nice to improve that if possible. But that's another bug I guess!
Created attachment 151743 [details] Patch
(In reply to comment #5) > (In reply to comment #3) > > (From update of attachment 151713 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=151713&action=review > > > > > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2049 > > > + if (!curInstruction[4].u.structure || Heap::isMarked(curInstruction[4].u.structure.get())) > > > + break; > > > > What if the main structure is marked but the prototype structure isn't? > > How is that possible? Wouldn't a marked structure imply that its prototype is marked? Or are these dead marks and not live marks? Apologies for the naivete here. You have two structures: curInstruction[4].u.structure and curInstruction[5].u.structure. Let's called them S and T for short. S points to a prototype P, and that prototype P at some point in time had structure T. But in the future it may have structure T'. That's the whole point of get_by_id_proto caching - to quickly check that P still has structure T and not some other structure T'. Hence, you're right that S marks P and P will mark whatever structure it has, but that structure may no longer be T. So if you don't mark T yourself, you'll have a get_by_id_proto cache that is checking if P has a garbage structure. Then you may find yourself in the following situation: 0) P points to T (starting condition) 1) P changes structure to T'. T is no longer reachable. 2) GC happens. T is freed. 3) P changed structure to T''. T'' is allocated in the memory formerly occupied by T. 4) get_by_id_proto succeeds even though it should have failed In particular, T'' may no longer have the property that the get_by_id_proto is trying to read. The T->T' or T'->T'' transition may have deleted the property you were trying to read. > > > Though, I think it would be even better if the clearing you're doing below flipped this back to op_get_by_id - that's the style I've been using for put_by_id. > > I wonder though, is there any benefit to flipping back to get_by_id just because the op_get_by_id_proto failed? Consider: you fail your prediction so you go through the slow_path_get_by_id. If it's going to predict something else, it will set the opcode and structure/offset fields appropriately. If not, the cache is not updated. There is no performance benefit. There's also no disadvantage, either. This is a once-per GC event. It happens rarely. > > It seems that the likelihood of a future get_by_id_proto cache hit after a miss is somewhat likely. I was thinking of just leaving the state as get_by_id_proto, given that it dispatches fairly quickly to get_by_id if things are not right. This is a matter of style. I'm asking you to stick to the style we have (cleared caches reverting to the bare form of the instruction) rather than introducing a new style. > > > > Source/JavaScriptCore/llint/LowLevelInterpreter.asm:508 > > > +_llint_op_get_by_id_proto: > > > + traceExecution() > > > + callSlowPath(_llint_slow_path_get_by_id_proto) > > > + dispatch(9) > > > > So you get a speed-up even though this isn't inline? Nifty. We should inline it at some point. > > I think the speedup is mostly because we cut the number of dynamic dispatches in half (~20% of the time to ~10% of the time). I was playing around with inline vs not inline for a couple other opcodes (get_string_length and get_array_length) and didn't see much difference there. > > Incidentally, op_get_by_val is unfortunate in some ways because of the pointer-chasing you have to do just to determine that it is an array (getting at the jsArrayClassInfo through the global data through the code block). In C++ it is more direct. Would be nice to improve that if possible. But that's another bug I guess!
Comment on attachment 151743 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=151743&action=review > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2050 > + ASSERT(Heap::isMarked(curInstruction[5].u.structure.get())); I believe this assertion is wrong. See my previous comment.
(In reply to comment #8) > (From update of attachment 151743 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=151743&action=review > > > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2050 > > + ASSERT(Heap::isMarked(curInstruction[5].u.structure.get())); > > I believe this assertion is wrong. See my previous comment. Indeed, thanks for the review. Will fix.
One more comment while I bake the patch: (In reply to comment #7) > > I wonder though, is there any benefit to flipping back to get_by_id just because the op_get_by_id_proto failed? Consider: you fail your prediction so you go through the slow_path_get_by_id. If it's going to predict something else, it will set the opcode and structure/offset fields appropriately. If not, the cache is not updated. > > There is no performance benefit. There's also no disadvantage, either. This is a once-per GC event. It happens rarely. There are other reasons to miss the inline cache, for example passing in an object with a different prototype. If property access on that object won't be cached for whatever reason, there is no need to clear the get_by_id_proto cache. But I won't insist on this point, it is a rare thing.
Created attachment 151747 [details] Patch
I need to head out, but I'm getting a couple fast/js failures with this patch: Regressions: Unexpected text diff mismatch : (2) fast/js/dictionary-no-cache.html = TEXT fast/js/dictionary-prototype-caching.html = TEXT Clearing r? so it doesn't go through accidentally. Thanks for the review though!