Bug 158083

Summary: LLInt should support other types of prototype GetById caching.
Product: WebKit Reporter: Keith Miller <keith_miller>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED WONTFIX    
Severity: Normal CC: beidson, buildbot, commit-queue, fpizlo, ggaren, mark.lam, msaboff, ossy, rniwa, ryanhaddad, saam, ticaiolima, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 159266    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews103 for mac-yosemite
none
Archive of layout-test-results from ews106 for mac-yosemite-wk2
none
Archive of layout-test-results from ews114 for mac-yosemite
none
Archive of layout-test-results from ews122 for ios-simulator-wk2
none
Patch
none
Patch
none
Patch
none
Base's Benchmark output
none
Cached Accessor's benchmark output
none
Benchmark
none
Patch
none
Patch
none
Archive of layout-test-results from ews100 for mac-yosemite
none
Archive of layout-test-results from ews107 for mac-yosemite-wk2
none
Archive of layout-test-results from ews115 for mac-yosemite
none
Archive of layout-test-results from ews124 for ios-simulator-wk2
none
Patch
none
Archive of layout-test-results from ews101 for mac-yosemite
none
Archive of layout-test-results from ews107 for mac-yosemite-wk2
none
Archive of layout-test-results from ews124 for ios-simulator-wk2
none
Archive of layout-test-results from ews117 for mac-yosemite
none
Patch
none
speedometer run comparison with and without patch applied
none
Archive of layout-test-results from ews103 for mac-yosemite
none
Archive of layout-test-results from ews106 for mac-yosemite-wk2
none
Archive of layout-test-results from ews125 for ios-simulator-wk2
none
Archive of layout-test-results from ews116 for mac-yosemite
none
Patch
none
Patch
none
Patch
none
Patch
none
Archive of layout-test-results from ews126 for ios-simulator-wk2
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch beidson: review-

Description Keith Miller 2016-05-25 13:22:39 PDT
Currently, the LLInt supports prototype load caching. It might also be profitable to cache prototype misses, accessors, or customs.
Comment 1 Caio Lima 2016-06-02 13:12:23 PDT
Hey Keith, I am interested in work in this bug if it is possible

Cold you clarify some points to me?

1. What are prototype misses?

2. When you think in cache accessors, you mean getters and setters?

3. Also, for customs

It could be a big help if you post some JavaScript code as example.

I am trying to debug the case for accessors and I noticed that the getter is emitted as get_by_id op and I suspect it is already being cached (I didn't check it).
Comment 2 Radar WebKit Bug Importer 2016-06-02 13:12:58 PDT
<rdar://problem/26607441>
Comment 3 Keith Miller 2016-06-02 15:57:19 PDT
(In reply to comment #1)
> Hey Keith, I am interested in work in this bug if it is possible
> 
> Cold you clarify some points to me?
> 
> 1. What are prototype misses?
> 
> 2. When you think in cache accessors, you mean getters and setters?
> 
> 3. Also, for customs
> 
> It could be a big help if you post some JavaScript code as example.
> 
> I am trying to debug the case for accessors and I noticed that the getter is
> emitted as get_by_id op and I suspect it is already being cached (I didn't
> check it).

Hi Caio,

It would be great if you fixed this feature and I would be happy to help. Feel free to message me on irc, my nick is keith_miller, if you have any other questions.

1) prototype miss was the wrong terminology. I should have said that we should cache unset properties. Meaning, `let x = foo.bar` where `"bar" in foo === false`. I actually already handled this case in http://trac.webkit.org/changeset/201456.

2) Yep, that's right. In JavaScriptCore we use the word Accessor to mean a property with a getter and/or a setter. 

For example, if you did the following: 

foo = {};
Object.defineOwnProperty(Object.prototype, "bar", { get: () => { return 1; } });
print(foo.bar);

The bytecodes will look something like:

[   9] resolve_scope     loc10, loc3, print(@id0), <GlobalProperty>, 1, 0x113de7900
[  16] get_from_scope    loc6, loc10, print(@id0), 2048<ThrowIfNotFound|GlobalProperty|NotInitialization>, 122    predicting None
[  24] resolve_scope     loc11, loc3, foo(@id1), <GlobalProperty>, 1, 0x113de7900
[  31] get_from_scope    loc12, loc11, foo(@id1), 2048<ThrowIfNotFound|GlobalProperty|NotInitialization>, 191    predicting None
[  39] get_by_id         loc9, loc12, bar(@id2)    predicting None
[  48] call              loc5, loc6, 2, 16 status(Could Take Slow Path)    Original; predicting None

That get_by_id will attempt to lookup the property on foo by calling foo's getOwnPropertySlot function, which will set the passed PropertySlot& to Object.prototype's "bar" GetterSetter object. In this case, the PropertySlot will say that the value is an GetterSetter and we won't cache it.

3) Customs are mostly used as a performance optimization for DOM Accessors. Basically, instead of allocating a JSFunction object for all accessor properties in the DOM we use a function pointer instead. You can see the typedef, GetValueFunc, in PropertySlot.h. Customs are also used in a small number of other places for special properties that are values but change might change, although that is quite rare.

I would recommend for a first pass that you implement a new opcode, I guess called op_get_by_id_proto_accessor, that just calls to a slow path. In the slow path it can load the GetterSetter object and call the getter returning the result. If you are feeling very ambitious, you could inline the code for the call in the interpreter assembly. Let me know if you plan on trying to inline the call since you will need to know the JavaScriptCore calling convention to implement it.
Comment 4 Caio Lima 2016-06-03 02:57:17 PDT
Created attachment 280434 [details]
Patch
Comment 5 Caio Lima 2016-06-03 03:02:13 PDT
(In reply to comment #3)
> (In reply to comment #1)
> > Hey Keith, I am interested in work in this bug if it is possible
> > 
> > Cold you clarify some points to me?
> > 
> > 1. What are prototype misses?
> > 
> > 2. When you think in cache accessors, you mean getters and setters?
> > 
> > 3. Also, for customs
> > 
> > It could be a big help if you post some JavaScript code as example.
> > 
> > I am trying to debug the case for accessors and I noticed that the getter is
> > emitted as get_by_id op and I suspect it is already being cached (I didn't
> > check it).
> 
> Hi Caio,
> 
> It would be great if you fixed this feature and I would be happy to help.
> Feel free to message me on irc, my nick is keith_miller, if you have any
> other questions.
> 
> 1) prototype miss was the wrong terminology. I should have said that we
> should cache unset properties. Meaning, `let x = foo.bar` where `"bar" in
> foo === false`. I actually already handled this case in
> http://trac.webkit.org/changeset/201456.
> 
> 2) Yep, that's right. In JavaScriptCore we use the word Accessor to mean a
> property with a getter and/or a setter. 
> 
> For example, if you did the following: 
> 
> foo = {};
> Object.defineOwnProperty(Object.prototype, "bar", { get: () => { return 1; }
> });
> print(foo.bar);
> 
> The bytecodes will look something like:
> 
> [   9] resolve_scope     loc10, loc3, print(@id0), <GlobalProperty>, 1,
> 0x113de7900
> [  16] get_from_scope    loc6, loc10, print(@id0),
> 2048<ThrowIfNotFound|GlobalProperty|NotInitialization>, 122    predicting
> None
> [  24] resolve_scope     loc11, loc3, foo(@id1), <GlobalProperty>, 1,
> 0x113de7900
> [  31] get_from_scope    loc12, loc11, foo(@id1),
> 2048<ThrowIfNotFound|GlobalProperty|NotInitialization>, 191    predicting
> None
> [  39] get_by_id         loc9, loc12, bar(@id2)    predicting None
> [  48] call              loc5, loc6, 2, 16 status(Could Take Slow Path)   
> Original; predicting None
> 
> That get_by_id will attempt to lookup the property on foo by calling foo's
> getOwnPropertySlot function, which will set the passed PropertySlot& to
> Object.prototype's "bar" GetterSetter object. In this case, the PropertySlot
> will say that the value is an GetterSetter and we won't cache it.
> 
> 3) Customs are mostly used as a performance optimization for DOM Accessors.
> Basically, instead of allocating a JSFunction object for all accessor
> properties in the DOM we use a function pointer instead. You can see the
> typedef, GetValueFunc, in PropertySlot.h. Customs are also used in a small
> number of other places for special properties that are values but change
> might change, although that is quite rare.
> 
> I would recommend for a first pass that you implement a new opcode, I guess
> called op_get_by_id_proto_accessor, that just calls to a slow path. In the
> slow path it can load the GetterSetter object and call the getter returning
> the result. If you are feeling very ambitious, you could inline the code for
> the call in the interpreter assembly. Let me know if you plan on trying to
> inline the call since you will need to know the JavaScriptCore calling
> convention to implement it.

Thanks for the clarification Keith.

I am sending a prototype of what you described, but I definitely want to learn how can I inline a function call =).
Comment 6 Build Bot 2016-06-03 03:58:17 PDT
Comment on attachment 280434 [details]
Patch

Attachment 280434 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/1427950

New failing tests:
webgl/1.0.2/conformance/typedarrays/typed-arrays-in-workers.html
js/regress/string-prototype-search-observable-side-effects.html
webgl/1.0.2/conformance/more/conformance/quickCheckAPI-B4.html
fast/files/workers/worker-read-file-async.html
storage/indexeddb/structured-clone-private.html
fast/canvas/webgl/arraybuffer-transfer-of-control.html
js/regress/string-prototype-split-observable-side-effects.html
webgl/1.0.2/conformance/state/gl-get-calls.html
js/regress/regexp-prototype-split-observable-side-effects.html
fast/files/read-file-async.html
fast/files/workers/worker-read-file-sync.html
storage/indexeddb/structured-clone.html
fast/canvas/webgl/array-message-passing.html
js/regress/regexp-prototype-search-observable-side-effects.html
js/pic/cached-named-property-getter.html
fast/canvas/webgl/typed-arrays-in-workers.html
fast/canvas/webgl/gl-get-calls.html
Comment 7 Build Bot 2016-06-03 03:58:22 PDT
Created attachment 280435 [details]
Archive of layout-test-results from ews103 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 8 Build Bot 2016-06-03 04:01:15 PDT
Comment on attachment 280434 [details]
Patch

Attachment 280434 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/1427959

New failing tests:
webgl/1.0.2/conformance/typedarrays/typed-arrays-in-workers.html
js/regress/string-prototype-search-observable-side-effects.html
webgl/1.0.2/conformance/more/conformance/quickCheckAPI-B4.html
fast/canvas/webgl/arraybuffer-transfer-of-control.html
js/regress/string-prototype-split-observable-side-effects.html
webgl/1.0.2/conformance/state/gl-get-calls.html
js/pic/cached-named-property-getter.html
js/regress/regexp-prototype-split-observable-side-effects.html
fast/canvas/webgl/array-message-passing.html
js/regress/regexp-prototype-search-observable-side-effects.html
fast/canvas/webgl/typed-arrays-in-workers.html
fast/canvas/webgl/gl-get-calls.html
Comment 9 Build Bot 2016-06-03 04:01:19 PDT
Created attachment 280436 [details]
Archive of layout-test-results from ews106 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 10 Build Bot 2016-06-03 04:04:01 PDT
Comment on attachment 280434 [details]
Patch

Attachment 280434 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1427956

Number of test failures exceeded the failure limit.
Comment 11 Build Bot 2016-06-03 04:04:05 PDT
Created attachment 280437 [details]
Archive of layout-test-results from ews114 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 12 Build Bot 2016-06-03 04:14:51 PDT
Comment on attachment 280434 [details]
Patch

Attachment 280434 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/1427981

New failing tests:
js/regress/regexp-prototype-split-observable-side-effects.html
js/regress/string-prototype-search-observable-side-effects.html
js/regress/string-prototype-split-observable-side-effects.html
js/pic/cached-named-property-getter.html
js/arraybuffer-wrappers.html
js/regress/regexp-prototype-search-observable-side-effects.html
Comment 13 Build Bot 2016-06-03 04:14:56 PDT
Created attachment 280438 [details]
Archive of layout-test-results from ews122 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews122  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.4
Comment 14 Caio Lima 2016-06-03 09:45:58 PDT
Created attachment 280446 [details]
Patch
Comment 15 Geoffrey Garen 2016-06-03 10:53:46 PDT
Lost of crashes on the mac-debug bot:

  fast/loader/stateobjects/state-attribute-history-getter.html [ Crash ]
  fast/regex/toString.html [ Crash ]
  fast/workers/stress-js-execution.html [ Crash ]
  ietestcenter/Javascript/11.1.5-0-1.html [ Crash ]
  ietestcenter/Javascript/11.1.5-0-2.html [ Crash ]
  ietestcenter/Javascript/15.10.7.1-1.html [ Crash ]
  ietestcenter/Javascript/15.10.7.2-1.html [ Crash ]
  inspector/css/stylesheet-events-multiple-documents.html [ Crash ]
  inspector/debugger/searchInContent-linebreaks.html [ Crash ]
  inspector/debugger/setBreakpoint-actions.html [ Crash ]
  inspector/page/searchInResources.html [ Crash ]
  inspector/runtime/getProperties.html [ Crash ]
  js/Object-create.html [ Crash ]
  js/array-defineOwnProperty.html [ Crash ]
  js/basic-computed-property-name.html [ Crash ]
  js/caller-property.html [ Crash ]
  js/dom/Object-defineProperty.html [ Crash ]
  js/dom/call-base-resolution.html [ Crash ]
  js/dom/exception-sequencing-binops2.html [ Crash ]
  js/kde/StringObject.html [ Crash ]
  js/kde/prototype_proto.html [ Crash ]
  js/pic/cached-getter-setter.html [ Crash ]
  js/pic/cached-named-property-getter.html [ Crash ]
  js/regress/getter-no-activation.html [ Crash ]
  js/regress/getter.html [ Crash ]
  sputnik/Conformance/15_Native_Objects/15.5_String/15.5.4/15.5.4.9_String.prototype.localeCompare/S15.5.4.9_3.html [ Crash ]
  sputnik/Conformance/15_Native_Objects/15.5_String/15.5.4/15.5.4.9_String.prototype.localeCompare/S15.5.4.9_A1_T1.html [ Crash ]
  sputnik/Conformance/15_Native_Objects/15.5_String/15.5.4/15.5.4.9_String.prototype.localeCompare/S15.5.4.9_A1_T2.html [ Crash ]
  streams/brand-checks.html [ Crash ]


You can run these tests locally using run-webkit-tests --debug --dump-render-tree.
Comment 16 Keith Miller 2016-06-03 11:01:04 PDT
> 
> You can run these tests locally using run-webkit-tests --debug
> --dump-render-tree.

I would recommend running "run-javascriptcore-tests --debug [--no-build if you already have a debug build of jsc]" first since that doesn't require building all of WebCore and runs most of these tests.
Comment 17 Caio Lima 2016-06-03 11:07:45 PDT
(In reply to comment #15)
> Lost of crashes on the mac-debug bot:
> 
>   fast/loader/stateobjects/state-attribute-history-getter.html [ Crash ]
>   fast/regex/toString.html [ Crash ]
>   fast/workers/stress-js-execution.html [ Crash ]
>   ietestcenter/Javascript/11.1.5-0-1.html [ Crash ]
>   ietestcenter/Javascript/11.1.5-0-2.html [ Crash ]
>   ietestcenter/Javascript/15.10.7.1-1.html [ Crash ]
>   ietestcenter/Javascript/15.10.7.2-1.html [ Crash ]
>   inspector/css/stylesheet-events-multiple-documents.html [ Crash ]
>   inspector/debugger/searchInContent-linebreaks.html [ Crash ]
>   inspector/debugger/setBreakpoint-actions.html [ Crash ]
>   inspector/page/searchInResources.html [ Crash ]
>   inspector/runtime/getProperties.html [ Crash ]
>   js/Object-create.html [ Crash ]
>   js/array-defineOwnProperty.html [ Crash ]
>   js/basic-computed-property-name.html [ Crash ]
>   js/caller-property.html [ Crash ]
>   js/dom/Object-defineProperty.html [ Crash ]
>   js/dom/call-base-resolution.html [ Crash ]
>   js/dom/exception-sequencing-binops2.html [ Crash ]
>   js/kde/StringObject.html [ Crash ]
>   js/kde/prototype_proto.html [ Crash ]
>   js/pic/cached-getter-setter.html [ Crash ]
>   js/pic/cached-named-property-getter.html [ Crash ]
>   js/regress/getter-no-activation.html [ Crash ]
>   js/regress/getter.html [ Crash ]
>  
> sputnik/Conformance/15_Native_Objects/15.5_String/15.5.4/15.5.4.9_String.
> prototype.localeCompare/S15.5.4.9_3.html [ Crash ]
>  
> sputnik/Conformance/15_Native_Objects/15.5_String/15.5.4/15.5.4.9_String.
> prototype.localeCompare/S15.5.4.9_A1_T1.html [ Crash ]
>  
> sputnik/Conformance/15_Native_Objects/15.5_String/15.5.4/15.5.4.9_String.
> prototype.localeCompare/S15.5.4.9_A1_T2.html [ Crash ]
>   streams/brand-checks.html [ Crash ]
> 
> 
> You can run these tests locally using run-webkit-tests --debug
> --dump-render-tree.

(In reply to comment #14)
> Created attachment 280446 [details]


Actually, I updated this patch today morning to avoid the crashes, because my implementation was incorrect. These tests are from my first patch, I am going try to run them locally tonight, but is there one way to retrigger them here?

BTW, this patch is RFC, not the final Version, because I want to inline the call instead of using the slowPath.
Comment 18 Keith Miller 2016-06-03 11:11:40 PDT
Comment on attachment 280446 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280446&action=review

> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:703
> +    JSObject* slotBase = (JSObject *) pc[6].u.pointer;

Style point. This should be: 

JSObject* slotBase = JSCast<JSObject*>(pc[6].u.jsCell.get());

JSCast has some builtin error checking on debug builds.
Comment 19 Keith Miller 2016-06-03 11:14:31 PDT
> Actually, I updated this patch today morning to avoid the crashes, because
> my implementation was incorrect. These tests are from my first patch, I am
> going try to run them locally tonight, but is there one way to retrigger
> them here?
> 
> BTW, this patch is RFC, not the final Version, because I want to inline the
> call instead of using the slowPath.


It looks like those crashes were happening on the second patch. You can see by clicking on the mac-debug bubble and looking at the results log. Unfortunately, the only way I know to rerun the tests on the bots is to upload the patch again :/.
Comment 20 Caio Lima 2016-06-03 11:24:19 PDT
(In reply to comment #19)
> > Actually, I updated this patch today morning to avoid the crashes, because
> > my implementation was incorrect. These tests are from my first patch, I am
> > going try to run them locally tonight, but is there one way to retrigger
> > them here?
> > 
> > BTW, this patch is RFC, not the final Version, because I want to inline the
> > call instead of using the slowPath.
> 
> 
> It looks like those crashes were happening on the second patch. You can see
> by clicking on the mac-debug bubble and looking at the results log.
> Unfortunately, the only way I know to rerun the tests on the bots is to
> upload the patch again :/.

No problem. I am going to fix them tonight, because I have no access to my computer right now.
Comment 21 Keith Miller 2016-06-03 18:50:42 PDT
Per your request before on inlining the getter call, I think the simplest way to implement it is to follow how we do it in PolymorphicAccess/operationGetByIdOptimize. PolymorphicAccess has the inline cache repatching code used in the JITs. The assembler in PolymorphicAccess more or less models X86 so it shouldn't be too hard to follow and translate to LLInt assembly. I'll also see if I can find some better information about the calling convention too. Let me know if you have any other questions.
Comment 22 Keith Miller 2016-06-03 19:05:31 PDT
(In reply to comment #21)
> Per your request before on inlining the getter call, I think the simplest
> way to implement it is to follow how we do it in
> PolymorphicAccess/operationGetByIdOptimize. PolymorphicAccess has the inline
> cache repatching code used in the JITs. The assembler in PolymorphicAccess
> more or less models X86 so it shouldn't be too hard to follow and translate
> to LLInt assembly. I'll also see if I can find some better information about
> the calling convention too. Let me know if you have any other questions.

Also, it might be helpful to look at JSStack.h, in particular, the CallFrameHeaderEntry enum since that represents the expected layout of a JS frame at the end of the function prologue.
Comment 23 Caio Lima 2016-06-04 18:54:51 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > Per your request before on inlining the getter call, I think the simplest
> > way to implement it is to follow how we do it in
> > PolymorphicAccess/operationGetByIdOptimize. PolymorphicAccess has the inline
> > cache repatching code used in the JITs. The assembler in PolymorphicAccess
> > more or less models X86 so it shouldn't be too hard to follow and translate
> > to LLInt assembly. I'll also see if I can find some better information about
> > the calling convention too. Let me know if you have any other questions.
> 
> Also, it might be helpful to look at JSStack.h, in particular, the
> CallFrameHeaderEntry enum since that represents the expected layout of a JS
> frame at the end of the function prologue.

Thank you for pointing where I can look. I think I am almost done because I understand and know hot to construct the JS Frame convention for 64bits architecture, however I am missing some very important information.

1) how to perform the call to the Getter function? I searched and found a makeJavascriptCall macro, however I am not sure if it is necessary use it.
Also I would like to understand the offlineassembly code "call".

2) Another point is, How can I get the getter JSFunction, since the offset cached point to a "GetterSetter"? Is it possible use "loadp GetterSetter::m_getter[t3], t4", for example? I noticed that the code in PolymorphicAccess uses GetterSetter::offsetGetter(). I suppose that this value should be placed in calle member of JS Frame Entry.

I didn't find information about the OfflineAssembly, could you point me some?
Comment 24 Caio Lima 2016-06-05 01:17:46 PDT
Created attachment 280545 [details]
Patch
Comment 25 Caio Lima 2016-06-05 01:21:29 PDT
In this patch I fixed the regressions and tried to inline the op_code. I am sending my experimental code commented. However, I am facing problems to get the Getter Function and store it in the Calle member of the JS Stack entry. Could you help me on this?
Comment 26 Keith Miller 2016-06-05 12:05:07 PDT
(In reply to comment #25)
> In this patch I fixed the regressions and tried to inline the op_code. I am
> sending my experimental code commented. However, I am facing problems to get
> the Getter Function and store it in the Calle member of the JS Stack entry.
> Could you help me on this?

Thinking about this further I realized that in order to inline the call we would need to have CallLinkInfo for the getter Callee. That would tell us what address to call when the function we are calling gets tiered up. Unfortunately, as far as I know, we don't have any kind of CallLinkInfo for the LLInt. Without, CallLinkInfo I think you would need to call into C++ to find the call location anyway. Let me look into this further but I think it might not be worthwhile to inline normal getters. We should still be able to  inline custom getters however.
Comment 27 Caio Lima 2016-06-05 13:21:22 PDT
Created attachment 280560 [details]
Patch
Comment 28 Caio Lima 2016-06-05 13:27:43 PDT
Hi Guys.

I cleaned the code and it is my Patch Version for caching the prototype accessor. I would like to know if It is a good idea send the implementation cache of prototype custom functions in this bug. 
Also, could you point me to a code that uses prototype customs?
Comment 29 Saam Barati 2016-06-05 16:31:16 PDT
Comment on attachment 280560 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280560&action=review

Have you run any benchmarks?

> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:713
> +    LLINT_OP(1) = result;
> +
> +    pc[OPCODE_LENGTH(op_get_by_id) - 1].u.profile->m_buckets[0] = JSValue::encode(result);
> +    LLINT_END();

I think LLINT_RETURN_PROFILED does this for you
Comment 30 Saam Barati 2016-06-05 16:35:09 PDT
View in context: https://bugs.webkit.org/attachment.cgi?id=280560&action=review

Have you run any benchmarks?

> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:713
> +    LLINT_OP(1) = result;
> +
> +    pc[OPCODE_LENGTH(op_get_by_id) - 1].u.profile->m_buckets[0] = JSValue::encode(result);
> +    LLINT_END();

I think LLINT_RETURN_PROFILED does this for you
Comment 31 Caio Lima 2016-06-05 19:56:18 PDT
Created attachment 280572 [details]
Base's Benchmark output
Comment 32 Caio Lima 2016-06-05 19:57:01 PDT
Created attachment 280573 [details]
Cached Accessor's benchmark output
Comment 33 Caio Lima 2016-06-05 21:38:39 PDT
Checking the benchmark I noticed that the patch does not improve the results. The cases that I paid attention were:

#JSC with the patch
get-by-id-proto-or-self                        11.6561+-2.5266

#JSC without the patch
get-by-id-proto-or-self                        13.2753+-2.3457    
--------------------------------------------------------------
#JSC with the patch
get-by-id-self-or-proto                        13.0930+-1.7589

#JSC without the patch
get-by-id-self-or-proto                        12.1011+-0.7154    
--------------------------------------------------------------
#JSC with the patch
getter-prototype                                5.8231+-0.6807

#JSC without the patch
getter-prototype                                5.5745+-0.2236    
--------------------------------------------------------------
#JSC with the patch
prototype-access-with-mutating-prototype        6.2859+-2.0017

#JSC without the patch
prototype-access-with-mutating-prototype        5.3830+-0.4957
--------------------------------------------------------------

It is important notice the errors presented by the JSC run with the patch are always higher than the run without patch. IMHO it is because of the prototype cache maintenance and setup overhead. Also, It is important consider that this is implementing just the slow_path to access the prototype accessor functions.
Comment 34 Saam Barati 2016-06-05 23:08:54 PDT
(In reply to comment #32)
> Created attachment 280573 [details]
> Cached Accessor's benchmark output

You can compare two VMs directly using the `run-jsc-benchmarks` script. This will give you nice output showing the differences so you don't have to manually compare the results.
Like so:
`run-jsc-benchmarks baseline:<pathToBaslineBuild>/jsc  changes:<pathToChanges>/jsc`

You can run individual benchmarks like (this runs only Kraken):
`run-jsc-benchmarks --kraken baseline:<pathToBaslineBuild>/jsc  changes:<pathToChanges>/jsc`

You can make run-jsc-benchmarks run more iterations which is helpful for noisy benchmarks. The default is 4 runs. This will make kraken run for 10 runs:
`run-jsc-benchmarks --kraken --outer 10 baseline:<pathToBaslineBuild>/jsc  changes:<pathToChanges>/jsc`

You can also provide a regex to specify sub tests inside a benchmark suite. This will run just ai-astar and audio-fft from kraken.
`run-jsc-benchmarks --kraken --benchmarks "ai-astar|audio-fft"  baseline:<pathToBaslineBuild>/jsc  changes:<pathToChanges>/jsc`


Also, you should get yourself an Octane and Kraken. They can be found on github I think. We don't bundle those benchmarks inside WebKit itself, but run-jsc-benchmarks
will look for the following the following JSON file:
`~/.run-jsc-benchmarks`
Which specifies paths to where Octane/Kraken can be found.

My `~/.run-jsc-benchmarks` looks like:
```
{
    "OctanePath": "/Volumes/Data/jsc-benchmarks/octane",
    "KrakenPath": "/Volumes/Data/jsc-benchmarks/kraken/tests/kraken-1.1"
}
```
Comment 35 Caio Lima 2016-06-05 23:14:05 PDT
(In reply to comment #34)
> (In reply to comment #32)
> > Created attachment 280573 [details]
> > Cached Accessor's benchmark output
> 
> You can compare two VMs directly using the `run-jsc-benchmarks` script. This
> will give you nice output showing the differences so you don't have to
> manually compare the results.
> Like so:
> `run-jsc-benchmarks baseline:<pathToBaslineBuild>/jsc 
> changes:<pathToChanges>/jsc`
> 
> You can run individual benchmarks like (this runs only Kraken):
> `run-jsc-benchmarks --kraken baseline:<pathToBaslineBuild>/jsc 
> changes:<pathToChanges>/jsc`
> 
> You can make run-jsc-benchmarks run more iterations which is helpful for
> noisy benchmarks. The default is 4 runs. This will make kraken run for 10
> runs:
> `run-jsc-benchmarks --kraken --outer 10 baseline:<pathToBaslineBuild>/jsc 
> changes:<pathToChanges>/jsc`
> 
> You can also provide a regex to specify sub tests inside a benchmark suite.
> This will run just ai-astar and audio-fft from kraken.
> `run-jsc-benchmarks --kraken --benchmarks "ai-astar|audio-fft" 
> baseline:<pathToBaslineBuild>/jsc  changes:<pathToChanges>/jsc`
> 
> 
> Also, you should get yourself an Octane and Kraken. They can be found on
> github I think. We don't bundle those benchmarks inside WebKit itself, but
> run-jsc-benchmarks
> will look for the following the following JSON file:
> `~/.run-jsc-benchmarks`
> Which specifies paths to where Octane/Kraken can be found.
> 
> My `~/.run-jsc-benchmarks` looks like:
> ```
> {
>     "OctanePath": "/Volumes/Data/jsc-benchmarks/octane",
>     "KrakenPath": "/Volumes/Data/jsc-benchmarks/kraken/tests/kraken-1.1"
> }
> ```

It is a nice information. BTW, How can I manage the DYLD_FRAMEWORK_PATH between 2 different builds?
Comment 36 Saam Barati 2016-06-05 23:45:02 PDT
(In reply to comment #35)
> (In reply to comment #34)
> > (In reply to comment #32)
> > > Created attachment 280573 [details]
> > > Cached Accessor's benchmark output
> > 
> > You can compare two VMs directly using the `run-jsc-benchmarks` script. This
> > will give you nice output showing the differences so you don't have to
> > manually compare the results.
> > Like so:
> > `run-jsc-benchmarks baseline:<pathToBaslineBuild>/jsc 
> > changes:<pathToChanges>/jsc`
> > 
> > You can run individual benchmarks like (this runs only Kraken):
> > `run-jsc-benchmarks --kraken baseline:<pathToBaslineBuild>/jsc 
> > changes:<pathToChanges>/jsc`
> > 
> > You can make run-jsc-benchmarks run more iterations which is helpful for
> > noisy benchmarks. The default is 4 runs. This will make kraken run for 10
> > runs:
> > `run-jsc-benchmarks --kraken --outer 10 baseline:<pathToBaslineBuild>/jsc 
> > changes:<pathToChanges>/jsc`
> > 
> > You can also provide a regex to specify sub tests inside a benchmark suite.
> > This will run just ai-astar and audio-fft from kraken.
> > `run-jsc-benchmarks --kraken --benchmarks "ai-astar|audio-fft" 
> > baseline:<pathToBaslineBuild>/jsc  changes:<pathToChanges>/jsc`
> > 
> > 
> > Also, you should get yourself an Octane and Kraken. They can be found on
> > github I think. We don't bundle those benchmarks inside WebKit itself, but
> > run-jsc-benchmarks
> > will look for the following the following JSON file:
> > `~/.run-jsc-benchmarks`
> > Which specifies paths to where Octane/Kraken can be found.
> > 
> > My `~/.run-jsc-benchmarks` looks like:
> > ```
> > {
> >     "OctanePath": "/Volumes/Data/jsc-benchmarks/octane",
> >     "KrakenPath": "/Volumes/Data/jsc-benchmarks/kraken/tests/kraken-1.1"
> > }
> > ```
> 
> It is a nice information. BTW, How can I manage the DYLD_FRAMEWORK_PATH
> between 2 different builds?
run-jsc-benchmarks should do that for you
Comment 37 Caio Lima 2016-06-06 01:47:05 PDT
Created attachment 280590 [details]
Benchmark
Comment 38 Keith Miller 2016-06-08 08:57:33 PDT
Comment on attachment 280560 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280560&action=review

Looks good. I think it's almost ready to go. Saam and I have a few comments before the patch is ready to land however.

> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:659
> +        if ((slot.isValue() || slot.isAccessor()) && slot.slotBase() == baseValue) {

Why clear the cache if we have a self accessor? There is no cost to leaving the old value cache in place.
Comment 39 Caio Lima 2016-06-08 09:42:31 PDT
(In reply to comment #38)
> Comment on attachment 280560 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=280560&action=review
> 
> Looks good. I think it's almost ready to go. Saam and I have a few comments
> before the patch is ready to land however.
> 
> > Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:659
> > +        if ((slot.isValue() || slot.isAccessor()) && slot.slotBase() == baseValue) {
> 
> Why clear the cache if we have a self accessor? There is no cost to leaving
> the old value cache in place.

I see your point. Actually, the main reason I implemented it was because I though it was the case where handling a cache miss and clearing the old cache. However, if I am not wrong, It is the case just to self accessors, right?

In our implementation, we just cache the prototype once (when pc[7].u.operand reaches 0) and after that, all cache misses are handled by the _llint_get_by_id_proto_acessor opcode slowPath case. The advantage to clear the cache is to avoid all these misses, however, the original _llint_get_by_id_proto_acessor also check for misses and there is no real improvement clearing the cache, right?

I am going to change it.

BTW, I have already an first implementation of CustomAcessors. What I need to finish is inline the CustomAcessor call to Assembly code. I think it is a good idea send them in the same patch, since the files touched are the same. As you pointed me instructions to try inline the Getter, I already have an idea what I need to do to complete this implementation.
Comment 40 Keith Miller 2016-06-08 09:49:16 PDT
(In reply to comment #39)
> (In reply to comment #38)
> > Comment on attachment 280560 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=280560&action=review
> > 
> > Looks good. I think it's almost ready to go. Saam and I have a few comments
> > before the patch is ready to land however.
> > 
> > > Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:659
> > > +        if ((slot.isValue() || slot.isAccessor()) && slot.slotBase() == baseValue) {
> > 
> > Why clear the cache if we have a self accessor? There is no cost to leaving
> > the old value cache in place.
> 
> I see your point. Actually, the main reason I implemented it was because I
> though it was the case where handling a cache miss and clearing the old
> cache. However, if I am not wrong, It is the case just to self accessors,
> right?
> 
> In our implementation, we just cache the prototype once (when
> pc[7].u.operand reaches 0) and after that, all cache misses are handled by
> the _llint_get_by_id_proto_acessor opcode slowPath case. The advantage to
> clear the cache is to avoid all these misses, however, the original
> _llint_get_by_id_proto_acessor also check for misses and there is no real
> improvement clearing the cache, right?
> 

That's right. Unless we have something else to put in the cache or the cache becomes invalid there is no benefit to clearing it.

> I am going to change it.
> 
> BTW, I have already an first implementation of CustomAcessors. What I need
> to finish is inline the CustomAcessor call to Assembly code. I think it is a
> good idea send them in the same patch, since the files touched are the same.
> As you pointed me instructions to try inline the Getter, I already have an
> idea what I need to do to complete this implementation.

Cool, sounds like a plan. Let me know if you have any other questions.
Comment 41 Caio Lima 2016-06-10 13:32:08 PDT
Hi Guys,

I am trying to inline the customs getter function call but I am facing a weird error.

When I set the cache Up, I put into pc[5] the pointer of the Custom Function "GetValueFunc".

When I try to load it into a register in LowLevelInterpreter64.asm, its content is not the same. I am doing the following operations:

loadisFromInstruction(5, t1)
PrepareForCCall
//load the arguments
cCall4(t0)
//retrun retrieving and dispatch

I debugged the value present into t1 (in my architecture is the rsi) and it is not corresponding to the pointer setted in setup. To test if call was working, I manually set rsi with the value of the function address and the function was called, however some arguments were incorrect the pointers.

I tried load the function pointer using:
loadpFromInstruction(5, t1)

And had no success too.

Is anything I am doing wrong to get the pc[] elements as pointer?
Comment 42 Saam Barati 2016-06-10 22:20:57 PDT
(In reply to comment #41)
> Hi Guys,
> 
> I am trying to inline the customs getter function call but I am facing a
> weird error.
> 
> When I set the cache Up, I put into pc[5] the pointer of the Custom Function
> "GetValueFunc".
> 
> When I try to load it into a register in LowLevelInterpreter64.asm, its
> content is not the same. I am doing the following operations:
> 
> loadisFromInstruction(5, t1)
> PrepareForCCall
> //load the arguments
> cCall4(t0)
> //retrun retrieving and dispatch
> 
> I debugged the value present into t1 (in my architecture is the rsi) and it
> is not corresponding to the pointer setted in setup. To test if call was
> working, I manually set rsi with the value of the function address and the
> function was called, however some arguments were incorrect the pointers.
> 
> I tried load the function pointer using:
> loadpFromInstruction(5, t1)
> 
> And had no success too.
> 
> Is anything I am doing wrong to get the pc[] elements as pointer?
Can you post the patch that you think should work?
What architecture are you on?
Comment 43 Caio Lima 2016-06-10 22:41:19 PDT
(In reply to comment #42)
> (In reply to comment #41)
> > Hi Guys,
> > 
> > I am trying to inline the customs getter function call but I am facing a
> > weird error.
> > 
> > When I set the cache Up, I put into pc[5] the pointer of the Custom Function
> > "GetValueFunc".
> > 
> > When I try to load it into a register in LowLevelInterpreter64.asm, its
> > content is not the same. I am doing the following operations:
> > 
> > loadisFromInstruction(5, t1)
> > PrepareForCCall
> > //load the arguments
> > cCall4(t0)
> > //retrun retrieving and dispatch
> > 
> > I debugged the value present into t1 (in my architecture is the rsi) and it
> > is not corresponding to the pointer setted in setup. To test if call was
> > working, I manually set rsi with the value of the function address and the
> > function was called, however some arguments were incorrect the pointers.
> > 
> > I tried load the function pointer using:
> > loadpFromInstruction(5, t1)
> > 
> > And had no success too.
> > 
> > Is anything I am doing wrong to get the pc[] elements as pointer?
> Can you post the patch that you think should work?
> What architecture are you on?

Hi Saam. I just discovered that some pseudo-register are mapped to the same physical register (e.g t1 = a0 in my architecture). I fixed the register usage and it is working now in my architecture.

Now the problem I am facing is that the "typedef EncodedJSValue (*GetValueFunc)(ExecState*, EncodedJSValue thisValue, PropertyName, JSObject* slotBase)" requires a PropertyName and I don't know how can I pass it in the inlined assembly.

My current assembly is this one:

_llint_op_get_by_id_proto_custom:
    traceExecution()
    loadisFromInstruction(2, t0)
    loadConstantOrVariableCell(t0, t3, .opGetByIdProtoCustomSlow)
    loadi JSCell::m_structureID[t3], t1
    loadisFromInstruction(4, t2)
    bineq t2, t1, .opGetByIdProtoCustomSlow
    loadpFromInstruction(6, a1)
    loadpFromInstruction(5, t0)
    push t2
    push PC
    push PB
    prepareStateForCCall()
    move cfr, a0
    move 0, a2 # I am testing with CustomGetter function and this parameter is ignored
    move a1, a3
    cCall4(t0)
    restoreStateAfterCCall()
    pop PB
    pop PC
    pop t2
    storeq r0, [cfr, t2, 8]
    valueProfile(r0, 8, t1)
    dispatch(9)

.opGetByIdProtoCustomSlow:
    callSlowPath(_llint_slow_path_get_by_id)
    dispatch(9)
Comment 44 Saam Barati 2016-06-11 01:43:01 PDT
A PropertyName is identical to a UniquedStringImpl* so you can just pass that.
A PropertyName is a struct with a single UniquedStringImpl* field.

I believe get_by_id has an index in it's instruction corresponding to its UniquedStringImpl*. You may need to write some code that loads this off
the CodeBlock based on some index. I'm not sure if we have code like that already
in the LLInt but it should be easy to write. 

You should also make sure you don't have any other unintended register
aliasing for all platforms that the LLInt supports.
Comment 45 Caio Lima 2016-06-11 13:37:41 PDT
(In reply to comment #44)
> A PropertyName is identical to a UniquedStringImpl* so you can just pass
> that.
> A PropertyName is a struct with a single UniquedStringImpl* field.
> 
> I believe get_by_id has an index in it's instruction corresponding to its
> UniquedStringImpl*. You may need to write some code that loads this off
> the CodeBlock based on some index. I'm not sure if we have code like that
> already
> in the LLInt but it should be easy to write. 

Actually, I don't think so. Checking the CodeBlock code I found the method used to get the identifier based in the identifier index.

const Identifier& CodeBlock::identifier(int index) const
{
    size_t unlinkedIdentifiers = m_unlinkedCode->numberOfIdentifiers();
    if (static_cast<unsigned>(index) < unlinkedIdentifiers)
        return m_unlinkedCode->identifier(index);
    
}

The m_unlikedCode stores the identifiers into a Vector<Identifier> m_identifiers. The problem is to revive the identifier contained in the Vector buffer. My current implementation is like this

macro loadIdentifier(index, dest)
    loadp CodeBlock[cfr], t2
    loadp CodeBlock::m_unlinkedCode[t2], t1 #t1 is pointing to UnlinkedCodeBlock
    addp UnlinkedCodeBlock::m_identifiers, t1 #Offset to point to m_identifiers
    loadis index, t2
    mulp sizeof Identifier, t2 
    addp t2, t1 # Offset to point to m_identifiers[index]
    addp Identifier::m_string, t1 # m_identifiers[index].m_string
    loadp t1, dest
end 

Any tip?
Comment 46 Caio Lima 2016-06-11 14:38:34 PDT
(In reply to comment #45)
> The m_unlikedCode stores the identifiers into a Vector<Identifier>
> m_identifiers. The problem is to revive the identifier contained in the
> Vector buffer. My current implementation is like this
> 
> macro loadIdentifier(index, dest)
>     loadp CodeBlock[cfr], t2
>     loadp CodeBlock::m_unlinkedCode[t2], t1 #t1 is pointing to
> UnlinkedCodeBlock
>     addp UnlinkedCodeBlock::m_identifiers, t1 #Offset to point to
> m_identifiers
>     loadis index, t2
>     mulp sizeof Identifier, t2 
>     addp t2, t1 # Offset to point to m_identifiers[index]
>     addp Identifier::m_string, t1 # m_identifiers[index].m_string
>     loadp t1, dest
> end 
> 
> Any tip?

Actually, this code working now:

macro loadIdentifier(index, dest)
    loadp CodeBlock[cfr], t2
    loadp CodeBlock::m_unlinkedCode[t2], t1
    loadp UnlinkedCodeBlock::m_identifiers[t1], t2
    move t2, t1
    loadis index, t2
    mulp sizeof Identifier, t2
    addp t2, t1
    loadp Identifier::m_string[t1], dest
end

I would like to know if there is a way to test the changes in all architectures.
Comment 47 Saam Barati 2016-06-11 14:40:25 PDT
(In reply to comment #45)
> (In reply to comment #44)
> > A PropertyName is identical to a UniquedStringImpl* so you can just pass
> > that.
> > A PropertyName is a struct with a single UniquedStringImpl* field.
> > 
> > I believe get_by_id has an index in it's instruction corresponding to its
> > UniquedStringImpl*. You may need to write some code that loads this off
> > the CodeBlock based on some index. I'm not sure if we have code like that
> > already
> > in the LLInt but it should be easy to write. 
> 
> Actually, I don't think so. Checking the CodeBlock code I found the method
> used to get the identifier based in the identifier index.
> 
> const Identifier& CodeBlock::identifier(int index) const
> {
>     size_t unlinkedIdentifiers = m_unlinkedCode->numberOfIdentifiers();
>     if (static_cast<unsigned>(index) < unlinkedIdentifiers)
>         return m_unlinkedCode->identifier(index);
>     
> }
> 
> The m_unlikedCode stores the identifiers into a Vector<Identifier>
> m_identifiers. The problem is to revive the identifier contained in the
> Vector buffer. My current implementation is like this
> 
> macro loadIdentifier(index, dest)
>     loadp CodeBlock[cfr], t2
>     loadp CodeBlock::m_unlinkedCode[t2], t1 #t1 is pointing to
> UnlinkedCodeBlock
>     addp UnlinkedCodeBlock::m_identifiers, t1 #Offset to point to
> m_identifiers
>     loadis index, t2
>     mulp sizeof Identifier, t2 
>     addp t2, t1 # Offset to point to m_identifiers[index]
>     addp Identifier::m_string, t1 # m_identifiers[index].m_string
>     loadp t1, dest
> end 
> 
> Any tip?
This is how we solve such problems in the LLInt and JITs in general. We often
end up inlining particular functions. For the JIT, this is usually easier because it
can compile constants more easily. So, your proposed solution would definitely work,
but it's obviously not ideal. 

But I should have suggested something simpler yesterday.
When you convert your get_by_id into your specialized version, you
can stash the UniquedStringImpl* into the instruction stream itself if
you have an unused instruction offset. Do you have such a free location?
Comment 48 Saam Barati 2016-06-11 14:42:03 PDT
(In reply to comment #46)
> (In reply to comment #45)
> > The m_unlikedCode stores the identifiers into a Vector<Identifier>
> > m_identifiers. The problem is to revive the identifier contained in the
> > Vector buffer. My current implementation is like this
> > 
> > macro loadIdentifier(index, dest)
> >     loadp CodeBlock[cfr], t2
> >     loadp CodeBlock::m_unlinkedCode[t2], t1 #t1 is pointing to
> > UnlinkedCodeBlock
> >     addp UnlinkedCodeBlock::m_identifiers, t1 #Offset to point to
> > m_identifiers
> >     loadis index, t2
> >     mulp sizeof Identifier, t2 
> >     addp t2, t1 # Offset to point to m_identifiers[index]
> >     addp Identifier::m_string, t1 # m_identifiers[index].m_string
> >     loadp t1, dest
> > end 
> > 
> > Any tip?
> 
> Actually, this code working now:
> 
> macro loadIdentifier(index, dest)
>     loadp CodeBlock[cfr], t2
>     loadp CodeBlock::m_unlinkedCode[t2], t1
>     loadp UnlinkedCodeBlock::m_identifiers[t1], t2
>     move t2, t1
>     loadis index, t2
>     mulp sizeof Identifier, t2
>     addp t2, t1
>     loadp Identifier::m_string[t1], dest
> end
> 
> I would like to know if there is a way to test the changes in all
> architectures.
Which architecture are you running locally?
Comment 49 Keith Miller 2016-06-11 14:48:14 PDT
(In reply to comment #47)
> But I should have suggested something simpler yesterday.
> When you convert your get_by_id into your specialized version, you
> can stash the UniquedStringImpl* into the instruction stream itself if
> you have an unused instruction offset. Do you have such a free location?

This is a good idea. We have an invariant that changing a Accessor on an object requires transitioning the structure of that object. This means that you can use, for example, pc[4] to hold the function pointer and pc[5] to point to the UniquedStringImpl* of the property.

(In reply to comment #46)
> I would like to know if there is a way to test the changes in all
> architectures.

It can be pretty tricky. I would recommend having the bot do it since, IIRC, we test all the architectures we care about.
Comment 50 Caio Lima 2016-06-11 15:17:47 PDT
(In reply to comment #49)
> (In reply to comment #47)
> > But I should have suggested something simpler yesterday.
> > When you convert your get_by_id into your specialized version, you
> > can stash the UniquedStringImpl* into the instruction stream itself if
> > you have an unused instruction offset. Do you have such a free location?
> 
> This is a good idea. We have an invariant that changing a Accessor on an
> object requires transitioning the structure of that object. This means that
> you can use, for example, pc[4] to hold the function pointer and pc[5] to
> point to the UniquedStringImpl* of the property.

Actually, my first thinking was in that way. However, pc[4] is used to check if the inline is still valid.

_llint_op_get_by_id_proto_custom:
    traceExecution()
    loadisFromInstruction(2, t0)
    loadConstantOrVariableCell(t0, t3, .opGetByIdProtoCustomSlow)
    loadi JSCell::m_structureID[t3], t1
    loadisFromInstruction(4, t2)
    bineq t2, t1, .opGetByIdProtoCustomSlow

This way, I think it is not possible use pc[4] for that. Could you confirm it? If it is possible, I think it is the best way to implement it.

Another strategy i had in mind was create a C struct to store customGetter pointer and UniquedStringImpl and "pc[5].u.cstruct" point to it. However, it is not efficient, since it is going to double the size of pc[].u. A way to improve it should store a pointer to this struct, however It is not possible to track this pointer and free its memory.

Other solution I was thinking was create an property into slot.slotBase() to store the identifier, however it is going to create a waste of space to other objects.

So, the most viable solution I thought was the last one I commented above.
 
> (In reply to comment #46)
> > I would like to know if there is a way to test the changes in all
> > architectures.
> 
> It can be pretty tricky. I would recommend having the bot do it since, IIRC,
> we test all the architectures we care about.

No problem. I asked because you could have a way to do it without submitting a patch. BTW my current architecture is x86_64 running Mac OS X.
Comment 51 Saam Barati 2016-06-11 15:27:02 PDT
(In reply to comment #49)
> (In reply to comment #47)
> > But I should have suggested something simpler yesterday.
> > When you convert your get_by_id into your specialized version, you
> > can stash the UniquedStringImpl* into the instruction stream itself if
> > you have an unused instruction offset. Do you have such a free location?
> 
> This is a good idea. We have an invariant that changing a Accessor on an
> object requires transitioning the structure of that object. This means that
> you can use, for example, pc[4] to hold the function pointer and pc[5] to
> point to the UniquedStringImpl* of the property.
> 
> (In reply to comment #46)
> > I would like to know if there is a way to test the changes in all
> > architectures.
> 
> It can be pretty tricky. I would recommend having the bot do it since, IIRC,
> we test all the architectures we care about.

Yeah it is indeed tricky. We do have some ARM EWS bots so if you upload relevant
tests in a LayoutTest, EWS will run it for you. Unfortunately, we don't have 
run-javascriptcore-tests on EWS, but we have internal bots that will alert us of
failure at Apple.
Comment 52 Saam Barati 2016-06-11 15:29:57 PDT
You won't be able to test ARM on your machine, but you can also
test x86 32-bit by building JavaScript core like:
Tools/Scripts/build-jsc --32-bit --release (or --debug)

This will give you some more coverage also because x86-32 doesn't have argument registers.
Comment 53 Caio Lima 2016-06-13 22:52:03 PDT
Created attachment 281234 [details]
Patch
Comment 54 Caio Lima 2016-06-13 23:05:16 PDT
Hi Guys. 

It is not the final patch (It is not ready yet and it is breaking).

I need some help from you. I am running my implementation of CustomAccessor prototype inline, however I think I am missing some important information.

I created a slowPath version to make debug easier and evaluate the performance gain without inlining it to assembly.

The problem that I am facing is because some customGetters are failing because the slot.slotBase() is not the correct parameter to them. In PropertySlot.cpp I found this code:

JSValue PropertySlot::customGetter(ExecState* exec, PropertyName propertyName) const
{
    // FIXME: Remove this differences in custom values and custom accessors.
    // https://bugs.webkit.org/show_bug.cgi?id=158014
    JSValue thisValue = m_attributes & CustomAccessor ? m_thisValue : JSValue(slotBase());
    return JSValue::decode(m_data.custom.getValue(exec, JSValue::encode(thisValue), propertyName));
}

The biggest problem I am facing is because m_thisValue.getObject() is not a valid pointer when called by the inlined version and I am not able to store EncodedJSValue to pc[6]. I spent yesterday night and today night trying to figure out what can I do.


Saam, about store the UniquedStringImpl into a pc slot, it is not possible because all slot are in use.
Comment 55 Caio Lima 2016-06-13 23:08:11 PDT
Created attachment 281235 [details]
Patch
Comment 56 Build Bot 2016-06-13 23:55:42 PDT
Comment on attachment 281235 [details]
Patch

Attachment 281235 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/1498748

Number of test failures exceeded the failure limit.
Comment 57 Build Bot 2016-06-13 23:55:48 PDT
Created attachment 281237 [details]
Archive of layout-test-results from ews100 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 58 Build Bot 2016-06-13 23:57:05 PDT
Comment on attachment 281235 [details]
Patch

Attachment 281235 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/1498751

Number of test failures exceeded the failure limit.
Comment 59 Build Bot 2016-06-13 23:57:09 PDT
Created attachment 281238 [details]
Archive of layout-test-results from ews107 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 60 Build Bot 2016-06-13 23:57:29 PDT
Comment on attachment 281235 [details]
Patch

Attachment 281235 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1498731

Number of test failures exceeded the failure limit.
Comment 61 Build Bot 2016-06-13 23:57:33 PDT
Created attachment 281239 [details]
Archive of layout-test-results from ews115 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 62 Build Bot 2016-06-13 23:58:22 PDT
Comment on attachment 281235 [details]
Patch

Attachment 281235 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/1498736

Number of test failures exceeded the failure limit.
Comment 63 Build Bot 2016-06-13 23:58:24 PDT
Created attachment 281240 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.4
Comment 64 Caio Lima 2016-06-14 11:12:18 PDT
Hi Guys. Finally I finished a version that is passing in the webkit-test for x86_64 on Mac. Now I need to fix the 32bit version and send to review. However, I am testing the performance to check the improvement of this implementation.

As a Benchmark for the Custom inlined I am using the following tool:
http://browserbench.org/Speedometer/

I already have some results but I didn't Analyze them. I am running this tool with the --release version of the baseline version and with the path applied.

In the overall score, both versions are close and I couldn't notice difference in the results. I am going to improve the analysis to check some behaviors of this benchmark with the prototype cache implementation (e.g.cache miss rate). I keep you informed. Probably today I am going to send my final version to review today.
Comment 65 Caio Lima 2016-06-15 20:30:45 PDT
Created attachment 281432 [details]
Patch
Comment 66 Build Bot 2016-06-15 21:28:16 PDT
Comment on attachment 281432 [details]
Patch

Attachment 281432 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/1510609

New failing tests:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
Comment 67 Build Bot 2016-06-15 21:28:20 PDT
Created attachment 281436 [details]
Archive of layout-test-results from ews101 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 68 Build Bot 2016-06-15 21:32:25 PDT
Comment on attachment 281432 [details]
Patch

Attachment 281432 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/1510612

New failing tests:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
Comment 69 Build Bot 2016-06-15 21:32:29 PDT
Created attachment 281437 [details]
Archive of layout-test-results from ews107 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 70 Build Bot 2016-06-15 21:36:18 PDT
Comment on attachment 281432 [details]
Patch

Attachment 281432 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/1510615

New failing tests:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
Comment 71 Build Bot 2016-06-15 21:36:23 PDT
Created attachment 281438 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.4
Comment 72 Build Bot 2016-06-15 23:48:44 PDT
Comment on attachment 281432 [details]
Patch

Attachment 281432 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1511098

New failing tests:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
Comment 73 Build Bot 2016-06-15 23:48:49 PDT
Created attachment 281446 [details]
Archive of layout-test-results from ews117 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 74 Caio Lima 2016-06-16 23:27:26 PDT
Created attachment 281539 [details]
Patch
Comment 75 Caio Lima 2016-06-16 23:32:07 PDT
Created attachment 281540 [details]
speedometer run comparison with and without patch applied

It is the result of speedometer benchmark comparing baseline and the patch applied.
Comment 76 Caio Lima 2016-06-16 23:44:42 PDT
(In reply to comment #75)
> Created attachment 281540 [details]
> speedometer run comparison with and without patch applied
> 
> It is the result of speedometer benchmark comparing baseline and the patch
> applied.

I noticed that there is no real gain in this benchmark in terms of performance for the custom accessor caching. So I decided to check the cache miss ratio of it to verify the behavior of this benchmark. The summary results I've got are:

----------------------------------------------------------------------------
Current CacheMiss Ratio:145467/319000
----------------------------------------------------------------------------

This Ration is 0.45600940438871473. I am not convinced that this benchmark is a good choice to test the performance gain of this patch. What I am doing right now is logging my tonight usage to collect the CacheMiss Ration using facebook, youtube and other real web applications.

I am going to send you the results soon.
Comment 77 Build Bot 2016-06-17 00:26:46 PDT
Comment on attachment 281539 [details]
Patch

Attachment 281539 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/1516671

New failing tests:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
Comment 78 Build Bot 2016-06-17 00:26:51 PDT
Created attachment 281542 [details]
Archive of layout-test-results from ews103 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 79 Build Bot 2016-06-17 00:28:58 PDT
Comment on attachment 281539 [details]
Patch

Attachment 281539 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/1516670

New failing tests:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
Comment 80 Build Bot 2016-06-17 00:29:02 PDT
Created attachment 281543 [details]
Archive of layout-test-results from ews106 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 81 Build Bot 2016-06-17 00:36:10 PDT
Comment on attachment 281539 [details]
Patch

Attachment 281539 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/1516674

New failing tests:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
Comment 82 Build Bot 2016-06-17 00:36:15 PDT
Created attachment 281544 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.4
Comment 83 Build Bot 2016-06-17 00:38:33 PDT
Comment on attachment 281539 [details]
Patch

Attachment 281539 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/1516676

New failing tests:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
Comment 84 Build Bot 2016-06-17 00:38:38 PDT
Created attachment 281545 [details]
Archive of layout-test-results from ews116 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 85 Caio Lima 2016-06-17 01:06:13 PDT
Created attachment 281547 [details]
Patch
Comment 86 WebKit Commit Bot 2016-06-17 01:08:49 PDT
Attachment 281547 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:64:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1666:  Missing space after ,  [whitespace/comma] [3]
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1675:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1687:  Missing space after ,  [whitespace/comma] [3]
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1688:  Missing space after ,  [whitespace/comma] [3]
Total errors found: 5 in 17 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 87 Caio Lima 2016-06-17 01:13:37 PDT
(In reply to comment #85)
> Created attachment 281547 [details]
> Patch

Hey Guys, I am sorry to post this patch, but it is the version that I am using to get Cache Miss Information. The strange behavior here is that this patch is not failing in tests in my local machine and I want to test it here:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html

However, the difference between this and the last version is just the call of Debug Functions... Do you guys can imagine why it is happening? BTW, the test is failing because the line of one CONSOLE MESSAGE is different (It is expected to be 66 and it is being 17).

Sorry for the delay in this patch and for this question, but I am trying to solve this specific bug since yesterday with no success.

Regards,
Caio Lima.
Comment 88 Caio Lima 2016-06-17 01:34:31 PDT
Created attachment 281550 [details]
Patch
Comment 89 Caio Lima 2016-06-18 01:55:30 PDT
Created attachment 281613 [details]
Patch
Comment 90 Caio Lima 2016-06-18 14:00:42 PDT
Hi Guys.

Finally, the patch is now Green! However I am in doubt if it is working for other architectures (Win, GTK, ios), because I saw some comments in the code about call C functions in Windows environment.

Thi is my final version to review =)
Comment 91 Caio Lima 2016-06-22 00:12:30 PDT
New information about Cache Miss Ratio. I traced the execution of my Safari since Sunday afternoon until now and I got the following result:

----------------------------------------------------------------------------
Current CacheMiss Ratio:15510/37691
----------------------------------------------------------------------------

Almost 70% of my usage was in Facebook, Youtube and Google. It is really close to results of speedometer benchmark, what validate the result of the last run.
Comment 92 Caio Lima 2016-06-26 15:53:19 PDT
Ping Review
Comment 93 Keith Miller 2016-06-28 11:44:49 PDT
Comment on attachment 281613 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=281613&action=review

Looks good! I just have a couple of comments/questions.

> Source/JavaScriptCore/ChangeLog:12
> +        offset of a object to call a getter function and get_by_id_proto_custo

typo: git_by_id_proto_custom

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1299
> +   bineq 0, dest, .loadEncodedThisValueDone

Does this get assembled to: 
test: dest, dest 
jz: .loadEncodedThisValueDone
? If not, we should probably try to figure out how to make that happen.
Comment 94 Caio Lima 2016-06-28 23:28:21 PDT
(In reply to comment #93)
> Comment on attachment 281613 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=281613&action=review
> 
> Looks good! I just have a couple of comments/questions.
> 
> > Source/JavaScriptCore/ChangeLog:12
> > +        offset of a object to call a getter function and get_by_id_proto_custo
> 
> typo: git_by_id_proto_custom

Nice Catch =)

> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1299
> > +   bineq 0, dest, .loadEncodedThisValueDone
> 
> Does this get assembled to: 
> test: dest, dest 
> jz: .loadEncodedThisValueDone
> ? If not, we should probably try to figure out how to make that happen.

Yes, Here is the code compiled:

0x100dbaf1f <+13003>: movq   0x30(%r13,%r8,8), %rsi
0x100dbaf24 <+13008>: testl  %esi, %esi
0x100dbaf26 <+13010>: jne    0x100dbaf2f               ; <+13019>
0x100dbaf2c <+13016>: movq   %rcx, %rsi
0x100dbaf2f <+13019>: movq   0x28(%r13,%r8,8), %rax

I am going to rebase the code and send the Patch updated
Comment 95 Caio Lima 2016-06-28 23:59:28 PDT
Created attachment 282325 [details]
Patch
Comment 96 Build Bot 2016-06-29 01:02:38 PDT
Comment on attachment 282325 [details]
Patch

Attachment 282325 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/1592963

New failing tests:
animations/multiple-backgrounds.html
Comment 97 Build Bot 2016-06-29 01:02:43 PDT
Created attachment 282331 [details]
Archive of layout-test-results from ews126 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.4
Comment 98 Keith Miller 2016-06-29 09:31:55 PDT
Comment on attachment 282325 [details]
Patch

r=me. That failing test is just noise.
Comment 99 WebKit Commit Bot 2016-06-29 09:53:48 PDT
Comment on attachment 282325 [details]
Patch

Clearing flags on attachment: 282325

Committed r202627: <http://trac.webkit.org/changeset/202627>
Comment 100 WebKit Commit Bot 2016-06-29 09:53:57 PDT
All reviewed patches have been landed.  Closing bug.
Comment 102 WebKit Commit Bot 2016-06-29 11:23:29 PDT
Re-opened since this is blocked by bug 159266
Comment 103 Caio Lima 2016-06-30 01:15:54 PDT
Created attachment 282418 [details]
Patch
Comment 104 Caio Lima 2016-06-30 01:17:51 PDT
(In reply to comment #103)
> Created attachment 282418 [details]
> Patch

Considering ARM64 and C_LOOP cases.
Comment 105 Caio Lima 2016-07-07 23:52:08 PDT
Ping review. I think now it is ok now, because I am considering the last errors reported to ARM64 and C_LOOP.

The last problems were:

1. ARM64 doesn't support "push <reg>", so it is necessary push two registers at the same time.

2. C_LOOP doesn't support call operations, So I added a slow case to it.
Comment 106 Keith Miller 2016-07-10 12:59:46 PDT
Comment on attachment 282418 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=282418&action=review

> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:686
> +        } else if (UNLIKELY(pc[7].u.operand && (slot.isValue() || slot.isUnset() || ((slot.isAccessor() || slot.isCustom()) && (slot.slotBase() != baseValue))))) {

I think this could just be:

if (UNLIKELY(pc[7].u.operand && (slot.slotBase() != baseValue)))))

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:1476
> +        push t1 # Load arg3 JSObject *
> +        push t2 # Load arg2 PropertyName
> +        move CellTag, t3
> +        if BIG_ENDIAN
> +            push t1 # Load arg1 Payload of EncodedJSValue
> +            push t3 # Load arg1 Tag of EncodedJSValue
> +        else
> +            push t3 # Load arg1 Tag of EncodedJSValue
> +            push t1 # Load arg1 Payload of EncodedJSValue
> +        end
> +        push cfr # Loading exec

Looking at this closer. I don't think this is going to work on ARM. The ARM calling convention passes the first four arguments in r0-r3. Additionally, depending on whether or not the device is using the EABI 64-bit values need to start in r0 or r2.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1333
> +        loadp Callee[cfr], t0
> +        andp MarkedBlockMask, t0, t1
> +        loadp MarkedBlock::m_weakSet + WeakSet::m_vm[t1], t1

It is probably worthwhile to make this a macro in LowLevelInterpreter.asm, loadVM(register) or something.
Comment 107 Caio Lima 2016-07-11 01:31:09 PDT
Created attachment 283306 [details]
Patch
Comment 108 Keith Miller 2016-07-21 11:46:24 PDT
Comment on attachment 283306 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=283306&action=review

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:1478
> +        loadEncodedThisValue(t3, a1)
> +        push t5, cfr
> +        loadp 20[PC], t5
> +        # Inlining the GetValueFunc call
> +        move t2, a3 # Load arg2 PropertyName
> +        if BIG_ENDIAN
> +            move a1, a2 # Load arg1 Payload of EncodedJSValue
> +            move CellTag, a1 # Load arg1 Tag of EncodedJSValue
> +        else
> +            move CellTag, a2 # Load arg1 Tag of EncodedJSValue
> +        end
> +        move cfr, a0 # Loading exec
> +        cCall4(t5)

Looks good. Although I still think you need to handle the EABI. For the EABI you'll need to push the PropertyName onto the stack put the EncodedJSValue into r2-r3. r1 should be empty.

> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1322
> +macro loadEncodedThisValue(baseValue, dest)
> +   loadpFromInstruction(6, dest)
> +   bineq 0, dest, .loadEncodedThisValueDone
> +   move baseValue, dest
> +   .loadEncodedThisValueDone:
> +end

I think this should be bpneq. bineq only compares the low 32 bits. Also, You could move it to LowLevelInterpreter.asm
Comment 109 Caio Lima 2016-07-31 17:34:15 PDT
(In reply to comment #108)
> Comment on attachment 283306 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=283306&action=review
> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:1478
> > +        loadEncodedThisValue(t3, a1)
> > +        push t5, cfr
> > +        loadp 20[PC], t5
> > +        # Inlining the GetValueFunc call
> > +        move t2, a3 # Load arg2 PropertyName
> > +        if BIG_ENDIAN
> > +            move a1, a2 # Load arg1 Payload of EncodedJSValue
> > +            move CellTag, a1 # Load arg1 Tag of EncodedJSValue
> > +        else
> > +            move CellTag, a2 # Load arg1 Tag of EncodedJSValue
> > +        end
> > +        move cfr, a0 # Loading exec
> > +        cCall4(t5)
> 
> Looks good. Although I still think you need to handle the EABI. For the EABI
> you'll need to push the PropertyName onto the stack put the EncodedJSValue
> into r2-r3. r1 should be empty.

Thank you to inform me about it. Which architectures use EABI? I suppose ARM, but I would like to be sure before send a new patch. Also, do you mind point me where you found this information?

> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1322
> > +macro loadEncodedThisValue(baseValue, dest)
> > +   loadpFromInstruction(6, dest)
> > +   bineq 0, dest, .loadEncodedThisValueDone
> > +   move baseValue, dest
> > +   .loadEncodedThisValueDone:
> > +end
> 
> I think this should be bpneq. bineq only compares the low 32 bits. Also, You
> could move it to LowLevelInterpreter.asm

Nice catch, Thanks!
Comment 110 Keith Miller 2016-08-01 11:16:45 PDT
(In reply to comment #109)
> (In reply to comment #108)
> > 
> > Looks good. Although I still think you need to handle the EABI. For the EABI
> > you'll need to push the PropertyName onto the stack put the EncodedJSValue
> > into r2-r3. r1 should be empty.
> 
> Thank you to inform me about it. Which architectures use EABI? I suppose
> ARM, but I would like to be sure before send a new patch. Also, do you mind
> point me where you found this information?

EABI AFAIK is used for both MIPS and ARM. I don't have any documentation for ARM I have some documentation for MIPS, which I will email to you.
Comment 111 Caio Lima 2016-08-02 00:21:26 PDT
Created attachment 285079 [details]
Patch
Comment 112 Caio Lima 2016-08-02 00:34:13 PDT
Created attachment 285082 [details]
Patch
Comment 113 Keith Miller 2016-08-05 10:44:25 PDT
Comment on attachment 285082 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=285082&action=review

> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:679
> +                if (slot.isValue()) {

I don't think you need this check. This block is dominated by a check that the slot is a value.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:1474
> +        if ARM or ARMv7 or ARMv7_TRADITIONAL or MIPS

I think you want. if (COMPILER_SUPPORTS(EABI) and ARM) or MIPS. Not all ARM devices use the EABI.
Comment 114 Caio Lima 2016-08-06 22:27:42 PDT
Created attachment 285522 [details]
Patch
Comment 115 Caio Lima 2016-08-14 16:51:22 PDT
Ping Review
Comment 116 Caio Lima 2016-10-08 22:47:19 PDT
Created attachment 291039 [details]
Patch
Comment 117 Caio Lima 2016-10-08 23:42:55 PDT
(In reply to comment #116)
> Created attachment 291039 [details]
> Patch

Rebased Patch
Comment 118 Geoffrey Garen 2016-10-11 13:53:55 PDT
Comment on attachment 291039 [details]
Patch

I remember Keith did a patch similar to this in the LLInt for prototype values and we didn't see a measurable speedup from it. 

Should we be doing optimizations like this in the LLInt? Is it worth the complexity and maintenance cost?

What's the best benchmark result we have from this? Do we have any results that are not tailored micro benchmarks?
Comment 119 Keith Miller 2016-10-11 13:57:41 PDT
(In reply to comment #118)
> Comment on attachment 291039 [details]
> Patch
> 
> I remember Keith did a patch similar to this in the LLInt for prototype
> values and we didn't see a measurable speedup from it. 
> 
> Should we be doing optimizations like this in the LLInt? Is it worth the
> complexity and maintenance cost?
> 
> What's the best benchmark result we have from this? Do we have any results
> that are not tailored micro benchmarks?

I think I agree with Geoff here. Looking at this patch again, it's difficult to understand exactly what it's doing. If we don't have a measurable speedup from this on a non-microbenchmark, I'm not sure we should take the patch.
Comment 120 Caio Lima 2016-10-11 14:13:11 PDT
(In reply to comment #118)
> Comment on attachment 291039 [details]
> Patch
> 
> I remember Keith did a patch similar to this in the LLInt for prototype
> values and we didn't see a measurable speedup from it. 
> 
> Should we be doing optimizations like this in the LLInt? Is it worth the
> complexity and maintenance 

I think I can't agree more. It is bringing a complex with practically no performance gains. I remember last time I ran Speedometer in x86_64 and I didn't saw gains. The point is that we can get better improvements if JIT=false, but it doesn't represent real world usage.

> What's the best benchmark result we have from this? Do we have any results
> that are not tailored micro benchmarks?

I don't.

What do you think if we just cache getters and forget about customGetter inline? The patch will be much more simpler and we also can take advantage on "super" access in a further case. I can take a look on that and see if it is worth or not.
Comment 121 Geoffrey Garen 2016-10-11 16:03:41 PDT
> What do you think if we just cache getters and forget about customGetter
> inline? The patch will be much more simpler and we also can take advantage
> on "super" access in a further case. I can take a look on that and see if it
> is worth or not.

If we have consensus that the performance motivation is not super strong, then I think we should adopt LLInt optimizations or not based on weighing their complexity. I don't know exactly what an optimization that kept caching but skipped customGetter would look like, but if it were simple enough I think it might be worth it.
Comment 122 Yusuke Suzuki 2017-07-20 19:44:51 PDT
I think non-JIT performance is important.

We already retired some old JIT arches like SH4.
And we do not actively maintain the existing but non-major JIT arch like ARM6.
If we would like to freely focus on existing high-performance JIT (lovely x64, ARM64!), I think we should keep LLInt fast for non-JIT.
In addition to slightly improve the performance of LLInt in JIT supported environments, it helps us to easily say that "OK, you do not need to worry about your non-JIT supported arch performance. LLInt is already super fast!".
Comment 123 Brady Eidson 2018-02-14 10:34:27 PST
Comment on attachment 291039 [details]
Patch

Patches that have been up for review since 2016 are almost certainly too stale to be relevant to trunk in their current form.

If this patch is still important please rebase it and post it for review again.