Hey Keith, I am interested in work in this bug if it is possible
Cold you clarify some points to me?
1. What are prototype misses?
2. When you think in cache accessors, you mean getters and setters?
3. Also, for customs
It could be a big help if you post some JavaScript code as example.
I am trying to debug the case for accessors and I noticed that the getter is emitted as get_by_id op and I suspect it is already being cached (I didn't check it).
(In reply to comment #1)
> Hey Keith, I am interested in work in this bug if it is possible
>
> Cold you clarify some points to me?
>
> 1. What are prototype misses?
>
> 2. When you think in cache accessors, you mean getters and setters?
>
> 3. Also, for customs
>
> It could be a big help if you post some JavaScript code as example.
>
> I am trying to debug the case for accessors and I noticed that the getter is
> emitted as get_by_id op and I suspect it is already being cached (I didn't
> check it).
Hi Caio,
It would be great if you fixed this feature and I would be happy to help. Feel free to message me on irc, my nick is keith_miller, if you have any other questions.
1) prototype miss was the wrong terminology. I should have said that we should cache unset properties. Meaning, `let x = foo.bar` where `"bar" in foo === false`. I actually already handled this case in http://trac.webkit.org/changeset/201456.
2) Yep, that's right. In JavaScriptCore we use the word Accessor to mean a property with a getter and/or a setter.
For example, if you did the following:
foo = {};
Object.defineOwnProperty(Object.prototype, "bar", { get: () => { return 1; } });
print(foo.bar);
The bytecodes will look something like:
[ 9] resolve_scope loc10, loc3, print(@id0), <GlobalProperty>, 1, 0x113de7900
[ 16] get_from_scope loc6, loc10, print(@id0), 2048<ThrowIfNotFound|GlobalProperty|NotInitialization>, 122 predicting None
[ 24] resolve_scope loc11, loc3, foo(@id1), <GlobalProperty>, 1, 0x113de7900
[ 31] get_from_scope loc12, loc11, foo(@id1), 2048<ThrowIfNotFound|GlobalProperty|NotInitialization>, 191 predicting None
[ 39] get_by_id loc9, loc12, bar(@id2) predicting None
[ 48] call loc5, loc6, 2, 16 status(Could Take Slow Path) Original; predicting None
That get_by_id will attempt to lookup the property on foo by calling foo's getOwnPropertySlot function, which will set the passed PropertySlot& to Object.prototype's "bar" GetterSetter object. In this case, the PropertySlot will say that the value is an GetterSetter and we won't cache it.
3) Customs are mostly used as a performance optimization for DOM Accessors. Basically, instead of allocating a JSFunction object for all accessor properties in the DOM we use a function pointer instead. You can see the typedef, GetValueFunc, in PropertySlot.h. Customs are also used in a small number of other places for special properties that are values but change might change, although that is quite rare.
I would recommend for a first pass that you implement a new opcode, I guess called op_get_by_id_proto_accessor, that just calls to a slow path. In the slow path it can load the GetterSetter object and call the getter returning the result. If you are feeling very ambitious, you could inline the code for the call in the interpreter assembly. Let me know if you plan on trying to inline the call since you will need to know the JavaScriptCore calling convention to implement it.
(In reply to comment #3)
> (In reply to comment #1)
> > Hey Keith, I am interested in work in this bug if it is possible
> >
> > Cold you clarify some points to me?
> >
> > 1. What are prototype misses?
> >
> > 2. When you think in cache accessors, you mean getters and setters?
> >
> > 3. Also, for customs
> >
> > It could be a big help if you post some JavaScript code as example.
> >
> > I am trying to debug the case for accessors and I noticed that the getter is
> > emitted as get_by_id op and I suspect it is already being cached (I didn't
> > check it).
>
> Hi Caio,
>
> It would be great if you fixed this feature and I would be happy to help.
> Feel free to message me on irc, my nick is keith_miller, if you have any
> other questions.
>
> 1) prototype miss was the wrong terminology. I should have said that we
> should cache unset properties. Meaning, `let x = foo.bar` where `"bar" in
> foo === false`. I actually already handled this case in
> http://trac.webkit.org/changeset/201456.
>
> 2) Yep, that's right. In JavaScriptCore we use the word Accessor to mean a
> property with a getter and/or a setter.
>
> For example, if you did the following:
>
> foo = {};
> Object.defineOwnProperty(Object.prototype, "bar", { get: () => { return 1; }
> });
> print(foo.bar);
>
> The bytecodes will look something like:
>
> [ 9] resolve_scope loc10, loc3, print(@id0), <GlobalProperty>, 1,
> 0x113de7900
> [ 16] get_from_scope loc6, loc10, print(@id0),
> 2048<ThrowIfNotFound|GlobalProperty|NotInitialization>, 122 predicting
> None
> [ 24] resolve_scope loc11, loc3, foo(@id1), <GlobalProperty>, 1,
> 0x113de7900
> [ 31] get_from_scope loc12, loc11, foo(@id1),
> 2048<ThrowIfNotFound|GlobalProperty|NotInitialization>, 191 predicting
> None
> [ 39] get_by_id loc9, loc12, bar(@id2) predicting None
> [ 48] call loc5, loc6, 2, 16 status(Could Take Slow Path)
> Original; predicting None
>
> That get_by_id will attempt to lookup the property on foo by calling foo's
> getOwnPropertySlot function, which will set the passed PropertySlot& to
> Object.prototype's "bar" GetterSetter object. In this case, the PropertySlot
> will say that the value is an GetterSetter and we won't cache it.
>
> 3) Customs are mostly used as a performance optimization for DOM Accessors.
> Basically, instead of allocating a JSFunction object for all accessor
> properties in the DOM we use a function pointer instead. You can see the
> typedef, GetValueFunc, in PropertySlot.h. Customs are also used in a small
> number of other places for special properties that are values but change
> might change, although that is quite rare.
>
> I would recommend for a first pass that you implement a new opcode, I guess
> called op_get_by_id_proto_accessor, that just calls to a slow path. In the
> slow path it can load the GetterSetter object and call the getter returning
> the result. If you are feeling very ambitious, you could inline the code for
> the call in the interpreter assembly. Let me know if you plan on trying to
> inline the call since you will need to know the JavaScriptCore calling
> convention to implement it.
Thanks for the clarification Keith.
I am sending a prototype of what you described, but I definitely want to learn how can I inline a function call =).
Created attachment 280435[details]
Archive of layout-test-results from ews103 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 280436[details]
Archive of layout-test-results from ews106 for mac-yosemite-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Created attachment 280437[details]
Archive of layout-test-results from ews114 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 280438[details]
Archive of layout-test-results from ews122 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews122 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.4
>
> You can run these tests locally using run-webkit-tests --debug
> --dump-render-tree.
I would recommend running "run-javascriptcore-tests --debug [--no-build if you already have a debug build of jsc]" first since that doesn't require building all of WebCore and runs most of these tests.
> Actually, I updated this patch today morning to avoid the crashes, because
> my implementation was incorrect. These tests are from my first patch, I am
> going try to run them locally tonight, but is there one way to retrigger
> them here?
>
> BTW, this patch is RFC, not the final Version, because I want to inline the
> call instead of using the slowPath.
It looks like those crashes were happening on the second patch. You can see by clicking on the mac-debug bubble and looking at the results log. Unfortunately, the only way I know to rerun the tests on the bots is to upload the patch again :/.
(In reply to comment #19)
> > Actually, I updated this patch today morning to avoid the crashes, because
> > my implementation was incorrect. These tests are from my first patch, I am
> > going try to run them locally tonight, but is there one way to retrigger
> > them here?
> >
> > BTW, this patch is RFC, not the final Version, because I want to inline the
> > call instead of using the slowPath.
>
>
> It looks like those crashes were happening on the second patch. You can see
> by clicking on the mac-debug bubble and looking at the results log.
> Unfortunately, the only way I know to rerun the tests on the bots is to
> upload the patch again :/.
No problem. I am going to fix them tonight, because I have no access to my computer right now.
Per your request before on inlining the getter call, I think the simplest way to implement it is to follow how we do it in PolymorphicAccess/operationGetByIdOptimize. PolymorphicAccess has the inline cache repatching code used in the JITs. The assembler in PolymorphicAccess more or less models X86 so it shouldn't be too hard to follow and translate to LLInt assembly. I'll also see if I can find some better information about the calling convention too. Let me know if you have any other questions.
(In reply to comment #21)
> Per your request before on inlining the getter call, I think the simplest
> way to implement it is to follow how we do it in
> PolymorphicAccess/operationGetByIdOptimize. PolymorphicAccess has the inline
> cache repatching code used in the JITs. The assembler in PolymorphicAccess
> more or less models X86 so it shouldn't be too hard to follow and translate
> to LLInt assembly. I'll also see if I can find some better information about
> the calling convention too. Let me know if you have any other questions.
Also, it might be helpful to look at JSStack.h, in particular, the CallFrameHeaderEntry enum since that represents the expected layout of a JS frame at the end of the function prologue.
(In reply to comment #22)
> (In reply to comment #21)
> > Per your request before on inlining the getter call, I think the simplest
> > way to implement it is to follow how we do it in
> > PolymorphicAccess/operationGetByIdOptimize. PolymorphicAccess has the inline
> > cache repatching code used in the JITs. The assembler in PolymorphicAccess
> > more or less models X86 so it shouldn't be too hard to follow and translate
> > to LLInt assembly. I'll also see if I can find some better information about
> > the calling convention too. Let me know if you have any other questions.
>
> Also, it might be helpful to look at JSStack.h, in particular, the
> CallFrameHeaderEntry enum since that represents the expected layout of a JS
> frame at the end of the function prologue.
Thank you for pointing where I can look. I think I am almost done because I understand and know hot to construct the JS Frame convention for 64bits architecture, however I am missing some very important information.
1) how to perform the call to the Getter function? I searched and found a makeJavascriptCall macro, however I am not sure if it is necessary use it.
Also I would like to understand the offlineassembly code "call".
2) Another point is, How can I get the getter JSFunction, since the offset cached point to a "GetterSetter"? Is it possible use "loadp GetterSetter::m_getter[t3], t4", for example? I noticed that the code in PolymorphicAccess uses GetterSetter::offsetGetter(). I suppose that this value should be placed in calle member of JS Frame Entry.
I didn't find information about the OfflineAssembly, could you point me some?
In this patch I fixed the regressions and tried to inline the op_code. I am sending my experimental code commented. However, I am facing problems to get the Getter Function and store it in the Calle member of the JS Stack entry. Could you help me on this?
(In reply to comment #25)
> In this patch I fixed the regressions and tried to inline the op_code. I am
> sending my experimental code commented. However, I am facing problems to get
> the Getter Function and store it in the Calle member of the JS Stack entry.
> Could you help me on this?
Thinking about this further I realized that in order to inline the call we would need to have CallLinkInfo for the getter Callee. That would tell us what address to call when the function we are calling gets tiered up. Unfortunately, as far as I know, we don't have any kind of CallLinkInfo for the LLInt. Without, CallLinkInfo I think you would need to call into C++ to find the call location anyway. Let me look into this further but I think it might not be worthwhile to inline normal getters. We should still be able to inline custom getters however.
Hi Guys.
I cleaned the code and it is my Patch Version for caching the prototype accessor. I would like to know if It is a good idea send the implementation cache of prototype custom functions in this bug.
Also, could you point me to a code that uses prototype customs?
Comment on attachment 280560[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=280560&action=review
Have you run any benchmarks?
> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:713
> + LLINT_OP(1) = result;
> +
> + pc[OPCODE_LENGTH(op_get_by_id) - 1].u.profile->m_buckets[0] = JSValue::encode(result);
> + LLINT_END();
I think LLINT_RETURN_PROFILED does this for you
View in context: https://bugs.webkit.org/attachment.cgi?id=280560&action=review
Have you run any benchmarks?
> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:713
> + LLINT_OP(1) = result;
> +
> + pc[OPCODE_LENGTH(op_get_by_id) - 1].u.profile->m_buckets[0] = JSValue::encode(result);
> + LLINT_END();
I think LLINT_RETURN_PROFILED does this for you
Checking the benchmark I noticed that the patch does not improve the results. The cases that I paid attention were:
#JSC with the patch
get-by-id-proto-or-self 11.6561+-2.5266
#JSC without the patch
get-by-id-proto-or-self 13.2753+-2.3457
--------------------------------------------------------------
#JSC with the patch
get-by-id-self-or-proto 13.0930+-1.7589
#JSC without the patch
get-by-id-self-or-proto 12.1011+-0.7154
--------------------------------------------------------------
#JSC with the patch
getter-prototype 5.8231+-0.6807
#JSC without the patch
getter-prototype 5.5745+-0.2236
--------------------------------------------------------------
#JSC with the patch
prototype-access-with-mutating-prototype 6.2859+-2.0017
#JSC without the patch
prototype-access-with-mutating-prototype 5.3830+-0.4957
--------------------------------------------------------------
It is important notice the errors presented by the JSC run with the patch are always higher than the run without patch. IMHO it is because of the prototype cache maintenance and setup overhead. Also, It is important consider that this is implementing just the slow_path to access the prototype accessor functions.
(In reply to comment #32)
> Created attachment 280573[details]
> Cached Accessor's benchmark output
You can compare two VMs directly using the `run-jsc-benchmarks` script. This will give you nice output showing the differences so you don't have to manually compare the results.
Like so:
`run-jsc-benchmarks baseline:<pathToBaslineBuild>/jsc changes:<pathToChanges>/jsc`
You can run individual benchmarks like (this runs only Kraken):
`run-jsc-benchmarks --kraken baseline:<pathToBaslineBuild>/jsc changes:<pathToChanges>/jsc`
You can make run-jsc-benchmarks run more iterations which is helpful for noisy benchmarks. The default is 4 runs. This will make kraken run for 10 runs:
`run-jsc-benchmarks --kraken --outer 10 baseline:<pathToBaslineBuild>/jsc changes:<pathToChanges>/jsc`
You can also provide a regex to specify sub tests inside a benchmark suite. This will run just ai-astar and audio-fft from kraken.
`run-jsc-benchmarks --kraken --benchmarks "ai-astar|audio-fft" baseline:<pathToBaslineBuild>/jsc changes:<pathToChanges>/jsc`
Also, you should get yourself an Octane and Kraken. They can be found on github I think. We don't bundle those benchmarks inside WebKit itself, but run-jsc-benchmarks
will look for the following the following JSON file:
`~/.run-jsc-benchmarks`
Which specifies paths to where Octane/Kraken can be found.
My `~/.run-jsc-benchmarks` looks like:
```
{
"OctanePath": "/Volumes/Data/jsc-benchmarks/octane",
"KrakenPath": "/Volumes/Data/jsc-benchmarks/kraken/tests/kraken-1.1"
}
```
(In reply to comment #34)
> (In reply to comment #32)
> > Created attachment 280573[details]
> > Cached Accessor's benchmark output
>
> You can compare two VMs directly using the `run-jsc-benchmarks` script. This
> will give you nice output showing the differences so you don't have to
> manually compare the results.
> Like so:
> `run-jsc-benchmarks baseline:<pathToBaslineBuild>/jsc
> changes:<pathToChanges>/jsc`
>
> You can run individual benchmarks like (this runs only Kraken):
> `run-jsc-benchmarks --kraken baseline:<pathToBaslineBuild>/jsc
> changes:<pathToChanges>/jsc`
>
> You can make run-jsc-benchmarks run more iterations which is helpful for
> noisy benchmarks. The default is 4 runs. This will make kraken run for 10
> runs:
> `run-jsc-benchmarks --kraken --outer 10 baseline:<pathToBaslineBuild>/jsc
> changes:<pathToChanges>/jsc`
>
> You can also provide a regex to specify sub tests inside a benchmark suite.
> This will run just ai-astar and audio-fft from kraken.
> `run-jsc-benchmarks --kraken --benchmarks "ai-astar|audio-fft"
> baseline:<pathToBaslineBuild>/jsc changes:<pathToChanges>/jsc`
>
>
> Also, you should get yourself an Octane and Kraken. They can be found on
> github I think. We don't bundle those benchmarks inside WebKit itself, but
> run-jsc-benchmarks
> will look for the following the following JSON file:
> `~/.run-jsc-benchmarks`
> Which specifies paths to where Octane/Kraken can be found.
>
> My `~/.run-jsc-benchmarks` looks like:
> ```
> {
> "OctanePath": "/Volumes/Data/jsc-benchmarks/octane",
> "KrakenPath": "/Volumes/Data/jsc-benchmarks/kraken/tests/kraken-1.1"
> }
> ```
It is a nice information. BTW, How can I manage the DYLD_FRAMEWORK_PATH between 2 different builds?
(In reply to comment #35)
> (In reply to comment #34)
> > (In reply to comment #32)
> > > Created attachment 280573[details]
> > > Cached Accessor's benchmark output
> >
> > You can compare two VMs directly using the `run-jsc-benchmarks` script. This
> > will give you nice output showing the differences so you don't have to
> > manually compare the results.
> > Like so:
> > `run-jsc-benchmarks baseline:<pathToBaslineBuild>/jsc
> > changes:<pathToChanges>/jsc`
> >
> > You can run individual benchmarks like (this runs only Kraken):
> > `run-jsc-benchmarks --kraken baseline:<pathToBaslineBuild>/jsc
> > changes:<pathToChanges>/jsc`
> >
> > You can make run-jsc-benchmarks run more iterations which is helpful for
> > noisy benchmarks. The default is 4 runs. This will make kraken run for 10
> > runs:
> > `run-jsc-benchmarks --kraken --outer 10 baseline:<pathToBaslineBuild>/jsc
> > changes:<pathToChanges>/jsc`
> >
> > You can also provide a regex to specify sub tests inside a benchmark suite.
> > This will run just ai-astar and audio-fft from kraken.
> > `run-jsc-benchmarks --kraken --benchmarks "ai-astar|audio-fft"
> > baseline:<pathToBaslineBuild>/jsc changes:<pathToChanges>/jsc`
> >
> >
> > Also, you should get yourself an Octane and Kraken. They can be found on
> > github I think. We don't bundle those benchmarks inside WebKit itself, but
> > run-jsc-benchmarks
> > will look for the following the following JSON file:
> > `~/.run-jsc-benchmarks`
> > Which specifies paths to where Octane/Kraken can be found.
> >
> > My `~/.run-jsc-benchmarks` looks like:
> > ```
> > {
> > "OctanePath": "/Volumes/Data/jsc-benchmarks/octane",
> > "KrakenPath": "/Volumes/Data/jsc-benchmarks/kraken/tests/kraken-1.1"
> > }
> > ```
>
> It is a nice information. BTW, How can I manage the DYLD_FRAMEWORK_PATH
> between 2 different builds?
run-jsc-benchmarks should do that for you
Comment on attachment 280560[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=280560&action=review
Looks good. I think it's almost ready to go. Saam and I have a few comments before the patch is ready to land however.
> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:659
> + if ((slot.isValue() || slot.isAccessor()) && slot.slotBase() == baseValue) {
Why clear the cache if we have a self accessor? There is no cost to leaving the old value cache in place.
(In reply to comment #38)
> Comment on attachment 280560[details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=280560&action=review
>
> Looks good. I think it's almost ready to go. Saam and I have a few comments
> before the patch is ready to land however.
>
> > Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:659
> > + if ((slot.isValue() || slot.isAccessor()) && slot.slotBase() == baseValue) {
>
> Why clear the cache if we have a self accessor? There is no cost to leaving
> the old value cache in place.
I see your point. Actually, the main reason I implemented it was because I though it was the case where handling a cache miss and clearing the old cache. However, if I am not wrong, It is the case just to self accessors, right?
In our implementation, we just cache the prototype once (when pc[7].u.operand reaches 0) and after that, all cache misses are handled by the _llint_get_by_id_proto_acessor opcode slowPath case. The advantage to clear the cache is to avoid all these misses, however, the original _llint_get_by_id_proto_acessor also check for misses and there is no real improvement clearing the cache, right?
I am going to change it.
BTW, I have already an first implementation of CustomAcessors. What I need to finish is inline the CustomAcessor call to Assembly code. I think it is a good idea send them in the same patch, since the files touched are the same. As you pointed me instructions to try inline the Getter, I already have an idea what I need to do to complete this implementation.
(In reply to comment #39)
> (In reply to comment #38)
> > Comment on attachment 280560[details]
> > Patch
> >
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=280560&action=review
> >
> > Looks good. I think it's almost ready to go. Saam and I have a few comments
> > before the patch is ready to land however.
> >
> > > Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:659
> > > + if ((slot.isValue() || slot.isAccessor()) && slot.slotBase() == baseValue) {
> >
> > Why clear the cache if we have a self accessor? There is no cost to leaving
> > the old value cache in place.
>
> I see your point. Actually, the main reason I implemented it was because I
> though it was the case where handling a cache miss and clearing the old
> cache. However, if I am not wrong, It is the case just to self accessors,
> right?
>
> In our implementation, we just cache the prototype once (when
> pc[7].u.operand reaches 0) and after that, all cache misses are handled by
> the _llint_get_by_id_proto_acessor opcode slowPath case. The advantage to
> clear the cache is to avoid all these misses, however, the original
> _llint_get_by_id_proto_acessor also check for misses and there is no real
> improvement clearing the cache, right?
>
That's right. Unless we have something else to put in the cache or the cache becomes invalid there is no benefit to clearing it.
> I am going to change it.
>
> BTW, I have already an first implementation of CustomAcessors. What I need
> to finish is inline the CustomAcessor call to Assembly code. I think it is a
> good idea send them in the same patch, since the files touched are the same.
> As you pointed me instructions to try inline the Getter, I already have an
> idea what I need to do to complete this implementation.
Cool, sounds like a plan. Let me know if you have any other questions.
Hi Guys,
I am trying to inline the customs getter function call but I am facing a weird error.
When I set the cache Up, I put into pc[5] the pointer of the Custom Function "GetValueFunc".
When I try to load it into a register in LowLevelInterpreter64.asm, its content is not the same. I am doing the following operations:
loadisFromInstruction(5, t1)
PrepareForCCall
//load the arguments
cCall4(t0)
//retrun retrieving and dispatch
I debugged the value present into t1 (in my architecture is the rsi) and it is not corresponding to the pointer setted in setup. To test if call was working, I manually set rsi with the value of the function address and the function was called, however some arguments were incorrect the pointers.
I tried load the function pointer using:
loadpFromInstruction(5, t1)
And had no success too.
Is anything I am doing wrong to get the pc[] elements as pointer?
(In reply to comment #41)
> Hi Guys,
>
> I am trying to inline the customs getter function call but I am facing a
> weird error.
>
> When I set the cache Up, I put into pc[5] the pointer of the Custom Function
> "GetValueFunc".
>
> When I try to load it into a register in LowLevelInterpreter64.asm, its
> content is not the same. I am doing the following operations:
>
> loadisFromInstruction(5, t1)
> PrepareForCCall
> //load the arguments
> cCall4(t0)
> //retrun retrieving and dispatch
>
> I debugged the value present into t1 (in my architecture is the rsi) and it
> is not corresponding to the pointer setted in setup. To test if call was
> working, I manually set rsi with the value of the function address and the
> function was called, however some arguments were incorrect the pointers.
>
> I tried load the function pointer using:
> loadpFromInstruction(5, t1)
>
> And had no success too.
>
> Is anything I am doing wrong to get the pc[] elements as pointer?
Can you post the patch that you think should work?
What architecture are you on?
(In reply to comment #42)
> (In reply to comment #41)
> > Hi Guys,
> >
> > I am trying to inline the customs getter function call but I am facing a
> > weird error.
> >
> > When I set the cache Up, I put into pc[5] the pointer of the Custom Function
> > "GetValueFunc".
> >
> > When I try to load it into a register in LowLevelInterpreter64.asm, its
> > content is not the same. I am doing the following operations:
> >
> > loadisFromInstruction(5, t1)
> > PrepareForCCall
> > //load the arguments
> > cCall4(t0)
> > //retrun retrieving and dispatch
> >
> > I debugged the value present into t1 (in my architecture is the rsi) and it
> > is not corresponding to the pointer setted in setup. To test if call was
> > working, I manually set rsi with the value of the function address and the
> > function was called, however some arguments were incorrect the pointers.
> >
> > I tried load the function pointer using:
> > loadpFromInstruction(5, t1)
> >
> > And had no success too.
> >
> > Is anything I am doing wrong to get the pc[] elements as pointer?
> Can you post the patch that you think should work?
> What architecture are you on?
Hi Saam. I just discovered that some pseudo-register are mapped to the same physical register (e.g t1 = a0 in my architecture). I fixed the register usage and it is working now in my architecture.
Now the problem I am facing is that the "typedef EncodedJSValue (*GetValueFunc)(ExecState*, EncodedJSValue thisValue, PropertyName, JSObject* slotBase)" requires a PropertyName and I don't know how can I pass it in the inlined assembly.
My current assembly is this one:
_llint_op_get_by_id_proto_custom:
traceExecution()
loadisFromInstruction(2, t0)
loadConstantOrVariableCell(t0, t3, .opGetByIdProtoCustomSlow)
loadi JSCell::m_structureID[t3], t1
loadisFromInstruction(4, t2)
bineq t2, t1, .opGetByIdProtoCustomSlow
loadpFromInstruction(6, a1)
loadpFromInstruction(5, t0)
push t2
push PC
push PB
prepareStateForCCall()
move cfr, a0
move 0, a2 # I am testing with CustomGetter function and this parameter is ignored
move a1, a3
cCall4(t0)
restoreStateAfterCCall()
pop PB
pop PC
pop t2
storeq r0, [cfr, t2, 8]
valueProfile(r0, 8, t1)
dispatch(9)
.opGetByIdProtoCustomSlow:
callSlowPath(_llint_slow_path_get_by_id)
dispatch(9)
A PropertyName is identical to a UniquedStringImpl* so you can just pass that.
A PropertyName is a struct with a single UniquedStringImpl* field.
I believe get_by_id has an index in it's instruction corresponding to its UniquedStringImpl*. You may need to write some code that loads this off
the CodeBlock based on some index. I'm not sure if we have code like that already
in the LLInt but it should be easy to write.
You should also make sure you don't have any other unintended register
aliasing for all platforms that the LLInt supports.
(In reply to comment #44)
> A PropertyName is identical to a UniquedStringImpl* so you can just pass
> that.
> A PropertyName is a struct with a single UniquedStringImpl* field.
>
> I believe get_by_id has an index in it's instruction corresponding to its
> UniquedStringImpl*. You may need to write some code that loads this off
> the CodeBlock based on some index. I'm not sure if we have code like that
> already
> in the LLInt but it should be easy to write.
Actually, I don't think so. Checking the CodeBlock code I found the method used to get the identifier based in the identifier index.
const Identifier& CodeBlock::identifier(int index) const
{
size_t unlinkedIdentifiers = m_unlinkedCode->numberOfIdentifiers();
if (static_cast<unsigned>(index) < unlinkedIdentifiers)
return m_unlinkedCode->identifier(index);
}
The m_unlikedCode stores the identifiers into a Vector<Identifier> m_identifiers. The problem is to revive the identifier contained in the Vector buffer. My current implementation is like this
macro loadIdentifier(index, dest)
loadp CodeBlock[cfr], t2
loadp CodeBlock::m_unlinkedCode[t2], t1 #t1 is pointing to UnlinkedCodeBlock
addp UnlinkedCodeBlock::m_identifiers, t1 #Offset to point to m_identifiers
loadis index, t2
mulp sizeof Identifier, t2
addp t2, t1 # Offset to point to m_identifiers[index]
addp Identifier::m_string, t1 # m_identifiers[index].m_string
loadp t1, dest
end
Any tip?
(In reply to comment #45)
> The m_unlikedCode stores the identifiers into a Vector<Identifier>
> m_identifiers. The problem is to revive the identifier contained in the
> Vector buffer. My current implementation is like this
>
> macro loadIdentifier(index, dest)
> loadp CodeBlock[cfr], t2
> loadp CodeBlock::m_unlinkedCode[t2], t1 #t1 is pointing to
> UnlinkedCodeBlock
> addp UnlinkedCodeBlock::m_identifiers, t1 #Offset to point to
> m_identifiers
> loadis index, t2
> mulp sizeof Identifier, t2
> addp t2, t1 # Offset to point to m_identifiers[index]
> addp Identifier::m_string, t1 # m_identifiers[index].m_string
> loadp t1, dest
> end
>
> Any tip?
Actually, this code working now:
macro loadIdentifier(index, dest)
loadp CodeBlock[cfr], t2
loadp CodeBlock::m_unlinkedCode[t2], t1
loadp UnlinkedCodeBlock::m_identifiers[t1], t2
move t2, t1
loadis index, t2
mulp sizeof Identifier, t2
addp t2, t1
loadp Identifier::m_string[t1], dest
end
I would like to know if there is a way to test the changes in all architectures.
(In reply to comment #45)
> (In reply to comment #44)
> > A PropertyName is identical to a UniquedStringImpl* so you can just pass
> > that.
> > A PropertyName is a struct with a single UniquedStringImpl* field.
> >
> > I believe get_by_id has an index in it's instruction corresponding to its
> > UniquedStringImpl*. You may need to write some code that loads this off
> > the CodeBlock based on some index. I'm not sure if we have code like that
> > already
> > in the LLInt but it should be easy to write.
>
> Actually, I don't think so. Checking the CodeBlock code I found the method
> used to get the identifier based in the identifier index.
>
> const Identifier& CodeBlock::identifier(int index) const
> {
> size_t unlinkedIdentifiers = m_unlinkedCode->numberOfIdentifiers();
> if (static_cast<unsigned>(index) < unlinkedIdentifiers)
> return m_unlinkedCode->identifier(index);
>
> }
>
> The m_unlikedCode stores the identifiers into a Vector<Identifier>
> m_identifiers. The problem is to revive the identifier contained in the
> Vector buffer. My current implementation is like this
>
> macro loadIdentifier(index, dest)
> loadp CodeBlock[cfr], t2
> loadp CodeBlock::m_unlinkedCode[t2], t1 #t1 is pointing to
> UnlinkedCodeBlock
> addp UnlinkedCodeBlock::m_identifiers, t1 #Offset to point to
> m_identifiers
> loadis index, t2
> mulp sizeof Identifier, t2
> addp t2, t1 # Offset to point to m_identifiers[index]
> addp Identifier::m_string, t1 # m_identifiers[index].m_string
> loadp t1, dest
> end
>
> Any tip?
This is how we solve such problems in the LLInt and JITs in general. We often
end up inlining particular functions. For the JIT, this is usually easier because it
can compile constants more easily. So, your proposed solution would definitely work,
but it's obviously not ideal.
But I should have suggested something simpler yesterday.
When you convert your get_by_id into your specialized version, you
can stash the UniquedStringImpl* into the instruction stream itself if
you have an unused instruction offset. Do you have such a free location?
(In reply to comment #46)
> (In reply to comment #45)
> > The m_unlikedCode stores the identifiers into a Vector<Identifier>
> > m_identifiers. The problem is to revive the identifier contained in the
> > Vector buffer. My current implementation is like this
> >
> > macro loadIdentifier(index, dest)
> > loadp CodeBlock[cfr], t2
> > loadp CodeBlock::m_unlinkedCode[t2], t1 #t1 is pointing to
> > UnlinkedCodeBlock
> > addp UnlinkedCodeBlock::m_identifiers, t1 #Offset to point to
> > m_identifiers
> > loadis index, t2
> > mulp sizeof Identifier, t2
> > addp t2, t1 # Offset to point to m_identifiers[index]
> > addp Identifier::m_string, t1 # m_identifiers[index].m_string
> > loadp t1, dest
> > end
> >
> > Any tip?
>
> Actually, this code working now:
>
> macro loadIdentifier(index, dest)
> loadp CodeBlock[cfr], t2
> loadp CodeBlock::m_unlinkedCode[t2], t1
> loadp UnlinkedCodeBlock::m_identifiers[t1], t2
> move t2, t1
> loadis index, t2
> mulp sizeof Identifier, t2
> addp t2, t1
> loadp Identifier::m_string[t1], dest
> end
>
> I would like to know if there is a way to test the changes in all
> architectures.
Which architecture are you running locally?
(In reply to comment #47)
> But I should have suggested something simpler yesterday.
> When you convert your get_by_id into your specialized version, you
> can stash the UniquedStringImpl* into the instruction stream itself if
> you have an unused instruction offset. Do you have such a free location?
This is a good idea. We have an invariant that changing a Accessor on an object requires transitioning the structure of that object. This means that you can use, for example, pc[4] to hold the function pointer and pc[5] to point to the UniquedStringImpl* of the property.
(In reply to comment #46)
> I would like to know if there is a way to test the changes in all
> architectures.
It can be pretty tricky. I would recommend having the bot do it since, IIRC, we test all the architectures we care about.
(In reply to comment #49)
> (In reply to comment #47)
> > But I should have suggested something simpler yesterday.
> > When you convert your get_by_id into your specialized version, you
> > can stash the UniquedStringImpl* into the instruction stream itself if
> > you have an unused instruction offset. Do you have such a free location?
>
> This is a good idea. We have an invariant that changing a Accessor on an
> object requires transitioning the structure of that object. This means that
> you can use, for example, pc[4] to hold the function pointer and pc[5] to
> point to the UniquedStringImpl* of the property.
Actually, my first thinking was in that way. However, pc[4] is used to check if the inline is still valid.
_llint_op_get_by_id_proto_custom:
traceExecution()
loadisFromInstruction(2, t0)
loadConstantOrVariableCell(t0, t3, .opGetByIdProtoCustomSlow)
loadi JSCell::m_structureID[t3], t1
loadisFromInstruction(4, t2)
bineq t2, t1, .opGetByIdProtoCustomSlow
This way, I think it is not possible use pc[4] for that. Could you confirm it? If it is possible, I think it is the best way to implement it.
Another strategy i had in mind was create a C struct to store customGetter pointer and UniquedStringImpl and "pc[5].u.cstruct" point to it. However, it is not efficient, since it is going to double the size of pc[].u. A way to improve it should store a pointer to this struct, however It is not possible to track this pointer and free its memory.
Other solution I was thinking was create an property into slot.slotBase() to store the identifier, however it is going to create a waste of space to other objects.
So, the most viable solution I thought was the last one I commented above.
> (In reply to comment #46)
> > I would like to know if there is a way to test the changes in all
> > architectures.
>
> It can be pretty tricky. I would recommend having the bot do it since, IIRC,
> we test all the architectures we care about.
No problem. I asked because you could have a way to do it without submitting a patch. BTW my current architecture is x86_64 running Mac OS X.
(In reply to comment #49)
> (In reply to comment #47)
> > But I should have suggested something simpler yesterday.
> > When you convert your get_by_id into your specialized version, you
> > can stash the UniquedStringImpl* into the instruction stream itself if
> > you have an unused instruction offset. Do you have such a free location?
>
> This is a good idea. We have an invariant that changing a Accessor on an
> object requires transitioning the structure of that object. This means that
> you can use, for example, pc[4] to hold the function pointer and pc[5] to
> point to the UniquedStringImpl* of the property.
>
> (In reply to comment #46)
> > I would like to know if there is a way to test the changes in all
> > architectures.
>
> It can be pretty tricky. I would recommend having the bot do it since, IIRC,
> we test all the architectures we care about.
Yeah it is indeed tricky. We do have some ARM EWS bots so if you upload relevant
tests in a LayoutTest, EWS will run it for you. Unfortunately, we don't have
run-javascriptcore-tests on EWS, but we have internal bots that will alert us of
failure at Apple.
You won't be able to test ARM on your machine, but you can also
test x86 32-bit by building JavaScript core like:
Tools/Scripts/build-jsc --32-bit --release (or --debug)
This will give you some more coverage also because x86-32 doesn't have argument registers.
Hi Guys.
It is not the final patch (It is not ready yet and it is breaking).
I need some help from you. I am running my implementation of CustomAccessor prototype inline, however I think I am missing some important information.
I created a slowPath version to make debug easier and evaluate the performance gain without inlining it to assembly.
The problem that I am facing is because some customGetters are failing because the slot.slotBase() is not the correct parameter to them. In PropertySlot.cpp I found this code:
JSValue PropertySlot::customGetter(ExecState* exec, PropertyName propertyName) const
{
// FIXME: Remove this differences in custom values and custom accessors.
// https://bugs.webkit.org/show_bug.cgi?id=158014
JSValue thisValue = m_attributes & CustomAccessor ? m_thisValue : JSValue(slotBase());
return JSValue::decode(m_data.custom.getValue(exec, JSValue::encode(thisValue), propertyName));
}
The biggest problem I am facing is because m_thisValue.getObject() is not a valid pointer when called by the inlined version and I am not able to store EncodedJSValue to pc[6]. I spent yesterday night and today night trying to figure out what can I do.
Saam, about store the UniquedStringImpl into a pc slot, it is not possible because all slot are in use.
Created attachment 281237[details]
Archive of layout-test-results from ews100 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 281238[details]
Archive of layout-test-results from ews107 for mac-yosemite-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Created attachment 281239[details]
Archive of layout-test-results from ews115 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 281240[details]
Archive of layout-test-results from ews124 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.4
Hi Guys. Finally I finished a version that is passing in the webkit-test for x86_64 on Mac. Now I need to fix the 32bit version and send to review. However, I am testing the performance to check the improvement of this implementation.
As a Benchmark for the Custom inlined I am using the following tool:
http://browserbench.org/Speedometer/
I already have some results but I didn't Analyze them. I am running this tool with the --release version of the baseline version and with the path applied.
In the overall score, both versions are close and I couldn't notice difference in the results. I am going to improve the analysis to check some behaviors of this benchmark with the prototype cache implementation (e.g.cache miss rate). I keep you informed. Probably today I am going to send my final version to review today.
Created attachment 281436[details]
Archive of layout-test-results from ews101 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 281437[details]
Archive of layout-test-results from ews107 for mac-yosemite-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Created attachment 281438[details]
Archive of layout-test-results from ews124 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.4
Created attachment 281446[details]
Archive of layout-test-results from ews117 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 281540[details]
speedometer run comparison with and without patch applied
It is the result of speedometer benchmark comparing baseline and the patch applied.
(In reply to comment #75)
> Created attachment 281540[details]
> speedometer run comparison with and without patch applied
>
> It is the result of speedometer benchmark comparing baseline and the patch
> applied.
I noticed that there is no real gain in this benchmark in terms of performance for the custom accessor caching. So I decided to check the cache miss ratio of it to verify the behavior of this benchmark. The summary results I've got are:
----------------------------------------------------------------------------
Current CacheMiss Ratio:145467/319000
----------------------------------------------------------------------------
This Ration is 0.45600940438871473. I am not convinced that this benchmark is a good choice to test the performance gain of this patch. What I am doing right now is logging my tonight usage to collect the CacheMiss Ration using facebook, youtube and other real web applications.
I am going to send you the results soon.
Created attachment 281542[details]
Archive of layout-test-results from ews103 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 281543[details]
Archive of layout-test-results from ews106 for mac-yosemite-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Created attachment 281544[details]
Archive of layout-test-results from ews125 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.4
Created attachment 281545[details]
Archive of layout-test-results from ews116 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116 Port: mac-yosemite Platform: Mac OS X 10.10.5
Attachment 281547[details] did not pass style-queue:
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:64: Alphabetical sorting problem. [build/include_order] [4]
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1666: Missing space after , [whitespace/comma] [3]
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1675: Weird number of spaces at line-start. Are you using a 4-space indent? [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1687: Missing space after , [whitespace/comma] [3]
ERROR: Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1688: Missing space after , [whitespace/comma] [3]
Total errors found: 5 in 17 files
If any of these errors are false positives, please file a bug against check-webkit-style.
(In reply to comment #85)
> Created attachment 281547[details]
> Patch
Hey Guys, I am sorry to post this patch, but it is the version that I am using to get Cache Miss Information. The strange behavior here is that this patch is not failing in tests in my local machine and I want to test it here:
http/tests/dom/set-document-location-host-to-accepted-values.html
http/tests/dom/set-document-location-hostname-to-accepted-values.html
However, the difference between this and the last version is just the call of Debug Functions... Do you guys can imagine why it is happening? BTW, the test is failing because the line of one CONSOLE MESSAGE is different (It is expected to be 66 and it is being 17).
Sorry for the delay in this patch and for this question, but I am trying to solve this specific bug since yesterday with no success.
Regards,
Caio Lima.
Hi Guys.
Finally, the patch is now Green! However I am in doubt if it is working for other architectures (Win, GTK, ios), because I saw some comments in the code about call C functions in Windows environment.
Thi is my final version to review =)
New information about Cache Miss Ratio. I traced the execution of my Safari since Sunday afternoon until now and I got the following result:
----------------------------------------------------------------------------
Current CacheMiss Ratio:15510/37691
----------------------------------------------------------------------------
Almost 70% of my usage was in Facebook, Youtube and Google. It is really close to results of speedometer benchmark, what validate the result of the last run.
Comment on attachment 281613[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=281613&action=review
Looks good! I just have a couple of comments/questions.
> Source/JavaScriptCore/ChangeLog:12
> + offset of a object to call a getter function and get_by_id_proto_custo
typo: git_by_id_proto_custom
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1299
> + bineq 0, dest, .loadEncodedThisValueDone
Does this get assembled to:
test: dest, dest
jz: .loadEncodedThisValueDone
? If not, we should probably try to figure out how to make that happen.
(In reply to comment #93)
> Comment on attachment 281613[details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=281613&action=review
>
> Looks good! I just have a couple of comments/questions.
>
> > Source/JavaScriptCore/ChangeLog:12
> > + offset of a object to call a getter function and get_by_id_proto_custo
>
> typo: git_by_id_proto_custom
Nice Catch =)
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1299
> > + bineq 0, dest, .loadEncodedThisValueDone
>
> Does this get assembled to:
> test: dest, dest
> jz: .loadEncodedThisValueDone
> ? If not, we should probably try to figure out how to make that happen.
Yes, Here is the code compiled:
0x100dbaf1f <+13003>: movq 0x30(%r13,%r8,8), %rsi
0x100dbaf24 <+13008>: testl %esi, %esi
0x100dbaf26 <+13010>: jne 0x100dbaf2f ; <+13019>
0x100dbaf2c <+13016>: movq %rcx, %rsi
0x100dbaf2f <+13019>: movq 0x28(%r13,%r8,8), %rax
I am going to rebase the code and send the Patch updated
Created attachment 282331[details]
Archive of layout-test-results from ews126 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.4
Ping review. I think now it is ok now, because I am considering the last errors reported to ARM64 and C_LOOP.
The last problems were:
1. ARM64 doesn't support "push <reg>", so it is necessary push two registers at the same time.
2. C_LOOP doesn't support call operations, So I added a slow case to it.
Comment on attachment 282418[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=282418&action=review> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:686
> + } else if (UNLIKELY(pc[7].u.operand && (slot.isValue() || slot.isUnset() || ((slot.isAccessor() || slot.isCustom()) && (slot.slotBase() != baseValue))))) {
I think this could just be:
if (UNLIKELY(pc[7].u.operand && (slot.slotBase() != baseValue)))))
> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:1476
> + push t1 # Load arg3 JSObject *
> + push t2 # Load arg2 PropertyName
> + move CellTag, t3
> + if BIG_ENDIAN
> + push t1 # Load arg1 Payload of EncodedJSValue
> + push t3 # Load arg1 Tag of EncodedJSValue
> + else
> + push t3 # Load arg1 Tag of EncodedJSValue
> + push t1 # Load arg1 Payload of EncodedJSValue
> + end
> + push cfr # Loading exec
Looking at this closer. I don't think this is going to work on ARM. The ARM calling convention passes the first four arguments in r0-r3. Additionally, depending on whether or not the device is using the EABI 64-bit values need to start in r0 or r2.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1333
> + loadp Callee[cfr], t0
> + andp MarkedBlockMask, t0, t1
> + loadp MarkedBlock::m_weakSet + WeakSet::m_vm[t1], t1
It is probably worthwhile to make this a macro in LowLevelInterpreter.asm, loadVM(register) or something.
Comment on attachment 283306[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=283306&action=review> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:1478
> + loadEncodedThisValue(t3, a1)
> + push t5, cfr
> + loadp 20[PC], t5
> + # Inlining the GetValueFunc call
> + move t2, a3 # Load arg2 PropertyName
> + if BIG_ENDIAN
> + move a1, a2 # Load arg1 Payload of EncodedJSValue
> + move CellTag, a1 # Load arg1 Tag of EncodedJSValue
> + else
> + move CellTag, a2 # Load arg1 Tag of EncodedJSValue
> + end
> + move cfr, a0 # Loading exec
> + cCall4(t5)
Looks good. Although I still think you need to handle the EABI. For the EABI you'll need to push the PropertyName onto the stack put the EncodedJSValue into r2-r3. r1 should be empty.
> Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1322
> +macro loadEncodedThisValue(baseValue, dest)
> + loadpFromInstruction(6, dest)
> + bineq 0, dest, .loadEncodedThisValueDone
> + move baseValue, dest
> + .loadEncodedThisValueDone:
> +end
I think this should be bpneq. bineq only compares the low 32 bits. Also, You could move it to LowLevelInterpreter.asm
(In reply to comment #108)
> Comment on attachment 283306[details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=283306&action=review
>
> > Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:1478
> > + loadEncodedThisValue(t3, a1)
> > + push t5, cfr
> > + loadp 20[PC], t5
> > + # Inlining the GetValueFunc call
> > + move t2, a3 # Load arg2 PropertyName
> > + if BIG_ENDIAN
> > + move a1, a2 # Load arg1 Payload of EncodedJSValue
> > + move CellTag, a1 # Load arg1 Tag of EncodedJSValue
> > + else
> > + move CellTag, a2 # Load arg1 Tag of EncodedJSValue
> > + end
> > + move cfr, a0 # Loading exec
> > + cCall4(t5)
>
> Looks good. Although I still think you need to handle the EABI. For the EABI
> you'll need to push the PropertyName onto the stack put the EncodedJSValue
> into r2-r3. r1 should be empty.
Thank you to inform me about it. Which architectures use EABI? I suppose ARM, but I would like to be sure before send a new patch. Also, do you mind point me where you found this information?
> > Source/JavaScriptCore/llint/LowLevelInterpreter64.asm:1322
> > +macro loadEncodedThisValue(baseValue, dest)
> > + loadpFromInstruction(6, dest)
> > + bineq 0, dest, .loadEncodedThisValueDone
> > + move baseValue, dest
> > + .loadEncodedThisValueDone:
> > +end
>
> I think this should be bpneq. bineq only compares the low 32 bits. Also, You
> could move it to LowLevelInterpreter.asm
Nice catch, Thanks!
(In reply to comment #109)
> (In reply to comment #108)
> >
> > Looks good. Although I still think you need to handle the EABI. For the EABI
> > you'll need to push the PropertyName onto the stack put the EncodedJSValue
> > into r2-r3. r1 should be empty.
>
> Thank you to inform me about it. Which architectures use EABI? I suppose
> ARM, but I would like to be sure before send a new patch. Also, do you mind
> point me where you found this information?
EABI AFAIK is used for both MIPS and ARM. I don't have any documentation for ARM I have some documentation for MIPS, which I will email to you.
Comment on attachment 285082[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=285082&action=review> Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:679
> + if (slot.isValue()) {
I don't think you need this check. This block is dominated by a check that the slot is a value.
> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:1474
> + if ARM or ARMv7 or ARMv7_TRADITIONAL or MIPS
I think you want. if (COMPILER_SUPPORTS(EABI) and ARM) or MIPS. Not all ARM devices use the EABI.
Comment on attachment 291039[details]
Patch
I remember Keith did a patch similar to this in the LLInt for prototype values and we didn't see a measurable speedup from it.
Should we be doing optimizations like this in the LLInt? Is it worth the complexity and maintenance cost?
What's the best benchmark result we have from this? Do we have any results that are not tailored micro benchmarks?
(In reply to comment #118)
> Comment on attachment 291039[details]
> Patch
>
> I remember Keith did a patch similar to this in the LLInt for prototype
> values and we didn't see a measurable speedup from it.
>
> Should we be doing optimizations like this in the LLInt? Is it worth the
> complexity and maintenance cost?
>
> What's the best benchmark result we have from this? Do we have any results
> that are not tailored micro benchmarks?
I think I agree with Geoff here. Looking at this patch again, it's difficult to understand exactly what it's doing. If we don't have a measurable speedup from this on a non-microbenchmark, I'm not sure we should take the patch.
(In reply to comment #118)
> Comment on attachment 291039[details]
> Patch
>
> I remember Keith did a patch similar to this in the LLInt for prototype
> values and we didn't see a measurable speedup from it.
>
> Should we be doing optimizations like this in the LLInt? Is it worth the
> complexity and maintenance
I think I can't agree more. It is bringing a complex with practically no performance gains. I remember last time I ran Speedometer in x86_64 and I didn't saw gains. The point is that we can get better improvements if JIT=false, but it doesn't represent real world usage.
> What's the best benchmark result we have from this? Do we have any results
> that are not tailored micro benchmarks?
I don't.
What do you think if we just cache getters and forget about customGetter inline? The patch will be much more simpler and we also can take advantage on "super" access in a further case. I can take a look on that and see if it is worth or not.
> What do you think if we just cache getters and forget about customGetter
> inline? The patch will be much more simpler and we also can take advantage
> on "super" access in a further case. I can take a look on that and see if it
> is worth or not.
If we have consensus that the performance motivation is not super strong, then I think we should adopt LLInt optimizations or not based on weighing their complexity. I don't know exactly what an optimization that kept caching but skipped customGetter would look like, but if it were simple enough I think it might be worth it.
I think non-JIT performance is important.
We already retired some old JIT arches like SH4.
And we do not actively maintain the existing but non-major JIT arch like ARM6.
If we would like to freely focus on existing high-performance JIT (lovely x64, ARM64!), I think we should keep LLInt fast for non-JIT.
In addition to slightly improve the performance of LLInt in JIT supported environments, it helps us to easily say that "OK, you do not need to worry about your non-JIT supported arch performance. LLInt is already super fast!".
Comment on attachment 291039[details]
Patch
Patches that have been up for review since 2016 are almost certainly too stale to be relevant to trunk in their current form.
If this patch is still important please rebase it and post it for review again.
2016-06-03 02:57 PDT, Caio Lima
2016-06-03 03:58 PDT, Build Bot
2016-06-03 04:01 PDT, Build Bot
2016-06-03 04:04 PDT, Build Bot
2016-06-03 04:14 PDT, Build Bot
2016-06-03 09:45 PDT, Caio Lima
2016-06-05 01:17 PDT, Caio Lima
2016-06-05 13:21 PDT, Caio Lima
2016-06-05 19:56 PDT, Caio Lima
2016-06-05 19:57 PDT, Caio Lima
2016-06-06 01:47 PDT, Caio Lima
2016-06-13 22:52 PDT, Caio Lima
2016-06-13 23:08 PDT, Caio Lima
2016-06-13 23:55 PDT, Build Bot
2016-06-13 23:57 PDT, Build Bot
2016-06-13 23:57 PDT, Build Bot
2016-06-13 23:58 PDT, Build Bot
2016-06-15 20:30 PDT, Caio Lima
2016-06-15 21:28 PDT, Build Bot
2016-06-15 21:32 PDT, Build Bot
2016-06-15 21:36 PDT, Build Bot
2016-06-15 23:48 PDT, Build Bot
2016-06-16 23:27 PDT, Caio Lima
2016-06-16 23:32 PDT, Caio Lima
2016-06-17 00:26 PDT, Build Bot
2016-06-17 00:29 PDT, Build Bot
2016-06-17 00:36 PDT, Build Bot
2016-06-17 00:38 PDT, Build Bot
2016-06-17 01:06 PDT, Caio Lima
2016-06-17 01:34 PDT, Caio Lima
2016-06-18 01:55 PDT, Caio Lima
2016-06-28 23:59 PDT, Caio Lima
2016-06-29 01:02 PDT, Build Bot
2016-06-30 01:15 PDT, Caio Lima
2016-07-11 01:31 PDT, Caio Lima
2016-08-02 00:21 PDT, Caio Lima
2016-08-02 00:34 PDT, Caio Lima
2016-08-06 22:27 PDT, Caio Lima
2016-10-08 22:47 PDT, Caio Lima