Bug 107185 - Prepare WebGL methods and binding generators for VM-allocated typed arrays
Summary: Prepare WebGL methods and binding generators for VM-allocated typed arrays
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebGL (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Kenneth Russell
URL:
Keywords:
Depends on: 106975
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-17 15:00 PST by Kenneth Russell
Modified: 2013-04-03 14:18 PDT (History)
17 users (show)

See Also:


Attachments
Patch (42.66 KB, patch)
2013-01-17 15:26 PST, Kenneth Russell
no flags Details | Formatted Diff | Diff
Patch (46.27 KB, patch)
2013-01-22 15:19 PST, Kenneth Russell
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kenneth Russell 2013-01-17 15:00:03 PST
Currently all instances of ArrayBuffer and ArrayBufferView created from JavaScript call down into WebCore to allocate their backing storage. This is problematic for two reasons:

1) A lot of C++ code is run when allocating these objects, and there are no hot spots in that code (measured by @ulan); it's just a large amount of "warm" code.
2) ECMAScript 6 subsumes these types into the language, which means that VMs like JSC and V8 will likely pull the entire implementation into the VM, and no longer call into the bindings at all when allocating these objects (at least in the common case).

To eliminate bottleneck (1) and make progress on (2), JavaScript VMs can allocate most typed arrays so their storage lives in the JS heap. When passing these instances across the bindings into WebCore, it will be necessary to describe them as ArrayBuffer and ArrayBufferView instances to WebCore without actually instantiating the C++ ArrayBuffer and ArrayBufferView classes, thereby causing memory allocation and data copying.

Lightweight structs which describe ArrayBuffers and ArrayBufferViews need to be introduced to allow JavaScript VMs to perform this optimization. Additionally, unifying the handling of ArrayBuffer and ArrayBufferView using these descriptors allows more code sharing in the WebGL implementation and probably elsewhere.
Comment 1 Filip Pizlo 2013-01-17 15:12:52 PST
(In reply to comment #0)
> Currently all instances of ArrayBuffer and ArrayBufferView created from JavaScript call down into WebCore to allocate their backing storage. This is problematic for two reasons:

I'm with you, so far.

> 
> 1) A lot of C++ code is run when allocating these objects, and there are no hot spots in that code (measured by @ulan); it's just a large amount of "warm" code.
> 2) ECMAScript 6 subsumes these types into the language, which means that VMs like JSC and V8 will likely pull the entire implementation into the VM, and no longer call into the bindings at all when allocating these objects (at least in the common case).
> 
> To eliminate bottleneck (1) and make progress on (2), JavaScript VMs can allocate most typed arrays so their storage lives in the JS heap. 

Agree.

> When passing these instances across the bindings into WebCore, it will be necessary to describe them as ArrayBuffer and ArrayBufferView instances to WebCore without actually instantiating the C++ ArrayBuffer and ArrayBufferView classes, thereby causing memory allocation and data copying.

You lost me.  A typed array *is* an ArrayBufferView, so there is nothing that needs to be done there.  The JS VM can allocate that directly.  No changes in the bindings are needed.  All that is needed is to un-dumb-ify the VM.

For ArrayBuffer, it's true that allocating that is redundant.  But we could just allocate it lazily when WebCore asks for it, and the ArrayBufferView didn't already have one.  I.e.: JSC allocates ArrayBufferViews sans ArrayBuffers, with a backing store allocated in the JSC CopiedSpace heap.  If you ask for an ArrayBuffer, or ask for the backing store from within WebCore, we then create the necessary data structures in the malloc heap.

> 
> Lightweight structs which describe ArrayBuffers and ArrayBufferViews need to be introduced to allow JavaScript VMs to perform this optimization. Additionally, unifying the handling of ArrayBuffer and ArrayBufferView using these descriptors allows more code sharing in the WebGL implementation and probably elsewhere.
Comment 2 Kenneth Russell 2013-01-17 15:26:15 PST
Created attachment 183297 [details]
Patch
Comment 3 Kenneth Russell 2013-01-17 15:33:46 PST
(In reply to comment #1)
> (In reply to comment #0)
> > When passing these instances across the bindings into WebCore, it will be necessary to describe them as ArrayBuffer and ArrayBufferView instances to WebCore without actually instantiating the C++ ArrayBuffer and ArrayBufferView classes, thereby causing memory allocation and data copying.
> 
> You lost me.  A typed array *is* an ArrayBufferView, so there is nothing that needs to be done there.  The JS VM can allocate that directly.  No changes in the bindings are needed.  All that is needed is to un-dumb-ify the VM.
> 
> For ArrayBuffer, it's true that allocating that is redundant.  But we could just allocate it lazily when WebCore asks for it, and the ArrayBufferView didn't already have one.  I.e.: JSC allocates ArrayBufferViews sans ArrayBuffers, with a backing store allocated in the JSC CopiedSpace heap.  If you ask for an ArrayBuffer, or ask for the backing store from within WebCore, we then create the necessary data structures in the malloc heap.

Creating these data structures in the malloc heap when touching the backing store from WebCore is exactly what we want to avoid. All of the WebGL entry points have the semantic that after they return, they no longer reference the incoming data. We want to pass direct pointers from the JS heap into WebCore, pinning the objects for the duration of the call, and avoid these allocations and data copies.
Comment 4 Filip Pizlo 2013-01-17 15:51:24 PST
(In reply to comment #3)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > When passing these instances across the bindings into WebCore, it will be necessary to describe them as ArrayBuffer and ArrayBufferView instances to WebCore without actually instantiating the C++ ArrayBuffer and ArrayBufferView classes, thereby causing memory allocation and data copying.
> > 
> > You lost me.  A typed array *is* an ArrayBufferView, so there is nothing that needs to be done there.  The JS VM can allocate that directly.  No changes in the bindings are needed.  All that is needed is to un-dumb-ify the VM.
> > 
> > For ArrayBuffer, it's true that allocating that is redundant.  But we could just allocate it lazily when WebCore asks for it, and the ArrayBufferView didn't already have one.  I.e.: JSC allocates ArrayBufferViews sans ArrayBuffers, with a backing store allocated in the JSC CopiedSpace heap.  If you ask for an ArrayBuffer, or ask for the backing store from within WebCore, we then create the necessary data structures in the malloc heap.
> 
> Creating these data structures in the malloc heap when touching the backing store from WebCore is exactly what we want to avoid. All of the WebGL entry points have the semantic that after they return, they no longer reference the incoming data. We want to pass direct pointers from the JS heap into WebCore, pinning the objects for the duration of the call, and avoid these allocations and data copies.

I looked at your patch.  Can you articulate how this helps (a) pinning, (b) not copying, and (c) allocating array buffer views in JSC?

It doesn't help pinning because it doesn't do pinning.  And anyway, the standard way of doing pinning doesn't require a separate descriptor.  Is this an attempt to refactor the code to do RAII for pinning?

This change adds copying of the void* base and unsigned length fields.  How does this reduce copying?

As I've already stated, it would be super simple to allocate array buffer views in the JS heap.  In the cases that your change covers, this would not require any changes.  I.e. without your patch, things would still work just fine: JSC will not copy an object while it is blocked on a call into the DOM.  It also won't copy an object while it is referenced from the C stack.  So, your change doesn't actually help things.

So as far as I can tell, this change is just adding noise - it doesn't achieve the thing that you claim it achieves.
Comment 5 Filip Pizlo 2013-01-17 15:52:55 PST
Comment on attachment 183297 [details]
Patch

Marking as r- to be as unambiguous as possible about the fact that I view this change as adding entirely unneeded code.
Comment 6 Kenneth Russell 2013-01-17 16:18:27 PST
(In reply to comment #4)
> I looked at your patch.  Can you articulate how this helps (a) pinning, (b) not copying, and (c) allocating array buffer views in JSC?
> 
> It doesn't help pinning because it doesn't do pinning.  And anyway, the standard way of doing pinning doesn't require a separate descriptor.  Is this an attempt to refactor the code to do RAII for pinning?
> 
> This change adds copying of the void* base and unsigned length fields.  How does this reduce copying?
> 
> As I've already stated, it would be super simple to allocate array buffer views in the JS heap.  In the cases that your change covers, this would not require any changes.  I.e. without your patch, things would still work just fine: JSC will not copy an object while it is blocked on a call into the DOM.  It also won't copy an object while it is referenced from the C stack.  So, your change doesn't actually help things.

WebCore methods exposed via the IDL which take ArrayBuffer or ArrayBufferView need new overloads in order for the bindings to handle passing typed arrays allocated on the JS heap.

A current ArrayBuffer* argument needs to be expanded into two arguments: the base pointer and size. A current ArrayBufferView* argument needs to be expanded into three: base pointer, size, and type. (The type is required; some WebGL APIs check it.) Without these changes it is not possible to efficiently pass a JS-heap-allocated typed array into WebCore.

This initial patch does three things:

1) Supports passing JS-heap-allocated, pinned typed arrays to the modified WebGL methods.
2) Keeps the C++ method signatures readable.
3) Makes it easier to generate the binding code.

It does not implement pinning, JS heap allocation for typed arrays, or binding code generator changes.
Comment 7 Filip Pizlo 2013-01-17 16:25:01 PST
(In reply to comment #6)
> (In reply to comment #4)
> > I looked at your patch.  Can you articulate how this helps (a) pinning, (b) not copying, and (c) allocating array buffer views in JSC?
> > 
> > It doesn't help pinning because it doesn't do pinning.  And anyway, the standard way of doing pinning doesn't require a separate descriptor.  Is this an attempt to refactor the code to do RAII for pinning?
> > 
> > This change adds copying of the void* base and unsigned length fields.  How does this reduce copying?
> > 
> > As I've already stated, it would be super simple to allocate array buffer views in the JS heap.  In the cases that your change covers, this would not require any changes.  I.e. without your patch, things would still work just fine: JSC will not copy an object while it is blocked on a call into the DOM.  It also won't copy an object while it is referenced from the C stack.  So, your change doesn't actually help things.
> 
> WebCore methods exposed via the IDL which take ArrayBuffer or ArrayBufferView need new overloads in order for the bindings to handle passing typed arrays allocated on the JS heap.

This statement is unsubstantiated, and strictly untrue in JSC.

> 
> A current ArrayBuffer* argument needs to be expanded into two arguments: the base pointer and size. A current ArrayBufferView* argument needs to be expanded into three: base pointer, size, and type. (The type is required; some WebGL APIs check it.) Without these changes it is not possible to efficiently pass a JS-heap-allocated typed array into WebCore.

Not true.  Can you argue specifically why you believe this to be the case?  It is strictly not true in JSC.

Adding WebCore functionality to support flaws in the V8 object model is not, to my understanding, a goal of WebKit.

> 
> This initial patch does three things:
> 
> 1) Supports passing JS-heap-allocated, pinned typed arrays to the modified WebGL methods.

It doesn't do this.  You can already pass pinned typed arrays to the unmodified WebGL methods.

> 2) Keeps the C++ method signatures readable.
> 3) Makes it easier to generate the binding code.
> 
> It does not implement pinning, JS heap allocation for typed arrays, or binding code generator changes.
Comment 8 Kenneth Russell 2013-01-17 17:52:32 PST
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #4)
> > > I looked at your patch.  Can you articulate how this helps (a) pinning, (b) not copying, and (c) allocating array buffer views in JSC?
> > > 
> > > It doesn't help pinning because it doesn't do pinning.  And anyway, the standard way of doing pinning doesn't require a separate descriptor.  Is this an attempt to refactor the code to do RAII for pinning?
> > > 
> > > This change adds copying of the void* base and unsigned length fields.  How does this reduce copying?
> > > 
> > > As I've already stated, it would be super simple to allocate array buffer views in the JS heap.  In the cases that your change covers, this would not require any changes.  I.e. without your patch, things would still work just fine: JSC will not copy an object while it is blocked on a call into the DOM.  It also won't copy an object while it is referenced from the C stack.  So, your change doesn't actually help things.
> > 
> > WebCore methods exposed via the IDL which take ArrayBuffer or ArrayBufferView need new overloads in order for the bindings to handle passing typed arrays allocated on the JS heap.
> 
> This statement is unsubstantiated, and strictly untrue in JSC.

I believe my statement is true. Taking the JSC bindings for WebGLRenderingContext's bufferData(in unsigned long target, in ArrayBufferView? data, in unsigned long usage) as an example:

static EncodedJSValue JSC_HOST_CALL jsWebGLRenderingContextPrototypeFunctionBufferData2(ExecState* exec)
{
    // ...
    ArrayBufferView* data(toArrayBufferView(MAYBE_MISSING_PARAMETER(exec, 1, DefaultIsUndefined)));
    // ...
    impl->bufferData(target, data, usage, ec);
    // ...
}

If JSC (or V8) allocated typed arrays' storage in the JS heap then the operation of fetching an ArrayBufferView C++ instance for it would require allocation of a lot of C++ objects: the underlying ArrayBuffer, the right ArrayBufferView subclass (which contains a RefPtr to the ArrayBuffer), the ArrayBuffer's ArrayBufferContents, etc. Are you proposing that all of those objects would actually be allocated from a contiguous block in the JS heap, and the JS GC modified to support moving them (in case of a compacting collector) so that pointers to them can be passed to C++? This seems incredibly complicated.

> > A current ArrayBuffer* argument needs to be expanded into two arguments: the base pointer and size. A current ArrayBufferView* argument needs to be expanded into three: base pointer, size, and type. (The type is required; some WebGL APIs check it.) Without these changes it is not possible to efficiently pass a JS-heap-allocated typed array into WebCore.
> 
> Not true.  Can you argue specifically why you believe this to be the case?  It is strictly not true in JSC.

Currently both the JSC and V8 bindings extract a C++ ArrayBuffer* or ArrayBufferView* from the incoming JavaScript wrappers. Right now the C++ classes own their storage. Without some change, whether it be to WTF or WebCore, there is no way to avoid a data copy, in either JSC or V8.

> Adding WebCore functionality to support flaws in the V8 object model is not, to my understanding, a goal of WebKit.

There is no need to be nasty. If I am misunderstanding your point then please help me understand exactly how you anticipate this working in JSC.

> > This initial patch does three things:
> > 
> > 1) Supports passing JS-heap-allocated, pinned typed arrays to the modified WebGL methods.
> 
> It doesn't do this.

The addition of the new WebGLRenderingContext methods taking ArrayBufferDescriptor and ArrayBufferViewDescriptor are what supports this. These can be stack-allocated in the bindings, and point into the JS heap during the native calls.

>  You can already pass pinned typed arrays to the unmodified WebGL methods.

I don't believe this to be true for either JSC or V8.
Comment 9 Filip Pizlo 2013-01-17 18:19:18 PST
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #4)
> > > > I looked at your patch.  Can you articulate how this helps (a) pinning, (b) not copying, and (c) allocating array buffer views in JSC?
> > > > 
> > > > It doesn't help pinning because it doesn't do pinning.  And anyway, the standard way of doing pinning doesn't require a separate descriptor.  Is this an attempt to refactor the code to do RAII for pinning?
> > > > 
> > > > This change adds copying of the void* base and unsigned length fields.  How does this reduce copying?
> > > > 
> > > > As I've already stated, it would be super simple to allocate array buffer views in the JS heap.  In the cases that your change covers, this would not require any changes.  I.e. without your patch, things would still work just fine: JSC will not copy an object while it is blocked on a call into the DOM.  It also won't copy an object while it is referenced from the C stack.  So, your change doesn't actually help things.
> > > 
> > > WebCore methods exposed via the IDL which take ArrayBuffer or ArrayBufferView need new overloads in order for the bindings to handle passing typed arrays allocated on the JS heap.
> > 
> > This statement is unsubstantiated, and strictly untrue in JSC.
> 
> I believe my statement is true. Taking the JSC bindings for WebGLRenderingContext's bufferData(in unsigned long target, in ArrayBufferView? data, in unsigned long usage) as an example:
> 
> static EncodedJSValue JSC_HOST_CALL jsWebGLRenderingContextPrototypeFunctionBufferData2(ExecState* exec)
> {
>     // ...
>     ArrayBufferView* data(toArrayBufferView(MAYBE_MISSING_PARAMETER(exec, 1, DefaultIsUndefined)));
>     // ...
>     impl->bufferData(target, data, usage, ec);
>     // ...
> }
> 
> If JSC (or V8) allocated typed arrays' storage in the JS heap then the operation of fetching an ArrayBufferView C++ instance for it would require allocation of a lot of C++ objects: the underlying ArrayBuffer, the right ArrayBufferView subclass (which contains a RefPtr to the ArrayBuffer), the ArrayBuffer's ArrayBufferContents, etc. Are you proposing that all of those objects would actually be allocated from a contiguous block in the JS heap, and the JS GC modified to support moving them (in case of a compacting collector) so that pointers to them can be passed to C++? This seems incredibly complicated.

JSC GC doesn't have to move objects, ever. So all of these objects can be allocated by the GC, and referenced from C++, and never moved. 

I view movable objects as a nice-to-have from a GC design standpoint. 

> 
> > > A current ArrayBuffer* argument needs to be expanded into two arguments: the base pointer and size. A current ArrayBufferView* argument needs to be expanded into three: base pointer, size, and type. (The type is required; some WebGL APIs check it.) Without these changes it is not possible to efficiently pass a JS-heap-allocated typed array into WebCore.
> > 
> > Not true.  Can you argue specifically why you believe this to be the case?  It is strictly not true in JSC.
> 
> Currently both the JSC and V8 bindings extract a C++ ArrayBuffer* or ArrayBufferView* from the incoming JavaScript wrappers. Right now the C++ classes own their storage. Without some change, whether it be to WTF or WebCore, there is no way to avoid a data copy, in either JSC or V8.

My point is that the change that we could make is to pass the objects in the JS heap directly into WebCore. 

My point is also that your change doesn't appear to move us in that direction. And you haven't articulated what direction your code is going in or how it would comprehensively allow typed arrays to be cheaper to allocate from JavaScript. 

> 
> > Adding WebCore functionality to support flaws in the V8 object model is not, to my understanding, a goal of WebKit.
> 
> There is no need to be nasty. If I am misunderstanding your point then please help me understand exactly how you anticipate this working in JSC.

See above. JSC can pin things for free, by design. 

> 
> > > This initial patch does three things:
> > > 
> > > 1) Supports passing JS-heap-allocated, pinned typed arrays to the modified WebGL methods.
> > 
> > It doesn't do this.
> 
> The addition of the new WebGLRenderingContext methods taking ArrayBufferDescriptor and ArrayBufferViewDescriptor are what supports this. These can be stack-allocated in the bindings, and point into the JS heap during the native calls.

I don't believe that the stack pattern covers all cases of typed arrays being passed from JS to DOM. 

> 
> >  You can already pass pinned typed arrays to the unmodified WebGL methods.
> 
> I don't believe this to be true for either JSC or V8.

Actually, you're right on this point - currently the WebCore object and the JS object are distinct. This is a bug, and we should fix that bug. My original point is that your change doesn't really get us into the world where a JS types array can always be passed for free into the DOM. Also, it doesn't address the heart of the problem, which is that ArrayBuffer and the Views are not JS objects, but rather DOM objects with JS wrappers.
Comment 10 Kenneth Russell 2013-01-17 19:20:14 PST
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > If JSC (or V8) allocated typed arrays' storage in the JS heap then the operation of fetching an ArrayBufferView C++ instance for it would require allocation of a lot of C++ objects: the underlying ArrayBuffer, the right ArrayBufferView subclass (which contains a RefPtr to the ArrayBuffer), the ArrayBuffer's ArrayBufferContents, etc. Are you proposing that all of those objects would actually be allocated from a contiguous block in the JS heap, and the JS GC modified to support moving them (in case of a compacting collector) so that pointers to them can be passed to C++? This seems incredibly complicated.
> 
> JSC GC doesn't have to move objects, ever. So all of these objects can be allocated by the GC, and referenced from C++, and never moved. 

I see. Regardless, a lot of unnecessary initialization code would need to be run in order to set up these objects so that they look correct to C++. It's this initialization code which is making typed array allocations slow.

> > >  You can already pass pinned typed arrays to the unmodified WebGL methods.
> > 
> > I don't believe this to be true for either JSC or V8.
> 
> Actually, you're right on this point - currently the WebCore object and the JS object are distinct. This is a bug, and we should fix that bug. My original point is that your change doesn't really get us into the world where a JS types array can always be passed for free into the DOM. Also, it doesn't address the heart of the problem, which is that ArrayBuffer and the Views are not JS objects, but rather DOM objects with JS wrappers.

I'm glad we're in agreement on this point. Let's focus on the technical details.

The basic idea is to change the default representation for typed arrays in the VM so that the storage is allocated contiguously to the JS object representing the ArrayBuffer. Most likely there will still be two distinct JS objects; one for the typed array view (Uint8Array, Float32Array, etc.) and one for the underlying ArrayBuffer, to make it simpler to have multiple views for the same buffer. Both objects can still be allocated in one shot by the JS heap allocator and no C++ code would need to be run during their allocation, which is essential to improve allocation performance.

Methods would be added (e.g. JSValue::isHeapAllocatedArrayBuffer) to identify these pure JS ArrayBuffer and view objects, and to extract their base addresses and sizes, and the type from the views.

Most of the methods exposed in the IDL taking typed arrays (in APIs like AudioContext, WebSocket, XMLHttpRequest, and WebGL) don't reference the incoming ArrayBuffer or ArrayBufferView object after the call returns. These methods would be annotated with [SupportsHeapAllocatedTypedArrays] or similar. If called with one of these pure JS typed array objects, the bindings would call the runtime to fetch the base address and pointer (and type for views) from the JS object, and pass them to WebCore's C++ code. I think it would be nicer to pass one stack-allocated structure instead of multiple arguments, which is what this patch does for WebGL's C++ entry points.

In fact, for such methods, we can completely eliminate the overloads which take C++ ArrayBuffer and ArrayBufferView pointers. If an "externally allocated" typed array is passed in, the bindings can just extract the values from the C++ objects. This would drive home the point that typed arrays are mostly owned by the VM and not by WebCore.

The more difficult situation is when a JS-heap-allocated typed array is passed to a C++ method that references the object after the call returns (AudioContext.decodeAudioData) or does complex operations like transfer of ownership (Worker.postMessage). decodeAudioData can be supported by defining a JS handle type in WebCore which would prevent GC of the referenced object. Worker.postMessage with Transferables is harder and I expect the C++ backing objects would be allocated on demand, the data copied, and the JS objects transformed into simple wrappers.

What are your thoughts?
Comment 11 Filip Pizlo 2013-01-17 19:50:43 PST
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > (In reply to comment #7)
> > > If JSC (or V8) allocated typed arrays' storage in the JS heap then the operation of fetching an ArrayBufferView C++ instance for it would require allocation of a lot of C++ objects: the underlying ArrayBuffer, the right ArrayBufferView subclass (which contains a RefPtr to the ArrayBuffer), the ArrayBuffer's ArrayBufferContents, etc. Are you proposing that all of those objects would actually be allocated from a contiguous block in the JS heap, and the JS GC modified to support moving them (in case of a compacting collector) so that pointers to them can be passed to C++? This seems incredibly complicated.
> > 
> > JSC GC doesn't have to move objects, ever. So all of these objects can be allocated by the GC, and referenced from C++, and never moved. 
> 
> I see. Regardless, a lot of unnecessary initialization code would need to be run in order to set up these objects so that they look correct to C++. It's this initialization code which is making typed array allocations slow.

I don't believe that this initialization code would be slow.

Can you provide specific examples of things that would have to be set up, which would be slow?

Consider this: let's say that the buffer is a contiguous object that contains both the backing store and the buffer head, and that a view is a separate object.  That's two allocations.  Each allocation is ~5-6 instructions on the fast path and another ~5-6 instructions on the slow path, so total ~10-12 instructions for two allocations.  The fast path will have two loads and one store, per object.  After the allocation, each JS object needs two stores to initialize its header.  So we're up to four loads (2x 2 loads per allocation) and 6 stores (1 store per object allocation times 2 objects, plus 2 stores per object initialization times 2 objects).  Then at minimum, even if the objects didn't have to look like C++ objects at all, the view would have to have a pointer to the buffer (1 store), the view probably would have to have its own notion of length and offset (another 2 stores, or at least 1 additional store - depends on how smart we are), and the buffer would have to have its size (another store).  Optimisitically, if we assume that view initialization only requires a buffer pointer and a length, that's 3 additional stores on top of the 6 stores we already had.  That's a total of 4 loads and 9 stores.

At this point, this allocation sequence, which doesn't even take into consideration anything that would make these objects look like C++ objects, has enough cruft that adding another couple of stores is unlikely to create a measurable difference in performance.  The surprising truth about inlined object allocators for managed languages is that adding a few instructions to them often doesn't impose significant overhead just because the true cost of a GC allocation ends up being the fact that it (1) accelerates you towards another GC, which is expensive and (2) you're most likely allocating the object in a part of memory that is less likely to be in cache than other parts of memory.  Btw, this is the part that moving GCs, and generational GCs, try to avoid - a moving GenGC will increase the likelihood that a newly allocated object is on a hot cache line - but still it won't be as hot as other objects.  Hence, the paradoxically, the cost of an additional store is quite low: usually the first store to do the first initialization of an allocated object is super expensive because it is likely to miss cache, but then subsequent stores are ultra fast by comparison because they hit cache.

So, what additional stores would you have to do to make the objects look like C++ objects?  If it's only a handful of stores, then it's no biggie.

And by the way, the JSC object model should enable us to make the JS array buffer/view objects look like C++ objects by default without additional stores.  Unless I'm forgetting something...

> 
> > > >  You can already pass pinned typed arrays to the unmodified WebGL methods.
> > > 
> > > I don't believe this to be true for either JSC or V8.
> > 
> > Actually, you're right on this point - currently the WebCore object and the JS object are distinct. This is a bug, and we should fix that bug. My original point is that your change doesn't really get us into the world where a JS types array can always be passed for free into the DOM. Also, it doesn't address the heart of the problem, which is that ArrayBuffer and the Views are not JS objects, but rather DOM objects with JS wrappers.
> 
> I'm glad we're in agreement on this point. Let's focus on the technical details.
> 
> The basic idea is to change the default representation for typed arrays in the VM so that the storage is allocated contiguously to the JS object representing the ArrayBuffer. Most likely there will still be two distinct JS objects; one for the typed array view (Uint8Array, Float32Array, etc.) and one for the underlying ArrayBuffer, to make it simpler to have multiple views for the same buffer. Both objects can still be allocated in one shot by the JS heap allocator and no C++ code would need to be run during their allocation, which is essential to improve allocation performance.
> 
> Methods would be added (e.g. JSValue::isHeapAllocatedArrayBuffer) to identify these pure JS ArrayBuffer and view objects, and to extract their base addresses and sizes, and the type from the views.
> 
> Most of the methods exposed in the IDL taking typed arrays (in APIs like AudioContext, WebSocket, XMLHttpRequest, and WebGL) don't reference the incoming ArrayBuffer or ArrayBufferView object after the call returns. These methods would be annotated with [SupportsHeapAllocatedTypedArrays] or similar. If called with one of these pure JS typed array objects, the bindings would call the runtime to fetch the base address and pointer (and type for views) from the JS object, and pass them to WebCore's C++ code. I think it would be nicer to pass one stack-allocated structure instead of multiple arguments, which is what this patch does for WebGL's C++ entry points.
> 
> In fact, for such methods, we can completely eliminate the overloads which take C++ ArrayBuffer and ArrayBufferView pointers. If an "externally allocated" typed array is passed in, the bindings can just extract the values from the C++ objects. This would drive home the point that typed arrays are mostly owned by the VM and not by WebCore.
> 
> The more difficult situation is when a JS-heap-allocated typed array is passed to a C++ method that references the object after the call returns (AudioContext.decodeAudioData) or does complex operations like transfer of ownership (Worker.postMessage). decodeAudioData can be supported by defining a JS handle type in WebCore which would prevent GC of the referenced object. Worker.postMessage with Transferables is harder and I expect the C++ backing objects would be allocated on demand, the data copied, and the JS objects transformed into simple wrappers.
> 
> What are your thoughts?

I agree that for cases where the C++ code only "borrows" an array buffer or view and doesn't hold on to it after returning, we shouldn't be allocating C++ state.  

But I disagree that we need the lightweight descriptors that you add in this patch in order to achieve goodness.  If the goal is to only allocate JS objects for array buffers/views then we can just pass pointers to those directly into the methods that only borrow state.  This will work by virtue of the fact that the JS VM is not running at that point, and already knows that the JS objects are live by virtue of them having to have been on the JS stack.  The only way this would break is if someone made the awful mistake of trying to implement a concurrent copying GC for JavaScript.  (If you want to know how bad that can get, then read either mine or McCloskey's dissertation on the horror that this leads to.)  In short, descriptors buy us nothing here.

For the cases of methods that do hold on to typed arrays after returning, your current patch doesn't help, either, since there we need some manner of stuff to inform the JS VM that the objects are referenced from C++ land.  And I agree with your proposed way of doing that - it's just that the current patch doesn't actually accomplish that.

In short, my objection is not to your high-level idea of what you're doing.  I'm strictly disagreeing with this particular patch, which appears to not get you any closer to your goal.
Comment 12 Kenneth Russell 2013-01-18 11:03:46 PST
(In reply to comment #11)
> I agree that for cases where the C++ code only "borrows" an array buffer or view and doesn't hold on to it after returning, we shouldn't be allocating C++ state.  

Good.

> In short, my objection is not to your high-level idea of what you're doing.  I'm strictly disagreeing with this particular patch, which appears to not get you any closer to your goal.

This patch would have made it easier for me to make the CodeGeneratorJS and CodeGeneratorV8 changes to support the new WebKit IDL extended attribute indicating that the callee doesn't retain the incoming typed arrays.

It's unfortunate that I won't be able to build upon this patch when making those changes, but I'll upload a new patch implementing them without using these descriptors.
Comment 13 Filip Pizlo 2013-01-18 11:08:25 PST
(In reply to comment #12)
> (In reply to comment #11)
> > I agree that for cases where the C++ code only "borrows" an array buffer or view and doesn't hold on to it after returning, we shouldn't be allocating C++ state.  
> 
> Good.
> 
> > In short, my objection is not to your high-level idea of what you're doing.  I'm strictly disagreeing with this particular patch, which appears to not get you any closer to your goal.
> 
> This patch would have made it easier for me to make the CodeGeneratorJS and CodeGeneratorV8 changes to support the new WebKit IDL extended attribute indicating that the callee doesn't retain the incoming typed arrays.

Can you elaborate on this?

> 
> It's unfortunate that I won't be able to build upon this patch when making those changes, but I'll upload a new patch implementing them without using these descriptors.

If you're making a change that by itself makes the code strictly more complex, you should be more specific about what the end goal is what what your plan is for getting there.

As it stands, I still don't see how adding descriptors makes life better.
Comment 14 Oliver Hunt 2013-01-18 13:44:59 PST
(In reply to comment #12)
> (In reply to comment #11)
> > I agree that for cases where the C++ code only "borrows" an array buffer or view and doesn't hold on to it after returning, we shouldn't be allocating C++ state.  
> 
> Good.
> 
> > In short, my objection is not to your high-level idea of what you're doing.  I'm strictly disagreeing with this particular patch, which appears to not get you any closer to your goal.
> 
> This patch would have made it easier for me to make the CodeGeneratorJS and CodeGeneratorV8 changes to support the new WebKit IDL extended attribute indicating that the callee doesn't retain the incoming typed arrays.
>

This is kind of terrifying to me - IDL doesn't generally specify the lifetime of parameters passed to it, and i'm not sure how you can verify that what an attribute claims ends up matching the implementation behaviour deep in C++ landia.

I'm also a little confused as to where this attribute would be applied as webgl is filled with typed array apis they take ownership for an unbound amount of time.  What am I missing?
Comment 15 Kenneth Russell 2013-01-18 14:05:52 PST
(In reply to comment #14)
> This is kind of terrifying to me - IDL doesn't generally specify the lifetime of parameters passed to it, and i'm not sure how you can verify that what an attribute claims ends up matching the implementation behaviour deep in C++ landia.

If this proposed extended attribute is supplied, then the WebCore code won't receive references to the WTF ArrayBuffer or ArrayBufferView objects at all. Of course it's possible that there could be a bug in the callee where it refers to the storage after it's allowed to, but that's no different than a refcounting bug where the callee fails to use RefPtr.

> I'm also a little confused as to where this attribute would be applied as webgl is filled with typed array apis they take ownership for an unbound amount of time.  What am I missing?

This is not the case. None of the WebGL APIs that take ArrayBuffer or ArrayBufferView reference the incoming buffer or view after the call returns.

I'm preparing a patch that should clarify things.
Comment 16 Oliver Hunt 2013-01-18 14:17:35 PST
(In reply to comment #15)
> (In reply to comment #14)
> > This is kind of terrifying to me - IDL doesn't generally specify the lifetime of parameters passed to it, and i'm not sure how you can verify that what an attribute claims ends up matching the implementation behaviour deep in C++ landia.
> 
> If this proposed extended attribute is supplied, then the WebCore code won't receive references to the WTF ArrayBuffer or ArrayBufferView objects at all. Of course it's possible that there could be a bug in the callee where it refers to the storage after it's allowed to, but that's no different than a refcounting bug where the callee fails to use RefPtr.
> 
> > I'm also a little confused as to where this attribute would be applied as webgl is filled with typed array apis they take ownership for an unbound amount of time.  What am I missing?
> 
> This is not the case. None of the WebGL APIs that take ArrayBuffer or ArrayBufferView reference the incoming buffer or view after the call returns.
> 
> I'm preparing a patch that should clarify things.

So essentially this would not apply to any of the typed array apis? Or do we do security tracking in terms of the buffer objects, and the bufferData API copies immediately?
Comment 17 Kenneth Russell 2013-01-18 14:25:23 PST
(In reply to comment #16)
> So essentially this would not apply to any of the typed array apis?

If I understand what you're asking, then no, the new attribute definitely won't be applied to the typed arrays' IDL files. To start, it will only be applied to methods on WebGLRenderingContext. 

> Or do we do security tracking in terms of the buffer objects, and the bufferData API copies immediately?

CORS security checks are done during texImage2D and texSubImage2D calls, and the call immediately throws if the check fails. There is no notion of a tainted WebGL context, nor of tainted WebGL texture or buffer objects.

The bufferData and bufferSubData APIs copy their incoming data immediately to the GPU.
Comment 18 Kenneth Russell 2013-01-22 14:34:12 PST
Changing description to more accurately reflect coming patch, from:
Add lightweight descriptors for ArrayBuffer and ArrayBufferView

to:
Prepare WebGL methods and binding generators for VM-allocated typed arrays
Comment 19 Kenneth Russell 2013-01-22 15:19:43 PST
Created attachment 184058 [details]
Patch
Comment 20 Filip Pizlo 2013-01-22 15:29:52 PST
Kenneth: I think the previous two patches, and the discussion, has left us talking cross-purposes.

Can you directly answer these questions:

1) Do you plan to make the existing ArrayBuffer/ArrayBufferView classes be VM-allocated?

2) What type will these classes have?

My intuition is that making them VM allocated will result in us being able to pass an ArrayBuffer* or an ArrayBufferView* directly into WebGL code, without changing the APIs of those classes.

If there is some V8 restriction that prevents this, then we should be careful about adding complexity to WebCore just because of V8 bugs.

There is to my knowledge nothing in JavaScriptCore that would prevent the above.

As such, this new patch of yours seems just as superfluous as the previous patch.
Comment 21 Kenneth Russell 2013-01-22 16:54:06 PST
(In reply to comment #20)
> Kenneth: I think the previous two patches, and the discussion, has left us talking cross-purposes.
> 
> Can you directly answer these questions:
> 
> 1) Do you plan to make the existing ArrayBuffer/ArrayBufferView classes be VM-allocated?

Not the C++ implementations. The intent is to move the implementation of the JavaScript ArrayBuffer type and all of the views entirely into the VM for the majority of use cases.

> 2) What type will these classes have?

From JavaScript's point of view they will have the same types. They will obey instanceof checks and have all of the properties and functions defined by the spec.

> My intuition is that making them VM allocated will result in us being able to pass an ArrayBuffer* or an ArrayBufferView* directly into WebGL code, without changing the APIs of those classes.
> 
> If there is some V8 restriction that prevents this, then we should be careful about adding complexity to WebCore just because of V8 bugs.

There is no bug nor restriction in V8. We do not want to have the overhead of setting up structures to make them look like instances of the WTF ArrayBuffer or ArrayBufferView classes. Ulan Degenbaev (CC'd) has profiled typed array allocation and found that the majority of the time is spent in this C++ code. We absolutely do not want to call out to C++ to initialize these objects, and we do not want to add code to the JIT to mimic what the C++ compiler does during initialization of these objects in the hope that simply avoiding the call out to C++ will speed things up. From the standpoint of most of the web APIs taking typed arrays, the setup of the C++ ArrayBuffer and ArrayBufferView structures is completely unnecessary overhead.

> There is to my knowledge nothing in JavaScriptCore that would prevent the above.
> 
> As such, this new patch of yours seems just as superfluous as the previous patch.

JSC will want to avoid this initialization overhead in exactly the same manner as V8. This patch is neither V8-specific nor superflous. It allows the typed array internalization work to proceed in parallel, without the engineers having to carry around large patches to WebCore.
Comment 22 Filip Pizlo 2013-01-22 17:09:46 PST
(In reply to comment #21)
> (In reply to comment #20)
> > Kenneth: I think the previous two patches, and the discussion, has left us talking cross-purposes.
> > 
> > Can you directly answer these questions:
> > 
> > 1) Do you plan to make the existing ArrayBuffer/ArrayBufferView classes be VM-allocated?
> 
> Not the C++ implementations. The intent is to move the implementation of the JavaScript ArrayBuffer type and all of the views entirely into the VM for the majority of use cases.
> 
> > 2) What type will these classes have?
> 
> From JavaScript's point of view they will have the same types. They will obey instanceof checks and have all of the properties and functions defined by the spec.
> 
> > My intuition is that making them VM allocated will result in us being able to pass an ArrayBuffer* or an ArrayBufferView* directly into WebGL code, without changing the APIs of those classes.
> > 
> > If there is some V8 restriction that prevents this, then we should be careful about adding complexity to WebCore just because of V8 bugs.
> 
> There is no bug nor restriction in V8. We do not want to have the overhead of setting up structures to make them look like instances of the WTF ArrayBuffer or ArrayBufferView classes. Ulan Degenbaev (CC'd) has profiled typed array allocation and found that the majority of the time is spent in this C++ code. We absolutely do not want to call out to C++ to initialize these objects, and we do not want to add code to the JIT to mimic what the C++ compiler does during initialization of these objects in the hope that simply avoiding the call out to C++ will speed things up. From the standpoint of most of the web APIs taking typed arrays, the setup of the C++ ArrayBuffer and ArrayBufferView structures is completely unnecessary overhead.

Here's my desire: let's reach consensus on what exactly this means before moving forward.

What do you mean by C++ initializing the objects?  The dominant cost in the ArrayBuffer::create method appears to be the use of fastMalloc for allocation, rather than using inlined GC allocators.  But I don't think even that is the current dominant cost: right now array buffer allocation is dominated by the fact that we have to allocate ~5 objects, each with fastMalloc calls.  The problem is far more systemic than just the fact that it's done in C++.

I think it would be worthwhile if you guys could share more data on this.  In particular, I think when people blame performance of object allocations in a managed language on C++ allocators, they are often conflating these factors:

A) The cost of calling malloc, as opposed to calling the VM's GC allocator.

B) The ABI costs of making a call from managed code to native code.

C) The cost of C++ code performing accesses into the heap (if C++ does such accesses it's usually slower than JS doing it, since the C++ code won't quite benefit from inline caching).

D) The cost of the C++ code itself not counting (A).

In the case of ArrayBuffer allocators, I don't think (C) is the problem.  (A) is a problem, but I have a hard time believing that it's that big of a deal - and anyway, that's neither here nor there since the whole point is to make the allocation use the GC.  And I really have a hard time buying that it's (D) since as I've pointed out before, this is just a handful of stores.

Yet, it appears that you are arguing that (D) is indeed the problem.  Visual inspection of the code implies that this can't be the case because the code isn't really doing that much.

So please, if you want to convince me that the setup of C++ state is expensive, you're going to have to try harder.

> 
> > There is to my knowledge nothing in JavaScriptCore that would prevent the above.
> > 
> > As such, this new patch of yours seems just as superfluous as the previous patch.
> 
> JSC will want to avoid this initialization overhead in exactly the same manner as V8. This patch is neither V8-specific nor superflous. It allows the typed array internalization work to proceed in parallel, without the engineers having to carry around large patches to WebCore.

We (JSC) will not want to avoid this initialization overhead, because we don't have reason to believe that the overhead is significant.

You need to do a better job of convincing me that the overhead is real.
Comment 23 Kenneth Russell 2013-04-03 14:18:21 PDT
Given recent developments, I am not going to pursue incorporating this patch into WebKit.