Bug 147393 - Parse the entire WebAssembly modules
Summary: Parse the entire WebAssembly modules
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 146064 147738
  Show dependency treegraph
 
Reported: 2015-07-28 18:26 PDT by Sukolsak Sakshuwong
Modified: 2015-08-06 17:22 PDT (History)
6 users (show)

See Also:


Attachments
Patch (21.99 KB, patch)
2015-07-28 18:31 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (24.00 KB, patch)
2015-07-29 01:37 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (30.72 KB, patch)
2015-07-31 00:08 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Rewrote WASMReader::readCompactUInt32() (30.69 KB, patch)
2015-07-31 15:57 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (31.99 KB, patch)
2015-08-03 16:04 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Override JSCell::destroy() (32.81 KB, patch)
2015-08-05 11:32 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (33.50 KB, patch)
2015-08-06 16:01 PDT, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sukolsak Sakshuwong 2015-07-28 18:26:23 PDT
Parse the entire WebAssembly modules from WebAssembly files produced by pack-asmjs <https://github.com/WebAssembly/polyfill-prototype-1>.
Comment 1 Sukolsak Sakshuwong 2015-07-28 18:31:10 PDT
Created attachment 257716 [details]
Patch
Comment 2 Sukolsak Sakshuwong 2015-07-29 01:37:27 PDT
Created attachment 257743 [details]
Patch
Comment 3 Geoffrey Garen 2015-07-29 11:07:14 PDT
Comment on attachment 257743 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=257743&action=review

The direction here looks good, but this patch needs some work.

> Source/JavaScriptCore/wasm/JSWASMModule.h:90
> +enum class WASMType : uint8_t {
> +    I32,
> +    F32,
> +    F64
> +};
> +
> +enum class WASMReturnType : uint8_t {
> +    I32,
> +    F32,
> +    F64,
> +    Void
> +};
> +
> +enum class WASMExportFormat : uint8_t {
> +    Default,
> +    Record
> +};
> +
> +struct WASMSignature {
> +    WASMReturnType returnType;
> +    Vector<WASMType> arguments;
> +};
> +
> +struct WASMFunctionImport {
> +    String functionName;
> +};
> +
> +struct WASMFunctionImportSignature {
> +    uint32_t signatureIndex;
> +    uint32_t functionImportIndex;
> +};
> +
> +struct WASMFunctionDeclaration {
> +    uint32_t signatureIndex;
> +};
> +
> +struct WASMFunctionPointerTable {
> +    uint32_t signatureIndex;
> +    Vector<uint32_t> elements;
> +};

In general, we keep one file per class in WebKit. Since these structs are separate from JSWASMModule, they should belong somewhere else.

It would be tedious to use one file per class for each of these structs, so let's use a shared file for all of the structs that define the format of a WASM tree. How about WASMFormat.h, as you suggested previously?

> Source/JavaScriptCore/wasm/JSWASMModule.h:113
> +    friend class WASMModuleParser;

"friend" tends toward spaghetti code. We use classes to establish a separation of concerns, so that we can think about one class, and change it, without worrying about all the other code that uses it. "friend" erases that separation and makes our code more brittle.

Instead of friend, in this case I recommend providing accessors methods for these underlying vectors.

> Source/JavaScriptCore/wasm/WASMModuleParser.h:60
> +    JSWASMModule* m_module;

You need to do something to prevent m_module from being garbage collected.
Comment 4 Sukolsak Sakshuwong 2015-07-29 13:35:46 PDT
Thank you for the review.

(In reply to comment #3)
> > Source/JavaScriptCore/wasm/WASMModuleParser.h:60
> > +    JSWASMModule* m_module;
> 
> You need to do something to prevent m_module from being garbage collected.

Since WASMModuleParser is allocated on the stack, doesn't that prevent m_module from being GC'd? Do I need to do something extra to protect it?
Comment 5 Geoffrey Garen 2015-07-30 14:13:45 PDT
> (In reply to comment #3)
> > > Source/JavaScriptCore/wasm/WASMModuleParser.h:60
> > > +    JSWASMModule* m_module;
> > 
> > You need to do something to prevent m_module from being garbage collected.
> 
> Since WASMModuleParser is allocated on the stack, doesn't that prevent
> m_module from being GC'd? Do I need to do something extra to protect it?

In practice, you can probably produce GC safety by allocating WASMModuleParser on the stack. But that's a dangerous thing to do because there's no way to prevent a future coder from moving WASMModuleParser to the heap, and in general, C++ is intentionally agnostic about whether an object was instantiated on the stack or the heap.

I think a better solution is to use Strong<T>.
Comment 6 Sukolsak Sakshuwong 2015-07-31 00:08:53 PDT
Created attachment 257910 [details]
Patch
Comment 7 Sukolsak Sakshuwong 2015-07-31 00:42:59 PDT
Thanks.

(In reply to comment #3)
> In general, we keep one file per class in WebKit. Since these structs are
> separate from JSWASMModule, they should belong somewhere else.
> 
> It would be tedious to use one file per class for each of these structs, so
> let's use a shared file for all of the structs that define the format of a
> WASM tree. How about WASMFormat.h, as you suggested previously?

Fixed.

> > Source/JavaScriptCore/wasm/JSWASMModule.h:113
> > +    friend class WASMModuleParser;
> 
> "friend" tends toward spaghetti code. We use classes to establish a
> separation of concerns, so that we can think about one class, and change it,
> without worrying about all the other code that uses it. "friend" erases that
> separation and makes our code more brittle.
> 
> Instead of friend, in this case I recommend providing accessors methods for
> these underlying vectors.

Fixed.

(In reply to comment #5)
> > (In reply to comment #3)
> > > > Source/JavaScriptCore/wasm/WASMModuleParser.h:60
> > > > +    JSWASMModule* m_module;
> > > 
> > > You need to do something to prevent m_module from being garbage collected.
> > 
> > Since WASMModuleParser is allocated on the stack, doesn't that prevent
> > m_module from being GC'd? Do I need to do something extra to protect it?
> 
> In practice, you can probably produce GC safety by allocating
> WASMModuleParser on the stack. But that's a dangerous thing to do because
> there's no way to prevent a future coder from moving WASMModuleParser to the
> heap, and in general, C++ is intentionally agnostic about whether an object
> was instantiated on the stack or the heap.
> 
> I think a better solution is to use Strong<T>.

Fixed. I guess one way to do it without using Strong<T> is probably

JSWASMModule* WASMModuleParser::parse(...)
{
    JSWASMModule* module = JSWASMModule::create(...);
    m_module = module;
    ...
    return module;
}

but it seems very hacky.
Comment 8 Saam Barati 2015-07-31 00:48:40 PDT
Comment on attachment 257910 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=257910&action=review

> Source/JavaScriptCore/wasm/WASMReader.cpp:83
> +    CHECK_READ(1);

Why not do everything in the loop?

> Source/JavaScriptCore/wasm/WASMReader.cpp:89
> +    while (true) {

What's stopping this from reading more than 4 bytes?

> Source/JavaScriptCore/wasm/WASMReader.cpp:96
> +        result |= (byte & 0x7f) << shift;

Or maybe the limit is we can read at most 5 bytes?
Can compact int values be larger than 2^28?
Comment 9 Sukolsak Sakshuwong 2015-07-31 02:40:28 PDT
Thanks.

(In reply to comment #8)
> Comment on attachment 257910 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=257910&action=review
> 
> > Source/JavaScriptCore/wasm/WASMReader.cpp:83
> > +    CHECK_READ(1);
> 
> Why not do everything in the loop?

It's just an optimization for small integers. If this seems like a micro-optimization, let me know and I will fix it.

> > Source/JavaScriptCore/wasm/WASMReader.cpp:89
> > +    while (true) {
> 
> What's stopping this from reading more than 4 bytes?
> 
> > Source/JavaScriptCore/wasm/WASMReader.cpp:96
> > +        result |= (byte & 0x7f) << shift;
> 
> Or maybe the limit is we can read at most 5 bytes?
> Can compact int values be larger than 2^28?

My first thought was that users couldn't do anything malicious with this anyway, because we always used CHECK_READ(), and unsigned integer overflow is well-defined. But I can see why this can be a problem. So, should we limit the number of bytes?

Compact (LEB128) uint32 values can be up to 2^32 - 1. So, the limit should be 5 bytes. If we check the number of bytes, should we check as well that the fifth byte is less than 2^(32-28) = 2^4 to prevent an integer overflow?
Comment 10 Mark Lam 2015-07-31 14:22:25 PDT
Comment on attachment 257910 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=257910&action=review

>>> Source/JavaScriptCore/wasm/WASMReader.cpp:83
>>> +    CHECK_READ(1);
>> 
>> Why not do everything in the loop?
> 
> It's just an optimization for small integers. If this seems like a micro-optimization, let me know and I will fix it.

I'm not sure that optimizing here will yield much difference because we're parsing values from an untrusted file, and we need to do all sorts of error checks anyway.  I suggest implementing this all as follows:

1. Use a do-while loop because it is more succinct and easier to read.
2. Use a uint64_t local value instead of storing into the result directly. Using the local allows the compiler to put it in a register, and register operations are cheap.  We only incur the memory write to result at the end.
3. When the computation is done,
    (1) break out of the loop,
    (2) validate that the computed value is within size of a uint32,  and
    (3) copy the local value into the result.  

4. The loop can loop while (value < numeric_limits<uint32_t>::max()).  This ensures that we don't overflow our uint64_t local value and that the validation check in 3.2 will work as expected.
Comment 11 Sukolsak Sakshuwong 2015-07-31 15:57:17 PDT
Created attachment 257974 [details]
Rewrote WASMReader::readCompactUInt32()
Comment 12 Sukolsak Sakshuwong 2015-07-31 16:10:25 PDT
Comment on attachment 257974 [details]
Rewrote WASMReader::readCompactUInt32()

Rewrote WASMReader::readCompactUInt32()
- check that the number of bytes does not exceed 5
- check that the fifth byte is less than 2^(32 - 7*4) = 2^4 to prevent integer overflows
- check that the last byte is not zero, unless it's the only byte of the number
Comment 13 Sukolsak Sakshuwong 2015-07-31 16:21:41 PDT
Thanks.

(In reply to comment #10)
> 1. Use a do-while loop because it is more succinct and easier to read.

Done.

> 2. Use a uint64_t local value instead of storing into the result directly.
> Using the local allows the compiler to put it in a register, and register
> operations are cheap.  We only incur the memory write to result at the end.

Used a local value.

> 3. When the computation is done,
>     (1) break out of the loop,
>     (2) validate that the computed value is within size of a uint32,  and
>     (3) copy the local value into the result.  
> 
> 4. The loop can loop while (value < numeric_limits<uint32_t>::max()).  This
> ensures that we don't overflow our uint64_t local value and that the
> validation check in 3.2 will work as expected.

As discussed on IRC, checking that (value < numeric_limits<uint32_t>::max()) is not enough, because the bytes could be 0x80 0x80 0x80 0x80 0x80 ... . Checking that the number of bytes doesn't exceed 5 and that the fifth byte is <= 0b00001111 should prevent that and also eliminate the need for uint64_t.
Comment 14 Sukolsak Sakshuwong 2015-08-03 16:04:53 PDT
Created attachment 258130 [details]
Patch
Comment 15 Sukolsak Sakshuwong 2015-08-05 11:32:57 PDT
Created attachment 258288 [details]
Override JSCell::destroy()
Comment 16 Saam Barati 2015-08-06 11:51:57 PDT
Comment on attachment 258288 [details]
Override JSCell::destroy()

View in context: https://bugs.webkit.org/attachment.cgi?id=258288&action=review

Looks pretty good to me. This was a fly-by review. I'll look at it in more detail and look at the polyfill when I have more time.

> Source/JavaScriptCore/wasm/JSWASMModule.h:57
> +    Vector<uint32_t>& i32Constants() { return m_i32Constants; }

Nit: These functions can all be marked as "const".

> Source/JavaScriptCore/wasm/WASMModuleParser.cpp:38
> +#define READ_UINT32_OR_FAIL(x, errorMessage) do { if (!m_reader.readUInt32(x)) FAIL_WITH_MESSAGE(errorMessage); } while (0)

I think we should use better names than "x" here and below. Maybe like numConstants or inOutNumConstants?

> Source/JavaScriptCore/wasm/WASMModuleParser.cpp:108
> +        m_module->i32Constants().append(constant);

I think this (and all appends below) can be uncheckedAppend() since we're reserving their space up front.

> Source/JavaScriptCore/wasm/WASMModuleParser.cpp:257
> +    // TODO: Support any functions.

Nit: We tend to use FIXME instead of TODO
Also, is there a bug open for this? If so, let's include the link to it here.
If not, we should open one and link it here.

> Source/JavaScriptCore/wasm/WASMReader.cpp:81
> +bool WASMReader::readCompactUInt32(uint32_t& result)

I wonder if some of this would be easier to read if we wrote numbers in binary or gave then names.
Like: 
uint8_t firstSevenBits = 0x7f;
or
uint8_t byteValueMask = 0x7f;

Or maybe this is silly and we should just stick with hex.
Comment 17 Geoffrey Garen 2015-08-06 14:24:24 PDT
Comment on attachment 258288 [details]
Override JSCell::destroy()

View in context: https://bugs.webkit.org/attachment.cgi?id=258288&action=review

r=me

>> Source/JavaScriptCore/wasm/WASMModuleParser.cpp:108
>> +        m_module->i32Constants().append(constant);
> 
> I think this (and all appends below) can be uncheckedAppend() since we're reserving their space up front.

I agree.

> Source/JavaScriptCore/wasm/WASMModuleParser.cpp:138
> +        m_module->signatures().append(signature);

Same here, and a few places below..
Comment 18 Sukolsak Sakshuwong 2015-08-06 16:01:15 PDT
Created attachment 258411 [details]
Patch
Comment 19 Sukolsak Sakshuwong 2015-08-06 16:16:40 PDT
Thanks you for the review.

(In reply to comment #16)
> > Source/JavaScriptCore/wasm/JSWASMModule.h:57
> > +    Vector<uint32_t>& i32Constants() { return m_i32Constants; }
> 
> Nit: These functions can all be marked as "const".

They cannot be marked as const, because I modify them.

> > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:38
> > +#define READ_UINT32_OR_FAIL(x, errorMessage) do { if (!m_reader.readUInt32(x)) FAIL_WITH_MESSAGE(errorMessage); } while (0)
> 
> I think we should use better names than "x" here and below. Maybe like
> numConstants or inOutNumConstants?

Done. Changed it to "result" to be consistent with the name I use in WASMReader.

> > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:108
> > +        m_module->i32Constants().append(constant);
> 
> I think this (and all appends below) can be uncheckedAppend() since we're
> reserving their space up front.

Done. Also added a check for numberOfSignatures in WASMModuleParser::parseFunctionImportSection() for safety.

> > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:257
> > +    // TODO: Support any functions.
> 
> Nit: We tend to use FIXME instead of TODO
> Also, is there a bug open for this? If so, let's include the link to it here.
> If not, we should open one and link it here.

Done. Changed it to TODO and added a link to https://bugs.webkit.org/show_bug.cgi?id=147738

> > Source/JavaScriptCore/wasm/WASMReader.cpp:81
> > +bool WASMReader::readCompactUInt32(uint32_t& result)
> 
> I wonder if some of this would be easier to read if we wrote numbers in
> binary or gave then names.
> Like: 
> uint8_t firstSevenBits = 0x7f;
> or
> uint8_t byteValueMask = 0x7f;
> 
> Or maybe this is silly and we should just stick with hex.

Done. Added "static const uint32_t firstSevenBitsMask = 0x7f;". This will be used again when I add a signed version of compact ints. I agree that it would be easier to read if it's in binary, but I can't find 0b... used anywhere other than in comments. I'm not sure what our policy is on binary literals.

(In reply to comment #17)
> > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:138
> > +        m_module->signatures().append(signature);
> 
> Same here, and a few places below..

Done.
Comment 20 Sukolsak Sakshuwong 2015-08-06 16:18:59 PDT
(In reply to comment #19)
> Done. Changed it to TODO and added a link to
> https://bugs.webkit.org/show_bug.cgi?id=147738

Sorry, I meant "Changed it to FIXME."
Comment 21 Geoffrey Garen 2015-08-06 16:31:46 PDT
Comment on attachment 258411 [details]
Patch

r=me
Comment 22 WebKit Commit Bot 2015-08-06 17:22:01 PDT
Comment on attachment 258411 [details]
Patch

Clearing flags on attachment: 258411

Committed r188099: <http://trac.webkit.org/changeset/188099>
Comment 23 WebKit Commit Bot 2015-08-06 17:22:04 PDT
All reviewed patches have been landed.  Closing bug.