RESOLVED FIXED 147393
Parse the entire WebAssembly modules
https://bugs.webkit.org/show_bug.cgi?id=147393
Summary Parse the entire WebAssembly modules
Sukolsak Sakshuwong
Reported 2015-07-28 18:26:23 PDT
Parse the entire WebAssembly modules from WebAssembly files produced by pack-asmjs <https://github.com/WebAssembly/polyfill-prototype-1>.
Attachments
Patch (21.99 KB, patch)
2015-07-28 18:31 PDT, Sukolsak Sakshuwong
no flags
Patch (24.00 KB, patch)
2015-07-29 01:37 PDT, Sukolsak Sakshuwong
no flags
Patch (30.72 KB, patch)
2015-07-31 00:08 PDT, Sukolsak Sakshuwong
no flags
Rewrote WASMReader::readCompactUInt32() (30.69 KB, patch)
2015-07-31 15:57 PDT, Sukolsak Sakshuwong
no flags
Patch (31.99 KB, patch)
2015-08-03 16:04 PDT, Sukolsak Sakshuwong
no flags
Override JSCell::destroy() (32.81 KB, patch)
2015-08-05 11:32 PDT, Sukolsak Sakshuwong
no flags
Patch (33.50 KB, patch)
2015-08-06 16:01 PDT, Sukolsak Sakshuwong
no flags
Sukolsak Sakshuwong
Comment 1 2015-07-28 18:31:10 PDT
Sukolsak Sakshuwong
Comment 2 2015-07-29 01:37:27 PDT
Geoffrey Garen
Comment 3 2015-07-29 11:07:14 PDT
Comment on attachment 257743 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=257743&action=review The direction here looks good, but this patch needs some work. > Source/JavaScriptCore/wasm/JSWASMModule.h:90 > +enum class WASMType : uint8_t { > + I32, > + F32, > + F64 > +}; > + > +enum class WASMReturnType : uint8_t { > + I32, > + F32, > + F64, > + Void > +}; > + > +enum class WASMExportFormat : uint8_t { > + Default, > + Record > +}; > + > +struct WASMSignature { > + WASMReturnType returnType; > + Vector<WASMType> arguments; > +}; > + > +struct WASMFunctionImport { > + String functionName; > +}; > + > +struct WASMFunctionImportSignature { > + uint32_t signatureIndex; > + uint32_t functionImportIndex; > +}; > + > +struct WASMFunctionDeclaration { > + uint32_t signatureIndex; > +}; > + > +struct WASMFunctionPointerTable { > + uint32_t signatureIndex; > + Vector<uint32_t> elements; > +}; In general, we keep one file per class in WebKit. Since these structs are separate from JSWASMModule, they should belong somewhere else. It would be tedious to use one file per class for each of these structs, so let's use a shared file for all of the structs that define the format of a WASM tree. How about WASMFormat.h, as you suggested previously? > Source/JavaScriptCore/wasm/JSWASMModule.h:113 > + friend class WASMModuleParser; "friend" tends toward spaghetti code. We use classes to establish a separation of concerns, so that we can think about one class, and change it, without worrying about all the other code that uses it. "friend" erases that separation and makes our code more brittle. Instead of friend, in this case I recommend providing accessors methods for these underlying vectors. > Source/JavaScriptCore/wasm/WASMModuleParser.h:60 > + JSWASMModule* m_module; You need to do something to prevent m_module from being garbage collected.
Sukolsak Sakshuwong
Comment 4 2015-07-29 13:35:46 PDT
Thank you for the review. (In reply to comment #3) > > Source/JavaScriptCore/wasm/WASMModuleParser.h:60 > > + JSWASMModule* m_module; > > You need to do something to prevent m_module from being garbage collected. Since WASMModuleParser is allocated on the stack, doesn't that prevent m_module from being GC'd? Do I need to do something extra to protect it?
Geoffrey Garen
Comment 5 2015-07-30 14:13:45 PDT
> (In reply to comment #3) > > > Source/JavaScriptCore/wasm/WASMModuleParser.h:60 > > > + JSWASMModule* m_module; > > > > You need to do something to prevent m_module from being garbage collected. > > Since WASMModuleParser is allocated on the stack, doesn't that prevent > m_module from being GC'd? Do I need to do something extra to protect it? In practice, you can probably produce GC safety by allocating WASMModuleParser on the stack. But that's a dangerous thing to do because there's no way to prevent a future coder from moving WASMModuleParser to the heap, and in general, C++ is intentionally agnostic about whether an object was instantiated on the stack or the heap. I think a better solution is to use Strong<T>.
Sukolsak Sakshuwong
Comment 6 2015-07-31 00:08:53 PDT
Sukolsak Sakshuwong
Comment 7 2015-07-31 00:42:59 PDT
Thanks. (In reply to comment #3) > In general, we keep one file per class in WebKit. Since these structs are > separate from JSWASMModule, they should belong somewhere else. > > It would be tedious to use one file per class for each of these structs, so > let's use a shared file for all of the structs that define the format of a > WASM tree. How about WASMFormat.h, as you suggested previously? Fixed. > > Source/JavaScriptCore/wasm/JSWASMModule.h:113 > > + friend class WASMModuleParser; > > "friend" tends toward spaghetti code. We use classes to establish a > separation of concerns, so that we can think about one class, and change it, > without worrying about all the other code that uses it. "friend" erases that > separation and makes our code more brittle. > > Instead of friend, in this case I recommend providing accessors methods for > these underlying vectors. Fixed. (In reply to comment #5) > > (In reply to comment #3) > > > > Source/JavaScriptCore/wasm/WASMModuleParser.h:60 > > > > + JSWASMModule* m_module; > > > > > > You need to do something to prevent m_module from being garbage collected. > > > > Since WASMModuleParser is allocated on the stack, doesn't that prevent > > m_module from being GC'd? Do I need to do something extra to protect it? > > In practice, you can probably produce GC safety by allocating > WASMModuleParser on the stack. But that's a dangerous thing to do because > there's no way to prevent a future coder from moving WASMModuleParser to the > heap, and in general, C++ is intentionally agnostic about whether an object > was instantiated on the stack or the heap. > > I think a better solution is to use Strong<T>. Fixed. I guess one way to do it without using Strong<T> is probably JSWASMModule* WASMModuleParser::parse(...) { JSWASMModule* module = JSWASMModule::create(...); m_module = module; ... return module; } but it seems very hacky.
Saam Barati
Comment 8 2015-07-31 00:48:40 PDT
Comment on attachment 257910 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=257910&action=review > Source/JavaScriptCore/wasm/WASMReader.cpp:83 > + CHECK_READ(1); Why not do everything in the loop? > Source/JavaScriptCore/wasm/WASMReader.cpp:89 > + while (true) { What's stopping this from reading more than 4 bytes? > Source/JavaScriptCore/wasm/WASMReader.cpp:96 > + result |= (byte & 0x7f) << shift; Or maybe the limit is we can read at most 5 bytes? Can compact int values be larger than 2^28?
Sukolsak Sakshuwong
Comment 9 2015-07-31 02:40:28 PDT
Thanks. (In reply to comment #8) > Comment on attachment 257910 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=257910&action=review > > > Source/JavaScriptCore/wasm/WASMReader.cpp:83 > > + CHECK_READ(1); > > Why not do everything in the loop? It's just an optimization for small integers. If this seems like a micro-optimization, let me know and I will fix it. > > Source/JavaScriptCore/wasm/WASMReader.cpp:89 > > + while (true) { > > What's stopping this from reading more than 4 bytes? > > > Source/JavaScriptCore/wasm/WASMReader.cpp:96 > > + result |= (byte & 0x7f) << shift; > > Or maybe the limit is we can read at most 5 bytes? > Can compact int values be larger than 2^28? My first thought was that users couldn't do anything malicious with this anyway, because we always used CHECK_READ(), and unsigned integer overflow is well-defined. But I can see why this can be a problem. So, should we limit the number of bytes? Compact (LEB128) uint32 values can be up to 2^32 - 1. So, the limit should be 5 bytes. If we check the number of bytes, should we check as well that the fifth byte is less than 2^(32-28) = 2^4 to prevent an integer overflow?
Mark Lam
Comment 10 2015-07-31 14:22:25 PDT
Comment on attachment 257910 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=257910&action=review >>> Source/JavaScriptCore/wasm/WASMReader.cpp:83 >>> + CHECK_READ(1); >> >> Why not do everything in the loop? > > It's just an optimization for small integers. If this seems like a micro-optimization, let me know and I will fix it. I'm not sure that optimizing here will yield much difference because we're parsing values from an untrusted file, and we need to do all sorts of error checks anyway. I suggest implementing this all as follows: 1. Use a do-while loop because it is more succinct and easier to read. 2. Use a uint64_t local value instead of storing into the result directly. Using the local allows the compiler to put it in a register, and register operations are cheap. We only incur the memory write to result at the end. 3. When the computation is done, (1) break out of the loop, (2) validate that the computed value is within size of a uint32, and (3) copy the local value into the result. 4. The loop can loop while (value < numeric_limits<uint32_t>::max()). This ensures that we don't overflow our uint64_t local value and that the validation check in 3.2 will work as expected.
Sukolsak Sakshuwong
Comment 11 2015-07-31 15:57:17 PDT
Created attachment 257974 [details] Rewrote WASMReader::readCompactUInt32()
Sukolsak Sakshuwong
Comment 12 2015-07-31 16:10:25 PDT
Comment on attachment 257974 [details] Rewrote WASMReader::readCompactUInt32() Rewrote WASMReader::readCompactUInt32() - check that the number of bytes does not exceed 5 - check that the fifth byte is less than 2^(32 - 7*4) = 2^4 to prevent integer overflows - check that the last byte is not zero, unless it's the only byte of the number
Sukolsak Sakshuwong
Comment 13 2015-07-31 16:21:41 PDT
Thanks. (In reply to comment #10) > 1. Use a do-while loop because it is more succinct and easier to read. Done. > 2. Use a uint64_t local value instead of storing into the result directly. > Using the local allows the compiler to put it in a register, and register > operations are cheap. We only incur the memory write to result at the end. Used a local value. > 3. When the computation is done, > (1) break out of the loop, > (2) validate that the computed value is within size of a uint32, and > (3) copy the local value into the result. > > 4. The loop can loop while (value < numeric_limits<uint32_t>::max()). This > ensures that we don't overflow our uint64_t local value and that the > validation check in 3.2 will work as expected. As discussed on IRC, checking that (value < numeric_limits<uint32_t>::max()) is not enough, because the bytes could be 0x80 0x80 0x80 0x80 0x80 ... . Checking that the number of bytes doesn't exceed 5 and that the fifth byte is <= 0b00001111 should prevent that and also eliminate the need for uint64_t.
Sukolsak Sakshuwong
Comment 14 2015-08-03 16:04:53 PDT
Sukolsak Sakshuwong
Comment 15 2015-08-05 11:32:57 PDT
Created attachment 258288 [details] Override JSCell::destroy()
Saam Barati
Comment 16 2015-08-06 11:51:57 PDT
Comment on attachment 258288 [details] Override JSCell::destroy() View in context: https://bugs.webkit.org/attachment.cgi?id=258288&action=review Looks pretty good to me. This was a fly-by review. I'll look at it in more detail and look at the polyfill when I have more time. > Source/JavaScriptCore/wasm/JSWASMModule.h:57 > + Vector<uint32_t>& i32Constants() { return m_i32Constants; } Nit: These functions can all be marked as "const". > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:38 > +#define READ_UINT32_OR_FAIL(x, errorMessage) do { if (!m_reader.readUInt32(x)) FAIL_WITH_MESSAGE(errorMessage); } while (0) I think we should use better names than "x" here and below. Maybe like numConstants or inOutNumConstants? > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:108 > + m_module->i32Constants().append(constant); I think this (and all appends below) can be uncheckedAppend() since we're reserving their space up front. > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:257 > + // TODO: Support any functions. Nit: We tend to use FIXME instead of TODO Also, is there a bug open for this? If so, let's include the link to it here. If not, we should open one and link it here. > Source/JavaScriptCore/wasm/WASMReader.cpp:81 > +bool WASMReader::readCompactUInt32(uint32_t& result) I wonder if some of this would be easier to read if we wrote numbers in binary or gave then names. Like: uint8_t firstSevenBits = 0x7f; or uint8_t byteValueMask = 0x7f; Or maybe this is silly and we should just stick with hex.
Geoffrey Garen
Comment 17 2015-08-06 14:24:24 PDT
Comment on attachment 258288 [details] Override JSCell::destroy() View in context: https://bugs.webkit.org/attachment.cgi?id=258288&action=review r=me >> Source/JavaScriptCore/wasm/WASMModuleParser.cpp:108 >> + m_module->i32Constants().append(constant); > > I think this (and all appends below) can be uncheckedAppend() since we're reserving their space up front. I agree. > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:138 > + m_module->signatures().append(signature); Same here, and a few places below..
Sukolsak Sakshuwong
Comment 18 2015-08-06 16:01:15 PDT
Sukolsak Sakshuwong
Comment 19 2015-08-06 16:16:40 PDT
Thanks you for the review. (In reply to comment #16) > > Source/JavaScriptCore/wasm/JSWASMModule.h:57 > > + Vector<uint32_t>& i32Constants() { return m_i32Constants; } > > Nit: These functions can all be marked as "const". They cannot be marked as const, because I modify them. > > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:38 > > +#define READ_UINT32_OR_FAIL(x, errorMessage) do { if (!m_reader.readUInt32(x)) FAIL_WITH_MESSAGE(errorMessage); } while (0) > > I think we should use better names than "x" here and below. Maybe like > numConstants or inOutNumConstants? Done. Changed it to "result" to be consistent with the name I use in WASMReader. > > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:108 > > + m_module->i32Constants().append(constant); > > I think this (and all appends below) can be uncheckedAppend() since we're > reserving their space up front. Done. Also added a check for numberOfSignatures in WASMModuleParser::parseFunctionImportSection() for safety. > > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:257 > > + // TODO: Support any functions. > > Nit: We tend to use FIXME instead of TODO > Also, is there a bug open for this? If so, let's include the link to it here. > If not, we should open one and link it here. Done. Changed it to TODO and added a link to https://bugs.webkit.org/show_bug.cgi?id=147738 > > Source/JavaScriptCore/wasm/WASMReader.cpp:81 > > +bool WASMReader::readCompactUInt32(uint32_t& result) > > I wonder if some of this would be easier to read if we wrote numbers in > binary or gave then names. > Like: > uint8_t firstSevenBits = 0x7f; > or > uint8_t byteValueMask = 0x7f; > > Or maybe this is silly and we should just stick with hex. Done. Added "static const uint32_t firstSevenBitsMask = 0x7f;". This will be used again when I add a signed version of compact ints. I agree that it would be easier to read if it's in binary, but I can't find 0b... used anywhere other than in comments. I'm not sure what our policy is on binary literals. (In reply to comment #17) > > Source/JavaScriptCore/wasm/WASMModuleParser.cpp:138 > > + m_module->signatures().append(signature); > > Same here, and a few places below.. Done.
Sukolsak Sakshuwong
Comment 20 2015-08-06 16:18:59 PDT
(In reply to comment #19) > Done. Changed it to TODO and added a link to > https://bugs.webkit.org/show_bug.cgi?id=147738 Sorry, I meant "Changed it to FIXME."
Geoffrey Garen
Comment 21 2015-08-06 16:31:46 PDT
Comment on attachment 258411 [details] Patch r=me
WebKit Commit Bot
Comment 22 2015-08-06 17:22:01 PDT
Comment on attachment 258411 [details] Patch Clearing flags on attachment: 258411 Committed r188099: <http://trac.webkit.org/changeset/188099>
WebKit Commit Bot
Comment 23 2015-08-06 17:22:04 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.