RESOLVED FIXED Bug 161866
Undefined behavior: Left shift negative number
https://bugs.webkit.org/show_bug.cgi?id=161866
Summary Undefined behavior: Left shift negative number
Jonathan Bedard
Reported 2016-09-12 10:13:49 PDT
Left shifting a negative number is undefined behavior in C/C++, although most implementations do defined this behavior. The following files left shift negative numbers: JavaScriptCore/dfg/DFGAbstractHeap.h
Attachments
Patch (5.75 KB, patch)
2016-09-12 13:32 PDT, Jonathan Bedard
no flags
Patch (7.17 KB, patch)
2016-09-15 09:58 PDT, Jonathan Bedard
no flags
Patch (7.94 KB, patch)
2016-09-15 13:33 PDT, Jonathan Bedard
no flags
Patch (8.04 KB, patch)
2016-09-20 09:00 PDT, Jonathan Bedard
no flags
Jonathan Bedard
Comment 1 2016-09-12 13:32:12 PDT
Jonathan Bedard
Comment 2 2016-09-12 13:41:06 PDT
wtf/text/Base64.cpp also left shifts negative numbers. It is possible that there are more cases of this, since the undefined behavior sanitizer works at runtime and there are other issues preventing a more complete search of WebKit for left shifting of negative numbers.
JF Bastien
Comment 3 2016-09-13 21:31:39 PDT
Comment on attachment 288605 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288605&action=review Does this hit when compiling + running with ubsan? > Source/JavaScriptCore/dfg/DFGAbstractHeap.h:301 > + return kindAsInt | static_cast<int64_t>(static_cast<int64_t>(payload.isTop()) << topShift) | (static_cast<int64_t>(payload.valueImpl()) << valueShift); isTop() returns bool, it seems better to cast it to uint64_t. valueImpl() is already int64_t, do you have a repo of when it is negative? It seems good to at least assert that the bits being shifted out are all zero or all one. > Source/WTF/wtf/text/Base64.cpp:119 > + const unsigned char* udata = reinterpret_cast<const unsigned char*>(data); You should change the function signature of base64EncodeInternal instead, it's only used in 4 places below and is cast from void* to char*. These casts need to be updated as well. Though either one isn't great because char* has magical meaning to C++ aliasing which unsigned char* doesn't in either solution. The other alternative is to case each data[] value about to be left-shifted to an unsigned. I'd rather take one of these approaches and not reinterpret_cast here. > Source/WTF/wtf/text/Base64.cpp:248 > + unsigned char* uout = reinterpret_cast<unsigned char*>(out.data()); Same here.
Jonathan Bedard
Comment 4 2016-09-14 08:26:28 PDT
Comment on attachment 288605 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288605&action=review >> Source/JavaScriptCore/dfg/DFGAbstractHeap.h:301 >> + return kindAsInt | static_cast<int64_t>(static_cast<int64_t>(payload.isTop()) << topShift) | (static_cast<int64_t>(payload.valueImpl()) << valueShift); > > isTop() returns bool, it seems better to cast it to uint64_t. > valueImpl() is already int64_t, do you have a repo of when it is negative? It seems good to at least assert that the bits being shifted out are all zero or all one. Actually, both should be uint64. valueImpl() can (in some cases) be negative.
JF Bastien
Comment 5 2016-09-14 10:22:00 PDT
Comment on attachment 288605 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288605&action=review >>> Source/JavaScriptCore/dfg/DFGAbstractHeap.h:301 >>> + return kindAsInt | static_cast<int64_t>(static_cast<int64_t>(payload.isTop()) << topShift) | (static_cast<int64_t>(payload.valueImpl()) << valueShift); >> >> isTop() returns bool, it seems better to cast it to uint64_t. >> valueImpl() is already int64_t, do you have a repo of when it is negative? It seems good to at least assert that the bits being shifted out are all zero or all one. > > Actually, both should be uint64. valueImpl() can (in some cases) be negative. So valueImpl should be bitwise_cast to uint64, and top bits debug-assert-checked to make sure information isn't shifted out.
Jonathan Bedard
Comment 6 2016-09-15 09:58:35 PDT
Jonathan Bedard
Comment 7 2016-09-15 10:06:47 PDT
Information is shifted out, and was shifted out in previous implementations. A debug check would be frequently triggered, and I believe that it is expected that some data is shifted out. accessibility/media-element.html provides an example of a test which will attempt to shift a negative payload.valueImpl() (on line 301 of JavaScriptCore/dfg/DFGAbstractHeap.h)
JF Bastien
Comment 8 2016-09-15 10:10:23 PDT
Comment on attachment 288964 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288964&action=review > Source/JavaScriptCore/dfg/DFGAbstractHeap.h:301 > + return kindAsInt | (static_cast<uint64_t>(payload.isTop()) << topShift) | (bitwise_cast<uint64_t>(payload.valueImpl()) << valueShift); This relies on 2 things: - valueImpl goes in the very top of the packed value (so sext bits don't mask out kind nor isTop). - valueImpl's representation has insignificant 15 bits at the top. I'd like to assert on the second assumption, because it's dynamic: auto valueImpl = bitwise_cast<uint64_t>(payload.valueImpl(); ASSERT(payload.valueImpl() == ((payload.valueImpl() << valueShift) >> valueShift)); return kindAsInt | (static_cast<uint64_t>(payload.isTop()) << topShift) | (valueImpl << valueShift)); > Source/WTF/ChangeLog:34 > + ChangeLog needs to be merged. > Source/WTF/wtf/text/Base64.h:55 > + void clear() { m_vector.u->clear(); } Is the vector ever stored as a char and retrieved as a uint8_t (or the other way around)? If so then this is also undefined behavior: the union should only be accessed with its active member (roughly: the last one stored to the union).
JF Bastien
Comment 9 2016-09-15 10:13:37 PDT
(In reply to comment #7) > Information is shifted out, and was shifted out in previous implementations. > A debug check would be frequently triggered, and I believe that it is > expected that some data is shifted out. > > accessibility/media-element.html provides an example of a test which will > attempt to shift a negative payload.valueImpl() (on line 301 of > JavaScriptCore/dfg/DFGAbstractHeap.h) Sorry, replied without seeing this. Do you mean that sext bits are shifted out? That's fine, the assert I suggest above shouldn't trigger when this happens. If significant bits are shifted out then isn't that a correctness issue? Sorry for asking so many questions, I simply don't know the codebase well enough yet :-)
Jonathan Bedard
Comment 10 2016-09-15 10:48:57 PDT
(In reply to comment #9) > > ... > I am going to dig a little bit deeper here to see why we're getting negative values at all. In accessibility/media-element.html (just as an example, there are other that do this too) the payload value at one point is -7, which is 0xFFFFFFFFFFFFFFF9 when converted to unsigned. Obviously this is going to Currently, things seem to be working well, so throwing out the top 15 bits seems to be the intended behavior (although I agree that it seems weird) And as for the union bit, the previous version of the class was built as a conversion class (essentially a bitwise_cast) converting an unsigned char vector into a signed char vector. I need to ask around because you're right, using unions to preform bitwise_casts is undefined in the standard (although defined in all relevant implementations) but we use this technique often in WebKit (actually, take a look at our implementation of bitwise_cast, it actually uses a union to preform the cast).
JF Bastien
Comment 11 2016-09-15 11:45:30 PDT
(In reply to comment #10) > (In reply to comment #9) > > > > ... > > > > I am going to dig a little bit deeper here to see why we're getting negative > values at all. In accessibility/media-element.html (just as an example, > there are other that do this too) the payload value at one point is -7, > which is 0xFFFFFFFFFFFFFFF9 when converted to unsigned. Obviously this is > going to Currently, things seem to be working well, so throwing out the > top 15 bits seems to be the intended behavior (although I agree that it > seems weird) OK cool, ty. That could be done separately IMO: it may be intended, and the assert won't fire. > And as for the union bit, the previous version of the class was built as a > conversion class (essentially a bitwise_cast) converting an unsigned char > vector into a signed char vector. I need to ask around because you're > right, using unions to preform bitwise_casts is undefined in the standard > (although defined in all relevant implementations) but we use this technique > often in WebKit (actually, take a look at our implementation of > bitwise_cast, it actually uses a union to preform the cast). It doesn't anymore, I fixed it ;-)
Jonathan Bedard
Comment 12 2016-09-15 13:33:43 PDT
Jonathan Bedard
Comment 13 2016-09-15 13:35:30 PDT
This patch fixes the union issues, although it still leaves out the ASSERT because that (currently) will cause test failures. I will continue investigating to figure out if the negative integer is expected behavior in the first place, and if it is not, I will put in the ASSERT.
JF Bastien
Comment 14 2016-09-19 22:49:59 PDT
Comment on attachment 288990 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288990&action=review I think some of the diff is missing? > Source/WTF/wtf/text/Base64.h:80 > + m_vector.u->clear(); The non-return cases are wrong (below as well).
Jonathan Bedard
Comment 15 2016-09-20 09:00:54 PDT
JF Bastien
Comment 16 2016-09-20 09:12:28 PDT
lgtm. I don't have review access, so you'll need someone else to r+.
Keith Miller
Comment 17 2016-09-20 09:29:42 PDT
Comment on attachment 289355 [details] Patch r=me.
WebKit Commit Bot
Comment 18 2016-09-20 10:16:13 PDT
Comment on attachment 289355 [details] Patch Clearing flags on attachment: 289355 Committed r206151: <http://trac.webkit.org/changeset/206151>
WebKit Commit Bot
Comment 19 2016-09-20 10:16:19 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.