Bug 161866

Summary: Undefined behavior: Left shift negative number
Product: WebKit Reporter: Jonathan Bedard <jbedard>
Component: JavaScriptCoreAssignee: Jonathan Bedard <jbedard>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, benjamin, cdumez, cmarcelo, commit-queue, dbates, fpizlo, jfbastien, keith_miller, mark.lam, msaboff, saam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch none

Description Jonathan Bedard 2016-09-12 10:13:49 PDT
Left shifting a negative number is undefined behavior in C/C++, although most implementations do defined this behavior.

The following files left shift negative numbers:
JavaScriptCore/dfg/DFGAbstractHeap.h
Comment 1 Jonathan Bedard 2016-09-12 13:32:12 PDT
Created attachment 288605 [details]
Patch
Comment 2 Jonathan Bedard 2016-09-12 13:41:06 PDT
wtf/text/Base64.cpp also left shifts negative numbers.  It is possible that there are more cases of this, since the undefined behavior sanitizer works at runtime and there are other issues preventing a more complete search of WebKit for left shifting of negative numbers.
Comment 3 JF Bastien 2016-09-13 21:31:39 PDT
Comment on attachment 288605 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288605&action=review

Does this hit when compiling + running with ubsan?

> Source/JavaScriptCore/dfg/DFGAbstractHeap.h:301
> +        return kindAsInt | static_cast<int64_t>(static_cast<int64_t>(payload.isTop()) << topShift) | (static_cast<int64_t>(payload.valueImpl()) << valueShift);

isTop() returns bool, it seems better to cast it to uint64_t.
valueImpl() is already int64_t, do you have a repo of when it is negative? It seems good to at least assert that the bits being shifted out are all zero or all one.

> Source/WTF/wtf/text/Base64.cpp:119
> +    const unsigned char* udata = reinterpret_cast<const unsigned char*>(data);

You should change the function signature of base64EncodeInternal instead, it's only used in 4 places below and is cast from void* to char*. These casts need to be updated as well. Though either one isn't great because char* has magical meaning to C++ aliasing which unsigned char* doesn't in either solution. The other alternative is to case each data[] value about to be left-shifted to an unsigned. I'd rather take one of these approaches and not reinterpret_cast here.

> Source/WTF/wtf/text/Base64.cpp:248
> +    unsigned char* uout = reinterpret_cast<unsigned char*>(out.data());

Same here.
Comment 4 Jonathan Bedard 2016-09-14 08:26:28 PDT
Comment on attachment 288605 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288605&action=review

>> Source/JavaScriptCore/dfg/DFGAbstractHeap.h:301
>> +        return kindAsInt | static_cast<int64_t>(static_cast<int64_t>(payload.isTop()) << topShift) | (static_cast<int64_t>(payload.valueImpl()) << valueShift);
> 
> isTop() returns bool, it seems better to cast it to uint64_t.
> valueImpl() is already int64_t, do you have a repo of when it is negative? It seems good to at least assert that the bits being shifted out are all zero or all one.

Actually, both should be uint64.  valueImpl() can (in some cases) be negative.
Comment 5 JF Bastien 2016-09-14 10:22:00 PDT
Comment on attachment 288605 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288605&action=review

>>> Source/JavaScriptCore/dfg/DFGAbstractHeap.h:301
>>> +        return kindAsInt | static_cast<int64_t>(static_cast<int64_t>(payload.isTop()) << topShift) | (static_cast<int64_t>(payload.valueImpl()) << valueShift);
>> 
>> isTop() returns bool, it seems better to cast it to uint64_t.
>> valueImpl() is already int64_t, do you have a repo of when it is negative? It seems good to at least assert that the bits being shifted out are all zero or all one.
> 
> Actually, both should be uint64.  valueImpl() can (in some cases) be negative.

So valueImpl should be bitwise_cast to uint64, and top bits debug-assert-checked to make sure information isn't shifted out.
Comment 6 Jonathan Bedard 2016-09-15 09:58:35 PDT
Created attachment 288964 [details]
Patch
Comment 7 Jonathan Bedard 2016-09-15 10:06:47 PDT
Information is shifted out, and was shifted out in previous implementations.  A debug check would be frequently triggered, and I believe that it is expected that some data is shifted out.

accessibility/media-element.html provides an example of a test which will attempt to shift a negative payload.valueImpl() (on line 301 of JavaScriptCore/dfg/DFGAbstractHeap.h)
Comment 8 JF Bastien 2016-09-15 10:10:23 PDT
Comment on attachment 288964 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288964&action=review

> Source/JavaScriptCore/dfg/DFGAbstractHeap.h:301
> +        return kindAsInt | (static_cast<uint64_t>(payload.isTop()) << topShift) | (bitwise_cast<uint64_t>(payload.valueImpl()) << valueShift);

This relies on 2 things:
 - valueImpl goes in the very top of the packed value (so sext bits don't mask out kind nor isTop).
 - valueImpl's representation has insignificant 15 bits at the top.

I'd like to assert on the second assumption, because it's dynamic:
auto valueImpl = bitwise_cast<uint64_t>(payload.valueImpl();
ASSERT(payload.valueImpl() == ((payload.valueImpl() << valueShift) >> valueShift));
return kindAsInt | (static_cast<uint64_t>(payload.isTop()) << topShift) | (valueImpl << valueShift));

> Source/WTF/ChangeLog:34
> +

ChangeLog needs to be merged.

> Source/WTF/wtf/text/Base64.h:55
> +    void clear() { m_vector.u->clear(); }

Is the vector ever stored as a char and retrieved as a uint8_t (or the other way around)? If so then this is also undefined behavior: the union should only be accessed with its active member (roughly: the last one stored to the union).
Comment 9 JF Bastien 2016-09-15 10:13:37 PDT
(In reply to comment #7)
> Information is shifted out, and was shifted out in previous implementations.
> A debug check would be frequently triggered, and I believe that it is
> expected that some data is shifted out.
> 
> accessibility/media-element.html provides an example of a test which will
> attempt to shift a negative payload.valueImpl() (on line 301 of
> JavaScriptCore/dfg/DFGAbstractHeap.h)

Sorry, replied without seeing this.

Do you mean that sext bits are shifted out? That's fine, the assert I suggest above shouldn't trigger when this happens.

If significant bits are shifted out then isn't that a correctness issue?

Sorry for asking so many questions, I simply don't know the codebase well enough yet :-)
Comment 10 Jonathan Bedard 2016-09-15 10:48:57 PDT
(In reply to comment #9)
> 
> ...
>

I am going to dig a little bit deeper here to see why we're getting negative values at all.  In accessibility/media-element.html (just as an example, there are other that do this too) the payload value at one point is -7, which is 0xFFFFFFFFFFFFFFF9 when converted to unsigned.  Obviously this is going to   Currently, things seem to be working well, so throwing out the top 15 bits seems to be the intended behavior (although I agree that it seems weird)

And as for the union bit, the previous version of the class was built as a conversion class (essentially a bitwise_cast) converting an unsigned char vector into a signed char vector.  I need to ask around because you're right, using unions to preform bitwise_casts is undefined in the standard (although defined in all relevant implementations) but we use this technique often in WebKit (actually, take a look at our implementation of bitwise_cast, it actually uses a union to preform the cast).
Comment 11 JF Bastien 2016-09-15 11:45:30 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > 
> > ...
> >
> 
> I am going to dig a little bit deeper here to see why we're getting negative
> values at all.  In accessibility/media-element.html (just as an example,
> there are other that do this too) the payload value at one point is -7,
> which is 0xFFFFFFFFFFFFFFF9 when converted to unsigned.  Obviously this is
> going to   Currently, things seem to be working well, so throwing out the
> top 15 bits seems to be the intended behavior (although I agree that it
> seems weird)

OK cool, ty. That could be done separately IMO: it may be intended, and the assert won't fire.

> And as for the union bit, the previous version of the class was built as a
> conversion class (essentially a bitwise_cast) converting an unsigned char
> vector into a signed char vector.  I need to ask around because you're
> right, using unions to preform bitwise_casts is undefined in the standard
> (although defined in all relevant implementations) but we use this technique
> often in WebKit (actually, take a look at our implementation of
> bitwise_cast, it actually uses a union to preform the cast).

It doesn't anymore, I fixed it ;-)
Comment 12 Jonathan Bedard 2016-09-15 13:33:43 PDT
Created attachment 288990 [details]
Patch
Comment 13 Jonathan Bedard 2016-09-15 13:35:30 PDT
This patch fixes the union issues, although it still leaves out the ASSERT because that (currently) will cause test failures.

I will continue investigating to figure out if the negative integer is expected behavior in the first place, and if it is not, I will put in the ASSERT.
Comment 14 JF Bastien 2016-09-19 22:49:59 PDT
Comment on attachment 288990 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=288990&action=review

I think some of the diff is missing?

> Source/WTF/wtf/text/Base64.h:80
> +        m_vector.u->clear();

The non-return cases are wrong (below as well).
Comment 15 Jonathan Bedard 2016-09-20 09:00:54 PDT
Created attachment 289355 [details]
Patch
Comment 16 JF Bastien 2016-09-20 09:12:28 PDT
lgtm. I don't have review access, so you'll need someone else to r+.
Comment 17 Keith Miller 2016-09-20 09:29:42 PDT
Comment on attachment 289355 [details]
Patch

r=me.
Comment 18 WebKit Commit Bot 2016-09-20 10:16:13 PDT
Comment on attachment 289355 [details]
Patch

Clearing flags on attachment: 289355

Committed r206151: <http://trac.webkit.org/changeset/206151>
Comment 19 WebKit Commit Bot 2016-09-20 10:16:19 PDT
All reviewed patches have been landed.  Closing bug.