Bug 142513

Summary: 8-bit version of weakCompareAndSwap() can cause an infinite loop
Product: WebKit Reporter: Mark Lam <mark.lam>
Component: Web Template FrameworkAssignee: Mark Lam <mark.lam>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, bfulgham, cmarcelo, commit-queue, fpizlo, ggaren, kevleyski, mmirman, msaboff, ossy, rniwa, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 141290    
Attachments:
Description Flags
the patch.
mark.lam: review-
take 2: with fixes for windows project. Make doesn't seem to build testapi.
fpizlo: review-, fpizlo: commit-queue-
take 3: splice the expectedValue and newValue using the same oldValue
none
take 4: fixed missing volatile
fpizlo: review+
follow up: Windows doesn't like pthreads
achristensen: review+
Speculative fix for Windows build.
none
Build fix for Windows. This time, I've tested this on a local Windows box. achristensen: review+

Description Mark Lam 2015-03-09 16:18:04 PDT
Presently, Bitmap::concurrentTestAndSet() uses the 8-bit version of weakCompareAndSwap() (which compares and swaps an uint8_t value).  Bitmap::concurrentTestAndSet() has a loop that checks if a bit in the byte of interest has been set.  If not, it will call the 8-bit CAS function to set the bit.

Under the covers, the 8-bit CAS function actually works with a 32-bit CAS.  The 8-bit CAS will first fetch the 32-bit value in memory that contains the 8-bit value, and check if it contains the expected byte.  If the value in memory doesn't have the expected byte, it will return early to its caller.  The expectation is that the caller will reload the byte from memory and call the 8-bit CAS again.

Unfortunately, this code path that returns early does not have a compiler fence.  Without that compiler fence, the C++ compiler can optimize away the reloading of the expected byte value, leaving it unchanged.  As a result, we'll have a infinite loop here that checks a value that will never change, and the loop will not terminate until the value changes.

The fix is to eliminate the early return check in the 8-bit CAS, and have it always call down to the 32-bit CAS.  The 32-bit CAS has a compiler fence which will prevent this issue.
Comment 1 Mark Lam 2015-03-09 16:19:09 PDT
<rdar://problem/20028910>
Comment 2 Mark Lam 2015-03-09 17:36:41 PDT
Created attachment 248298 [details]
the patch.
Comment 3 Mark Lam 2015-03-09 17:51:33 PDT
Comment on attachment 248298 [details]
the patch.

Need to add test file to other ports.
Comment 4 Mark Lam 2015-03-09 17:59:28 PDT
Created attachment 248303 [details]
take 2: with fixes for windows project.  Make doesn't seem to build testapi.
Comment 5 Filip Pizlo 2015-03-09 18:20:46 PDT
Comment on attachment 248303 [details]
take 2: with fixes for windows project.  Make doesn't seem to build testapi.

View in context: https://bugs.webkit.org/attachment.cgi?id=248303&action=review

> Source/WTF/wtf/Atomics.h:335
> +    struct CompositeValue {
> +    public:
> +        CompositeValue(volatile unsigned* baseWordAddr, uint8_t replacementByte, uintptr_t replacementIndex)
> +        {
> +            u.word = *baseWordAddr;
> +            u.bytes[replacementIndex] = replacementByte;
> +        }
> +
> +        unsigned word() { return u.word; }
> +
> +    private:
> +        union {
> +            unsigned word;
> +            uint8_t bytes[sizeof(unsigned)];
> +        } u;
> +    };
> +    
> +    CompositeValue oldAlignedValue(alignedLocation, expected, locationOffset);
> +    CompositeValue newAlignedValue(alignedLocation, newValue, locationOffset);

This is wrong.  You load from *alignedLocation twice: once t get the side-bytes of the old value and once to get the side-bytes of the new value.  This has a race.  If the bytes other than the ones you are CASing inside the aligned word change between the two loads, you could cause badness.

I suggest not using the CompositeValue helper.  Instead, all that this needs is a compiler fence before the load of the oldAlignedValue in the original code.
Comment 6 Brent Fulgham 2015-03-09 18:23:27 PDT
We're missing the path to "config.h" on the Windows project settings for testapi. I'll land a fix for that independent of this change.
Comment 7 Filip Pizlo 2015-03-09 18:49:11 PDT
(In reply to comment #5)
> Comment on attachment 248303 [details]
> take 2: with fixes for windows project.  Make doesn't seem to build testapi.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=248303&action=review
> 
> > Source/WTF/wtf/Atomics.h:335
> > +    struct CompositeValue {
> > +    public:
> > +        CompositeValue(volatile unsigned* baseWordAddr, uint8_t replacementByte, uintptr_t replacementIndex)
> > +        {
> > +            u.word = *baseWordAddr;
> > +            u.bytes[replacementIndex] = replacementByte;
> > +        }
> > +
> > +        unsigned word() { return u.word; }
> > +
> > +    private:
> > +        union {
> > +            unsigned word;
> > +            uint8_t bytes[sizeof(unsigned)];
> > +        } u;
> > +    };
> > +    
> > +    CompositeValue oldAlignedValue(alignedLocation, expected, locationOffset);
> > +    CompositeValue newAlignedValue(alignedLocation, newValue, locationOffset);
> 
> This is wrong.  You load from *alignedLocation twice: once t get the
> side-bytes of the old value and once to get the side-bytes of the new value.
> This has a race.  If the bytes other than the ones you are CASing inside the
> aligned word change between the two loads, you could cause badness.
> 
> I suggest not using the CompositeValue helper.  Instead, all that this needs
> is a compiler fence before the load of the oldAlignedValue in the original
> code.

Here's the race that breaks your code:

char buf[4];

// initial state
buf[0] = 0;
buf[1] = 0;
buf[2] = 0;
buf[3] = 0;

Thread #1 takes the following steps:
T1: 1) CAS(&buf[0], 0, 1);

Thread #2 takes the following steps:
T2: 1) buf[1] = 1;
T2: 2) buf[1] = 0;

We would expect the following to be valid outcomes of executing this program, regardless of the three possible interleavings.  Here, "T1.1->T2.1" implies a particular sequencing of executions, so in particular it means that "T1 executes step 1 (i.e. the CAS) and then T2 executes step 1 (i.e. buf[1] = 1)".

T1.1->T2.1->T2.2: buf == {1, 0, 0, 0}
T2.1->T1.1->t2.2: buf == {1, 0, 0, 0}
T2.1->T2.2->T1.1: buf == {1, 0, 0, 0}

In particular there is no valid outcome with buf[1] != 0.

But if the 1-byte CAS does what you do, then it actually has two loads and a CAS.  The first load influences what we compare and the second load influences what we set the word to.  Let's say that the CAS is decomposed into three steps:

a) load the whole word, put in the expected byte
b) load the whole word, put in the newValue byte
c) do the word CAS

This means that we will have the following evil interleaving:

T1.1.a->T2.1->T2.2->T1.1.b->T1.1.c

After T1.1.a executes, the expected value word will be {0, 0, 0, 0}. After T2.1, buf[1] == 1, but then T2.2 runs and then we have buf[1] == 0 again. Now T1.1.b runs and sets the new value word to {1, 1, 0, 0}. Then T1.1.c runs, and we CAS with expected = {0, 0, 0, 0} and newValue = {1, 1, 0, 0}.  So the result is {1, 1, 0, 0,} which is wrong.
Comment 8 Filip Pizlo 2015-03-09 18:59:52 PDT
I believe the composite value thing doesn't produce a net benefit. I think it makes it easier to make the sort of mistake that you made. CAS implementations should be as straight-line and free of abstractions as possible because we want to be able to see, by glancing at the code, how many loads there are. I believe that a correct implementation must only involve one load. The abstraction via CompositeLoad means that the code appears to have one load but you only realize that it executes twice by thinking about it a bit more carefully.

There are two possible changes that make the current trunk code correct:

1) Put a compiler fence before and after the whole 8-byte CAS function - that is, the early return would instead become a control flow diamond and the whole 8-byte CAS function would look something like this:

compilerFence();
... address math
oldAlignedValue = *load;
if (u.bytes[thingy] == stuff)
    result = false;
else {
    ...
    result = CAS(...);
}
compilerFence();
return result;

The reason for both compiler fences is to prevent code motion into or out of the CAS implementation. The top compiler fence prevents hoisting the load. The bottom compiler fence prevents hoisting a store into the fast-exit path.

This doesn't prevent any optimizations other than those that would be invalid.

2) Use an unconditional CAS like your change does, but without CompositeValue. You could just construct oldValue and newValue by storing a byte into the union and then loading the word out. This kind of very C-like simple implementation is trivial to verify by visual inspection. The upside is that it's easier to reason about why it's right. But there is a downside: the splicing of the old value, which trunk does not do, is probably just as expensive as that byte check. And if the check would fail (early return) in trunk, this new modified code would have to execute a full CAS. Also, I suspect that we still would need a compiler fence at the top to prevent removing the load. I worry that a clever compiler could CSE the load with an earlier load, and that could cause some subtle issues.

I don't really care much if we use (1) or (2). But the CompositeValue abstraction feels dangerous to me - the less abstraction this code has, the more obvious it will be what the interleavings could be.
Comment 9 Mark Lam 2015-03-09 19:09:44 PDT
Created attachment 248307 [details]
take 3: splice the expectedValue and newValue using the same oldValue
Comment 10 Mark Lam 2015-03-09 19:11:43 PDT
Created attachment 248308 [details]
take 4: fixed missing volatile
Comment 11 Mark Lam 2015-03-09 19:12:52 PDT
Comment on attachment 248308 [details]
take 4: fixed missing volatile

View in context: https://bugs.webkit.org/attachment.cgi?id=248308&action=review

> Source/WTF/wtf/Atomics.h:-316
> -    // Make sure that this load is always issued and never optimized away.

I should probably restore this comment.
Comment 12 Filip Pizlo 2015-03-09 19:12:55 PDT
Comment on attachment 248308 [details]
take 4: fixed missing volatile

View in context: https://bugs.webkit.org/attachment.cgi?id=248308&action=review

> Source/WTF/wtf/Atomics.h:-316
> -    // Make sure that this load is always issued and never optimized away.

Why remove this comment?  I think it's useful.
Comment 13 Mark Lam 2015-03-09 19:21:18 PDT
Thanks for the review.  The comment has been restored.

Landed in r181305: <http://trac.webkit.org/r181305>.
Comment 14 Ryosuke Niwa 2015-03-09 20:13:16 PDT
(In reply to comment #13)
> Thanks for the review.  The comment has been restored.
> 
> Landed in r181305: <http://trac.webkit.org/r181305>.

This patch broke Windows builds:

  CompareAndSwapTest.cpp
..\..\API\tests\CompareAndSwapTest.cpp(95): error C3861: 'pthread_exit': identifier not found [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
..\..\API\tests\CompareAndSwapTest.cpp(102): error C2065: 'pthread_t' : undeclared identifier [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
..\..\API\tests\CompareAndSwapTest.cpp(102): error C2146: syntax error : missing ';' before identifier 'threadIDs' [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
..\..\API\tests\CompareAndSwapTest.cpp(102): error C2065: 'threadIDs' : undeclared identifier [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
..\..\API\tests\CompareAndSwapTest.cpp(110): error C2065: 'threadIDs' : undeclared identifier [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
..\..\API\tests\CompareAndSwapTest.cpp(110): error C3861: 'pthread_create': identifier not found [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
..\..\API\tests\CompareAndSwapTest.cpp(115): error C2065: 'threadIDs' : undeclared identifier [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
..\..\API\tests\CompareAndSwapTest.cpp(115): error C3861: 'pthread_join': identifier not found [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
Comment 15 Mark Lam 2015-03-09 20:22:39 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > Thanks for the review.  The comment has been restored.
> > 
> > Landed in r181305: <http://trac.webkit.org/r181305>.
> 
> This patch broke Windows builds:
> 
>   CompareAndSwapTest.cpp
> ..\..\API\tests\CompareAndSwapTest.cpp(95): error C3861: 'pthread_exit':
> identifier not found
> [C:\cygwin\home\buildbot\slave\win-
> debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.
> vcxproj]
> ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2065: 'pthread_t' :
> undeclared identifier
> [C:\cygwin\home\buildbot\slave\win-
> debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.
> vcxproj]
> ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2146: syntax error :
> missing ';' before identifier 'threadIDs'
> [C:\cygwin\home\buildbot\slave\win-
> debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.
> vcxproj]
> ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2065: 'threadIDs' :
> undeclared identifier
> [C:\cygwin\home\buildbot\slave\win-
> debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.
> vcxproj]
> ..\..\API\tests\CompareAndSwapTest.cpp(110): error C2065: 'threadIDs' :
> undeclared identifier
> [C:\cygwin\home\buildbot\slave\win-
> debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.
> vcxproj]
> ..\..\API\tests\CompareAndSwapTest.cpp(110): error C3861: 'pthread_create':
> identifier not found
> [C:\cygwin\home\buildbot\slave\win-
> debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.
> vcxproj]
> ..\..\API\tests\CompareAndSwapTest.cpp(115): error C2065: 'threadIDs' :
> undeclared identifier
> [C:\cygwin\home\buildbot\slave\win-
> debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.
> vcxproj]
> ..\..\API\tests\CompareAndSwapTest.cpp(115): error C3861: 'pthread_join':
> identifier not found
> [C:\cygwin\home\buildbot\slave\win-
> debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.
> vcxproj]

Yeah, Brent just told me this on the way out.  I’m working on a fix.
Comment 16 Mark Lam 2015-03-09 21:05:09 PDT
Created attachment 248314 [details]
follow up: Windows doesn't like pthreads
Comment 17 Mark Lam 2015-03-09 21:23:11 PDT
Windows fix landed in r181308: <http://trac.webkit.org/r181308>.
Comment 18 Mark Lam 2015-03-09 21:37:39 PDT
Created attachment 248316 [details]
Speculative fix for Windows build.
Comment 19 Mark Lam 2015-03-09 21:43:03 PDT
The EWS bots are not cooperating.  I went ahead and landed the speculative Windows fix in r181309: <http://trac.webkit.org/r181309>.
Comment 20 Mark Lam 2015-03-09 23:40:04 PDT
Created attachment 248318 [details]
Build fix for Windows.  This time, I've tested this on a local Windows box.
Comment 21 Mark Lam 2015-03-09 23:57:48 PDT
Final build fix for Windows landed in r181315: <http://trac.webkit.org/r181315>.
Comment 22 Csaba Osztrogonác 2015-03-10 01:04:19 PDT
(In reply to comment #17)
> Windows fix landed in r181308: <http://trac.webkit.org/r181308>.

It broke the JSC API tests on the Apple bots.
Comment 23 Mark Lam 2015-03-10 01:06:25 PDT
(In reply to comment #22)
> (In reply to comment #17)
> > Windows fix landed in r181308: <http://trac.webkit.org/r181308>.
> 
> It broke the JSC API tests on the Apple bots.

Which ones?  Can you provide a link please?
Comment 24 Csaba Osztrogonác 2015-03-10 01:07:07 PDT
(In reply to comment #23)
> (In reply to comment #22)
> > (In reply to comment #17)
> > > Windows fix landed in r181308: <http://trac.webkit.org/r181308>.
> > 
> > It broke the JSC API tests on the Apple bots.
> 
> Which ones?  Can you provide a link please?

All Apple Mac bots on build.webkit.org
Comment 25 Mark Lam 2015-03-10 01:48:53 PDT
(In reply to comment #24)
> All Apple Mac bots on build.webkit.org

I wasn’t seeing this on the dashboard except for the Mavericks debug bots.  Anyway, it’s a trivial test only fix.  

Landed in r181319: <http://trac.webkit.org/r181319>.
Comment 26 Csaba Osztrogonác 2015-03-10 04:09:11 PDT
(In reply to comment #25)
> (In reply to comment #24)
> > All Apple Mac bots on build.webkit.org
> 
> I wasn’t seeing this on the dashboard except for the Mavericks debug bots. 
> Anyway, it’s a trivial test only fix.  
Ah, because the JSC test coverage is poor on the dashboard:
- CLOOP and 32 bit bots aren't included
- Yosemite WK1 and WK2 bots don't run JSC tests
  and the dedicated JSC tester bot isn't on dashboard
- Only the Mavericks WK1 bots can be found on it

> Landed in r181319: <http://trac.webkit.org/r181319>.
Thanks for the quick fix.
Comment 27 Kevin Staunton-Lambert 2015-11-18 10:00:39 PST
Thanks also