RESOLVED FIXED 142513
8-bit version of weakCompareAndSwap() can cause an infinite loop
https://bugs.webkit.org/show_bug.cgi?id=142513
Summary 8-bit version of weakCompareAndSwap() can cause an infinite loop
Mark Lam
Reported 2015-03-09 16:18:04 PDT
Presently, Bitmap::concurrentTestAndSet() uses the 8-bit version of weakCompareAndSwap() (which compares and swaps an uint8_t value). Bitmap::concurrentTestAndSet() has a loop that checks if a bit in the byte of interest has been set. If not, it will call the 8-bit CAS function to set the bit. Under the covers, the 8-bit CAS function actually works with a 32-bit CAS. The 8-bit CAS will first fetch the 32-bit value in memory that contains the 8-bit value, and check if it contains the expected byte. If the value in memory doesn't have the expected byte, it will return early to its caller. The expectation is that the caller will reload the byte from memory and call the 8-bit CAS again. Unfortunately, this code path that returns early does not have a compiler fence. Without that compiler fence, the C++ compiler can optimize away the reloading of the expected byte value, leaving it unchanged. As a result, we'll have a infinite loop here that checks a value that will never change, and the loop will not terminate until the value changes. The fix is to eliminate the early return check in the 8-bit CAS, and have it always call down to the 32-bit CAS. The 32-bit CAS has a compiler fence which will prevent this issue.
Attachments
the patch. (13.04 KB, patch)
2015-03-09 17:36 PDT, Mark Lam
mark.lam: review-
take 2: with fixes for windows project. Make doesn't seem to build testapi. (14.48 KB, patch)
2015-03-09 17:59 PDT, Mark Lam
fpizlo: review-
fpizlo: commit-queue-
take 3: splice the expectedValue and newValue using the same oldValue (14.47 KB, patch)
2015-03-09 19:09 PDT, Mark Lam
no flags
take 4: fixed missing volatile (14.41 KB, patch)
2015-03-09 19:11 PDT, Mark Lam
fpizlo: review+
follow up: Windows doesn't like pthreads (2.54 KB, patch)
2015-03-09 21:05 PDT, Mark Lam
achristensen: review+
Speculative fix for Windows build. (1.04 KB, patch)
2015-03-09 21:37 PDT, Mark Lam
no flags
Build fix for Windows. This time, I've tested this on a local Windows box. (7.65 KB, patch)
2015-03-09 23:40 PDT, Mark Lam
achristensen: review+
Mark Lam
Comment 1 2015-03-09 16:19:09 PDT
Mark Lam
Comment 2 2015-03-09 17:36:41 PDT
Created attachment 248298 [details] the patch.
Mark Lam
Comment 3 2015-03-09 17:51:33 PDT
Comment on attachment 248298 [details] the patch. Need to add test file to other ports.
Mark Lam
Comment 4 2015-03-09 17:59:28 PDT
Created attachment 248303 [details] take 2: with fixes for windows project. Make doesn't seem to build testapi.
Filip Pizlo
Comment 5 2015-03-09 18:20:46 PDT
Comment on attachment 248303 [details] take 2: with fixes for windows project. Make doesn't seem to build testapi. View in context: https://bugs.webkit.org/attachment.cgi?id=248303&action=review > Source/WTF/wtf/Atomics.h:335 > + struct CompositeValue { > + public: > + CompositeValue(volatile unsigned* baseWordAddr, uint8_t replacementByte, uintptr_t replacementIndex) > + { > + u.word = *baseWordAddr; > + u.bytes[replacementIndex] = replacementByte; > + } > + > + unsigned word() { return u.word; } > + > + private: > + union { > + unsigned word; > + uint8_t bytes[sizeof(unsigned)]; > + } u; > + }; > + > + CompositeValue oldAlignedValue(alignedLocation, expected, locationOffset); > + CompositeValue newAlignedValue(alignedLocation, newValue, locationOffset); This is wrong. You load from *alignedLocation twice: once t get the side-bytes of the old value and once to get the side-bytes of the new value. This has a race. If the bytes other than the ones you are CASing inside the aligned word change between the two loads, you could cause badness. I suggest not using the CompositeValue helper. Instead, all that this needs is a compiler fence before the load of the oldAlignedValue in the original code.
Brent Fulgham
Comment 6 2015-03-09 18:23:27 PDT
We're missing the path to "config.h" on the Windows project settings for testapi. I'll land a fix for that independent of this change.
Filip Pizlo
Comment 7 2015-03-09 18:49:11 PDT
(In reply to comment #5) > Comment on attachment 248303 [details] > take 2: with fixes for windows project. Make doesn't seem to build testapi. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=248303&action=review > > > Source/WTF/wtf/Atomics.h:335 > > + struct CompositeValue { > > + public: > > + CompositeValue(volatile unsigned* baseWordAddr, uint8_t replacementByte, uintptr_t replacementIndex) > > + { > > + u.word = *baseWordAddr; > > + u.bytes[replacementIndex] = replacementByte; > > + } > > + > > + unsigned word() { return u.word; } > > + > > + private: > > + union { > > + unsigned word; > > + uint8_t bytes[sizeof(unsigned)]; > > + } u; > > + }; > > + > > + CompositeValue oldAlignedValue(alignedLocation, expected, locationOffset); > > + CompositeValue newAlignedValue(alignedLocation, newValue, locationOffset); > > This is wrong. You load from *alignedLocation twice: once t get the > side-bytes of the old value and once to get the side-bytes of the new value. > This has a race. If the bytes other than the ones you are CASing inside the > aligned word change between the two loads, you could cause badness. > > I suggest not using the CompositeValue helper. Instead, all that this needs > is a compiler fence before the load of the oldAlignedValue in the original > code. Here's the race that breaks your code: char buf[4]; // initial state buf[0] = 0; buf[1] = 0; buf[2] = 0; buf[3] = 0; Thread #1 takes the following steps: T1: 1) CAS(&buf[0], 0, 1); Thread #2 takes the following steps: T2: 1) buf[1] = 1; T2: 2) buf[1] = 0; We would expect the following to be valid outcomes of executing this program, regardless of the three possible interleavings. Here, "T1.1->T2.1" implies a particular sequencing of executions, so in particular it means that "T1 executes step 1 (i.e. the CAS) and then T2 executes step 1 (i.e. buf[1] = 1)". T1.1->T2.1->T2.2: buf == {1, 0, 0, 0} T2.1->T1.1->t2.2: buf == {1, 0, 0, 0} T2.1->T2.2->T1.1: buf == {1, 0, 0, 0} In particular there is no valid outcome with buf[1] != 0. But if the 1-byte CAS does what you do, then it actually has two loads and a CAS. The first load influences what we compare and the second load influences what we set the word to. Let's say that the CAS is decomposed into three steps: a) load the whole word, put in the expected byte b) load the whole word, put in the newValue byte c) do the word CAS This means that we will have the following evil interleaving: T1.1.a->T2.1->T2.2->T1.1.b->T1.1.c After T1.1.a executes, the expected value word will be {0, 0, 0, 0}. After T2.1, buf[1] == 1, but then T2.2 runs and then we have buf[1] == 0 again. Now T1.1.b runs and sets the new value word to {1, 1, 0, 0}. Then T1.1.c runs, and we CAS with expected = {0, 0, 0, 0} and newValue = {1, 1, 0, 0}. So the result is {1, 1, 0, 0,} which is wrong.
Filip Pizlo
Comment 8 2015-03-09 18:59:52 PDT
I believe the composite value thing doesn't produce a net benefit. I think it makes it easier to make the sort of mistake that you made. CAS implementations should be as straight-line and free of abstractions as possible because we want to be able to see, by glancing at the code, how many loads there are. I believe that a correct implementation must only involve one load. The abstraction via CompositeLoad means that the code appears to have one load but you only realize that it executes twice by thinking about it a bit more carefully. There are two possible changes that make the current trunk code correct: 1) Put a compiler fence before and after the whole 8-byte CAS function - that is, the early return would instead become a control flow diamond and the whole 8-byte CAS function would look something like this: compilerFence(); ... address math oldAlignedValue = *load; if (u.bytes[thingy] == stuff) result = false; else { ... result = CAS(...); } compilerFence(); return result; The reason for both compiler fences is to prevent code motion into or out of the CAS implementation. The top compiler fence prevents hoisting the load. The bottom compiler fence prevents hoisting a store into the fast-exit path. This doesn't prevent any optimizations other than those that would be invalid. 2) Use an unconditional CAS like your change does, but without CompositeValue. You could just construct oldValue and newValue by storing a byte into the union and then loading the word out. This kind of very C-like simple implementation is trivial to verify by visual inspection. The upside is that it's easier to reason about why it's right. But there is a downside: the splicing of the old value, which trunk does not do, is probably just as expensive as that byte check. And if the check would fail (early return) in trunk, this new modified code would have to execute a full CAS. Also, I suspect that we still would need a compiler fence at the top to prevent removing the load. I worry that a clever compiler could CSE the load with an earlier load, and that could cause some subtle issues. I don't really care much if we use (1) or (2). But the CompositeValue abstraction feels dangerous to me - the less abstraction this code has, the more obvious it will be what the interleavings could be.
Mark Lam
Comment 9 2015-03-09 19:09:44 PDT
Created attachment 248307 [details] take 3: splice the expectedValue and newValue using the same oldValue
Mark Lam
Comment 10 2015-03-09 19:11:43 PDT
Created attachment 248308 [details] take 4: fixed missing volatile
Mark Lam
Comment 11 2015-03-09 19:12:52 PDT
Comment on attachment 248308 [details] take 4: fixed missing volatile View in context: https://bugs.webkit.org/attachment.cgi?id=248308&action=review > Source/WTF/wtf/Atomics.h:-316 > - // Make sure that this load is always issued and never optimized away. I should probably restore this comment.
Filip Pizlo
Comment 12 2015-03-09 19:12:55 PDT
Comment on attachment 248308 [details] take 4: fixed missing volatile View in context: https://bugs.webkit.org/attachment.cgi?id=248308&action=review > Source/WTF/wtf/Atomics.h:-316 > - // Make sure that this load is always issued and never optimized away. Why remove this comment? I think it's useful.
Mark Lam
Comment 13 2015-03-09 19:21:18 PDT
Thanks for the review. The comment has been restored. Landed in r181305: <http://trac.webkit.org/r181305>.
Ryosuke Niwa
Comment 14 2015-03-09 20:13:16 PDT
(In reply to comment #13) > Thanks for the review. The comment has been restored. > > Landed in r181305: <http://trac.webkit.org/r181305>. This patch broke Windows builds: CompareAndSwapTest.cpp ..\..\API\tests\CompareAndSwapTest.cpp(95): error C3861: 'pthread_exit': identifier not found [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj] ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2065: 'pthread_t' : undeclared identifier [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj] ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2146: syntax error : missing ';' before identifier 'threadIDs' [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj] ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2065: 'threadIDs' : undeclared identifier [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj] ..\..\API\tests\CompareAndSwapTest.cpp(110): error C2065: 'threadIDs' : undeclared identifier [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj] ..\..\API\tests\CompareAndSwapTest.cpp(110): error C3861: 'pthread_create': identifier not found [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj] ..\..\API\tests\CompareAndSwapTest.cpp(115): error C2065: 'threadIDs' : undeclared identifier [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj] ..\..\API\tests\CompareAndSwapTest.cpp(115): error C3861: 'pthread_join': identifier not found [C:\cygwin\home\buildbot\slave\win-debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi.vcxproj]
Mark Lam
Comment 15 2015-03-09 20:22:39 PDT
(In reply to comment #14) > (In reply to comment #13) > > Thanks for the review. The comment has been restored. > > > > Landed in r181305: <http://trac.webkit.org/r181305>. > > This patch broke Windows builds: > > CompareAndSwapTest.cpp > ..\..\API\tests\CompareAndSwapTest.cpp(95): error C3861: 'pthread_exit': > identifier not found > [C:\cygwin\home\buildbot\slave\win- > debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi. > vcxproj] > ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2065: 'pthread_t' : > undeclared identifier > [C:\cygwin\home\buildbot\slave\win- > debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi. > vcxproj] > ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2146: syntax error : > missing ';' before identifier 'threadIDs' > [C:\cygwin\home\buildbot\slave\win- > debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi. > vcxproj] > ..\..\API\tests\CompareAndSwapTest.cpp(102): error C2065: 'threadIDs' : > undeclared identifier > [C:\cygwin\home\buildbot\slave\win- > debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi. > vcxproj] > ..\..\API\tests\CompareAndSwapTest.cpp(110): error C2065: 'threadIDs' : > undeclared identifier > [C:\cygwin\home\buildbot\slave\win- > debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi. > vcxproj] > ..\..\API\tests\CompareAndSwapTest.cpp(110): error C3861: 'pthread_create': > identifier not found > [C:\cygwin\home\buildbot\slave\win- > debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi. > vcxproj] > ..\..\API\tests\CompareAndSwapTest.cpp(115): error C2065: 'threadIDs' : > undeclared identifier > [C:\cygwin\home\buildbot\slave\win- > debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi. > vcxproj] > ..\..\API\tests\CompareAndSwapTest.cpp(115): error C3861: 'pthread_join': > identifier not found > [C:\cygwin\home\buildbot\slave\win- > debug\build\Source\JavaScriptCore\JavaScriptCore.vcxproj\testapi\testapi. > vcxproj] Yeah, Brent just told me this on the way out. I’m working on a fix.
Mark Lam
Comment 16 2015-03-09 21:05:09 PDT
Created attachment 248314 [details] follow up: Windows doesn't like pthreads
Mark Lam
Comment 17 2015-03-09 21:23:11 PDT
Windows fix landed in r181308: <http://trac.webkit.org/r181308>.
Mark Lam
Comment 18 2015-03-09 21:37:39 PDT
Created attachment 248316 [details] Speculative fix for Windows build.
Mark Lam
Comment 19 2015-03-09 21:43:03 PDT
The EWS bots are not cooperating. I went ahead and landed the speculative Windows fix in r181309: <http://trac.webkit.org/r181309>.
Mark Lam
Comment 20 2015-03-09 23:40:04 PDT
Created attachment 248318 [details] Build fix for Windows. This time, I've tested this on a local Windows box.
Mark Lam
Comment 21 2015-03-09 23:57:48 PDT
Final build fix for Windows landed in r181315: <http://trac.webkit.org/r181315>.
Csaba Osztrogonác
Comment 22 2015-03-10 01:04:19 PDT
(In reply to comment #17) > Windows fix landed in r181308: <http://trac.webkit.org/r181308>. It broke the JSC API tests on the Apple bots.
Mark Lam
Comment 23 2015-03-10 01:06:25 PDT
(In reply to comment #22) > (In reply to comment #17) > > Windows fix landed in r181308: <http://trac.webkit.org/r181308>. > > It broke the JSC API tests on the Apple bots. Which ones? Can you provide a link please?
Csaba Osztrogonác
Comment 24 2015-03-10 01:07:07 PDT
(In reply to comment #23) > (In reply to comment #22) > > (In reply to comment #17) > > > Windows fix landed in r181308: <http://trac.webkit.org/r181308>. > > > > It broke the JSC API tests on the Apple bots. > > Which ones? Can you provide a link please? All Apple Mac bots on build.webkit.org
Mark Lam
Comment 25 2015-03-10 01:48:53 PDT
(In reply to comment #24) > All Apple Mac bots on build.webkit.org I wasn’t seeing this on the dashboard except for the Mavericks debug bots. Anyway, it’s a trivial test only fix. Landed in r181319: <http://trac.webkit.org/r181319>.
Csaba Osztrogonác
Comment 26 2015-03-10 04:09:11 PDT
(In reply to comment #25) > (In reply to comment #24) > > All Apple Mac bots on build.webkit.org > > I wasn’t seeing this on the dashboard except for the Mavericks debug bots. > Anyway, it’s a trivial test only fix. Ah, because the JSC test coverage is poor on the dashboard: - CLOOP and 32 bit bots aren't included - Yosemite WK1 and WK2 bots don't run JSC tests and the dedicated JSC tester bot isn't on dashboard - Only the Mavericks WK1 bots can be found on it > Landed in r181319: <http://trac.webkit.org/r181319>. Thanks for the quick fix.
Kevin Staunton-Lambert
Comment 27 2015-11-18 10:00:39 PST
Thanks also
Note You need to log in before you can comment on or make changes to this bug.