RESOLVED FIXED 203228
Don't use memmove/memcpy/memset for memory that can be scanned concurrently
https://bugs.webkit.org/show_bug.cgi?id=203228
Summary Don't use memmove/memcpy/memset for memory that can be scanned concurrently
Saam Barati
Reported 2019-10-21 18:11:24 PDT
otherwise, we might see torn values in the collector
Attachments
patch (27.27 KB, patch)
2019-10-22 16:48 PDT, Saam Barati
no flags
patch (27.26 KB, patch)
2019-10-22 16:51 PDT, Saam Barati
ysuzuki: review+
patch (36.44 KB, patch)
2019-10-29 17:34 PDT, Saam Barati
no flags
patch (36.04 KB, patch)
2019-10-29 17:40 PDT, Saam Barati
rmorisset: review+
ews-watchlist: commit-queue-
Archive of layout-test-results from ews213 for win-future (14.43 MB, application/zip)
2019-10-29 22:44 PDT, EWS Watchlist
no flags
patch for landing (35.70 KB, patch)
2019-10-30 13:56 PDT, Saam Barati
saam: commit-queue-
patch for landing (36.30 KB, patch)
2019-10-30 17:14 PDT, Saam Barati
no flags
patch for landing (36.29 KB, patch)
2019-10-30 19:08 PDT, Saam Barati
ews-watchlist: commit-queue-
Archive of layout-test-results from ews212 for win-future (14.22 MB, application/zip)
2019-10-31 06:17 PDT, EWS Watchlist
no flags
patch for landing (37.63 KB, patch)
2019-10-31 10:19 PDT, Saam Barati
no flags
patch for landing (36.19 KB, patch)
2019-10-31 11:22 PDT, Saam Barati
no flags
patch for landing (36.19 KB, patch)
2019-11-04 14:01 PST, Saam Barati
saam: commit-queue-
patch for landing (36.20 KB, patch)
2019-11-04 14:36 PST, Saam Barati
no flags
Saam Barati
Comment 1 2019-10-21 18:11:53 PDT
Keith Miller
Comment 2 2019-10-21 20:07:11 PDT
When we fix this we should figure out if there's a way to tell if the system memmove will tear or not for a given set of parameters. If it doesn't exist we should ask for it anyway.
Geoffrey Garen
Comment 3 2019-10-21 20:30:25 PDT
An alternative to avoiding memcpy is to make sure that the object continues pointing to the original buffer until the memcpy into the new buffer is complete. (In the case of in-place memmove, though, that's not possible, and you either need a manual loop or a lock.)
Saam Barati
Comment 4 2019-10-22 13:27:41 PDT
(In reply to Keith Miller from comment #2) > When we fix this we should figure out if there's a way to tell if the system > memmove will tear or not for a given set of parameters. If it doesn't exist > we should ask for it anyway. We could ask.
Saam Barati
Comment 5 2019-10-22 13:29:07 PDT
(In reply to Geoffrey Garen from comment #3) > An alternative to avoiding memcpy is to make sure that the object continues > pointing to the original buffer until the memcpy into the new buffer is > complete. (In the case of in-place memmove, though, that's not possible, and > you either need a manual loop or a lock.) Yeah, Mark and I last night looked at the code llvm emits. It appears if you emit a memcpy loop in C++ with 64-bit items, llvm will inline its memcpy, but won't emit the byte copy part. So it seems like we may be able to just emit these loops in terms of JSValue and it will Just Work.
Saam Barati
Comment 6 2019-10-22 16:48:13 PDT
Saam Barati
Comment 7 2019-10-22 16:51:34 PDT
Yusuke Suzuki
Comment 8 2019-10-22 16:57:57 PDT
Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review r=me with one comment. > Source/JavaScriptCore/heap/GCMemoryOperations.h:72 > + bitwise_cast<JSValue*>(destination)[i] = JSValue(); In 32bit arch, JSValue() is not zero IIRC (EmptyTag exists). But it is also possible that caller of gcSafeZeroMemory intends to fill them with JSEmpty (and we were using memset with 0 incorrectly). Can you check, 1. Whether the caller wants to fill it with 0, or wants to fill it with JSEmpty 2. If the caller want to fill it with JSEmpty, I think we need to have some other name here since JSEmpty in 32bit is not zero.
Robin Morisset
Comment 9 2019-10-22 17:00:11 PDT
Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review > Source/JavaScriptCore/ChangeLog:19 > + store into 8 single byte stores, and in practice, based on the disassembly Nit: "and in" => "in" > Source/JavaScriptCore/heap/GCMemoryOperations.h:43 > + static_assert(sizeof(T) == sizeof(JSValue)); Since there is this static_assert, I don't see the point of passing the number of bytes. Why not pass the count directly? In practice it appears that most callers are pointlessly multiplying the count by the sizeof(T), just for it to be divided by sizeof(JSValue) just afterwards. > Source/JavaScriptCore/heap/GCMemoryOperations.h:46 > + if (source < destination) { technically UB to compare pointers to different allocations. It is unlikely to cause problems, but ideally they should be casted to uintptr_t before the comparison.
Saam Barati
Comment 10 2019-10-22 17:03:58 PDT
Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review >> Source/JavaScriptCore/heap/GCMemoryOperations.h:43 >> + static_assert(sizeof(T) == sizeof(JSValue)); > > Since there is this static_assert, I don't see the point of passing the number of bytes. Why not pass the count directly? In practice it appears that most callers are pointlessly multiplying the count by the sizeof(T), just for it to be divided by sizeof(JSValue) just afterwards. I'm just keeping it as bytes for a few reasons: - As you said, many callers trivially multiply. Since this is inlined, the compiler will just get rid of the multiply and this assert branch - there are places where it's a bit uglier. See stuff with indexing header and array storage. (These also though, should be optimized away by the compiler.) - It's nice to have the same API as memmove/memcpy >> Source/JavaScriptCore/heap/GCMemoryOperations.h:46 >> + if (source < destination) { > > technically UB to compare pointers to different allocations. > It is unlikely to cause problems, but ideally they should be casted to uintptr_t before the comparison. c++ is terrible. I'll cast
Saam Barati
Comment 11 2019-10-22 17:18:32 PDT
Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review > Source/JavaScriptCore/heap/GCMemoryOperations.h:66 > +ALWAYS_INLINE void gcSafeZeroMemory(T* destination, size_t bytes) unfortunately this is turned into a call to memset. Need to figure it out
Mark Lam
Comment 12 2019-10-22 17:23:15 PDT
Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review > Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj:-11037 > - FE1D6D4523580E1F007A5C26 /* Options.cpp in Sources */, Why this removal? Please undo.
Saam Barati
Comment 13 2019-10-22 17:24:06 PDT
Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review >> Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj:-11037 >> - FE1D6D4523580E1F007A5C26 /* Options.cpp in Sources */, > > Why this removal? Please undo. no idea. I blame Xcode.
Saam Barati
Comment 14 2019-10-22 17:44:47 PDT
I'm going to just write these in inline asm, and then we won't have to worry about it
Saam Barati
Comment 15 2019-10-29 17:34:39 PDT
Saam Barati
Comment 16 2019-10-29 17:37:24 PDT
Comment on attachment 382257 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=382257&action=review > Source/JavaScriptCore/heap/GCMemoryOperations.h:39 > +// We use these memory operations when modifying memory that might be scanned by the concurrent collector. > +// We don't call the default operations because they're not guaranteed to store to memory in eight byte aligned > +// chunks. If we happened to fall into the system's normal byte copy loop, we may see a torn JSValue in the > +// concurrent collector. Even though it's legal according to the C++ spec to turn something like an 8 byte store > +// to memory into 8 single byte stores, we rely on LLVM (and other compilers) to not do such transformations. > +// In practice, it appears that they don't. (If they did, this isn't the only place we would have an issue. > +// For example, everywhere we do a store in a WriteBarrier would be subject to the same issue.) Oops, I will update this comment.
Saam Barati
Comment 17 2019-10-29 17:40:39 PDT
Robin Morisset
Comment 18 2019-10-29 18:16:49 PDT
Comment on attachment 382258 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=382258&action=review Do you have any performance numbers for this? In particular I'd like at least some checking that it is not too massive of a regression on each of x86_64, ARM64 and ARM32 (which I think we still support on some watches?) > Source/JavaScriptCore/heap/GCMemoryOperations.h:50 > + size_t count = bytes / 8; Maybe extract this line and the next two as their own function (slowPathForwardMemcpy or something), since they are reused a below. > Source/JavaScriptCore/heap/GCMemoryOperations.h:56 > + size_t tmp; Why did you declare tmp here, instead of just using a register that you mark clobbered (as you did for ymm0 and ymm1) ? > Source/JavaScriptCore/heap/GCMemoryOperations.h:118 > + RELEASE_ASSERT(!isARM64()); Why not a more explicit check that this is x86_64? I don't think we currently support a 64-bit version of MIPS or anything like that, but having an ASSERT here could save us trouble down the line. > Source/JavaScriptCore/heap/GCMemoryOperations.h:219 > + } else { It might be cleaner to have this be outside of all the #if madness, since I believe it always does the right thing as long as dst < src. Just start the function with: ``` if (bitwise_cast<uintptr_t>(src) > bitwise_cast<uintptr_t>(dst)) { gcSafeMemcpy(dst, src, bytes); return; } ``` It would both reduce the level of nesting of the branches, and let you remove the forward loop in the non-GCC-compatible (or non-64-bits) case below.
Saam Barati
Comment 19 2019-10-29 20:27:19 PDT
(In reply to Robin Morisset from comment #18) > Comment on attachment 382258 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=382258&action=review > > Do you have any performance numbers for this? In particular I'd like at > least some checking that it is not too massive of a regression on each of > x86_64, ARM64 and ARM32 (which I think we still support on some watches?) My plan is to just monitor the perf bots after landing. For 32bit, I’ll just make the code call memset/memmove/memcpy. > > > Source/JavaScriptCore/heap/GCMemoryOperations.h:50 > > + size_t count = bytes / 8; > > Maybe extract this line and the next two as their own function > (slowPathForwardMemcpy or something), since they are reused a below. Sounds good. > > > Source/JavaScriptCore/heap/GCMemoryOperations.h:56 > > + size_t tmp; > > Why did you declare tmp here, instead of just using a register that you mark > clobbered (as you did for ymm0 and ymm1) ? I used tmp so the compiler could pick the register. I’m not sure how to do the same with ymm registers. > > > Source/JavaScriptCore/heap/GCMemoryOperations.h:118 > > + RELEASE_ASSERT(!isARM64()); > > Why not a more explicit check that this is x86_64? I don't think we > currently support a 64-bit version of MIPS or anything like that, but having > an ASSERT here could save us trouble down the line. Sure. > > > Source/JavaScriptCore/heap/GCMemoryOperations.h:219 > > + } else { > > It might be cleaner to have this be outside of all the #if madness, since I > believe it always does the right thing as long as dst < src. > Just start the function with: > ``` > if (bitwise_cast<uintptr_t>(src) > bitwise_cast<uintptr_t>(dst)) { > gcSafeMemcpy(dst, src, bytes); > return; > } > ``` > It would both reduce the level of nesting of the branches, and let you > remove the forward loop in the non-GCC-compatible (or non-64-bits) case > below. Sounds good.
EWS Watchlist
Comment 20 2019-10-29 22:44:21 PDT
Comment on attachment 382258 [details] patch Attachment 382258 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/13190416 New failing tests: fast/repaint/backgroundSizeRepaint.html
EWS Watchlist
Comment 21 2019-10-29 22:44:23 PDT
Created attachment 382283 [details] Archive of layout-test-results from ews213 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews213 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Robin Morisset
Comment 22 2019-10-30 13:25:22 PDT
Comment on attachment 382258 [details] patch r=me, we discussed testing offline (land + run benchmarks concurrently).
Saam Barati
Comment 23 2019-10-30 13:56:28 PDT
Created attachment 382347 [details] patch for landing
Aakash Jain
Comment 24 2019-10-30 16:24:28 PDT
(In reply to Saam Barati from comment #23) > Created attachment 382347 [details] > patch for landing Seems to cause timeouts on api-tests on mac, https://ews-build.webkit.org/#/builders/3/builds/11216 build is running for 3+ hours, similar thing for https://ews-build.webkit.org/#/builders/3/builds/11221 Please verify before landing.
Saam Barati
Comment 25 2019-10-30 17:14:22 PDT
Created attachment 382396 [details] patch for landing
Saam Barati
Comment 26 2019-10-30 19:08:15 PDT
Created attachment 382407 [details] patch for landing
EWS Watchlist
Comment 27 2019-10-30 21:40:37 PDT
Comment on attachment 382407 [details] patch for landing Attachment 382407 [details] did not pass jsc-ews (mac): Output: https://webkit-queues.webkit.org/results/13193684 New failing tests: mozilla-tests.yaml/js1_5/Array/regress-101964.js.mozilla-baseline
EWS Watchlist
Comment 28 2019-10-31 06:17:32 PDT
Comment on attachment 382407 [details] patch for landing Attachment 382407 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/13195226 New failing tests: fast/repaint/backgroundSizeRepaint.html
EWS Watchlist
Comment 29 2019-10-31 06:17:35 PDT
Created attachment 382455 [details] Archive of layout-test-results from ews212 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews212 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Saam Barati
Comment 30 2019-10-31 10:19:24 PDT
Created attachment 382475 [details] patch for landing ok, I think it should all work now.
Saam Barati
Comment 31 2019-10-31 11:22:27 PDT
Created attachment 382480 [details] patch for landing
WebKit Commit Bot
Comment 32 2019-10-31 13:51:03 PDT
Comment on attachment 382480 [details] patch for landing Clearing flags on attachment: 382480 Committed r251875: <https://trac.webkit.org/changeset/251875>
WebKit Commit Bot
Comment 33 2019-10-31 13:51:06 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 34 2019-11-02 17:51:54 PDT
Reverted in r251967, because this is suspected to have broken perf tests on iOS.
Saam Barati
Comment 35 2019-11-04 13:49:05 PST
Comment on attachment 382480 [details] patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=382480&action=review > Source/JavaScriptCore/heap/GCMemoryOperations.h:284 > + "str d0, [%x[dst]], #0x10\t\n" I believe this is the bug. It should be 0x8 not 0x10
Saam Barati
Comment 36 2019-11-04 14:01:22 PST
Created attachment 382770 [details] patch for landing
Saam Barati
Comment 37 2019-11-04 14:36:33 PST
Created attachment 382775 [details] patch for landing
WebKit Commit Bot
Comment 38 2019-11-04 15:57:41 PST
Comment on attachment 382775 [details] patch for landing Clearing flags on attachment: 382775 Committed r252024: <https://trac.webkit.org/changeset/252024>
WebKit Commit Bot
Comment 39 2019-11-04 15:57:44 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.