Summary: | Don't use memmove/memcpy/memset for memory that can be scanned concurrently | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Saam Barati <saam> | ||||||||||||||||||||||||||||
Component: | JavaScriptCore | Assignee: | Saam Barati <saam> | ||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||
Severity: | Normal | CC: | aakash_jain, annulen, benjamin, commit-queue, ews-watchlist, fpizlo, ggaren, gskachkov, guijemont, gyuyoung.kim, keith_miller, mark.lam, msaboff, rmorisset, ryuan.choi, sergio, ticaiolima, tzagallo, webkit-bot-watchers-bugzilla, webkit-bug-importer, ysuzuki | ||||||||||||||||||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||||
Attachments: |
|
Description
Saam Barati
2019-10-21 18:11:24 PDT
When we fix this we should figure out if there's a way to tell if the system memmove will tear or not for a given set of parameters. If it doesn't exist we should ask for it anyway. An alternative to avoiding memcpy is to make sure that the object continues pointing to the original buffer until the memcpy into the new buffer is complete. (In the case of in-place memmove, though, that's not possible, and you either need a manual loop or a lock.) (In reply to Keith Miller from comment #2) > When we fix this we should figure out if there's a way to tell if the system > memmove will tear or not for a given set of parameters. If it doesn't exist > we should ask for it anyway. We could ask. (In reply to Geoffrey Garen from comment #3) > An alternative to avoiding memcpy is to make sure that the object continues > pointing to the original buffer until the memcpy into the new buffer is > complete. (In the case of in-place memmove, though, that's not possible, and > you either need a manual loop or a lock.) Yeah, Mark and I last night looked at the code llvm emits. It appears if you emit a memcpy loop in C++ with 64-bit items, llvm will inline its memcpy, but won't emit the byte copy part. So it seems like we may be able to just emit these loops in terms of JSValue and it will Just Work. Created attachment 381629 [details]
patch
Created attachment 381630 [details]
patch
Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review r=me with one comment. > Source/JavaScriptCore/heap/GCMemoryOperations.h:72 > + bitwise_cast<JSValue*>(destination)[i] = JSValue(); In 32bit arch, JSValue() is not zero IIRC (EmptyTag exists). But it is also possible that caller of gcSafeZeroMemory intends to fill them with JSEmpty (and we were using memset with 0 incorrectly). Can you check, 1. Whether the caller wants to fill it with 0, or wants to fill it with JSEmpty 2. If the caller want to fill it with JSEmpty, I think we need to have some other name here since JSEmpty in 32bit is not zero. Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review > Source/JavaScriptCore/ChangeLog:19 > + store into 8 single byte stores, and in practice, based on the disassembly Nit: "and in" => "in" > Source/JavaScriptCore/heap/GCMemoryOperations.h:43 > + static_assert(sizeof(T) == sizeof(JSValue)); Since there is this static_assert, I don't see the point of passing the number of bytes. Why not pass the count directly? In practice it appears that most callers are pointlessly multiplying the count by the sizeof(T), just for it to be divided by sizeof(JSValue) just afterwards. > Source/JavaScriptCore/heap/GCMemoryOperations.h:46 > + if (source < destination) { technically UB to compare pointers to different allocations. It is unlikely to cause problems, but ideally they should be casted to uintptr_t before the comparison. Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review >> Source/JavaScriptCore/heap/GCMemoryOperations.h:43 >> + static_assert(sizeof(T) == sizeof(JSValue)); > > Since there is this static_assert, I don't see the point of passing the number of bytes. Why not pass the count directly? In practice it appears that most callers are pointlessly multiplying the count by the sizeof(T), just for it to be divided by sizeof(JSValue) just afterwards. I'm just keeping it as bytes for a few reasons: - As you said, many callers trivially multiply. Since this is inlined, the compiler will just get rid of the multiply and this assert branch - there are places where it's a bit uglier. See stuff with indexing header and array storage. (These also though, should be optimized away by the compiler.) - It's nice to have the same API as memmove/memcpy >> Source/JavaScriptCore/heap/GCMemoryOperations.h:46 >> + if (source < destination) { > > technically UB to compare pointers to different allocations. > It is unlikely to cause problems, but ideally they should be casted to uintptr_t before the comparison. c++ is terrible. I'll cast Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review > Source/JavaScriptCore/heap/GCMemoryOperations.h:66 > +ALWAYS_INLINE void gcSafeZeroMemory(T* destination, size_t bytes) unfortunately this is turned into a call to memset. Need to figure it out Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review > Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj:-11037 > - FE1D6D4523580E1F007A5C26 /* Options.cpp in Sources */, Why this removal? Please undo. Comment on attachment 381630 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review >> Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj:-11037 >> - FE1D6D4523580E1F007A5C26 /* Options.cpp in Sources */, > > Why this removal? Please undo. no idea. I blame Xcode. I'm going to just write these in inline asm, and then we won't have to worry about it Created attachment 382257 [details]
patch
Comment on attachment 382257 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=382257&action=review > Source/JavaScriptCore/heap/GCMemoryOperations.h:39 > +// We use these memory operations when modifying memory that might be scanned by the concurrent collector. > +// We don't call the default operations because they're not guaranteed to store to memory in eight byte aligned > +// chunks. If we happened to fall into the system's normal byte copy loop, we may see a torn JSValue in the > +// concurrent collector. Even though it's legal according to the C++ spec to turn something like an 8 byte store > +// to memory into 8 single byte stores, we rely on LLVM (and other compilers) to not do such transformations. > +// In practice, it appears that they don't. (If they did, this isn't the only place we would have an issue. > +// For example, everywhere we do a store in a WriteBarrier would be subject to the same issue.) Oops, I will update this comment. Created attachment 382258 [details]
patch
Comment on attachment 382258 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=382258&action=review Do you have any performance numbers for this? In particular I'd like at least some checking that it is not too massive of a regression on each of x86_64, ARM64 and ARM32 (which I think we still support on some watches?) > Source/JavaScriptCore/heap/GCMemoryOperations.h:50 > + size_t count = bytes / 8; Maybe extract this line and the next two as their own function (slowPathForwardMemcpy or something), since they are reused a below. > Source/JavaScriptCore/heap/GCMemoryOperations.h:56 > + size_t tmp; Why did you declare tmp here, instead of just using a register that you mark clobbered (as you did for ymm0 and ymm1) ? > Source/JavaScriptCore/heap/GCMemoryOperations.h:118 > + RELEASE_ASSERT(!isARM64()); Why not a more explicit check that this is x86_64? I don't think we currently support a 64-bit version of MIPS or anything like that, but having an ASSERT here could save us trouble down the line. > Source/JavaScriptCore/heap/GCMemoryOperations.h:219 > + } else { It might be cleaner to have this be outside of all the #if madness, since I believe it always does the right thing as long as dst < src. Just start the function with: ``` if (bitwise_cast<uintptr_t>(src) > bitwise_cast<uintptr_t>(dst)) { gcSafeMemcpy(dst, src, bytes); return; } ``` It would both reduce the level of nesting of the branches, and let you remove the forward loop in the non-GCC-compatible (or non-64-bits) case below. (In reply to Robin Morisset from comment #18) > Comment on attachment 382258 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=382258&action=review > > Do you have any performance numbers for this? In particular I'd like at > least some checking that it is not too massive of a regression on each of > x86_64, ARM64 and ARM32 (which I think we still support on some watches?) My plan is to just monitor the perf bots after landing. For 32bit, I’ll just make the code call memset/memmove/memcpy. > > > Source/JavaScriptCore/heap/GCMemoryOperations.h:50 > > + size_t count = bytes / 8; > > Maybe extract this line and the next two as their own function > (slowPathForwardMemcpy or something), since they are reused a below. Sounds good. > > > Source/JavaScriptCore/heap/GCMemoryOperations.h:56 > > + size_t tmp; > > Why did you declare tmp here, instead of just using a register that you mark > clobbered (as you did for ymm0 and ymm1) ? I used tmp so the compiler could pick the register. I’m not sure how to do the same with ymm registers. > > > Source/JavaScriptCore/heap/GCMemoryOperations.h:118 > > + RELEASE_ASSERT(!isARM64()); > > Why not a more explicit check that this is x86_64? I don't think we > currently support a 64-bit version of MIPS or anything like that, but having > an ASSERT here could save us trouble down the line. Sure. > > > Source/JavaScriptCore/heap/GCMemoryOperations.h:219 > > + } else { > > It might be cleaner to have this be outside of all the #if madness, since I > believe it always does the right thing as long as dst < src. > Just start the function with: > ``` > if (bitwise_cast<uintptr_t>(src) > bitwise_cast<uintptr_t>(dst)) { > gcSafeMemcpy(dst, src, bytes); > return; > } > ``` > It would both reduce the level of nesting of the branches, and let you > remove the forward loop in the non-GCC-compatible (or non-64-bits) case > below. Sounds good. Comment on attachment 382258 [details] patch Attachment 382258 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/13190416 New failing tests: fast/repaint/backgroundSizeRepaint.html Created attachment 382283 [details]
Archive of layout-test-results from ews213 for win-future
The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews213 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Comment on attachment 382258 [details]
patch
r=me, we discussed testing offline (land + run benchmarks concurrently).
Created attachment 382347 [details]
patch for landing
(In reply to Saam Barati from comment #23) > Created attachment 382347 [details] > patch for landing Seems to cause timeouts on api-tests on mac, https://ews-build.webkit.org/#/builders/3/builds/11216 build is running for 3+ hours, similar thing for https://ews-build.webkit.org/#/builders/3/builds/11221 Please verify before landing. Created attachment 382396 [details]
patch for landing
Created attachment 382407 [details]
patch for landing
Comment on attachment 382407 [details] patch for landing Attachment 382407 [details] did not pass jsc-ews (mac): Output: https://webkit-queues.webkit.org/results/13193684 New failing tests: mozilla-tests.yaml/js1_5/Array/regress-101964.js.mozilla-baseline Comment on attachment 382407 [details] patch for landing Attachment 382407 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/13195226 New failing tests: fast/repaint/backgroundSizeRepaint.html Created attachment 382455 [details]
Archive of layout-test-results from ews212 for win-future
The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews212 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Created attachment 382475 [details]
patch for landing
ok, I think it should all work now.
Created attachment 382480 [details]
patch for landing
Comment on attachment 382480 [details] patch for landing Clearing flags on attachment: 382480 Committed r251875: <https://trac.webkit.org/changeset/251875> All reviewed patches have been landed. Closing bug. Reverted in r251967, because this is suspected to have broken perf tests on iOS. Comment on attachment 382480 [details] patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=382480&action=review > Source/JavaScriptCore/heap/GCMemoryOperations.h:284 > + "str d0, [%x[dst]], #0x10\t\n" I believe this is the bug. It should be 0x8 not 0x10 Created attachment 382770 [details]
patch for landing
Created attachment 382775 [details]
patch for landing
Comment on attachment 382775 [details] patch for landing Clearing flags on attachment: 382775 Committed r252024: <https://trac.webkit.org/changeset/252024> All reviewed patches have been landed. Closing bug. |