Bug 203228

Summary: Don't use memmove/memcpy/memset for memory that can be scanned concurrently
Product: WebKit Reporter: Saam Barati <saam>
Component: JavaScriptCoreAssignee: Saam Barati <saam>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, annulen, benjamin, commit-queue, ews-watchlist, fpizlo, ggaren, gskachkov, guijemont, gyuyoung.kim, keith_miller, mark.lam, msaboff, rmorisset, ryuan.choi, sergio, ticaiolima, tzagallo, webkit-bot-watchers-bugzilla, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
patch
none
patch
ysuzuki: review+
patch
none
patch
rmorisset: review+, ews-watchlist: commit-queue-
Archive of layout-test-results from ews213 for win-future
none
patch for landing
saam: commit-queue-
patch for landing
none
patch for landing
ews-watchlist: commit-queue-
Archive of layout-test-results from ews212 for win-future
none
patch for landing
none
patch for landing
none
patch for landing
saam: commit-queue-
patch for landing none

Description Saam Barati 2019-10-21 18:11:24 PDT
otherwise, we might see torn values in the collector
Comment 1 Saam Barati 2019-10-21 18:11:53 PDT
<rdar://problem/56401852>
Comment 2 Keith Miller 2019-10-21 20:07:11 PDT
When we fix this we should figure out if there's a way to tell if the system memmove will tear or not for a given set of parameters. If it doesn't exist we should ask for it anyway.
Comment 3 Geoffrey Garen 2019-10-21 20:30:25 PDT
An alternative to avoiding memcpy is to make sure that the object continues pointing to the original buffer until the memcpy into the new buffer is complete. (In the case of in-place memmove, though, that's not possible, and you either need a manual loop or a lock.)
Comment 4 Saam Barati 2019-10-22 13:27:41 PDT
(In reply to Keith Miller from comment #2)
> When we fix this we should figure out if there's a way to tell if the system
> memmove will tear or not for a given set of parameters. If it doesn't exist
> we should ask for it anyway.

We could ask.
Comment 5 Saam Barati 2019-10-22 13:29:07 PDT
(In reply to Geoffrey Garen from comment #3)
> An alternative to avoiding memcpy is to make sure that the object continues
> pointing to the original buffer until the memcpy into the new buffer is
> complete. (In the case of in-place memmove, though, that's not possible, and
> you either need a manual loop or a lock.)

Yeah, Mark and I last night looked at the code llvm emits. It appears if you emit a memcpy loop in C++ with 64-bit items, llvm will inline its memcpy, but won't emit the byte copy part. So it seems like we may be able to just emit these loops in terms of JSValue and it will Just Work.
Comment 6 Saam Barati 2019-10-22 16:48:13 PDT
Created attachment 381629 [details]
patch
Comment 7 Saam Barati 2019-10-22 16:51:34 PDT
Created attachment 381630 [details]
patch
Comment 8 Yusuke Suzuki 2019-10-22 16:57:57 PDT
Comment on attachment 381630 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review

r=me with one comment.

> Source/JavaScriptCore/heap/GCMemoryOperations.h:72
> +        bitwise_cast<JSValue*>(destination)[i] = JSValue();

In 32bit arch, JSValue() is not zero IIRC (EmptyTag exists). But it is also possible that caller of gcSafeZeroMemory intends to fill them with JSEmpty (and we were using memset with 0 incorrectly).
Can you check,

1. Whether the caller wants to fill it with 0, or wants to fill it with JSEmpty
2. If the caller want to fill it with JSEmpty, I think we need to have some other name here since JSEmpty in 32bit is not zero.
Comment 9 Robin Morisset 2019-10-22 17:00:11 PDT
Comment on attachment 381630 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review

> Source/JavaScriptCore/ChangeLog:19
> +        store into 8 single byte stores, and in practice, based on the disassembly

Nit: "and in" => "in"

> Source/JavaScriptCore/heap/GCMemoryOperations.h:43
> +    static_assert(sizeof(T) == sizeof(JSValue));

Since there is this static_assert, I don't see the point of passing the number of bytes. Why not pass the count directly? In practice it appears that most callers are pointlessly multiplying the count by the sizeof(T), just for it to be divided by sizeof(JSValue) just afterwards.

> Source/JavaScriptCore/heap/GCMemoryOperations.h:46
> +    if (source < destination) {

technically UB to compare pointers to different allocations.
It is unlikely to cause problems, but ideally they should be casted to uintptr_t before the comparison.
Comment 10 Saam Barati 2019-10-22 17:03:58 PDT
Comment on attachment 381630 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review

>> Source/JavaScriptCore/heap/GCMemoryOperations.h:43
>> +    static_assert(sizeof(T) == sizeof(JSValue));
> 
> Since there is this static_assert, I don't see the point of passing the number of bytes. Why not pass the count directly? In practice it appears that most callers are pointlessly multiplying the count by the sizeof(T), just for it to be divided by sizeof(JSValue) just afterwards.

I'm just keeping it as bytes for a few reasons:
- As you said, many callers trivially multiply. Since this is inlined, the compiler will just get rid of the multiply and this assert branch
- there are places where it's a bit uglier. See stuff with indexing header and array storage. (These also though, should be optimized away by the compiler.)
- It's nice to have the same API as memmove/memcpy

>> Source/JavaScriptCore/heap/GCMemoryOperations.h:46
>> +    if (source < destination) {
> 
> technically UB to compare pointers to different allocations.
> It is unlikely to cause problems, but ideally they should be casted to uintptr_t before the comparison.

c++ is terrible. I'll cast
Comment 11 Saam Barati 2019-10-22 17:18:32 PDT
Comment on attachment 381630 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review

> Source/JavaScriptCore/heap/GCMemoryOperations.h:66
> +ALWAYS_INLINE void gcSafeZeroMemory(T* destination, size_t bytes)

unfortunately this is turned into a call to memset. Need to figure it out
Comment 12 Mark Lam 2019-10-22 17:23:15 PDT
Comment on attachment 381630 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review

> Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj:-11037
> -				FE1D6D4523580E1F007A5C26 /* Options.cpp in Sources */,

Why this removal?  Please undo.
Comment 13 Saam Barati 2019-10-22 17:24:06 PDT
Comment on attachment 381630 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=381630&action=review

>> Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj:-11037
>> -				FE1D6D4523580E1F007A5C26 /* Options.cpp in Sources */,
> 
> Why this removal?  Please undo.

no idea. I blame Xcode.
Comment 14 Saam Barati 2019-10-22 17:44:47 PDT
I'm going to just write these in inline asm, and then we won't have to worry about it
Comment 15 Saam Barati 2019-10-29 17:34:39 PDT
Created attachment 382257 [details]
patch
Comment 16 Saam Barati 2019-10-29 17:37:24 PDT
Comment on attachment 382257 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=382257&action=review

> Source/JavaScriptCore/heap/GCMemoryOperations.h:39
> +// We use these memory operations when modifying memory that might be scanned by the concurrent collector.
> +// We don't call the default operations because they're not guaranteed to store to memory in eight byte aligned
> +// chunks. If we happened to fall into the system's normal byte copy loop, we may see a torn JSValue in the
> +// concurrent collector. Even though it's legal according to the C++ spec to turn something like an 8 byte store
> +// to memory into 8 single byte stores, we rely on LLVM (and other compilers) to not do such transformations.
> +// In practice, it appears that they don't. (If they did, this isn't the only place we would have an issue.
> +// For example, everywhere we do a store in a WriteBarrier would be subject to the same issue.)

Oops, I will update this comment.
Comment 17 Saam Barati 2019-10-29 17:40:39 PDT
Created attachment 382258 [details]
patch
Comment 18 Robin Morisset 2019-10-29 18:16:49 PDT
Comment on attachment 382258 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=382258&action=review

Do you have any performance numbers for this? In particular I'd like at least some checking that it is not too massive of a regression on each of x86_64, ARM64 and ARM32 (which I think we still support on some watches?)

> Source/JavaScriptCore/heap/GCMemoryOperations.h:50
> +        size_t count = bytes / 8;

Maybe extract this line and the next two as their own function (slowPathForwardMemcpy or something), since they are reused a below.

> Source/JavaScriptCore/heap/GCMemoryOperations.h:56
> +        size_t tmp;

Why did you declare tmp here, instead of just using a register that you mark clobbered (as you did for ymm0 and ymm1) ?

> Source/JavaScriptCore/heap/GCMemoryOperations.h:118
> +        RELEASE_ASSERT(!isARM64());

Why not a more explicit check that this is x86_64? I don't think we currently support a 64-bit version of MIPS or anything like that, but having an ASSERT here could save us trouble down the line.

> Source/JavaScriptCore/heap/GCMemoryOperations.h:219
> +    } else {

It might be cleaner to have this be outside of all the #if madness, since I believe it always does the right thing as long as dst < src.
Just start the function with:
```
if (bitwise_cast<uintptr_t>(src) > bitwise_cast<uintptr_t>(dst)) {
    gcSafeMemcpy(dst, src, bytes);
    return;
}
```
It would both reduce the level of nesting of the branches, and let you remove the forward loop in the non-GCC-compatible (or non-64-bits) case below.
Comment 19 Saam Barati 2019-10-29 20:27:19 PDT
(In reply to Robin Morisset from comment #18)
> Comment on attachment 382258 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=382258&action=review
> 
> Do you have any performance numbers for this? In particular I'd like at
> least some checking that it is not too massive of a regression on each of
> x86_64, ARM64 and ARM32 (which I think we still support on some watches?)
My plan is to just monitor the perf bots after landing. For 32bit, I’ll just make the code call memset/memmove/memcpy.

> 
> > Source/JavaScriptCore/heap/GCMemoryOperations.h:50
> > +        size_t count = bytes / 8;
> 
> Maybe extract this line and the next two as their own function
> (slowPathForwardMemcpy or something), since they are reused a below.
Sounds good.
> 
> > Source/JavaScriptCore/heap/GCMemoryOperations.h:56
> > +        size_t tmp;
> 
> Why did you declare tmp here, instead of just using a register that you mark
> clobbered (as you did for ymm0 and ymm1) ?
I used tmp so the compiler could pick the register. I’m not sure how to do the same with ymm registers.
> 
> > Source/JavaScriptCore/heap/GCMemoryOperations.h:118
> > +        RELEASE_ASSERT(!isARM64());
> 
> Why not a more explicit check that this is x86_64? I don't think we
> currently support a 64-bit version of MIPS or anything like that, but having
> an ASSERT here could save us trouble down the line.
Sure.

> 
> > Source/JavaScriptCore/heap/GCMemoryOperations.h:219
> > +    } else {
> 
> It might be cleaner to have this be outside of all the #if madness, since I
> believe it always does the right thing as long as dst < src.
> Just start the function with:
> ```
> if (bitwise_cast<uintptr_t>(src) > bitwise_cast<uintptr_t>(dst)) {
>     gcSafeMemcpy(dst, src, bytes);
>     return;
> }
> ```
> It would both reduce the level of nesting of the branches, and let you
> remove the forward loop in the non-GCC-compatible (or non-64-bits) case
> below.
Sounds good.
Comment 20 EWS Watchlist 2019-10-29 22:44:21 PDT
Comment on attachment 382258 [details]
patch

Attachment 382258 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/13190416

New failing tests:
fast/repaint/backgroundSizeRepaint.html
Comment 21 EWS Watchlist 2019-10-29 22:44:23 PDT
Created attachment 382283 [details]
Archive of layout-test-results from ews213 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews213  Port: win-future  Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Comment 22 Robin Morisset 2019-10-30 13:25:22 PDT
Comment on attachment 382258 [details]
patch

r=me, we discussed testing offline (land + run benchmarks concurrently).
Comment 23 Saam Barati 2019-10-30 13:56:28 PDT
Created attachment 382347 [details]
patch for landing
Comment 24 Aakash Jain 2019-10-30 16:24:28 PDT
(In reply to Saam Barati from comment #23)
> Created attachment 382347 [details]
> patch for landing
Seems to cause timeouts on api-tests on mac, https://ews-build.webkit.org/#/builders/3/builds/11216 build is running for 3+ hours, similar thing for https://ews-build.webkit.org/#/builders/3/builds/11221

Please verify before landing.
Comment 25 Saam Barati 2019-10-30 17:14:22 PDT
Created attachment 382396 [details]
patch for landing
Comment 26 Saam Barati 2019-10-30 19:08:15 PDT
Created attachment 382407 [details]
patch for landing
Comment 27 EWS Watchlist 2019-10-30 21:40:37 PDT
Comment on attachment 382407 [details]
patch for landing

Attachment 382407 [details] did not pass jsc-ews (mac):
Output: https://webkit-queues.webkit.org/results/13193684

New failing tests:
mozilla-tests.yaml/js1_5/Array/regress-101964.js.mozilla-baseline
Comment 28 EWS Watchlist 2019-10-31 06:17:32 PDT
Comment on attachment 382407 [details]
patch for landing

Attachment 382407 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/13195226

New failing tests:
fast/repaint/backgroundSizeRepaint.html
Comment 29 EWS Watchlist 2019-10-31 06:17:35 PDT
Created attachment 382455 [details]
Archive of layout-test-results from ews212 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews212  Port: win-future  Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Comment 30 Saam Barati 2019-10-31 10:19:24 PDT
Created attachment 382475 [details]
patch for landing

ok, I think it should all work now.
Comment 31 Saam Barati 2019-10-31 11:22:27 PDT
Created attachment 382480 [details]
patch for landing
Comment 32 WebKit Commit Bot 2019-10-31 13:51:03 PDT
Comment on attachment 382480 [details]
patch for landing

Clearing flags on attachment: 382480

Committed r251875: <https://trac.webkit.org/changeset/251875>
Comment 33 WebKit Commit Bot 2019-10-31 13:51:06 PDT
All reviewed patches have been landed.  Closing bug.
Comment 34 Alexey Proskuryakov 2019-11-02 17:51:54 PDT
Reverted in r251967, because this is suspected to have broken perf tests on iOS.
Comment 35 Saam Barati 2019-11-04 13:49:05 PST
Comment on attachment 382480 [details]
patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=382480&action=review

> Source/JavaScriptCore/heap/GCMemoryOperations.h:284
> +        "str d0, [%x[dst]], #0x10\t\n"

I believe this is the bug. It should be 0x8 not 0x10
Comment 36 Saam Barati 2019-11-04 14:01:22 PST
Created attachment 382770 [details]
patch for landing
Comment 37 Saam Barati 2019-11-04 14:36:33 PST
Created attachment 382775 [details]
patch for landing
Comment 38 WebKit Commit Bot 2019-11-04 15:57:41 PST
Comment on attachment 382775 [details]
patch for landing

Clearing flags on attachment: 382775

Committed r252024: <https://trac.webkit.org/changeset/252024>
Comment 39 WebKit Commit Bot 2019-11-04 15:57:44 PST
All reviewed patches have been landed.  Closing bug.