Bug 196160 - [JSC] Butterfly allocation from LargeAllocation should try "realloc" behavior if collector thread is not active
Summary: [JSC] Butterfly allocation from LargeAllocation should try "realloc" behavior...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-03-22 14:45 PDT by Yusuke Suzuki
Modified: 2019-03-31 23:54 PDT (History)
12 users (show)

See Also:


Attachments
Patch (15.09 KB, patch)
2019-03-22 17:08 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (15.12 KB, patch)
2019-03-22 17:24 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews107 for mac-highsierra-wk2 (1.90 MB, application/zip)
2019-03-22 18:32 PDT, EWS Watchlist
no flags Details
Archive of layout-test-results from ews103 for mac-highsierra (1.39 MB, application/zip)
2019-03-22 18:36 PDT, EWS Watchlist
no flags Details
Archive of layout-test-results from ews112 for mac-highsierra (2.15 MB, application/zip)
2019-03-22 18:42 PDT, EWS Watchlist
no flags Details
Archive of layout-test-results from ews125 for ios-simulator-wk2 (10.76 MB, application/zip)
2019-03-22 19:21 PDT, EWS Watchlist
no flags Details
Patch (24.83 KB, patch)
2019-03-22 19:40 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (24.96 KB, patch)
2019-03-22 19:44 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (26.21 KB, patch)
2019-03-22 20:32 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (26.27 KB, patch)
2019-03-22 21:17 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews205 for win-future (12.86 MB, application/zip)
2019-03-22 23:58 PDT, EWS Watchlist
no flags Details
Patch (30.39 KB, patch)
2019-03-23 00:05 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (33.37 KB, patch)
2019-03-25 13:07 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (33.37 KB, patch)
2019-03-25 13:45 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (32.96 KB, patch)
2019-03-26 18:03 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (32.96 KB, patch)
2019-03-26 18:27 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (29.55 KB, patch)
2019-03-26 18:30 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (33.26 KB, patch)
2019-03-26 18:35 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (32.84 KB, patch)
2019-03-26 18:40 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews123 for ios-simulator-wk2 (2.59 MB, application/zip)
2019-03-26 20:56 PDT, EWS Watchlist
no flags Details
Patch (32.85 KB, patch)
2019-03-26 23:19 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (31.93 KB, patch)
2019-03-29 17:26 PDT, Yusuke Suzuki
sbarati: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2019-03-22 14:45:16 PDT
...
Comment 1 Yusuke Suzuki 2019-03-22 14:47:18 PDT
I don't think this hurts performance even if we make it by default.
This is simply because allocating LargeAllocation should be already super slow.
And this cost is amortized by the exponential size increase policy.
But anyway, we can measure it. And if we hit some performance problems, we could do,

1. enable it only if sweepSynchronously() is true
2. find some way for mitigation

I think we do not need to do this and believe it is performance-neutral.
Comment 2 Yusuke Suzuki 2019-03-22 17:08:04 PDT
Created attachment 365781 [details]
Patch

Start working
Comment 3 Yusuke Suzuki 2019-03-22 17:24:05 PDT
Created attachment 365786 [details]
Patch

more
Comment 4 EWS Watchlist 2019-03-22 18:32:33 PDT
Comment on attachment 365786 [details]
Patch

Attachment 365786 [details] did not pass mac-wk2-ews (mac-wk2):
Output: https://webkit-queues.webkit.org/results/11619055

Number of test failures exceeded the failure limit.
Comment 5 EWS Watchlist 2019-03-22 18:32:35 PDT
Created attachment 365789 [details]
Archive of layout-test-results from ews107 for mac-highsierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-highsierra-wk2  Platform: Mac OS X 10.13.6
Comment 6 EWS Watchlist 2019-03-22 18:36:40 PDT
Comment on attachment 365786 [details]
Patch

Attachment 365786 [details] did not pass mac-ews (mac):
Output: https://webkit-queues.webkit.org/results/11619108

Number of test failures exceeded the failure limit.
Comment 7 EWS Watchlist 2019-03-22 18:36:42 PDT
Created attachment 365790 [details]
Archive of layout-test-results from ews103 for mac-highsierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-highsierra  Platform: Mac OS X 10.13.6
Comment 8 EWS Watchlist 2019-03-22 18:42:36 PDT
Comment on attachment 365786 [details]
Patch

Attachment 365786 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/11619062

Number of test failures exceeded the failure limit.
Comment 9 EWS Watchlist 2019-03-22 18:42:37 PDT
Created attachment 365791 [details]
Archive of layout-test-results from ews112 for mac-highsierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-highsierra  Platform: Mac OS X 10.13.6
Comment 10 EWS Watchlist 2019-03-22 19:21:48 PDT
Comment on attachment 365786 [details]
Patch

Attachment 365786 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: https://webkit-queues.webkit.org/results/11619219

Number of test failures exceeded the failure limit.
Comment 11 EWS Watchlist 2019-03-22 19:21:50 PDT
Created attachment 365795 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.6
Comment 12 Yusuke Suzuki 2019-03-22 19:39:54 PDT
It turns out that "realloc" behavior is very important. While bmalloc uses "malloc", "memcpy" and "free" operations for "realloc" request, system malloc attempts to use in-place expansion.
Comment 13 Yusuke Suzuki 2019-03-22 19:40:11 PDT
Created attachment 365796 [details]
Patch
Comment 14 Yusuke Suzuki 2019-03-22 19:44:52 PDT
Created attachment 365797 [details]
Patch
Comment 15 Yusuke Suzuki 2019-03-22 20:32:26 PDT
Created attachment 365800 [details]
Patch
Comment 16 Yusuke Suzuki 2019-03-22 21:17:47 PDT
Created attachment 365801 [details]
Patch
Comment 17 Yusuke Suzuki 2019-03-22 23:43:58 PDT
(In reply to Yusuke Suzuki from comment #12)
> It turns out that "realloc" behavior is very important. While bmalloc uses
> "malloc", "memcpy" and "free" operations for "realloc" request, system
> malloc attempts to use in-place expansion.

Darwin libmalloc explicitly states that "posix_memalign + realloc is valid" in its man page.
In glibc, there is no documentation, but it works.
In bmalloc, it is valid.

In windows, using realloc with _aligned_malloc is not valid. But it has _aligned_realloc. So if we want, we can use it.
Comment 18 EWS Watchlist 2019-03-22 23:58:29 PDT
Comment on attachment 365801 [details]
Patch

Attachment 365801 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/11620968

New failing tests:
js/sort-large-array.html
js/dom/put-direct-index-beyond-vector-length-resize.html
js/array-iterate-backwards.html
js/regress-155776.html
fast/events/bogus-event-listener-invalidation.html
js/promises-tests/promises-in-workers.html
js/dom/array-with-double-assign.html
fast/xpath/xpath-result-eventlistener-crash.html
js/arrowfunction-lexical-bind-arguments-strict.html
js/reentrant-caching.html
js/dom/array-with-double-push.html
editing/selection/move-left-right.html
fast/workers/worker-gc2.html
js/function-apply-many-args.html
js/arrowfunction-lexical-bind-arguments-non-strict.html
fast/dom/SelectorAPI/resig-SelectorsAPI-test.xhtml
editing/selection/move-by-word-visually-crash-test-5.html
fast/css/css-selector-deeply-nested.html
fast/css/rule-selector-overflow.html
js/dom/document-all-between-frames.html
js/slow-stress/array-prototype-filter.html
Comment 19 EWS Watchlist 2019-03-22 23:58:41 PDT
Created attachment 365803 [details]
Archive of layout-test-results from ews205 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews205  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 20 Yusuke Suzuki 2019-03-23 00:05:46 PDT
Created attachment 365805 [details]
Patch
Comment 21 Yusuke Suzuki 2019-03-25 13:07:20 PDT
Created attachment 365888 [details]
Patch
Comment 22 Yusuke Suzuki 2019-03-25 13:45:47 PDT
Created attachment 365893 [details]
Patch
Comment 23 Yusuke Suzuki 2019-03-25 20:19:16 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

> Source/JavaScriptCore/heap/LargeAllocation.cpp:81
> +    if (isOnList())
> +        remove();
> +    unsigned originalCellSize = m_cellSize;
> +
> +    void* space = subspace->alignedMemoryAllocator()->tryReallocate(this, sizeIncludingDistancingAndAlignmentAdjustment);
> +    if (!space)
> +        return nullptr;

I do not want to call LargeAllocation's destructor here since we still use oldAllocation if tryReallocate fails.
Comment 24 Geoffrey Garen 2019-03-26 14:16:34 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

> Source/WTF/wtf/FastMalloc.cpp:271
> +bool canUseReallocOnToAlignedMallocedPointer()
> +{
> +#if OS(DARWIN)
> +    // `man posix_memalign` explicitly states that this is allowed.
> +    return true;
> +#else
> +    return false;
> +#endif
> +}

Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?
Comment 25 Yusuke Suzuki 2019-03-26 14:29:40 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

>> Source/WTF/wtf/FastMalloc.cpp:271
>> +}
> 
> Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?

1. In Darwin libmalloc, it succeeds and it is guaranteed, and documented in `man`.
So, in Darwin, we can just return `true`, and the code is doing this.

2. In Windows, _aligned_malloc & realloc fail. _aligned_malloc & _aligned_realloc succeed.
And yes. As you said, we can make
1) fastMalloc to call _aligned_malloc with 8 or 16
2) fastRealloc to call _aligned_realloc with 8 or 16
3) fastFree to call _aligned_free
But not sure about the impact in terms of performance, but I think it does not have an impact as long as we use 8 or 16 for default alignment.

3. In Linux GLibC, it succeeds, but it is not guaranteed. It depends on the current implementation. In the other libc, it depends on the implementation of each Libc.

So, due to (3), anyway, we need to have this concept.
Comment 26 Geoffrey Garen 2019-03-26 16:09:39 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

>>> Source/WTF/wtf/FastMalloc.cpp:271
>>> +}
>> 
>> Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?
> 
> 1. In Darwin libmalloc, it succeeds and it is guaranteed, and documented in `man`.
> So, in Darwin, we can just return `true`, and the code is doing this.
> 
> 2. In Windows, _aligned_malloc & realloc fail. _aligned_malloc & _aligned_realloc succeed.
> And yes. As you said, we can make
> 1) fastMalloc to call _aligned_malloc with 8 or 16
> 2) fastRealloc to call _aligned_realloc with 8 or 16
> 3) fastFree to call _aligned_free
> But not sure about the impact in terms of performance, but I think it does not have an impact as long as we use 8 or 16 for default alignment.
> 
> 3. In Linux GLibC, it succeeds, but it is not guaranteed. It depends on the current implementation. In the other libc, it depends on the implementation of each Libc.
> 
> So, due to (3), anyway, we need to have this concept.

I see. Thanks for the explanation!

I wonder if it would be better to have a Windows path in FastMalloc (to call _aligned_realloc) and libc path in the non-Darwin DebugHeap (to call posix_memalign and memcpy). Would that work? (It's nice for client code when FastMalloc can paper over these platform differences.)
Comment 27 Yusuke Suzuki 2019-03-26 16:39:27 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

>>>> Source/WTF/wtf/FastMalloc.cpp:271
>>>> +}
>>> 
>>> Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?
>> 
>> 1. In Darwin libmalloc, it succeeds and it is guaranteed, and documented in `man`.
>> So, in Darwin, we can just return `true`, and the code is doing this.
>> 
>> 2. In Windows, _aligned_malloc & realloc fail. _aligned_malloc & _aligned_realloc succeed.
>> And yes. As you said, we can make
>> 1) fastMalloc to call _aligned_malloc with 8 or 16
>> 2) fastRealloc to call _aligned_realloc with 8 or 16
>> 3) fastFree to call _aligned_free
>> But not sure about the impact in terms of performance, but I think it does not have an impact as long as we use 8 or 16 for default alignment.
>> 
>> 3. In Linux GLibC, it succeeds, but it is not guaranteed. It depends on the current implementation. In the other libc, it depends on the implementation of each Libc.
>> 
>> So, due to (3), anyway, we need to have this concept.
> 
> I see. Thanks for the explanation!
> 
> I wonder if it would be better to have a Windows path in FastMalloc (to call _aligned_realloc) and libc path in the non-Darwin DebugHeap (to call posix_memalign and memcpy). Would that work? (It's nice for client code when FastMalloc can paper over these platform differences.)

Sounds like a good idea! If we change Windows to always using _aligned_malloc / _aligned_realloc / _aligned_free, we can completely hide this concept behind the FastMalloc.
The problem was Windows' fastAlignedMalloc and fastAlignedFree. We cannot call realloc and free on fastAlignedMalloc memory, and this problem leaded to this complexity. But if we change Windows' fastMalloc / fastFree implementations to always using _aligned_malloc etc. we can meet the following features.

1. fastMalloced memory can be freed by fastFree
2. fastAlignedMalloced memory can be freed by fastFree (this was not possible, and it prevented us from introducing fallback implementation to fastRealloc (we need to perform "free" onto the given memory, but we do not know whether the memory is allocated from fastMalloc or fastAlignedMalloc)).

This allows us to implement fastRealloc for the non Darwin & non Windows environments as "malloc, memcpy and free".
Comment 28 Yusuke Suzuki 2019-03-26 16:50:56 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

>>>>> Source/WTF/wtf/FastMalloc.cpp:271
>>>>> +}
>>>> 
>>>> Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?
>>> 
>>> 1. In Darwin libmalloc, it succeeds and it is guaranteed, and documented in `man`.
>>> So, in Darwin, we can just return `true`, and the code is doing this.
>>> 
>>> 2. In Windows, _aligned_malloc & realloc fail. _aligned_malloc & _aligned_realloc succeed.
>>> And yes. As you said, we can make
>>> 1) fastMalloc to call _aligned_malloc with 8 or 16
>>> 2) fastRealloc to call _aligned_realloc with 8 or 16
>>> 3) fastFree to call _aligned_free
>>> But not sure about the impact in terms of performance, but I think it does not have an impact as long as we use 8 or 16 for default alignment.
>>> 
>>> 3. In Linux GLibC, it succeeds, but it is not guaranteed. It depends on the current implementation. In the other libc, it depends on the implementation of each Libc.
>>> 
>>> So, due to (3), anyway, we need to have this concept.
>> 
>> I see. Thanks for the explanation!
>> 
>> I wonder if it would be better to have a Windows path in FastMalloc (to call _aligned_realloc) and libc path in the non-Darwin DebugHeap (to call posix_memalign and memcpy). Would that work? (It's nice for client code when FastMalloc can paper over these platform differences.)
> 
> Sounds like a good idea! If we change Windows to always using _aligned_malloc / _aligned_realloc / _aligned_free, we can completely hide this concept behind the FastMalloc.
> The problem was Windows' fastAlignedMalloc and fastAlignedFree. We cannot call realloc and free on fastAlignedMalloc memory, and this problem leaded to this complexity. But if we change Windows' fastMalloc / fastFree implementations to always using _aligned_malloc etc. we can meet the following features.
> 
> 1. fastMalloced memory can be freed by fastFree
> 2. fastAlignedMalloced memory can be freed by fastFree (this was not possible, and it prevented us from introducing fallback implementation to fastRealloc (we need to perform "free" onto the given memory, but we do not know whether the memory is allocated from fastMalloc or fastAlignedMalloc)).
> 
> This allows us to implement fastRealloc for the non Darwin & non Windows environments as "malloc, memcpy and free".

Ah! No, unfortunately :(
"realloc" only takes (1) a pointer to the previously allocated memory and (2) size for newly allocated memory. But we need to know the size of (1)'s memory region, since,

realloc(pointer, largerSize)
=>

void* memory = malloc(largerSize);
memcpy(memory, pointer, min(largerSize, sizeOfPointer));  // !!!
free(pointer);

But we do not know "sizeOfPointer" here. We can do that if we pass the allocated memory size to "realloc" function, but it adds additional complexity, and I think it is more complex and error prone than checking "canUseReallocOnToAlignedMallocedPointer".
Comment 29 Yusuke Suzuki 2019-03-26 16:59:26 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

>>>>>> Source/WTF/wtf/FastMalloc.cpp:271
>>>>>> +}
>>>>> 
>>>>> Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?
>>>> 
>>>> 1. In Darwin libmalloc, it succeeds and it is guaranteed, and documented in `man`.
>>>> So, in Darwin, we can just return `true`, and the code is doing this.
>>>> 
>>>> 2. In Windows, _aligned_malloc & realloc fail. _aligned_malloc & _aligned_realloc succeed.
>>>> And yes. As you said, we can make
>>>> 1) fastMalloc to call _aligned_malloc with 8 or 16
>>>> 2) fastRealloc to call _aligned_realloc with 8 or 16
>>>> 3) fastFree to call _aligned_free
>>>> But not sure about the impact in terms of performance, but I think it does not have an impact as long as we use 8 or 16 for default alignment.
>>>> 
>>>> 3. In Linux GLibC, it succeeds, but it is not guaranteed. It depends on the current implementation. In the other libc, it depends on the implementation of each Libc.
>>>> 
>>>> So, due to (3), anyway, we need to have this concept.
>>> 
>>> I see. Thanks for the explanation!
>>> 
>>> I wonder if it would be better to have a Windows path in FastMalloc (to call _aligned_realloc) and libc path in the non-Darwin DebugHeap (to call posix_memalign and memcpy). Would that work? (It's nice for client code when FastMalloc can paper over these platform differences.)
>> 
>> Sounds like a good idea! If we change Windows to always using _aligned_malloc / _aligned_realloc / _aligned_free, we can completely hide this concept behind the FastMalloc.
>> The problem was Windows' fastAlignedMalloc and fastAlignedFree. We cannot call realloc and free on fastAlignedMalloc memory, and this problem leaded to this complexity. But if we change Windows' fastMalloc / fastFree implementations to always using _aligned_malloc etc. we can meet the following features.
>> 
>> 1. fastMalloced memory can be freed by fastFree
>> 2. fastAlignedMalloced memory can be freed by fastFree (this was not possible, and it prevented us from introducing fallback implementation to fastRealloc (we need to perform "free" onto the given memory, but we do not know whether the memory is allocated from fastMalloc or fastAlignedMalloc)).
>> 
>> This allows us to implement fastRealloc for the non Darwin & non Windows environments as "malloc, memcpy and free".
> 
> Ah! No, unfortunately :(
> "realloc" only takes (1) a pointer to the previously allocated memory and (2) size for newly allocated memory. But we need to know the size of (1)'s memory region, since,
> 
> realloc(pointer, largerSize)
> =>
> 
> void* memory = malloc(largerSize);
> memcpy(memory, pointer, min(largerSize, sizeOfPointer));  // !!!
> free(pointer);
> 
> But we do not know "sizeOfPointer" here. We can do that if we pass the allocated memory size to "realloc" function, but it adds additional complexity, and I think it is more complex and error prone than checking "canUseReallocOnToAlignedMallocedPointer".

Another way is stopping using "posix_memalign" for allocating LargeAllocation in JSC GC. Use "malloc" and adjust the pointer to align it to 16B.
This way, we can always use "realloc" onto LargeAllocation since we know that this is allocated by "malloc". I think it would be cleaner since,

1. anyway, we already have a code handling misalignment for realloc-ed memory.
2. we do not need to rely on this "realloc"'s platform behavior.

I'll try to do this.
Comment 30 Yusuke Suzuki 2019-03-26 17:17:49 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

>>>>>>> Source/WTF/wtf/FastMalloc.cpp:271
>>>>>>> +}
>>>>>> 
>>>>>> Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?
>>>>> 
>>>>> 1. In Darwin libmalloc, it succeeds and it is guaranteed, and documented in `man`.
>>>>> So, in Darwin, we can just return `true`, and the code is doing this.
>>>>> 
>>>>> 2. In Windows, _aligned_malloc & realloc fail. _aligned_malloc & _aligned_realloc succeed.
>>>>> And yes. As you said, we can make
>>>>> 1) fastMalloc to call _aligned_malloc with 8 or 16
>>>>> 2) fastRealloc to call _aligned_realloc with 8 or 16
>>>>> 3) fastFree to call _aligned_free
>>>>> But not sure about the impact in terms of performance, but I think it does not have an impact as long as we use 8 or 16 for default alignment.
>>>>> 
>>>>> 3. In Linux GLibC, it succeeds, but it is not guaranteed. It depends on the current implementation. In the other libc, it depends on the implementation of each Libc.
>>>>> 
>>>>> So, due to (3), anyway, we need to have this concept.
>>>> 
>>>> I see. Thanks for the explanation!
>>>> 
>>>> I wonder if it would be better to have a Windows path in FastMalloc (to call _aligned_realloc) and libc path in the non-Darwin DebugHeap (to call posix_memalign and memcpy). Would that work? (It's nice for client code when FastMalloc can paper over these platform differences.)
>>> 
>>> Sounds like a good idea! If we change Windows to always using _aligned_malloc / _aligned_realloc / _aligned_free, we can completely hide this concept behind the FastMalloc.
>>> The problem was Windows' fastAlignedMalloc and fastAlignedFree. We cannot call realloc and free on fastAlignedMalloc memory, and this problem leaded to this complexity. But if we change Windows' fastMalloc / fastFree implementations to always using _aligned_malloc etc. we can meet the following features.
>>> 
>>> 1. fastMalloced memory can be freed by fastFree
>>> 2. fastAlignedMalloced memory can be freed by fastFree (this was not possible, and it prevented us from introducing fallback implementation to fastRealloc (we need to perform "free" onto the given memory, but we do not know whether the memory is allocated from fastMalloc or fastAlignedMalloc)).
>>> 
>>> This allows us to implement fastRealloc for the non Darwin & non Windows environments as "malloc, memcpy and free".
>> 
>> Ah! No, unfortunately :(
>> "realloc" only takes (1) a pointer to the previously allocated memory and (2) size for newly allocated memory. But we need to know the size of (1)'s memory region, since,
>> 
>> realloc(pointer, largerSize)
>> =>
>> 
>> void* memory = malloc(largerSize);
>> memcpy(memory, pointer, min(largerSize, sizeOfPointer));  // !!!
>> free(pointer);
>> 
>> But we do not know "sizeOfPointer" here. We can do that if we pass the allocated memory size to "realloc" function, but it adds additional complexity, and I think it is more complex and error prone than checking "canUseReallocOnToAlignedMallocedPointer".
> 
> Another way is stopping using "posix_memalign" for allocating LargeAllocation in JSC GC. Use "malloc" and adjust the pointer to align it to 16B.
> This way, we can always use "realloc" onto LargeAllocation since we know that this is allocated by "malloc". I think it would be cleaner since,
> 
> 1. anyway, we already have a code handling misalignment for realloc-ed memory.
> 2. we do not need to rely on this "realloc"'s platform behavior.
> 
> I'll try to do this.

I've tried, but it seems that this is also not so good than I thought. We need to add non-aligned version of allocation-related member functions to JSC::AlignedMemoryAllocator,
Comment 31 Yusuke Suzuki 2019-03-26 17:18:20 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

>>>>>>>> Source/WTF/wtf/FastMalloc.cpp:271
>>>>>>>> +}
>>>>>>> 
>>>>>>> Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?
>>>>>> 
>>>>>> 1. In Darwin libmalloc, it succeeds and it is guaranteed, and documented in `man`.
>>>>>> So, in Darwin, we can just return `true`, and the code is doing this.
>>>>>> 
>>>>>> 2. In Windows, _aligned_malloc & realloc fail. _aligned_malloc & _aligned_realloc succeed.
>>>>>> And yes. As you said, we can make
>>>>>> 1) fastMalloc to call _aligned_malloc with 8 or 16
>>>>>> 2) fastRealloc to call _aligned_realloc with 8 or 16
>>>>>> 3) fastFree to call _aligned_free
>>>>>> But not sure about the impact in terms of performance, but I think it does not have an impact as long as we use 8 or 16 for default alignment.
>>>>>> 
>>>>>> 3. In Linux GLibC, it succeeds, but it is not guaranteed. It depends on the current implementation. In the other libc, it depends on the implementation of each Libc.
>>>>>> 
>>>>>> So, due to (3), anyway, we need to have this concept.
>>>>> 
>>>>> I see. Thanks for the explanation!
>>>>> 
>>>>> I wonder if it would be better to have a Windows path in FastMalloc (to call _aligned_realloc) and libc path in the non-Darwin DebugHeap (to call posix_memalign and memcpy). Would that work? (It's nice for client code when FastMalloc can paper over these platform differences.)
>>>> 
>>>> Sounds like a good idea! If we change Windows to always using _aligned_malloc / _aligned_realloc / _aligned_free, we can completely hide this concept behind the FastMalloc.
>>>> The problem was Windows' fastAlignedMalloc and fastAlignedFree. We cannot call realloc and free on fastAlignedMalloc memory, and this problem leaded to this complexity. But if we change Windows' fastMalloc / fastFree implementations to always using _aligned_malloc etc. we can meet the following features.
>>>> 
>>>> 1. fastMalloced memory can be freed by fastFree
>>>> 2. fastAlignedMalloced memory can be freed by fastFree (this was not possible, and it prevented us from introducing fallback implementation to fastRealloc (we need to perform "free" onto the given memory, but we do not know whether the memory is allocated from fastMalloc or fastAlignedMalloc)).
>>>> 
>>>> This allows us to implement fastRealloc for the non Darwin & non Windows environments as "malloc, memcpy and free".
>>> 
>>> Ah! No, unfortunately :(
>>> "realloc" only takes (1) a pointer to the previously allocated memory and (2) size for newly allocated memory. But we need to know the size of (1)'s memory region, since,
>>> 
>>> realloc(pointer, largerSize)
>>> =>
>>> 
>>> void* memory = malloc(largerSize);
>>> memcpy(memory, pointer, min(largerSize, sizeOfPointer));  // !!!
>>> free(pointer);
>>> 
>>> But we do not know "sizeOfPointer" here. We can do that if we pass the allocated memory size to "realloc" function, but it adds additional complexity, and I think it is more complex and error prone than checking "canUseReallocOnToAlignedMallocedPointer".
>> 
>> Another way is stopping using "posix_memalign" for allocating LargeAllocation in JSC GC. Use "malloc" and adjust the pointer to align it to 16B.
>> This way, we can always use "realloc" onto LargeAllocation since we know that this is allocated by "malloc". I think it would be cleaner since,
>> 
>> 1. anyway, we already have a code handling misalignment for realloc-ed memory.
>> 2. we do not need to rely on this "realloc"'s platform behavior.
>> 
>> I'll try to do this.
> 
> I've tried, but it seems that this is also not so good than I thought. We need to add non-aligned version of allocation-related member functions to JSC::AlignedMemoryAllocator,

But it would be better. I'll continue trying.
Comment 32 Yusuke Suzuki 2019-03-26 18:00:50 PDT
Comment on attachment 365893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365893&action=review

>>>>>>>>> Source/WTF/wtf/FastMalloc.cpp:271
>>>>>>>>> +}
>>>>>>>> 
>>>>>>>> Do we really need this concept? Which OS will fail a realloc on an aligned malloc? Is it just Windows? If so, can the Windows FastMalloc path just add a call to _aligned_realloc?
>>>>>>> 
>>>>>>> 1. In Darwin libmalloc, it succeeds and it is guaranteed, and documented in `man`.
>>>>>>> So, in Darwin, we can just return `true`, and the code is doing this.
>>>>>>> 
>>>>>>> 2. In Windows, _aligned_malloc & realloc fail. _aligned_malloc & _aligned_realloc succeed.
>>>>>>> And yes. As you said, we can make
>>>>>>> 1) fastMalloc to call _aligned_malloc with 8 or 16
>>>>>>> 2) fastRealloc to call _aligned_realloc with 8 or 16
>>>>>>> 3) fastFree to call _aligned_free
>>>>>>> But not sure about the impact in terms of performance, but I think it does not have an impact as long as we use 8 or 16 for default alignment.
>>>>>>> 
>>>>>>> 3. In Linux GLibC, it succeeds, but it is not guaranteed. It depends on the current implementation. In the other libc, it depends on the implementation of each Libc.
>>>>>>> 
>>>>>>> So, due to (3), anyway, we need to have this concept.
>>>>>> 
>>>>>> I see. Thanks for the explanation!
>>>>>> 
>>>>>> I wonder if it would be better to have a Windows path in FastMalloc (to call _aligned_realloc) and libc path in the non-Darwin DebugHeap (to call posix_memalign and memcpy). Would that work? (It's nice for client code when FastMalloc can paper over these platform differences.)
>>>>> 
>>>>> Sounds like a good idea! If we change Windows to always using _aligned_malloc / _aligned_realloc / _aligned_free, we can completely hide this concept behind the FastMalloc.
>>>>> The problem was Windows' fastAlignedMalloc and fastAlignedFree. We cannot call realloc and free on fastAlignedMalloc memory, and this problem leaded to this complexity. But if we change Windows' fastMalloc / fastFree implementations to always using _aligned_malloc etc. we can meet the following features.
>>>>> 
>>>>> 1. fastMalloced memory can be freed by fastFree
>>>>> 2. fastAlignedMalloced memory can be freed by fastFree (this was not possible, and it prevented us from introducing fallback implementation to fastRealloc (we need to perform "free" onto the given memory, but we do not know whether the memory is allocated from fastMalloc or fastAlignedMalloc)).
>>>>> 
>>>>> This allows us to implement fastRealloc for the non Darwin & non Windows environments as "malloc, memcpy and free".
>>>> 
>>>> Ah! No, unfortunately :(
>>>> "realloc" only takes (1) a pointer to the previously allocated memory and (2) size for newly allocated memory. But we need to know the size of (1)'s memory region, since,
>>>> 
>>>> realloc(pointer, largerSize)
>>>> =>
>>>> 
>>>> void* memory = malloc(largerSize);
>>>> memcpy(memory, pointer, min(largerSize, sizeOfPointer));  // !!!
>>>> free(pointer);
>>>> 
>>>> But we do not know "sizeOfPointer" here. We can do that if we pass the allocated memory size to "realloc" function, but it adds additional complexity, and I think it is more complex and error prone than checking "canUseReallocOnToAlignedMallocedPointer".
>>> 
>>> Another way is stopping using "posix_memalign" for allocating LargeAllocation in JSC GC. Use "malloc" and adjust the pointer to align it to 16B.
>>> This way, we can always use "realloc" onto LargeAllocation since we know that this is allocated by "malloc". I think it would be cleaner since,
>>> 
>>> 1. anyway, we already have a code handling misalignment for realloc-ed memory.
>>> 2. we do not need to rely on this "realloc"'s platform behavior.
>>> 
>>> I'll try to do this.
>> 
>> I've tried, but it seems that this is also not so good than I thought. We need to add non-aligned version of allocation-related member functions to JSC::AlignedMemoryAllocator,
> 
> But it would be better. I'll continue trying.

I've uploaded the patch. In this patch, I just stop using this platform-specific feature. LargeAllocation is always managed by malloc & free. Stop using posix_memalign.
Then, we can use realloc safely in all the platforms.
Comment 33 Yusuke Suzuki 2019-03-26 18:03:05 PDT
Created attachment 366030 [details]
Patch
Comment 34 Yusuke Suzuki 2019-03-26 18:27:45 PDT
Created attachment 366033 [details]
Patch
Comment 35 Yusuke Suzuki 2019-03-26 18:30:25 PDT
Created attachment 366034 [details]
Patch
Comment 36 Yusuke Suzuki 2019-03-26 18:35:21 PDT
Created attachment 366035 [details]
Patch
Comment 37 Yusuke Suzuki 2019-03-26 18:40:28 PDT
Created attachment 366039 [details]
Patch
Comment 38 EWS Watchlist 2019-03-26 20:56:51 PDT
Comment on attachment 366039 [details]
Patch

Attachment 366039 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: https://webkit-queues.webkit.org/results/11678869

New failing tests:
fast/visual-viewport/ios/min-scale-greater-than-one.html
Comment 39 EWS Watchlist 2019-03-26 20:56:54 PDT
Created attachment 366044 [details]
Archive of layout-test-results from ews123 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews123  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.6
Comment 40 Yusuke Suzuki 2019-03-26 23:19:40 PDT
Created attachment 366054 [details]
Patch
Comment 41 Saam Barati 2019-03-28 22:56:36 PDT
Comment on attachment 366054 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366054&action=review

> Source/JavaScriptCore/heap/CompleteSubspace.cpp:167
> +    ASSERT(oldAllocation->weakSet().isTriviallyDestructible());

How is this guaranteed? Why is this needed?

> Source/JavaScriptCore/heap/CompleteSubspace.cpp:189
> +        m_largeAllocations.append(oldAllocation);

Why do we need to do this if we failed to allocate?

> Source/JavaScriptCore/heap/LargeAllocation.cpp:77
> +    // This includes padding at the end of the allocation to maintain the distancing constraint.

We should remove this.

> Source/JavaScriptCore/heap/LargeAllocation.cpp:105
> +    // We have 4 patterns.

This is super unfortunate if we have to do this memmove.

Doesn’t Bmalloc have an aligned malloc? Can’t we just use that?
Comment 42 Yusuke Suzuki 2019-03-29 00:39:55 PDT
Comment on attachment 366054 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366054&action=review

>> Source/JavaScriptCore/heap/CompleteSubspace.cpp:167
>> +    ASSERT(oldAllocation->weakSet().isTriviallyDestructible());
> 
> How is this guaranteed? Why is this needed?

Weak<> only accept JSCell*, so Auxiliary CompleteSubspace never uses it.
Since we perform "realloc" without calling destructors, this needs to be ensured.
If we call a destructor in "realloc", recovering the oldAllocation becomes super tricky instead when "realloc" fails.

>> Source/JavaScriptCore/heap/CompleteSubspace.cpp:189
>> +        m_largeAllocations.append(oldAllocation);
> 
> Why do we need to do this if we failed to allocate?

Because we removed this from Subspace in LargeAllocation::tryReallocate. If we do not remove oldAllocation from the linked list in "tryReallocate", we miss the chance to remove oldAllocation from Subspace's doubly linked list when "realloc" succeeds since address may be changed by "realloc".

>> Source/JavaScriptCore/heap/LargeAllocation.cpp:77
>> +    // This includes padding at the end of the allocation to maintain the distancing constraint.
> 
> We should remove this.

I'll remove this in "distancing removing patch" side :)

>> Source/JavaScriptCore/heap/LargeAllocation.cpp:105
>> +    // We have 4 patterns.
> 
> This is super unfortunate if we have to do this memmove.
> 
> Doesn’t Bmalloc have an aligned malloc? Can’t we just use that?

bmalloc has aligned malloc. But it is not sufficient... What we need is aligned realloc. and nobody has it! (except for Windows).

Even if we use aligned malloc in the first malloc (LargeAllocation::tryCreate), we do not have any guarantee that realloc-ed malloc has the same alignment (this is explicitly described in `man posix_memalign`).
So, anyway, we need this complicated code. Let's consider the following example,

1. 1st LargeAllocation is allocated from aligned-malloc (posix_memalign)
2. After that, this LargeAllocation is realloc-ed (This is only valid on Darwin platforms. In Windows, it crashes. In Linux, it works but no guarantee).
3. Then, when we again want to expand this LargeAllocation, we need this adjustment anyway since result in (2)'s realloc may not have an alignment specified in (1).

If we have an API like, posix_memaligned_realloc or aligned_realloc thing, we can consistently use that. But it does not exist in Darwin unfortunately :(
We can add it to bmalloc. But since system malloc does not have that, we cannot use it anyway here.

Good news: In Darwin libmalloc, returned memory from malloc is always 16B aligned (undocumented, but this is correct in the current implementation IIRC). So basically, memmove does not happen if we use this code with system malloc in Darwin.
Comment 43 Yusuke Suzuki 2019-03-29 17:13:39 PDT
Comment on attachment 366054 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366054&action=review

>>> Source/JavaScriptCore/heap/CompleteSubspace.cpp:189
>>> +        m_largeAllocations.append(oldAllocation);
>> 
>> Why do we need to do this if we failed to allocate?
> 
> Because we removed this from Subspace in LargeAllocation::tryReallocate. If we do not remove oldAllocation from the linked list in "tryReallocate", we miss the chance to remove oldAllocation from Subspace's doubly linked list when "realloc" succeeds since address may be changed by "realloc".

I changed this code a bit: moving 

    if (isOnList())
        remove();

operations out of tryReallocate to make code cleaner.

>>> Source/JavaScriptCore/heap/LargeAllocation.cpp:77
>>> +    // This includes padding at the end of the allocation to maintain the distancing constraint.
>> 
>> We should remove this.
> 
> I'll remove this in "distancing removing patch" side :)

Removed. So I'll update the patch.
Comment 44 Yusuke Suzuki 2019-03-29 17:26:18 PDT
Created attachment 366331 [details]
Patch
Comment 45 Yusuke Suzuki 2019-03-29 18:18:31 PDT
It is ready.
Comment 46 Saam Barati 2019-03-31 16:44:36 PDT
Comment on attachment 366331 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366331&action=review

r=me

> Source/JavaScriptCore/heap/AlignedMemoryAllocator.h:53
> +    // Only some of carefully-designed allocator has this.

“Only some carefully-designed allocators have this.”

That said, not sure this comment adds much. It’s better to explain why

> Source/JavaScriptCore/heap/LargeAllocation.cpp:70
> +    size_t allocationSizeAndAlignmentAdjustment = headerSize() + size + halfAlignment;

Nit: i think a better name is “alignmentAdjustedAllocationSize”

> Source/JavaScriptCore/heap/LargeAllocation.cpp:76
> +    bool oldAlignmentAdjusted = m_alignmentAdjusted;

Nit: another name suggestion, I think this makes more sense as “adjusted alignment” instead of “alignment adjusted”, e.g “m_adjustedAlignment”
Comment 47 Yusuke Suzuki 2019-03-31 23:40:36 PDT
Comment on attachment 366331 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=366331&action=review

Thank you!

>> Source/JavaScriptCore/heap/AlignedMemoryAllocator.h:53
>> +    // Only some of carefully-designed allocator has this.
> 
> “Only some carefully-designed allocators have this.”
> 
> That said, not sure this comment adds much. It’s better to explain why

We do not add realloc features to IsoAlignedMemoryAllocator, this is simply because it is not necessary: IsoAlignedMemoryAllocator is used for IsoSubspace, so allocation size is always the same. It means that we never have "realloc" use case.
Changed the comment to explain this.

>> Source/JavaScriptCore/heap/LargeAllocation.cpp:70
>> +    size_t allocationSizeAndAlignmentAdjustment = headerSize() + size + halfAlignment;
> 
> Nit: i think a better name is “alignmentAdjustedAllocationSize”

Sounds good. Changed.

>> Source/JavaScriptCore/heap/LargeAllocation.cpp:76
>> +    bool oldAlignmentAdjusted = m_alignmentAdjusted;
> 
> Nit: another name suggestion, I think this makes more sense as “adjusted alignment” instead of “alignment adjusted”, e.g “m_adjustedAlignment”

Sounds nice. Fixed.
Comment 48 Yusuke Suzuki 2019-03-31 23:51:22 PDT
Committed r243688: <https://trac.webkit.org/changeset/243688>
Comment 49 Radar WebKit Bug Importer 2019-03-31 23:54:45 PDT
<rdar://problem/49469428>