WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
94810
Vector::shrinkToFit should use realloc when canMoveWithMemcpy is true
https://bugs.webkit.org/show_bug.cgi?id=94810
Summary
Vector::shrinkToFit should use realloc when canMoveWithMemcpy is true
Yong Li
Reported
2012-08-23 07:15:38 PDT
This should be safe. But I don't know why there are both canCopyWithMemcpy and canMoveWithMemcpy. Is there any case they are not equivalent?
Attachments
the patch
(3.52 KB, patch)
2012-08-27 10:03 PDT
,
Yong Li
benjamin
: review-
buildbot
: commit-queue-
Details
Formatted Diff
Diff
only for blocks >=1k
(3.69 KB, patch)
2012-08-28 11:07 PDT
,
Yong Li
yong.li.webkit
: review-
Details
Formatted Diff
Diff
Try again
(4.17 KB, patch)
2012-08-28 12:17 PDT
,
Yong Li
no flags
Details
Formatted Diff
Diff
should use canMoveWithMemcpy
(4.17 KB, patch)
2012-08-28 12:26 PDT
,
Yong Li
no flags
Details
Formatted Diff
Diff
the test code I'm using...
(1.52 KB, text/x-c++src)
2012-08-28 12:33 PDT
,
Yong Li
no flags
Details
Patch
(4.01 KB, patch)
2012-08-28 12:52 PDT
,
Yong Li
benjamin
: review-
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Updated
(4.17 KB, patch)
2012-08-29 08:37 PDT
,
Yong Li
buildbot
: commit-queue-
Details
Formatted Diff
Diff
try again
(4.39 KB, patch)
2012-08-29 09:35 PDT
,
Yong Li
peter+ews
: commit-queue-
Details
Formatted Diff
Diff
Again
(4.38 KB, patch)
2012-08-29 11:16 PDT
,
Yong Li
benjamin
: review+
benjamin
: commit-queue-
Details
Formatted Diff
Diff
To commit (see if GTK can build now)
(4.38 KB, patch)
2012-08-30 10:09 PDT
,
Yong Li
no flags
Details
Formatted Diff
Diff
Show Obsolete
(8)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2012-08-23 11:52:20 PDT
Can realloc result in more memory fragmentation?
Yong Li
Comment 2
2012-08-23 12:07:31 PDT
(In reply to
comment #1
)
> Can realloc result in more memory fragmentation?
It could... But that is up to the implementation of realloc, because realloc doesn't have to keep the address unchanged when shrinking.
Alexey Proskuryakov
Comment 3
2012-08-23 12:14:54 PDT
I think that at least OS X man page for realloc implies that it cannot move the allocation when shrinking.
Yong Li
Comment 4
2012-08-23 12:22:17 PDT
(In reply to
comment #3
)
> I think that at least OS X man page for realloc implies that it cannot move the allocation when shrinking.
Hm... good to know. For large blocks, the memory allocator can decommit/munmap the free space. For small blocks, probably we don't care that much? Also StringBuilder/StringImpl is using fastRealloc to shrink memory anyway.
Yong Li
Comment 5
2012-08-27 10:03:42 PDT
Created
attachment 160738
[details]
the patch It doesn't show any noticeable diff on sunspider and octane, though. Any other suggested perf tests?
Build Bot
Comment 6
2012-08-27 10:38:33 PDT
Comment on
attachment 160738
[details]
the patch
Attachment 160738
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13619208
Benjamin Poulain
Comment 7
2012-08-27 22:19:57 PDT
Comment on
attachment 160738
[details]
the patch r- because you do not provide any rationale for making the change. 1) Please microbenchmark this to provide numbers for the improvement. 2) Please find what is the memory impact. Exactly how does fastMalloc work with those Realloc.
> Also StringBuilder/StringImpl is using fastRealloc to shrink memory anyway.
This is pretty much irrelevant. The memory allocated there is small on average and the lost memory on shrink is at most a little less the size of the useful payload.
Yong Li
Comment 8
2012-08-28 07:33:22 PDT
(In reply to
comment #7
)
> (From update of
attachment 160738
[details]
) > r- because you do not provide any rationale for making the change. > > 1) Please microbenchmark this to provide numbers for the improvement. > 2) Please find what is the memory impact. Exactly how does fastMalloc work with those Realloc.
OK
> > > > Also StringBuilder/StringImpl is using fastRealloc to shrink memory anyway. > > This is pretty much irrelevant. The memory allocated there is small on average and the lost memory on shrink is at most a little less the size of the useful payload.
Isn't this the worst case of using realloc (causing fragments)? If the memory allocated isn't small, and realloc() would keep the address unchanged, it saves a memcpy on a large block. If realloc() uses different addresses, it is same as malloc/memcpy/free. I understand this depends very much on the actual malloc algorithm. But what Vector is trying to do here is just a re-alloc. It is very natural to use realloc, and lets the malloc algorithm to decide what to do for best performance. Probably adding an ENABLE switch is better?
Benjamin Poulain
Comment 9
2012-08-28 08:37:59 PDT
> Isn't this the worst case of using realloc (causing fragments)? If the memory allocated isn't small, and realloc() would keep the address unchanged, it saves a memcpy on a large block. If realloc() uses different addresses, it is same as malloc/memcpy/free. > > I understand this depends very much on the actual malloc algorithm. But what Vector is trying to do here is just a re-alloc. It is very natural to use realloc, and lets the malloc algorithm to decide what to do for best performance.
I am not sure what you arguing for? (against?) What you say is correct, and I agree using realloc seems like the good thing to do. But you should not make changes like this one without measuring the benefits. You are modifying a abstract data type used everywhere in WebKit in a non obvious way, you should be able to prove the change is correct. It is not obvious if 1) This will increase memory fragmentation because realloc is not flexible for shrink. 2) This will not provide any improvement and it is just extra code to support. 3) Fastmalloc does the right thing and the patch avoid some memcopy. (3) should go in WebKit, (1) and (2) should just be discarded. I only ask you make sure you have the third case.
> Probably adding an ENABLE switch is better?
No, please. ENABLE switch is a bad idea here.
Yong Li
Comment 10
2012-08-28 10:20:26 PDT
(In reply to
comment #9
)
> It is not obvious if > 1) This will increase memory fragmentation because realloc is not flexible for shrink. > 2) This will not provide any improvement and it is just extra code to support. > 3) Fastmalloc does the right thing and the patch avoid some memcopy. > > (3) should go in WebKit, (1) and (2) should just be discarded. I only ask you make sure you have the third case. > > > Probably adding an ENABLE switch is better? > > No, please. ENABLE switch is a bad idea here.
It seems FastMalloc is OK, as its realloc() allocates a new block when "the new size is significantly smaller than the old size". // Reallocate if the new size is larger than the old size, // or if the new size is significantly smaller than the old size. if ((new_size > old_size) || (AllocationSize(new_size) < old_size)) { But not all ports are using same memory allocator. Some ports have USE_SYSTEM_MALLOC defined. As Alexey said, "at least OS X man page for realloc implies that it cannot move the allocation when shrinking". PLATFORM(BLACKBERRY) also uses the system memory allocator. I'm trying to get some numbers on QNX, but the results won't apply to all ports. BTW, I've seen in reality WebKit runs out-of-memory in shrinkToFit, which is a little bit weird: it is trying to shrink memory usage, but ends up with allocating more memory and runs OOM even faster. Reusing the old memory address for larges blocks can also reduce memory spikes in some cases.
Yong Li
Comment 11
2012-08-28 11:06:06 PDT
I tested with the following code, and found that realloc() gives noticeable performance boost only on large blocks (>=1k). It is even slower than malloc/memcpy/free solution on very small blocks. The larger the blocks are, the more performance boost realloc() gives. So it seems it is better to do this only when both old and new capacities are larger than 1k. double start = currentTime(); for (int i = 0; i < 100000; ++i) { Vector<char> vector; vector.reserveCapacity(<capacity>); vector.resize(<size>); vector.shrinkToFit(); } double elapsed = currentTime() - start;
Yong Li
Comment 12
2012-08-28 11:07:10 PDT
Created
attachment 161015
[details]
only for blocks >=1k Can anyone also try the patch?
Yong Li
Comment 13
2012-08-28 11:29:35 PDT
Comment on
attachment 161015
[details]
only for blocks >=1k actually we should use canMoveWithMemcpy
Benjamin Poulain
Comment 14
2012-08-28 11:36:58 PDT
View in context:
https://bugs.webkit.org/attachment.cgi?id=161015&action=review
So you picked the 1K based on measurement on your platform, one of the few that does not use USE_SYSTEM_MALLOC. And you set the patch for review for all platforms. Don't you see anything wrong with that?
> Source/WTF/ChangeLog:10 > + Use realloc to shrink buffer when it is changing from a large malloc buffer > + to another large one (> 1k), and canCopyWithMemcpy is true. > +
You should give the performance analysis and the numbers in your ChangeLog. Here is an example of a good ChangeLog of performance work:
http://trac.webkit.org/changeset/124990
Yong Li
Comment 15
2012-08-28 11:45:53 PDT
(In reply to
comment #14
)
> View in context:
https://bugs.webkit.org/attachment.cgi?id=161015&action=review
> > So you picked the 1K based on measurement on your platform, one of the few that does not use USE_SYSTEM_MALLOC. > > And you set the patch for review for all platforms. Don't you see anything wrong with that?
yes I do :). Probably I shouldn't set it for review. This is why I suggested to use some macros, and I'm asking for help to investigate this plan on other ports...
Benjamin Poulain
Comment 16
2012-08-28 11:50:19 PDT
> > So you picked the 1K based on measurement on your platform, one of the few that does not use USE_SYSTEM_MALLOC. > > > > And you set the patch for review for all platforms. Don't you see anything wrong with that? > > yes I do :). Probably I shouldn't set it for review. This is why I suggested to use some macros, and I'm asking for help to investigate this plan on other ports...
Having some #ifdef for Blackberry is perfectly reasonable here. It is the only platform for which you can demonstrate your patch is an improvement.
Benjamin Poulain
Comment 17
2012-08-28 12:00:30 PDT
> Having some #ifdef for Blackberry is perfectly reasonable here. It is the only platform for which you can demonstrate your patch is an improvement.
To be clear, I think ENABLE() is a bad idea, PLATFORM(BLACKBERRY) or something even more specific to the allocator is perfectly reasonable. E.g. #if USE(SYSTEM_MALLOC) && PLATFORM(BLACKBERRY) const unsigned reallocOnShrinkSizeThreshold = 1024; #define VECTOR_REALLOC_ON_SHRINK #endif // USE(SYSTEM_MALLOC) && PLATFORM(BLACKBERRY)
Yong Li
Comment 18
2012-08-28 12:17:23 PDT
Created
attachment 161029
[details]
Try again How about this? Hopefully we can get some numbers for realloc vs malloc/memcpy/free comparison on other ports very soon...
Yong Li
Comment 19
2012-08-28 12:26:17 PDT
Created
attachment 161031
[details]
should use canMoveWithMemcpy actually it is not making copies, but just moving.
Yong Li
Comment 20
2012-08-28 12:33:44 PDT
Created
attachment 161033
[details]
the test code I'm using... It is a little bit ugly I know :) Basically in every cycle, it creates a vector, allocates buffer, sets the size, and then call shrinkToFit(), and destructs the buffer.
Yong Li
Comment 21
2012-08-28 12:52:58 PDT
Created
attachment 161042
[details]
Patch Sorry for keeping changing it... Actually we don't want the 1K threshold. 1) the measurement for small blocks is not stable. It is actually very close as realloc() doesn't nothing but malloc/memcpy/free. 2) if realloc() was slower, it should be fixed and improved anyway.
Build Bot
Comment 22
2012-08-28 18:00:54 PDT
Comment on
attachment 161042
[details]
Patch
Attachment 161042
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13656552
Benjamin Poulain
Comment 23
2012-08-28 22:55:31 PDT
Comment on
attachment 161042
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=161042&action=review
I just r- for the minor issues, but everything looks good really. I'll have to double check the VectorTraits<T>::canMoveWithMemcpy, I do not remember by heart if it is the right one.
> Source/WTF/ChangeLog:8 > + Only tested on BlackBerry. So it is wrapped with PLATFORM(BLACKBERRY) at the mean time.
"at the mean time" -> "in the meantime"/"for now"
> Source/WTF/ChangeLog:12 > + Use realloc to shrink buffer when it is changing from a large malloc buffer > + to another large one (> 1k), and canMoveWithMemcpy is true. When a Vector<char> shrinks > + capacity from 2K to 1K, this gives more than 10x times performance boost on shrinkToFit() > + (Tested on BlackBerry).
This is outdated, your removed the 1K limit. Please use a new paragraph for the performance detail, that'll make it easier to track. 10 times faster!!! I should really try your bench with fastMalloc!
> Source/WTF/wtf/Vector.h:288 > + // FIXME: Return true when realloc() can give better performance.
The "when" is ambiguous IMHO. More like: // FIXME: Return true on the platform where realloc() give better performance. ? + UNUSED_PARAM(newCapacity); (for the Mac break).
Yong Li
Comment 24
2012-08-29 07:47:11 PDT
(In reply to
comment #23
)
> (From update of
attachment 161042
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=161042&action=review
> I'll have to double check the VectorTraits<T>::canMoveWithMemcpy, I do not remember by heart if it is the right one.
If I understand correctly, this is for the objects with something like ref-counter. For example, RefPtr<T> cannot be C++-copied with memcpy, as creating a copy should increase the ref-count, but memcpy doesn't do that. However, it can be moved with memcpy as it is nothing but a pointer to the RefCounted object.
> 10 times faster!!! I should really try your bench with fastMalloc!
That is only for large blocks. And it is reasonable because it just releases the free space to system in this case, so it takes no time compared to allocating a new block, memcpying, and freeing the old block. Thanks a lot for reviewing and all the suggestions!
Yong Li
Comment 25
2012-08-29 08:37:19 PDT
Created
attachment 161235
[details]
Updated I changed the number "10x" because it is actually an estimate after deducting the extra malloc/free in every cycle. Also I made a mistake that it was on 8K->4K shrinking. To be clear, for 2K-1K shrinking, the time it spends given by the test reduces from 100% to 60-70%, For 8k-4k shrinking, the elapsed time reduces from 100% to ~55%.. I'm not sure how to calculate this performance boost percentage. Now I'm using this formula: (<old time cost> - <new time cost>) / <old time cost>. Then this cannot explain why we see "2x times faster" in some logs...
Build Bot
Comment 26
2012-08-29 09:14:58 PDT
Comment on
attachment 161235
[details]
Updated
Attachment 161235
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13680259
Build Bot
Comment 27
2012-08-29 09:19:44 PDT
Comment on
attachment 161235
[details]
Updated
Attachment 161235
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13690259
Yong Li
Comment 28
2012-08-29 09:35:43 PDT
Created
attachment 161248
[details]
try again
Peter Beverloo (cr-android ews)
Comment 29
2012-08-29 11:09:40 PDT
Comment on
attachment 161248
[details]
try again
Attachment 161248
[details]
did not pass cr-android-ews (chromium-android): Output:
http://queues.webkit.org/results/13687288
Early Warning System Bot
Comment 30
2012-08-29 11:11:40 PDT
Comment on
attachment 161248
[details]
try again
Attachment 161248
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/13689318
Yong Li
Comment 31
2012-08-29 11:13:02 PDT
(In reply to
comment #29
)
> (From update of
attachment 161248
[details]
) >
Attachment 161248
[details]
did not pass cr-android-ews (chromium-android): > Output:
http://queues.webkit.org/results/13687288
:(. So many senseless warnings!
Yong Li
Comment 32
2012-08-29 11:16:55 PDT
Created
attachment 161271
[details]
Again
Yong Li
Comment 33
2012-08-30 07:55:03 PDT
GTK build failure: CXXLD Programs/TestWebKitAPI/TestWebKit2 make[1]: Leaving directory `/home/phil/gst/jhbuild/makes/WebKit/WebKitBuild/Release' Segmentation fault Don't think it is any issue in the patch. Ping reviewers?
Benjamin Poulain
Comment 34
2012-08-30 09:44:28 PDT
Comment on
attachment 161271
[details]
Again View in context:
https://bugs.webkit.org/attachment.cgi?id=161271&action=review
The patch looks correct. Just make sure there is no problem with GTK when you land it.
> Source/WTF/ChangeLog:8 > + Only tested on BlackBerry. So it is wrapped with PLATFORM(BLACKBERRY) at the mean time.
"at the mean time" -> in the meantime.
Yong Li
Comment 35
2012-08-30 10:05:09 PDT
Comment on
attachment 161271
[details]
Again Thanks Ben. Commit bot should be able to verify it builds. I'll keep an eye on it.
Benjamin Poulain
Comment 36
2012-08-30 10:06:12 PDT
Comment on
attachment 161271
[details]
Again You also need to update the changelog.
Yong Li
Comment 37
2012-08-30 10:09:25 PDT
Created
attachment 161504
[details]
To commit (see if GTK can build now)
WebKit Review Bot
Comment 38
2012-08-30 13:51:45 PDT
Comment on
attachment 161504
[details]
To commit (see if GTK can build now) Clearing flags on attachment: 161504 Committed
r127186
: <
http://trac.webkit.org/changeset/127186
>
WebKit Review Bot
Comment 39
2012-08-30 13:51:51 PDT
All reviewed patches have been landed. Closing bug.
Darin Adler
Comment 40
2012-08-30 14:56:39 PDT
(In reply to
comment #0
)
> But I don't know why there are both canCopyWithMemcpy and canMoveWithMemcpy. Is there any case they are not equivalent?
Yes. A class like RefPtr can return true for canMoveWithMemcpy but must return false for canCopyWithMemcpy because a copy requires incrementing the reference count.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug