Bug 94810

Summary: Vector::shrinkToFit should use realloc when canMoveWithMemcpy is true
Product: WebKit Reporter: Yong Li <yong.li.webkit>
Component: Web Template FrameworkAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, benjamin, darin, dglazkov, eric, gustavo, kling, mjs, peter+ews, philn, webkit.review.bot, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
the patch
benjamin: review-, buildbot: commit-queue-
only for blocks >=1k
yong.li.webkit: review-
Try again
none
should use canMoveWithMemcpy
none
the test code I'm using...
none
Patch
benjamin: review-, buildbot: commit-queue-
Updated
buildbot: commit-queue-
try again
peter+ews: commit-queue-
Again
benjamin: review+, benjamin: commit-queue-
To commit (see if GTK can build now) none

Yong Li
Reported 2012-08-23 07:15:38 PDT
This should be safe. But I don't know why there are both canCopyWithMemcpy and canMoveWithMemcpy. Is there any case they are not equivalent?
Attachments
the patch (3.52 KB, patch)
2012-08-27 10:03 PDT, Yong Li
benjamin: review-
buildbot: commit-queue-
only for blocks >=1k (3.69 KB, patch)
2012-08-28 11:07 PDT, Yong Li
yong.li.webkit: review-
Try again (4.17 KB, patch)
2012-08-28 12:17 PDT, Yong Li
no flags
should use canMoveWithMemcpy (4.17 KB, patch)
2012-08-28 12:26 PDT, Yong Li
no flags
the test code I'm using... (1.52 KB, text/x-c++src)
2012-08-28 12:33 PDT, Yong Li
no flags
Patch (4.01 KB, patch)
2012-08-28 12:52 PDT, Yong Li
benjamin: review-
buildbot: commit-queue-
Updated (4.17 KB, patch)
2012-08-29 08:37 PDT, Yong Li
buildbot: commit-queue-
try again (4.39 KB, patch)
2012-08-29 09:35 PDT, Yong Li
peter+ews: commit-queue-
Again (4.38 KB, patch)
2012-08-29 11:16 PDT, Yong Li
benjamin: review+
benjamin: commit-queue-
To commit (see if GTK can build now) (4.38 KB, patch)
2012-08-30 10:09 PDT, Yong Li
no flags
Alexey Proskuryakov
Comment 1 2012-08-23 11:52:20 PDT
Can realloc result in more memory fragmentation?
Yong Li
Comment 2 2012-08-23 12:07:31 PDT
(In reply to comment #1) > Can realloc result in more memory fragmentation? It could... But that is up to the implementation of realloc, because realloc doesn't have to keep the address unchanged when shrinking.
Alexey Proskuryakov
Comment 3 2012-08-23 12:14:54 PDT
I think that at least OS X man page for realloc implies that it cannot move the allocation when shrinking.
Yong Li
Comment 4 2012-08-23 12:22:17 PDT
(In reply to comment #3) > I think that at least OS X man page for realloc implies that it cannot move the allocation when shrinking. Hm... good to know. For large blocks, the memory allocator can decommit/munmap the free space. For small blocks, probably we don't care that much? Also StringBuilder/StringImpl is using fastRealloc to shrink memory anyway.
Yong Li
Comment 5 2012-08-27 10:03:42 PDT
Created attachment 160738 [details] the patch It doesn't show any noticeable diff on sunspider and octane, though. Any other suggested perf tests?
Build Bot
Comment 6 2012-08-27 10:38:33 PDT
Benjamin Poulain
Comment 7 2012-08-27 22:19:57 PDT
Comment on attachment 160738 [details] the patch r- because you do not provide any rationale for making the change. 1) Please microbenchmark this to provide numbers for the improvement. 2) Please find what is the memory impact. Exactly how does fastMalloc work with those Realloc. > Also StringBuilder/StringImpl is using fastRealloc to shrink memory anyway. This is pretty much irrelevant. The memory allocated there is small on average and the lost memory on shrink is at most a little less the size of the useful payload.
Yong Li
Comment 8 2012-08-28 07:33:22 PDT
(In reply to comment #7) > (From update of attachment 160738 [details]) > r- because you do not provide any rationale for making the change. > > 1) Please microbenchmark this to provide numbers for the improvement. > 2) Please find what is the memory impact. Exactly how does fastMalloc work with those Realloc. OK > > > > Also StringBuilder/StringImpl is using fastRealloc to shrink memory anyway. > > This is pretty much irrelevant. The memory allocated there is small on average and the lost memory on shrink is at most a little less the size of the useful payload. Isn't this the worst case of using realloc (causing fragments)? If the memory allocated isn't small, and realloc() would keep the address unchanged, it saves a memcpy on a large block. If realloc() uses different addresses, it is same as malloc/memcpy/free. I understand this depends very much on the actual malloc algorithm. But what Vector is trying to do here is just a re-alloc. It is very natural to use realloc, and lets the malloc algorithm to decide what to do for best performance. Probably adding an ENABLE switch is better?
Benjamin Poulain
Comment 9 2012-08-28 08:37:59 PDT
> Isn't this the worst case of using realloc (causing fragments)? If the memory allocated isn't small, and realloc() would keep the address unchanged, it saves a memcpy on a large block. If realloc() uses different addresses, it is same as malloc/memcpy/free. > > I understand this depends very much on the actual malloc algorithm. But what Vector is trying to do here is just a re-alloc. It is very natural to use realloc, and lets the malloc algorithm to decide what to do for best performance. I am not sure what you arguing for? (against?) What you say is correct, and I agree using realloc seems like the good thing to do. But you should not make changes like this one without measuring the benefits. You are modifying a abstract data type used everywhere in WebKit in a non obvious way, you should be able to prove the change is correct. It is not obvious if 1) This will increase memory fragmentation because realloc is not flexible for shrink. 2) This will not provide any improvement and it is just extra code to support. 3) Fastmalloc does the right thing and the patch avoid some memcopy. (3) should go in WebKit, (1) and (2) should just be discarded. I only ask you make sure you have the third case. > Probably adding an ENABLE switch is better? No, please. ENABLE switch is a bad idea here.
Yong Li
Comment 10 2012-08-28 10:20:26 PDT
(In reply to comment #9) > It is not obvious if > 1) This will increase memory fragmentation because realloc is not flexible for shrink. > 2) This will not provide any improvement and it is just extra code to support. > 3) Fastmalloc does the right thing and the patch avoid some memcopy. > > (3) should go in WebKit, (1) and (2) should just be discarded. I only ask you make sure you have the third case. > > > Probably adding an ENABLE switch is better? > > No, please. ENABLE switch is a bad idea here. It seems FastMalloc is OK, as its realloc() allocates a new block when "the new size is significantly smaller than the old size". // Reallocate if the new size is larger than the old size, // or if the new size is significantly smaller than the old size. if ((new_size > old_size) || (AllocationSize(new_size) < old_size)) { But not all ports are using same memory allocator. Some ports have USE_SYSTEM_MALLOC defined. As Alexey said, "at least OS X man page for realloc implies that it cannot move the allocation when shrinking". PLATFORM(BLACKBERRY) also uses the system memory allocator. I'm trying to get some numbers on QNX, but the results won't apply to all ports. BTW, I've seen in reality WebKit runs out-of-memory in shrinkToFit, which is a little bit weird: it is trying to shrink memory usage, but ends up with allocating more memory and runs OOM even faster. Reusing the old memory address for larges blocks can also reduce memory spikes in some cases.
Yong Li
Comment 11 2012-08-28 11:06:06 PDT
I tested with the following code, and found that realloc() gives noticeable performance boost only on large blocks (>=1k). It is even slower than malloc/memcpy/free solution on very small blocks. The larger the blocks are, the more performance boost realloc() gives. So it seems it is better to do this only when both old and new capacities are larger than 1k. double start = currentTime(); for (int i = 0; i < 100000; ++i) { Vector<char> vector; vector.reserveCapacity(<capacity>); vector.resize(<size>); vector.shrinkToFit(); } double elapsed = currentTime() - start;
Yong Li
Comment 12 2012-08-28 11:07:10 PDT
Created attachment 161015 [details] only for blocks >=1k Can anyone also try the patch?
Yong Li
Comment 13 2012-08-28 11:29:35 PDT
Comment on attachment 161015 [details] only for blocks >=1k actually we should use canMoveWithMemcpy
Benjamin Poulain
Comment 14 2012-08-28 11:36:58 PDT
View in context: https://bugs.webkit.org/attachment.cgi?id=161015&action=review So you picked the 1K based on measurement on your platform, one of the few that does not use USE_SYSTEM_MALLOC. And you set the patch for review for all platforms. Don't you see anything wrong with that? > Source/WTF/ChangeLog:10 > + Use realloc to shrink buffer when it is changing from a large malloc buffer > + to another large one (> 1k), and canCopyWithMemcpy is true. > + You should give the performance analysis and the numbers in your ChangeLog. Here is an example of a good ChangeLog of performance work: http://trac.webkit.org/changeset/124990
Yong Li
Comment 15 2012-08-28 11:45:53 PDT
(In reply to comment #14) > View in context: https://bugs.webkit.org/attachment.cgi?id=161015&action=review > > So you picked the 1K based on measurement on your platform, one of the few that does not use USE_SYSTEM_MALLOC. > > And you set the patch for review for all platforms. Don't you see anything wrong with that? yes I do :). Probably I shouldn't set it for review. This is why I suggested to use some macros, and I'm asking for help to investigate this plan on other ports...
Benjamin Poulain
Comment 16 2012-08-28 11:50:19 PDT
> > So you picked the 1K based on measurement on your platform, one of the few that does not use USE_SYSTEM_MALLOC. > > > > And you set the patch for review for all platforms. Don't you see anything wrong with that? > > yes I do :). Probably I shouldn't set it for review. This is why I suggested to use some macros, and I'm asking for help to investigate this plan on other ports... Having some #ifdef for Blackberry is perfectly reasonable here. It is the only platform for which you can demonstrate your patch is an improvement.
Benjamin Poulain
Comment 17 2012-08-28 12:00:30 PDT
> Having some #ifdef for Blackberry is perfectly reasonable here. It is the only platform for which you can demonstrate your patch is an improvement. To be clear, I think ENABLE() is a bad idea, PLATFORM(BLACKBERRY) or something even more specific to the allocator is perfectly reasonable. E.g. #if USE(SYSTEM_MALLOC) && PLATFORM(BLACKBERRY) const unsigned reallocOnShrinkSizeThreshold = 1024; #define VECTOR_REALLOC_ON_SHRINK #endif // USE(SYSTEM_MALLOC) && PLATFORM(BLACKBERRY)
Yong Li
Comment 18 2012-08-28 12:17:23 PDT
Created attachment 161029 [details] Try again How about this? Hopefully we can get some numbers for realloc vs malloc/memcpy/free comparison on other ports very soon...
Yong Li
Comment 19 2012-08-28 12:26:17 PDT
Created attachment 161031 [details] should use canMoveWithMemcpy actually it is not making copies, but just moving.
Yong Li
Comment 20 2012-08-28 12:33:44 PDT
Created attachment 161033 [details] the test code I'm using... It is a little bit ugly I know :) Basically in every cycle, it creates a vector, allocates buffer, sets the size, and then call shrinkToFit(), and destructs the buffer.
Yong Li
Comment 21 2012-08-28 12:52:58 PDT
Created attachment 161042 [details] Patch Sorry for keeping changing it... Actually we don't want the 1K threshold. 1) the measurement for small blocks is not stable. It is actually very close as realloc() doesn't nothing but malloc/memcpy/free. 2) if realloc() was slower, it should be fixed and improved anyway.
Build Bot
Comment 22 2012-08-28 18:00:54 PDT
Benjamin Poulain
Comment 23 2012-08-28 22:55:31 PDT
Comment on attachment 161042 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=161042&action=review I just r- for the minor issues, but everything looks good really. I'll have to double check the VectorTraits<T>::canMoveWithMemcpy, I do not remember by heart if it is the right one. > Source/WTF/ChangeLog:8 > + Only tested on BlackBerry. So it is wrapped with PLATFORM(BLACKBERRY) at the mean time. "at the mean time" -> "in the meantime"/"for now" > Source/WTF/ChangeLog:12 > + Use realloc to shrink buffer when it is changing from a large malloc buffer > + to another large one (> 1k), and canMoveWithMemcpy is true. When a Vector<char> shrinks > + capacity from 2K to 1K, this gives more than 10x times performance boost on shrinkToFit() > + (Tested on BlackBerry). This is outdated, your removed the 1K limit. Please use a new paragraph for the performance detail, that'll make it easier to track. 10 times faster!!! I should really try your bench with fastMalloc! > Source/WTF/wtf/Vector.h:288 > + // FIXME: Return true when realloc() can give better performance. The "when" is ambiguous IMHO. More like: // FIXME: Return true on the platform where realloc() give better performance. ? + UNUSED_PARAM(newCapacity); (for the Mac break).
Yong Li
Comment 24 2012-08-29 07:47:11 PDT
(In reply to comment #23) > (From update of attachment 161042 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=161042&action=review > I'll have to double check the VectorTraits<T>::canMoveWithMemcpy, I do not remember by heart if it is the right one. If I understand correctly, this is for the objects with something like ref-counter. For example, RefPtr<T> cannot be C++-copied with memcpy, as creating a copy should increase the ref-count, but memcpy doesn't do that. However, it can be moved with memcpy as it is nothing but a pointer to the RefCounted object. > 10 times faster!!! I should really try your bench with fastMalloc! That is only for large blocks. And it is reasonable because it just releases the free space to system in this case, so it takes no time compared to allocating a new block, memcpying, and freeing the old block. Thanks a lot for reviewing and all the suggestions!
Yong Li
Comment 25 2012-08-29 08:37:19 PDT
Created attachment 161235 [details] Updated I changed the number "10x" because it is actually an estimate after deducting the extra malloc/free in every cycle. Also I made a mistake that it was on 8K->4K shrinking. To be clear, for 2K-1K shrinking, the time it spends given by the test reduces from 100% to 60-70%, For 8k-4k shrinking, the elapsed time reduces from 100% to ~55%.. I'm not sure how to calculate this performance boost percentage. Now I'm using this formula: (<old time cost> - <new time cost>) / <old time cost>. Then this cannot explain why we see "2x times faster" in some logs...
Build Bot
Comment 26 2012-08-29 09:14:58 PDT
Build Bot
Comment 27 2012-08-29 09:19:44 PDT
Yong Li
Comment 28 2012-08-29 09:35:43 PDT
Created attachment 161248 [details] try again
Peter Beverloo (cr-android ews)
Comment 29 2012-08-29 11:09:40 PDT
Comment on attachment 161248 [details] try again Attachment 161248 [details] did not pass cr-android-ews (chromium-android): Output: http://queues.webkit.org/results/13687288
Early Warning System Bot
Comment 30 2012-08-29 11:11:40 PDT
Yong Li
Comment 31 2012-08-29 11:13:02 PDT
(In reply to comment #29) > (From update of attachment 161248 [details]) > Attachment 161248 [details] did not pass cr-android-ews (chromium-android): > Output: http://queues.webkit.org/results/13687288 :(. So many senseless warnings!
Yong Li
Comment 32 2012-08-29 11:16:55 PDT
Yong Li
Comment 33 2012-08-30 07:55:03 PDT
GTK build failure: CXXLD Programs/TestWebKitAPI/TestWebKit2 make[1]: Leaving directory `/home/phil/gst/jhbuild/makes/WebKit/WebKitBuild/Release' Segmentation fault Don't think it is any issue in the patch. Ping reviewers?
Benjamin Poulain
Comment 34 2012-08-30 09:44:28 PDT
Comment on attachment 161271 [details] Again View in context: https://bugs.webkit.org/attachment.cgi?id=161271&action=review The patch looks correct. Just make sure there is no problem with GTK when you land it. > Source/WTF/ChangeLog:8 > + Only tested on BlackBerry. So it is wrapped with PLATFORM(BLACKBERRY) at the mean time. "at the mean time" -> in the meantime.
Yong Li
Comment 35 2012-08-30 10:05:09 PDT
Comment on attachment 161271 [details] Again Thanks Ben. Commit bot should be able to verify it builds. I'll keep an eye on it.
Benjamin Poulain
Comment 36 2012-08-30 10:06:12 PDT
Comment on attachment 161271 [details] Again You also need to update the changelog.
Yong Li
Comment 37 2012-08-30 10:09:25 PDT
Created attachment 161504 [details] To commit (see if GTK can build now)
WebKit Review Bot
Comment 38 2012-08-30 13:51:45 PDT
Comment on attachment 161504 [details] To commit (see if GTK can build now) Clearing flags on attachment: 161504 Committed r127186: <http://trac.webkit.org/changeset/127186>
WebKit Review Bot
Comment 39 2012-08-30 13:51:51 PDT
All reviewed patches have been landed. Closing bug.
Darin Adler
Comment 40 2012-08-30 14:56:39 PDT
(In reply to comment #0) > But I don't know why there are both canCopyWithMemcpy and canMoveWithMemcpy. Is there any case they are not equivalent? Yes. A class like RefPtr can return true for canMoveWithMemcpy but must return false for canCopyWithMemcpy because a copy requires incrementing the reference count.
Note You need to log in before you can comment on or make changes to this bug.