RESOLVED WONTFIX 103027
[Chromium] fastMalloc has an extra branch on Windows
https://bugs.webkit.org/show_bug.cgi?id=103027
Summary [Chromium] fastMalloc has an extra branch on Windows
Adam Barth
Reported 2012-11-21 23:39:42 PST
On Windows, we route WebKit allocations through the USE(SYSTEM_ALLOCATOR) path in FastMalloc.cpp. That code path checks whether malloc() returns 0 in order to crash when we run out of memory. However, the crash stacks we get when we really run out of memory on Windows look like the following: 0x5e3a9d97 [chrome.dll] - process_util_win.cc:109] base::`anonymous namespace'::OnNoMemory() 0x5de8165f [chrome.dll] - allocator_shim.cc:135] malloc 0x5dec8c18 [chrome.dll] - fastmalloc.cpp:268] WTF::fastMalloc(unsigned int) 0x5e0ebff3 [chrome.dll] - vector.h:903] WTF::Vector<char,0>::reserveCapacity(unsigned int) 0x5e0ebfc7 [chrome.dll] - vector.h:820] WTF::Vector<char,0>::expandCapacity(unsigned int) 0x5e271993 [chrome.dll] - sharedbuffer.cpp:224] WebCore::SharedBuffer::buffer() 0x5e85df0d [chrome.dll] - cachedrawresource.cpp:53] WebCore::CachedRawResource::data(WTF::PassRefPtr<WebCore::SharedBuffer>,bool) 0x5e26cf82 [chrome.dll] - subresourceloader.cpp:253] WebCore::SubresourceLoader::sendDataToResource(char const *,int) 0x5e26cba8 [chrome.dll] - subresourceloader.cpp:227] WebCore::SubresourceLoader::didReceiveData(char const *,int,__int64,bool) (See, for example, <https://code.google.com/p/chromium/issues/detail?id=138506>.) Notice that we actually crash inside malloc rather than in FastMalloc.cpp. That means that the branch for malloc() returning zero is not needed. We should remove it so that WebKit can go fast.
Attachments
Causes many test to time out; need to investigate (2.08 KB, patch)
2012-11-25 09:05 PST, Adam Barth
no flags
Patch (3.13 KB, patch)
2012-11-26 11:53 PST, Adam Barth
no flags
Adam Barth
Comment 1 2012-11-21 23:40:11 PST
This bug is related to bug 102866, which is about optimizing malloc on Mac.
Eric Seidel (no email)
Comment 2 2012-11-21 23:59:22 PST
This is as simple as defining ENABLE_GLOBAL_FASTMALLOC_NEW=0, no? Or will that still leave the per-object overrides on?
Eric Seidel (no email)
Comment 3 2012-11-22 00:00:11 PST
I guess we could also just conditionalize the branch. With a SYSTEM_MALLOC_CRASHES_ON_OOM or similar.
Adam Barth
Comment 4 2012-11-22 08:48:21 PST
> Or will that still leave the per-object overrides on? I believe that will still leave the per-object overrides. We might need a ENABLE_PER_OBJECT_FASTMALLOC_NEW as well. We should think about how to structure these defines in the context of bug 102866 as well because we'll like need to keep the global overrides but use the system malloc rather than TCMalloc. Here are the independent choices: 1) Override global new. 2) Override per-object new. 3) When overriding global new, use the system malloc (versus TCMalloc). 4) When overriding per-object new, use the system malloc (versus TCMalloc). Currently I believe (2) is required and (3) and (4) are coupled. For Windows (and I believe Linux), Chromium seems to want to disable all of these options. For Mac, Chromium seems to want (1), (2), and (4)---but not (3).
Nico Weber
Comment 5 2012-11-22 10:37:08 PST
mac malloc questions -> +avi
Avi Drissman
Comment 6 2012-11-22 12:46:24 PST
I'm not sure of the terminology you're using (global vs per-object new). On Mac Chrome we override the world when it comes to allocators, probably redundantly. http://src.chromium.org/viewvc/chrome/trunk/src/base/process_util_mac.mm We override: - C malloc/calloc/valloc/realloc/posix_memalign for default and purgeable zones - C++ operator new - Core Foundation CFAllocators - Cocoa +[NSObject allocWithZone:] If you have specific questions about implementation, ask away.
Adam Barth
Comment 7 2012-11-22 22:21:05 PST
This bug is about Windows. The bug relating to Mac is bug 102866.
Adam Barth
Comment 8 2012-11-22 22:27:24 PST
To answer your questions... > I'm not sure of the terminology you're using (global vs per-object new). These are WebKit-specific overrides in http://trac.webkit.org/browser/trunk/Source/WTF/wtf/FastMalloc.h#L259 and http://trac.webkit.org/browser/trunk/Source/WTF/wtf/FastAllocBase.h#L96 > On Mac Chrome we override the world when it comes to allocators, probably redundantly. In bug 102866, we observe a speedup by switch to using WebKit's copy of TCMalloc instead of defining the USE(SYSTEM_MALLOC) symbol. Based on profiles and looking at symbols, it appears that currently on Mac, WebKit uses the OS-provided malloc implementation, which isn't particularly fast. Is process_util_mac.mm supposed to cause WebKit to use a malloc implementation other than the system-provided implementation? (Please feel free to respond on bug 102866 as that bug is actually about Mac.)
Avi Drissman
Comment 9 2012-11-23 06:39:53 PST
(In reply to comment #8) > Is process_util_mac.mm supposed to cause WebKit to use a malloc implementation other than the system-provided implementation? No, there is nothing currently in process_util_mac intended to do anything of the sort.
Adam Barth
Comment 10 2012-11-25 09:05:57 PST
Created attachment 175884 [details] Causes many test to time out; need to investigate
Adam Barth
Comment 11 2012-11-26 11:53:03 PST
Adam Barth
Comment 12 2012-11-26 11:53:50 PST
I marked this patch cq- because I want to land the patch when the tree is quiet and I can see the perf effects clearly.
Eric Seidel (no email)
Comment 13 2012-11-26 17:31:13 PST
Comment on attachment 176043 [details] Patch LGTM. Other ports which USE_SYSTEM_MALLOC might want this implied as well, but we can make that decision if/when others switch it on. Also, I wonder if these shouldn't be USE_ macros. ENABLE has historically been for features.
WebKit Review Bot
Comment 14 2012-11-26 23:35:03 PST
Comment on attachment 176043 [details] Patch Clearing flags on attachment: 176043 Committed r135828: <http://trac.webkit.org/changeset/135828>
WebKit Review Bot
Comment 15 2012-11-26 23:35:08 PST
All reviewed patches have been landed. Closing bug.
WebKit Review Bot
Comment 16 2012-11-27 05:41:40 PST
Re-opened since this is blocked by bug 103405
Adam Barth
Comment 17 2012-11-27 07:30:15 PST
I can't find any evidence that this improves performance on the perf bots. It's possible I screwed it up, but it's also possible the branch is extremely well-predicted and doesn't cost performance.
Note You need to log in before you can comment on or make changes to this bug.