On Windows, we route WebKit allocations through the USE(SYSTEM_ALLOCATOR) path in FastMalloc.cpp. That code path checks whether malloc() returns 0 in order to crash when we run out of memory. However, the crash stacks we get when we really run out of memory on Windows look like the following:
0x5e3a9d97 [chrome.dll] - process_util_win.cc:109] base::`anonymous namespace'::OnNoMemory()
0x5de8165f [chrome.dll] - allocator_shim.cc:135] malloc
0x5dec8c18 [chrome.dll] - fastmalloc.cpp:268] WTF::fastMalloc(unsigned int)
0x5e0ebff3 [chrome.dll] - vector.h:903] WTF::Vector<char,0>::reserveCapacity(unsigned int)
0x5e0ebfc7 [chrome.dll] - vector.h:820] WTF::Vector<char,0>::expandCapacity(unsigned int)
0x5e271993 [chrome.dll] - sharedbuffer.cpp:224] WebCore::SharedBuffer::buffer()
0x5e85df0d [chrome.dll] - cachedrawresource.cpp:53] WebCore::CachedRawResource::data(WTF::PassRefPtr<WebCore::SharedBuffer>,bool)
0x5e26cf82 [chrome.dll] - subresourceloader.cpp:253] WebCore::SubresourceLoader::sendDataToResource(char const *,int)
0x5e26cba8 [chrome.dll] - subresourceloader.cpp:227] WebCore::SubresourceLoader::didReceiveData(char const *,int,__int64,bool)
(See, for example, <https://code.google.com/p/chromium/issues/detail?id=138506>.)
Notice that we actually crash inside malloc rather than in FastMalloc.cpp. That means that the branch for malloc() returning zero is not needed. We should remove it so that WebKit can go fast.
This bug is related to bug 102866, which is about optimizing malloc on Mac.
This is as simple as defining ENABLE_GLOBAL_FASTMALLOC_NEW=0, no? Or will that still leave the per-object overrides on?
I guess we could also just conditionalize the branch. With a SYSTEM_MALLOC_CRASHES_ON_OOM or similar.
> Or will that still leave the per-object overrides on?
I believe that will still leave the per-object overrides. We might need a ENABLE_PER_OBJECT_FASTMALLOC_NEW as well. We should think about how to structure these defines in the context of bug 102866 as well because we'll like need to keep the global overrides but use the system malloc rather than TCMalloc.
Here are the independent choices:
1) Override global new.
2) Override per-object new.
3) When overriding global new, use the system malloc (versus TCMalloc).
4) When overriding per-object new, use the system malloc (versus TCMalloc).
Currently I believe (2) is required and (3) and (4) are coupled. For Windows (and I believe Linux), Chromium seems to want to disable all of these options. For Mac, Chromium seems to want (1), (2), and (4)---but not (3).
mac malloc questions -> +avi
I'm not sure of the terminology you're using (global vs per-object new). On Mac Chrome we override the world when it comes to allocators, probably redundantly.
- C malloc/calloc/valloc/realloc/posix_memalign for default and purgeable zones
- C++ operator new
- Core Foundation CFAllocators
- Cocoa +[NSObject allocWithZone:]
If you have specific questions about implementation, ask away.
This bug is about Windows. The bug relating to Mac is bug 102866.
To answer your questions...
> I'm not sure of the terminology you're using (global vs per-object new).
These are WebKit-specific overrides in http://trac.webkit.org/browser/trunk/Source/WTF/wtf/FastMalloc.h#L259 and http://trac.webkit.org/browser/trunk/Source/WTF/wtf/FastAllocBase.h#L96
> On Mac Chrome we override the world when it comes to allocators, probably redundantly.
In bug 102866, we observe a speedup by switch to using WebKit's copy of TCMalloc instead of defining the USE(SYSTEM_MALLOC) symbol. Based on profiles and looking at symbols, it appears that currently on Mac, WebKit uses the OS-provided malloc implementation, which isn't particularly fast.
Is process_util_mac.mm supposed to cause WebKit to use a malloc implementation other than the system-provided implementation? (Please feel free to respond on bug 102866 as that bug is actually about Mac.)
(In reply to comment #8)
> Is process_util_mac.mm supposed to cause WebKit to use a malloc implementation other than the system-provided implementation?
No, there is nothing currently in process_util_mac intended to do anything of the sort.
Created attachment 175884 [details]
Causes many test to time out; need to investigate
Created attachment 176043 [details]
I marked this patch cq- because I want to land the patch when the tree is quiet and I can see the perf effects clearly.
Comment on attachment 176043 [details]
LGTM. Other ports which USE_SYSTEM_MALLOC might want this implied as well, but we can make that decision if/when others switch it on. Also, I wonder if these shouldn't be USE_ macros. ENABLE has historically been for features.
Comment on attachment 176043 [details]
Clearing flags on attachment: 176043
Committed r135828: <http://trac.webkit.org/changeset/135828>
All reviewed patches have been landed. Closing bug.
Re-opened since this is blocked by bug 103405
I can't find any evidence that this improves performance on the perf bots. It's possible I screwed it up, but it's also possible the branch is extremely well-predicted and doesn't cost performance.