Bug 103027

Summary: [Chromium] fastMalloc has an extra branch on Windows
Product: WebKit Reporter: Adam Barth <abarth>
Component: Web Template FrameworkAssignee: Adam Barth <abarth>
Status: RESOLVED WONTFIX    
Severity: Normal CC: avi, benjamin, eric, jamesr, ojan, schenney, thakis, tony, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 103405    
Bug Blocks:    
Attachments:
Description Flags
Causes many test to time out; need to investigate
none
Patch none

Description Adam Barth 2012-11-21 23:39:42 PST
On Windows, we route WebKit allocations through the USE(SYSTEM_ALLOCATOR) path in FastMalloc.cpp.  That code path checks whether malloc() returns 0 in order to crash when we run out of memory.  However, the crash stacks we get when we really run out of memory on Windows look like the following:

0x5e3a9d97	 [chrome.dll]	 - process_util_win.cc:109]	base::`anonymous namespace'::OnNoMemory()
0x5de8165f	 [chrome.dll]	 - allocator_shim.cc:135]	malloc
0x5dec8c18	 [chrome.dll]	 - fastmalloc.cpp:268]	WTF::fastMalloc(unsigned int)
0x5e0ebff3	 [chrome.dll]	 - vector.h:903]	WTF::Vector<char,0>::reserveCapacity(unsigned int)
0x5e0ebfc7	 [chrome.dll]	 - vector.h:820]	WTF::Vector<char,0>::expandCapacity(unsigned int)
0x5e271993	 [chrome.dll]	 - sharedbuffer.cpp:224]	WebCore::SharedBuffer::buffer()
0x5e85df0d	 [chrome.dll]	 - cachedrawresource.cpp:53]	WebCore::CachedRawResource::data(WTF::PassRefPtr<WebCore::SharedBuffer>,bool)
0x5e26cf82	 [chrome.dll]	 - subresourceloader.cpp:253]	WebCore::SubresourceLoader::sendDataToResource(char const *,int)
0x5e26cba8	 [chrome.dll]	 - subresourceloader.cpp:227]	WebCore::SubresourceLoader::didReceiveData(char const *,int,__int64,bool)

(See, for example, <https://code.google.com/p/chromium/issues/detail?id=138506>.)

Notice that we actually crash inside malloc rather than in FastMalloc.cpp.  That means that the branch for malloc() returning zero is not needed.  We should remove it so that WebKit can go fast.
Comment 1 Adam Barth 2012-11-21 23:40:11 PST
This bug is related to bug 102866, which is about optimizing malloc on Mac.
Comment 2 Eric Seidel (no email) 2012-11-21 23:59:22 PST
This is as simple as defining ENABLE_GLOBAL_FASTMALLOC_NEW=0, no?  Or will that still leave the per-object overrides on?
Comment 3 Eric Seidel (no email) 2012-11-22 00:00:11 PST
I guess we could also just conditionalize the branch.  With a SYSTEM_MALLOC_CRASHES_ON_OOM or similar.
Comment 4 Adam Barth 2012-11-22 08:48:21 PST
> Or will that still leave the per-object overrides on?

I believe that will still leave the per-object overrides.  We might need a ENABLE_PER_OBJECT_FASTMALLOC_NEW as well.  We should think about how to structure these defines in the context of bug 102866 as well because we'll like need to keep the global overrides but use the system malloc rather than TCMalloc.

Here are the independent choices:

1) Override global new.
2) Override per-object new.
3) When overriding global new, use the system malloc (versus TCMalloc).
4) When overriding per-object new, use the system malloc (versus TCMalloc).

Currently I believe (2) is required and (3) and (4) are coupled.  For Windows (and I believe Linux), Chromium seems to want to disable all of these options.  For Mac, Chromium seems to want (1), (2), and (4)---but not (3).
Comment 5 Nico Weber 2012-11-22 10:37:08 PST
mac malloc questions -> +avi
Comment 6 Avi Drissman 2012-11-22 12:46:24 PST
I'm not sure of the terminology you're using (global vs per-object new). On Mac Chrome we override the world when it comes to allocators, probably redundantly.

http://src.chromium.org/viewvc/chrome/trunk/src/base/process_util_mac.mm

We override:
- C malloc/calloc/valloc/realloc/posix_memalign for default and purgeable zones
- C++ operator new
- Core Foundation CFAllocators
- Cocoa +[NSObject allocWithZone:]

If you have specific questions about implementation, ask away.
Comment 7 Adam Barth 2012-11-22 22:21:05 PST
This bug is about Windows.  The bug relating to Mac is bug 102866.
Comment 8 Adam Barth 2012-11-22 22:27:24 PST
To answer your questions...

> I'm not sure of the terminology you're using (global vs per-object new).

These are WebKit-specific overrides in http://trac.webkit.org/browser/trunk/Source/WTF/wtf/FastMalloc.h#L259 and http://trac.webkit.org/browser/trunk/Source/WTF/wtf/FastAllocBase.h#L96

> On Mac Chrome we override the world when it comes to allocators, probably redundantly.

In bug 102866, we observe a speedup by switch to using WebKit's copy of TCMalloc instead of defining the USE(SYSTEM_MALLOC) symbol.  Based on profiles and looking at symbols, it appears that currently on Mac, WebKit uses the OS-provided malloc implementation, which isn't particularly fast.

Is process_util_mac.mm supposed to cause WebKit to use a malloc implementation other than the system-provided implementation?  (Please feel free to respond on bug 102866 as that bug is actually about Mac.)
Comment 9 Avi Drissman 2012-11-23 06:39:53 PST
(In reply to comment #8)
> Is process_util_mac.mm supposed to cause WebKit to use a malloc implementation other than the system-provided implementation?

No, there is nothing currently in process_util_mac intended to do anything of the sort.
Comment 10 Adam Barth 2012-11-25 09:05:57 PST
Created attachment 175884 [details]
Causes many test to time out; need to investigate
Comment 11 Adam Barth 2012-11-26 11:53:03 PST
Created attachment 176043 [details]
Patch
Comment 12 Adam Barth 2012-11-26 11:53:50 PST
I marked this patch cq- because I want to land the patch when the tree is quiet and I can see the perf effects clearly.
Comment 13 Eric Seidel (no email) 2012-11-26 17:31:13 PST
Comment on attachment 176043 [details]
Patch

LGTM.  Other ports which USE_SYSTEM_MALLOC might want this implied as well, but we can make that decision if/when others switch it on.  Also, I wonder if these shouldn't be USE_ macros.  ENABLE has historically been for features.
Comment 14 WebKit Review Bot 2012-11-26 23:35:03 PST
Comment on attachment 176043 [details]
Patch

Clearing flags on attachment: 176043

Committed r135828: <http://trac.webkit.org/changeset/135828>
Comment 15 WebKit Review Bot 2012-11-26 23:35:08 PST
All reviewed patches have been landed.  Closing bug.
Comment 16 WebKit Review Bot 2012-11-27 05:41:40 PST
Re-opened since this is blocked by bug 103405
Comment 17 Adam Barth 2012-11-27 07:30:15 PST
I can't find any evidence that this improves performance on the perf bots.  It's possible I screwed it up, but it's also possible the branch is extremely well-predicted and doesn't cost performance.