Bug 5243 - Numerous performance improvements to new FastMalloc to fix regression
Summary: Numerous performance improvements to new FastMalloc to fix regression
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Major
Assignee: Maciej Stachowiak
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-02 21:22 PDT by Maciej Stachowiak
Modified: 2005-10-03 14:08 PDT (History)
0 users

See Also:


Attachments
numerous optimizations to the new malloc (242.18 KB, patch)
2005-10-02 21:23 PDT, Maciej Stachowiak
darin: review-
Details | Formatted Diff | Diff
addressed darin's comments (260.84 KB, patch)
2005-10-03 00:35 PDT, Maciej Stachowiak
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Maciej Stachowiak 2005-10-02 21:22:37 PDT
The new threadsafe fast allocator caused a performance regression relative to dlmalloc, however by 
optimizing it a bit I have fixed this. Also I reduced memory consumption to be as good as the system 
malloc. Here is what I did:

- use fastMalloc for everything - it now gets applied to all new/delete allocations via a private inline 
operator new that is now included into every file via config.h.

- tweaked some of the numeric parameters for size classes and amount of wasted memory allowed per 
allocation - this saves on memory use and consequently improves speed.

- so long as the allocator is not being used on background threads, get the per-thread cache from a 
global variable instead of from pthread_getspecific, since the latter is slow.

- inline more functions, and force the ones GCC refuses to inline with  attribute(always_inline), nearly 
all of these have one call site so inlining them has to be a win.

- use some tricks to calculate allocation size more efficiently and fewer times for small allocations, to 
avoid hitting the huge size table array.

- avoid hitting the per-thread cache on code paths that don't need it.

- implement inline assembly version of spinlock for PowerPC (was already done for x86)
Comment 1 Maciej Stachowiak 2005-10-02 21:23:24 PDT
Created attachment 4165 [details]
numerous optimizations to the new malloc
Comment 2 Maciej Stachowiak 2005-10-02 21:45:18 PDT
Comment on attachment 4165 [details]
numerous optimizations to the new malloc

Uh, I meant "?"
Comment 3 Darin Adler 2005-10-02 21:52:03 PDT
Comment on attachment 4165 [details]
numerous optimizations to the new malloc

I'm puzzled by the use of "config.h" -- did we discover that was the best
practice and better than a prefix file?

Also, it seems that if we do use "config.h" it needs to be the first include in
each file, because other headers might have inline code that depends on it, but
it instead seems to be the second include in every file. Is this intentional or
perhaps a bug in the script you used to add it?

To echo a complaint you've made to me many times, it was really a pain to
search through all the boilerplate changes to the other files to find the
substantive changes to review.

I'm concerned that changes to constants in the FastMalloc.cpp source file seem
to be near comments that have not been updated. Are there any comments here
that need to be updated?

The use of __attribute__((always_inline)) is not inside an ifdef. Lets instead
do this in a more-portable way using a macro and __GNUC__ ifdef as we do with
PRIVATE_INLINE.

I suggest that PRIVATE_INLINE use a KXMLCORE prefix since it's a macro and in a
header.

Should remove the FIXME comment talking about PPC.

Is there an easy way to catch the mistake of pairing fastMalloc with plain old
free or vice versa in a build where NDEBUG is not defined? If not, we should
try to come up with something.
Comment 4 Maciej Stachowiak 2005-10-02 23:57:03 PDT
It sounds from the comments on webkit-dev like the prefix solution will not work for Symbian or Linux.

Changes made in response to comments:

- config.h is now the first file everywhere
- if you don't include config.h but use new or delete, you now get a compile failure (on OS X at least)
- comments updated to match constants
- made an ifdef for use of ALWAYS_INLINE in the file
- put prefix on PRIVATE_INLINE
- removed now unneeded FIXME
Comment 5 Maciej Stachowiak 2005-10-03 00:35:26 PDT
Created attachment 4171 [details]
addressed darin's comments
Comment 6 Darin Adler 2005-10-03 09:38:25 PDT
Comment on attachment 4171 [details]
addressed darin's comments

Why remove the include of config.h from fpconst.cpp? It needs the definition of
WORDS_BIGENDIAN.

Otherwise looks fine. I'll mark review+, but please consider the fpconst.cpp
thing.
Comment 7 Maciej Stachowiak 2005-10-03 13:59:00 PDT
This is also <rdar://problem/4283967>