Bug 27236 - Memory is not released back to the system from TCMalloc on Windows
Summary: Memory is not released back to the system from TCMalloc on Windows
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Ada Chan
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2009-07-13 13:48 PDT by Ada Chan
Modified: 2009-07-28 18:11 PDT (History)
5 users (show)

See Also:


Attachments
This patch implements TCMalloc_SystemRelease() on windows and adds a new mechanism to release memory back to the system via a background thread. (17.54 KB, patch)
2009-07-13 13:57 PDT, Ada Chan
darin: review+
Details | Formatted Diff | Diff
Patch to remove TCMALLOC_TRACK_DECOMMITED_SPANS (4.27 KB, patch)
2009-07-24 18:34 PDT, Ada Chan
mrowe: review+
Details | Formatted Diff | Diff
Revised patch to use a background thread to release memory back to the system. (15.45 KB, patch)
2009-07-27 16:56 PDT, Ada Chan
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ada Chan 2009-07-13 13:48:22 PDT
Right now TCMalloc_SystemRelease() is not implemented on Windows, which means we are not releasing memory back to the system.
Comment 1 Ada Chan 2009-07-13 13:57:34 PDT
Created attachment 32677 [details]
This patch implements TCMalloc_SystemRelease() on windows and adds a new mechanism to release memory back to the system via a background thread.

This patch implements TCMalloc_SystemRelease() on windows and adds a new mechanism to release memory back to the system via a background thread.  That new mechanism is turned on by default.  

Every 5 seconds, the background thread wakes up and does the following:
- Check if the number of pages allocated in the last 5 seconds is greater than 1/3 of the free committed pages.  If so, skip this scavenge because it's a sign that there is a lot of activity going on right now and thrashing may happen if we release memory back to the system now.
- Otherwise, go through the list of free spans (from largest to smallest) and release up to 1/3 of the free committed pages back to the system.  We can release more if there's absolutely no activity going on in the last 5 seconds.
- If the number of free committed pages reaches kMinimumFreeCommittedPageCount, we can stop the scavenging and block the scavenging thread until the number of free committed pages goes above kMinimumFreeCommittedPageCount.
Comment 2 Ada Chan 2009-07-13 14:01:28 PDT
<rdar://problem/5649058>
Comment 3 Mark Rowe (bdash) 2009-07-13 14:19:31 PDT
Comment on attachment 32677 [details]
This patch implements TCMalloc_SystemRelease() on windows and adds a new mechanism to release memory back to the system via a background thread.

In theory our changes to TCMalloc should be wrapped in "#ifdef WTF_CHANGES".

> +#if USE_BACKGROUND_THREAD_TO_SCAVENGE_MEMORY
> +#if PLATFORM(WIN)
> +  static void* runScavengerThread(void*);
> +
> +  void scavengerThread();
> +#else
> +  static NO_RETURN void* runScavengerThread(void*);
> +
> +  NO_RETURN void scavengerThread();
> +#endif

Why does this need to be #if'd for PLATFORM(WIN)?  NO_RETURN should expand away to nothing for non-GCC compilers already.

> +#if TCMALLOC_TRACK_DECOMMITED_SPANS
> +  inline bool shouldContinueScavenging() const { return free_committed_pages_ > kMinimumFreeCommittedPageCount; }
> +#else
> +  inline bool shouldContinueScavenging() const { return true; }
> +#endif

It seems that this will lead to Tiger and Leopard to continue waking up and attempting to scavenge even when there's nothing left for it to do.

Have we tested the performance and memory use impact of this patch Tiger or Leopard, where we don't track decommitted spans?

> Index: wtf/TCSpinLock.h
> ===================================================================
> --- wtf/TCSpinLock.h	(revision 45828)
> +++ wtf/TCSpinLock.h	(working copy)
> @@ -209,6 +209,12 @@ struct TCMalloc_SpinLock {
>    inline void Unlock() {
>      if (pthread_mutex_unlock(&private_lock_) != 0) CRASH();
>    }
> +  inline bool IsHeld() {
> +    if (pthread_mutex_trylock(&private_lock_))
> +        return true;
> +    pthread_mutex_unlock(&private_lock_);
> +    return false;
> +  }
>  };

This change seems unrelated to the rest of the patch.
Comment 4 Ada Chan 2009-07-13 15:08:26 PDT
I'll remove the #if PLATFORM(WIN) around the declarations for runScavengerThread() and scavengerThread().

I'll make a separate patch later to define NO_RETURN for Windows.

I'll remove the changes in TCSpinLock.h.

What do you recommend we do for Tiger and Leopard that don't track decommitted spans?
Comment 5 Ada Chan 2009-07-13 15:14:02 PDT
Mark, regarding your suggestion about wrapping the changes with "#ifdef WTF_CHANGES": since all the changes in FastMalloc.cpp are wrapped with USE_BACKGROUND_THREAD_TO_SCAVENGE_MEMORY, can I just wrap the #define for USE_BACKGROUND_THREAD_TO_SCAVENGE_MEMORY with "#ifdef WTF_CHANGES"?

Do I need wrap changes in TCSystemAlloc.cpp with "#ifdef WTF_CHANGES" also?  They are already wrapped with #if HAVE(VIRTUALALLOC).
Comment 6 Mark Rowe (bdash) 2009-07-13 19:10:38 PDT
(In reply to comment #5)
> Mark, regarding your suggestion about wrapping the changes with "#ifdef
> WTF_CHANGES": since all the changes in FastMalloc.cpp are wrapped with
> USE_BACKGROUND_THREAD_TO_SCAVENGE_MEMORY, can I just wrap the #define for
> USE_BACKGROUND_THREAD_TO_SCAVENGE_MEMORY with "#ifdef WTF_CHANGES"?
> 
> Do I need wrap changes in TCSystemAlloc.cpp with "#ifdef WTF_CHANGES" also? 
> They are already wrapped with #if HAVE(VIRTUALALLOC).

Yeah, I guess it's not really worth the hassle.
Comment 7 Mark Rowe (bdash) 2009-07-13 19:15:06 PDT
(In reply to comment #3)
> It seems that this will lead to Tiger and Leopard to continue waking up and
> attempting to scavenge even when there's nothing left for it to do.
> 
> Have we tested the performance and memory use impact of this patch Tiger or
> Leopard, where we don't track decommitted spans?

Another thing to  consider would be always enabling the tracking of decommitted spans, even on Tiger or Leopard.  This would simplify a bunch of the #ifdefs that this patch introduces.  I initially added the #ifdef because the code was a slight performance regression on Leopard.  We could re-measure the impact that this code has and if it's small enough remove the #ifdefs.
Comment 8 Ada Chan 2009-07-13 21:15:38 PDT
(In reply to comment #7)
> Another thing to  consider would be always enabling the tracking of decommitted
> spans, even on Tiger or Leopard.  This would simplify a bunch of the #ifdefs
> that this patch introduces.  I initially added the #ifdef because the code was
> a slight performance regression on Leopard.  We could re-measure the impact
> that this code has and if it's small enough remove the #ifdefs.

I've asked Stephanie to measure the impact of always enabling the tracking of decommitted spans on Leopard.
Comment 9 Darin Adler 2009-07-14 12:06:54 PDT
Comment on attachment 32677 [details]
This patch implements TCMalloc_SystemRelease() on windows and adds a new mechanism to release memory back to the system via a background thread.

This patch has a lot of uses of reinterpret_cast in cases where static_cast can be used. To convert from void* to a specific type, static_cast works and is preferred.

> +#if PLATFORM(WIN)
> +  static void* runScavengerThread(void*);
> +
> +  void scavengerThread();
> +#else
> +  static NO_RETURN void* runScavengerThread(void*);
> +
> +  NO_RETURN void scavengerThread();
> +#endif

Can we just use the return 0 all the time and avoid the conditional code here and in the definition of the function? Or does GCC give us a warning about NO_RETURN if we don't specify it?

> +#if TCMALLOC_TRACK_DECOMMITED_SPANS
> +  inline bool shouldContinueScavenging() const { return free_committed_pages_ > kMinimumFreeCommittedPageCount; }
> +#else
> +  inline bool shouldContinueScavenging() const { return true; }
> +#endif

I'd prefer to have the function declaration be outside an ifdef, and have the implementation be ifdef'd.

Also, the "inline" here is not needed. If you define a function inside a class definition then it's automatically treated as if "inline" was specified.

> +void TCMalloc_PageHeap::scavenge() {

Since this is new code, you could follow the normal WebKit style of putting the brace on a new line.

> +#if PLATFORM(WIN)
> +      ::Sleep(kScavengeTimerDelayInSeconds * 1000);
> +#else
> +      sleep(kScavengeTimerDelayInSeconds);
> +#endif

Can we make a helper function for this so we don't have to have the #if right in the middle of the code's logic? Maybe even define a sleep function for Windows, or if not, just define a function that has the call to either sleep or Sleep in it.

> +    void* end = (char*)start + length;

This should be static_cast, not a C-style cast.

> +        size_t resultSize = VirtualQuery(ptr, &info, sizeof(info));
> +        UNUSED_PARAM(resultSize);
> +        ASSERT(resultSize == sizeof(info));

You should use ASSERT_UNUSED here, not UNUSED_PARAM. For one thing, this is not a parameter!

> +        BOOL success = VirtualFree(ptr, length < info.RegionSize ? length : info.RegionSize, MEM_DECOMMIT);
> +        UNUSED_PARAM(success);
> +        ASSERT(success);

Ditto.

> +    void* end = (char*)start + length;

Ditto.

> +        size_t resultSize = VirtualQuery(ptr, &info, sizeof(info));
> +        UNUSED_PARAM(resultSize);
> +        ASSERT(resultSize == sizeof(info));

Ditto.

> +        UNUSED_PARAM(newAddress);
> +        ASSERT(newAddress == ptr);

Ditto.

r=me
Comment 10 Ada Chan 2009-07-14 15:35:34 PDT
(In reply to comment #9)
> (From update of attachment 32677 [details])
> This patch has a lot of uses of reinterpret_cast in cases where static_cast can
> be used. To convert from void* to a specific type, static_cast works and is
> preferred.
> 
> > +#if PLATFORM(WIN)
> > +  static void* runScavengerThread(void*);
> > +
> > +  void scavengerThread();
> > +#else
> > +  static NO_RETURN void* runScavengerThread(void*);
> > +
> > +  NO_RETURN void scavengerThread();
> > +#endif
> 
> Can we just use the return 0 all the time and avoid the conditional code here
> and in the definition of the function? Or does GCC give us a warning about
> NO_RETURN if we don't specify it?

If I always return 0 in runScavengerThread(), the release build in Visual Studio will fail complaining about unreachable code.  I've talked to Mark about this and we've decided to just always have:  

static NO_RETURN void* runScavengerThread(void*);
NO_RETURN void scavengerThread();

This should compile as is because this code is turned off in the debug build anyway.  Later on, we'll have a separate patch to define NO_RETURN appropriately on windows.

> 
> > +#if TCMALLOC_TRACK_DECOMMITED_SPANS
> > +  inline bool shouldContinueScavenging() const { return free_committed_pages_ > kMinimumFreeCommittedPageCount; }
> > +#else
> > +  inline bool shouldContinueScavenging() const { return true; }
> > +#endif
> 
> I'd prefer to have the function declaration be outside an ifdef, and have the
> implementation be ifdef'd.
> 
> Also, the "inline" here is not needed. If you define a function inside a class
> definition then it's automatically treated as if "inline" was specified.

OK.

Mark also pointed out that we should test this code on Leopard/Tiger because apparently TCMALLOC_TRACK_DECOMMITTED_SPANS is not defined for those cases.  We may end up turning that on for those platforms if it no longer hurts performance.

> > +void TCMalloc_PageHeap::scavenge() {
> 
> Since this is new code, you could follow the normal WebKit style of putting the
> brace on a new line.

Done.

> 
> > +#if PLATFORM(WIN)
> > +      ::Sleep(kScavengeTimerDelayInSeconds * 1000);
> > +#else
> > +      sleep(kScavengeTimerDelayInSeconds);
> > +#endif
> 
> Can we make a helper function for this so we don't have to have the #if right
> in the middle of the code's logic? Maybe even define a sleep function for
> Windows, or if not, just define a function that has the call to either sleep or
> Sleep in it.

I can define a sleep function on windows.  Should I define it locally in FastMalloc.cpp or do you recommend a better place for it?

> 
> > +    void* end = (char*)start + length;
> 
> This should be static_cast, not a C-style cast.

Done.

> 
> > +        size_t resultSize = VirtualQuery(ptr, &info, sizeof(info));
> > +        UNUSED_PARAM(resultSize);
> > +        ASSERT(resultSize == sizeof(info));
> 
> You should use ASSERT_UNUSED here, not UNUSED_PARAM. For one thing, this is not
> a parameter!

Done.  And I'll fix the other instances.

Thanks for the review!  I'm waiting for results from Stephanie in testing this patch on Leopard.
Comment 11 Stephanie Lewis 2009-07-15 20:24:51 PDT
I test the patch and tested tracking decommitted regions on Leopard.  Neither were a regression on SunSpider.

Tracking decommitted regions is a .4% PLT regression on Leopard (i.e. just big enough to tell that we are slower)

The patch was a .6% PLT regression on Leopard.
Comment 12 Ada Chan 2009-07-16 13:59:55 PDT
(In reply to comment #11)
> I test the patch and tested tracking decommitted regions on Leopard.  Neither
> were a regression on SunSpider.
> 
> Tracking decommitted regions is a .4% PLT regression on Leopard (i.e. just big
> enough to tell that we are slower)
> 
> The patch was a .6% PLT regression on Leopard.

Stephanie, have you tried the patch + turning on tracking decommitted regions on Leopard?
Comment 13 Ada Chan 2009-07-24 18:34:28 PDT
Created attachment 33482 [details]
Patch to remove TCMALLOC_TRACK_DECOMMITED_SPANS
Comment 14 Mark Rowe (bdash) 2009-07-24 18:39:32 PDT
Comment on attachment 33482 [details]
Patch to remove TCMALLOC_TRACK_DECOMMITED_SPANS

> Index: JavaScriptCore/ChangeLog
> ===================================================================
> --- JavaScriptCore/ChangeLog	(revision 46386)
> +++ JavaScriptCore/ChangeLog	(working copy)
> @@ -1,3 +1,18 @@
> +2009-07-24  Ada Chan  <adachan@apple.com>
> +
> +        In preparation for https://bugs.webkit.org/show_bug.cgi?id=27236:
> +        Remove TCMALLOC_TRACK_DECOMMITED_SPANS.  We'll always track decommitted spans.

It's probably worth mentioning that this was performance tested and had little impact.

r=me
Comment 15 Ada Chan 2009-07-27 16:56:47 PDT
Created attachment 33581 [details]
Revised patch to use a background thread to release memory back to the system.

This patch has been revised from the previous version.  The condition of whether we should hold off on the scavenge has changed.  We'll check whether we need to commit any memory in the last 5 seconds.  If so, it's a sign that there are not enough free committed pages around for the amount of allocations that we have done recently, so we should hold off on releasing memory.
Comment 16 Darin Adler 2009-07-28 15:06:23 PDT
Comment on attachment 33581 [details]
Revised patch to use a background thread to release memory back to the system.

> +            if (!s->decommitted)
> +                pagesDecommitted += s->length;
> +            s->decommitted = true;

Since you already have the if statement, you could move the assignment into it.

> +#if PLATFORM(WIN)
> +static void sleep(unsigned seconds)
> +{
> +    ::Sleep(seconds * 1000);
> +}
> +#endif

I usually prefer to put these platform-specific hacks into WTF headers. This is OK for now but I'd like to consider later having this be in some header with other <unistd.h> implementations for non-Unix platforms. Or a thread helper function header.

> +        BOOL success = VirtualFree(ptr, length < info.RegionSize ? length : info.RegionSize, MEM_DECOMMIT);
> +        ASSERT_UNUSED(success, success);
> +        length -= info.RegionSize;
> +        ptr = static_cast<char*>(ptr) + info.RegionSize;

I think the way the code modifies length here is a little confusing and unnecessary. Instead I would write something like this:

    size_t decommitSize = min(info.RegionSize, end - ptr);

And use that instead of the ?: and not change length each time through the loop.

> +    return;
> +}

You don't need a return here.

> +        void *newAddress = VirtualAlloc(ptr, length < info.RegionSize ? length : info.RegionSize, MEM_COMMIT, PAGE_READWRITE);
> +        ASSERT_UNUSED(newAddress, newAddress == ptr);
> +        length -= info.RegionSize;
> +        ptr = static_cast<char*>(ptr) + info.RegionSize;
> +    }
> +    return;
>  }

Same comments as above.

r=me, but please consider the tweaks I mentioned.
Comment 17 Ada Chan 2009-07-28 18:11:47 PDT
Fixed in r46511.