Bug 44806

Summary: Use purgeable memory to keep more dead resources in cache
Product: WebKit Reporter: Pratik Solanki <psolanki>
Component: PlatformAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, charles-webkit, dacarson, darin, ddkilzer, eric, ggaren, psolanki, socket.h, webkit.review.bot
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch ggaren: review+

Description Pratik Solanki 2010-08-27 17:40:08 PDT
When we have purgeable memory available, we should be able to hold on to more objects that the dead size limit of the WebCore cache. Since the data is kept in purgeable pages, we can keep the CachedResource objects around until the kernel decides to reclaim the pages. At which point we can delete the cached resources.
Comment 1 Pratik Solanki 2010-08-28 00:25:31 PDT
<rdar://problem/8350901>
Comment 2 Pratik Solanki 2010-08-28 00:37:16 PDT
Created attachment 65814 [details]
Patch
Comment 3 Darin Adler 2010-08-28 19:59:30 PDT
Comment on attachment 65814 [details]
Patch

Thanks, great to see you taking on this work!

> +    bool wasPurgeable = Cache::markMemoryPurgeableOnEvict() && resource && resource->isPurgeable();

A better name for this function would be shouldMarkMemoryPurgeableOnEviction. The phrase "mark memory purgeable on evict" is not as good, because it sounds like a function that will take action, not return a boolean, and "on evict" is unnecessarily ungrammatical and hence slightly confusing.

> +        // Add the size back since we had subtracted it when we marked the memory
> +        // as purgeable.

I suggest putting this comment all in one line. The line wouldn't be that long.

> +        int delta = resource->size();
> +        if (delta)
> +            adjustSize(resource->hasClients(), delta);

Do we really need the optimization for a resource with a size of zero? Is that a common case that needs to be faster than other cases? I would write:

    adjustSize(resource->hasClients(), resource->size());

By the way, if we did need it, I still think that scoping the local variable to the if would be nicer style.

> +                    // Pass false if we had already subtracted the resource size
> +                    // when we marked it as purgeable.

This comment would be better if it explained that passing false means. It's tantalizingly close to a comment that explains things, but I have to look up the contents of the evict function to understand what this mean. Also, would read better as a single line instead of broken up like this. It wouldn't be too long.

> +                    // evict the resource since we couldn't use purgeable
> +                    // memory.

Should capitalize the word "evict" and put this comment all on one line.

> +        if (!resource->makePurgeable(true))
> +            return false;

Not your fault, but call sites like this are the reason we don’t like

> +        // Subtract from our size totals.
> +        int delta = -static_cast<int>(resource->size());
> +        if (delta)
> +            adjustSize(resource->hasClients(), delta);

Is the static_cast here really needed? Why exactly?

I don't think the comment says anything the code doesn't already say, so you should omit it.

Same comment as earlier about optimizing the zero size case. Why do we need to optimize that? Is it particularly common?

> +        return true;
> +    }
> +    return false;

I think an early return would be better style than nesting all the code inside the resource->inCache check.

> +        if (decrementSize) {
> +            // Subtract from our size totals.
> +            int delta = -static_cast<int>(resource->size());
> +            if (delta)
> +                adjustSize(resource->hasClients(), delta);
> +        }

Same comments as above again.

> +#if ENABLE(PURGEABLE_MEMORY_MARK_ON_EVICTION)
> +/* With this define enabled, we no longer have an explicit bound on the number
> + * of dead resources. The normal behaviour is that dead resources have their
> + * memory marked as purgeable and as we pruned the dead resources list, we
> + * evicted them from cache and deleted them. We change this behaviour to the
> + * following
> + *
> + * 1. Dead resources in the cache are kept in non-purgeable memory.
> + * 2. When we prune dead resources, instead of freeing them, we mark their
> + *    memory as purgeable and keep the resources until the kernel reclaims the
> + *    purgeable memory.
> + *
> + * By leaving the in-cache dead resources in dirty resident memory, we decrease
> + * the likelyhood of the kernel claiming that memory and forcing us to refetch
> + * the resource (for example when a user presses back).
> + *
> + * And by having an unbounded number of resource objects using purgeable
> + * memory, we can utilize as much memory as is available on the machine. We are
> + * bounded by how much purgeable memory the kernel will alow us to have. We do
> + * delete CachedResource objects - it happens as part of pruneDeadResources if
> + * we notice that the memory used by the resource has been purged by the
> + * kernel. The trade-off here is that the CachedResource object (and its member
> + * variables) itself is allocated in non-purgeable TC-malloc'd memory so we
> + * would see slightly more memory use due to this.
> + */
> +#endif

We use // for comments, not /* */. If your reason to use /* */ is because it's a long comment and you want to make it easier to edit, that is defeated by putting a * on every line.

It doesn't make a lot of sense to have a comment inside an #if, so you should find another way to tie it together.

We use US spelling in comments in WebKit, so it should be "behavior" rather than "behaviour".

The word "utilize" is no better than the word "use", so please don't utilize it. ;-)

I believe the word is spelled "likelihood", not "likelyhood".

You should also revisit this comment and see if you can say the same thing with fewer words. It's a little too long for people to read, so not as valuable as a terser comment that says the same thing.


> +#if ENABLE(PURGEABLE_MEMORY_MARK_ON_EVICTION)
> +    static bool markMemoryPurgeableOnEvict() { return true; }
> +#else
> +    static bool markMemoryPurgeableOnEvict() { return false; }
> +#endif

It would be better to have the #if around the body of the function rather than the definition. I suggest moving the function outside the class definition. It can still be inline:

    inline bool Cache::shouldMarkMemoryPurgableOnEviction()
    {
#if /* SUITABLE CONDITION HERE, PERHAPS PLATFORM(IOS) */
        return true;
#else
        return false;
#endif
    }

I'm not sure we should have a special #if for this, given that it's controlled in exactly one place. It seems that one place could have the suitable #if. But I suppose you can leave that for now. It's a real shame to have something in Platform.h that really only needs to be in Cache.h, though.

> +    void evict(CachedResource*, bool decrementSize = true);

It's quite unpleasant to have a boolean argument to this function that tells it to "decrement size". First of all, it should be "shouldDecrementSize", but also, that's too low level a concept for the function argument. It's just not clear why would you pass false in some cases and true in others? Can't evict make the decision itself based on looking at resource->wasPurged()? That would be cleaner. Or the boolean could be "size already removed from cache at purge time", which would be clearer and easier to set correctly for callers.

> -    if (isSafeToMakePurgeable())
> -        makePurgeable(true);
> +    if (!Cache::markMemoryPurgeableOnEvict()) {
> +        if (isSafeToMakePurgeable())
> +            makePurgeable(true);
> +    }

I think && would be better here than nesting ifs, but also, can the check go lower level, inside isSafeToMakePurgeable and markPurgeable?

> -        makePurgeable(true);
> +        if (!Cache::markMemoryPurgeableOnEvict())
> +            makePurgeable(true);

Same question.

> -    if (isSafeToMakePurgeable())
> -        makePurgeable(true);
> +    if (!Cache::markMemoryPurgeableOnEvict()) {
> +        if (isSafeToMakePurgeable())
> +            makePurgeable(true);
> +    }

And again.

I'm sure at least some of my suggestions aren't practical, but at least some of them are good ideas. I'm going to say review+, but I think you may want another round of review after taking at least some of my advice.
Comment 4 Pratik Solanki 2010-08-29 23:46:03 PDT
(In reply to comment #3)
> (From update of attachment 65814 [details])

> > +        int delta = resource->size();
> > +        if (delta)
> > +            adjustSize(resource->hasClients(), delta);
> 
> Do we really need the optimization for a resource with a size of zero? Is that a common case that needs to be faster than other cases? I would write:
> 
>     adjustSize(resource->hasClients(), resource->size());

That part of the code was copied from existing code. It seems to be the pattern in Cache.cpp to test for a non-zero delta before calling adjustSize). Looking at the code though, I don't see why we need to test for 0. I could change it if you would prefer.
 
> By the way, if we did need it, I still think that scoping the local variable to the if would be nicer style.

Sure I can do that.

> > +        if (!resource->makePurgeable(true))
> > +            return false;
> 
> Not your fault, but call sites like this are the reason we don’t like

I think you forgot to finish this sentence.

> > +        // Subtract from our size totals.
> > +        int delta = -static_cast<int>(resource->size());
> > +        if (delta)
> > +            adjustSize(resource->hasClients(), delta);
> 
> Is the static_cast here really needed? Why exactly?
> 
> I don't think the comment says anything the code doesn't already say, so you should omit it.
> 
> Same comment as earlier about optimizing the zero size case. Why do we need to optimize that? Is it particularly common?

This part of the code was copied from evict(). It does the static_cast as well. resource->size() returns an unsigned int so maybe the cast was added for that reason?

> > +        return true;
> > +    }
> > +    return false;
> 
> I think an early return would be better style than nesting all the code inside the resource->inCache check.

Hazards of copy-paste code... But you're right. I'll fix that.

> > +        if (decrementSize) {
> > +            // Subtract from our size totals.
> > +            int delta = -static_cast<int>(resource->size());
> > +            if (delta)
> > +                adjustSize(resource->hasClients(), delta);
> > +        }
> 
> Same comments as above again.
> 
> > +#if ENABLE(PURGEABLE_MEMORY_MARK_ON_EVICTION)
> > +/* With this define enabled, we no longer have an explicit bound on the number
> > + * of dead resources. The normal behaviour is that dead resources have their
> > + * memory marked as purgeable and as we pruned the dead resources list, we
> > + * evicted them from cache and deleted them. We change this behaviour to the
> > + * following
> > + *
> > + * 1. Dead resources in the cache are kept in non-purgeable memory.
> > + * 2. When we prune dead resources, instead of freeing them, we mark their
> > + *    memory as purgeable and keep the resources until the kernel reclaims the
> > + *    purgeable memory.
> > + *
> > + * By leaving the in-cache dead resources in dirty resident memory, we decrease
> > + * the likelyhood of the kernel claiming that memory and forcing us to refetch
> > + * the resource (for example when a user presses back).
> > + *
> > + * And by having an unbounded number of resource objects using purgeable
> > + * memory, we can utilize as much memory as is available on the machine. We are
> > + * bounded by how much purgeable memory the kernel will alow us to have. We do
> > + * delete CachedResource objects - it happens as part of pruneDeadResources if
> > + * we notice that the memory used by the resource has been purged by the
> > + * kernel. The trade-off here is that the CachedResource object (and its member
> > + * variables) itself is allocated in non-purgeable TC-malloc'd memory so we
> > + * would see slightly more memory use due to this.
> > + */
> > +#endif
> 
> We use // for comments, not /* */. If your reason to use /* */ is because it's a long comment and you want to make it easier to edit, that is defeated by putting a * on every line.

Ok. I can change it to //. /* */ is just a style I like using. I use vim so editing such comments is actually not a problem at all for me :). But I'll fix it. 

> It doesn't make a lot of sense to have a comment inside an #if, so you should find another way to tie it together.

It explains what happens when the macro is defined. Sort of like saying, this comment only valid when the #if is 1. Again, something I had seen in other projects and liked. I can just change the comment to say that it only applies when the ENABLE is 1.
 
> We use US spelling in comments in WebKit, so it should be "behavior" rather than "behaviour".

Sure. I can fix that.

> The word "utilize" is no better than the word "use", so please don't utilize it. ;-)

:)
 
> You should also revisit this comment and see if you can say the same thing with fewer words. It's a little too long for people to read, so not as valuable as a terser comment that says the same thing.

Good point. I'll try to strip it down to fewer words. 

> > +#if ENABLE(PURGEABLE_MEMORY_MARK_ON_EVICTION)
> > +    static bool markMemoryPurgeableOnEvict() { return true; }
> > +#else
> > +    static bool markMemoryPurgeableOnEvict() { return false; }
> > +#endif
> 
> It would be better to have the #if around the body of the function rather than the definition. I suggest moving the function outside the class definition. It can still be inline:

Ok.

>     inline bool Cache::shouldMarkMemoryPurgableOnEviction()
>     {
> #if /* SUITABLE CONDITION HERE, PERHAPS PLATFORM(IOS) */
>         return true;
> #else
>         return false;
> #endif
>     }
> 
> I'm not sure we should have a special #if for this, given that it's controlled in exactly one place. It seems that one place could have the suitable #if. But I suppose you can leave that for now. It's a real shame to have something in Platform.h that really only needs to be in Cache.h, though.

I'll try to rework this. Seems like it might be possible to remove this from Platform.h and just have it in Cache.h

> > +    void evict(CachedResource*, bool decrementSize = true);
> 
> It's quite unpleasant to have a boolean argument to this function that tells it to "decrement size". First of all, it should be "shouldDecrementSize", but also, that's too low level a concept for the function argument. It's just not clear why would you pass false in some cases and true in others? Can't evict make the decision itself based on looking at resource->wasPurged()? That would be cleaner. Or the boolean could be "size already removed from cache at purge time", which would be clearer and easier to set correctly for callers.
> 
> > -    if (isSafeToMakePurgeable())
> > -        makePurgeable(true);
> > +    if (!Cache::markMemoryPurgeableOnEvict()) {
> > +        if (isSafeToMakePurgeable())
> > +            makePurgeable(true);
> > +    }
> 
> I think && would be better here than nesting ifs, but also, can the check go lower level, inside isSafeToMakePurgeable and markPurgeable?

&& is easy to do. I'll have to investigate if the check can go lower into makePurgeable(). 

> > -        makePurgeable(true);
> > +        if (!Cache::markMemoryPurgeableOnEvict())
> > +            makePurgeable(true);
> 
> Same question.
> 
> > -    if (isSafeToMakePurgeable())
> > -        makePurgeable(true);
> > +    if (!Cache::markMemoryPurgeableOnEvict()) {
> > +        if (isSafeToMakePurgeable())
> > +            makePurgeable(true);
> > +    }
> 
> And again.
> 
> I'm sure at least some of my suggestions aren't practical, but at least some of them are good ideas. I'm going to say review+, but I think you may want another round of review after taking at least some of my advice.

I'll have a new patch up addressing the comments. Thanks a lot for the review!
Comment 5 Pratik Solanki 2010-08-30 15:33:54 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > (From update of attachment 65814 [details] [details])

> > Do we really need the optimization for a resource with a size of zero? Is that a common case that needs to be faster than other cases? I would write:
 
> > By the way, if we did need it, I still think that scoping the local variable to the if would be nicer style.

I'm leaving the check for 0 in to be consistent. If we want to get rid of the check, let me know and I can clean up all the call sites in a follow on bug.
 
> > Is the static_cast here really needed? Why exactly?
> This part of the code was copied from evict(). It does the static_cast as well. resource->size() returns an unsigned int so maybe the cast was added for that reason?

Leaving the static_cast in as well since resource->size() returns unsigned.
  
> > > -    if (isSafeToMakePurgeable())
> > > -        makePurgeable(true);
> > > +    if (!Cache::markMemoryPurgeableOnEvict()) {
> > > +        if (isSafeToMakePurgeable())
> > > +            makePurgeable(true);
> > > +    }
> > 
> > I think && would be better here than nesting ifs, but also, can the check go lower level, inside isSafeToMakePurgeable and markPurgeable?
> 
> && is easy to do. I'll have to investigate if the check can go lower into makePurgeable(). 

It didn't look straightforward to move the check down to makePurgeable() so I'm leaving it where it is right now.
Comment 6 Pratik Solanki 2010-08-30 15:34:50 PDT
Created attachment 65971 [details]
Patch
Comment 7 Darin Adler 2010-08-30 15:46:31 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Do we really need the optimization for a resource with a size of zero? Is that a common case that needs to be faster than other cases?
> 
> I'm leaving the check for 0 in to be consistent. If we want to get rid of the check, let me know and I can clean up all the call sites in a follow on bug.

Yes, I think we should get rid of the check.

> > > Is the static_cast here really needed? Why exactly?
> > This part of the code was copied from evict(). It does the static_cast as well. resource->size() returns an unsigned int so maybe the cast was added for that reason?
> 
> Leaving the static_cast in as well since resource->size() returns unsigned.

That’s not a good reason.

We can use a "-" with an unsigned. The extra cast doesn’t do anything helpful; just makes the code harder to read. It’s not good to copy that mistake from the other code.

> > > > -    if (isSafeToMakePurgeable())
> > > > -        makePurgeable(true);
> > > > +    if (!Cache::markMemoryPurgeableOnEvict()) {
> > > > +        if (isSafeToMakePurgeable())
> > > > +            makePurgeable(true);
> > > > +    }
> > > 
> > > I think && would be better here than nesting ifs, but also, can the check go lower level, inside isSafeToMakePurgeable and markPurgeable?
> > 
> > && is easy to do. I'll have to investigate if the check can go lower into makePurgeable(). 
> 
> It didn't look straightforward to move the check down to makePurgeable() so I'm leaving it where it is right now.

You don’t say anything concrete about why it didn’t look straightforward. It’s unfortunate to have checks in multiple places and wonder if we forgot one of the call sites.
Comment 8 Pratik Solanki 2010-08-30 16:27:41 PDT
(In reply to comment #7)

> Yes, I think we should get rid of the check.

Ok.
 
> That’s not a good reason.
> 
> We can use a "-" with an unsigned. The extra cast doesn’t do anything helpful; just makes the code harder to read. It’s not good to copy that mistake from the other code.

Ok. Changed.
 
> > It didn't look straightforward to move the check down to makePurgeable() so I'm leaving it where it is right now.
> 
> You don’t say anything concrete about why it didn’t look straightforward. It’s unfortunate to have checks in multiple places and wonder if we forgot one of the call sites.

Mainly because CachedResource::makePurgeable() gets called from Cache class as well and it needs to do its work when called from there. So I can't just have makePurgeable() look at the flag and decide what to do. I need to look at the caller as well. Which is why the check is at the call site. How can I design this better?
Comment 9 Dmitry Gorbik 2010-08-30 16:38:06 PDT
I suppose it is better to track purgeable and in memory data consumption separately to make the code more clear.
https://bugs.webkit.org/show_bug.cgi?id=44878
Comment 10 Pratik Solanki 2010-08-30 16:51:44 PDT
Created attachment 65979 [details]
Patch
Comment 11 WebKit Review Bot 2010-08-30 21:53:49 PDT
Attachment 65979 [details] did not build on win:
Build output: http://queues.webkit.org/results/3828169
Comment 12 Pratik Solanki 2010-09-20 16:28:35 PDT
I was on vacation and just got back. I'm not sure why the Windows bot failed and the error log is quite unhelpful. I am updating the patch to latest webkit sources and asking for review again.
Comment 13 Pratik Solanki 2010-09-20 16:30:38 PDT
Created attachment 68154 [details]
Patch
Comment 14 WebKit Review Bot 2010-09-21 01:32:39 PDT
Attachment 68154 [details] did not build on win:
Build output: http://queues.webkit.org/results/4092022
Comment 15 Pratik Solanki 2010-09-21 13:17:32 PDT
(In reply to comment #14)
> Attachment 68154 [details] did not build on win:
> Build output: http://queues.webkit.org/results/4092022

Very odd. I have no idea why my patch failed to build on windows and the output is not at all useful.
Comment 16 Darin Adler 2010-09-21 13:38:17 PDT
(In reply to comment #15)
> Very odd. I have no idea why my patch failed to build on windows and the output is not at all useful.

Yes, the Early Warning System uses the free version of Microsoft’s development tools, which don’t put errors where we can see them. In some cases people just land the patches and then fix the failure as seen on the bots, which have the non-free versions and so we can see the errors.
Comment 17 Geoffrey Garen 2010-09-21 16:22:27 PDT
Comment on attachment 68154 [details]
Patch

r=me

I'd recommend landing and watching the Windows buildbot to get real build output. Should be easy enough to fix right away, but if it's not, we can always roll out and try again later.
Comment 18 Pratik Solanki 2010-09-21 16:27:01 PDT
Thanks for the review Geoff. I'll keep an eye on the Windows buildbot.
Comment 19 Geoffrey Garen 2010-09-21 17:46:31 PDT
Committed revision 67998.

Committed revision 67996.
Comment 20 WebKit Review Bot 2010-09-21 17:52:01 PDT
http://trac.webkit.org/changeset/67996 might have broken GTK Linux 32-bit Release
The following changes are on the blame list:
http://trac.webkit.org/changeset/67994
http://trac.webkit.org/changeset/67995
http://trac.webkit.org/changeset/67996