Bug 67823 - [NRWT] Provides a simple LRU cache class in Python.
Summary: [NRWT] Provides a simple LRU cache class in Python.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Hayato Ito
URL:
Keywords:
Depends on:
Blocks: 67479
  Show dependency treegraph
 
Reported: 2011-09-08 17:42 PDT by Hayato Ito
Modified: 2011-09-25 18:12 PDT (History)
3 users (show)

See Also:


Attachments
Patch (8.25 KB, patch)
2011-09-14 00:01 PDT, Ai Makabi
no flags Details | Formatted Diff | Diff
Patch (8.88 KB, patch)
2011-09-14 06:25 PDT, Ai Makabi
no flags Details | Formatted Diff | Diff
Patch (5.91 KB, patch)
2011-09-15 07:27 PDT, Ai Makabi
no flags Details | Formatted Diff | Diff
Patch (9.27 KB, patch)
2011-09-15 18:50 PDT, Ai Makabi
no flags Details | Formatted Diff | Diff
Patch (9.29 KB, patch)
2011-09-15 22:27 PDT, Ai Makabi
no flags Details | Formatted Diff | Diff
Patch (9.29 KB, patch)
2011-09-16 00:26 PDT, Ai Makabi
no flags Details | Formatted Diff | Diff
fix_leak (9.38 KB, patch)
2011-09-21 22:34 PDT, Hayato Ito
tony: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hayato Ito 2011-09-08 17:42:27 PDT
In CSSWG reftests, there are a lot of cases that the same HTML file is used as a reference file in many tests.
It should be avoided that calling DumpRenderTree for the same file many times. We should cache the rendering result somehow for the same HTML file.
Note that we need image itself as weel as image hash so that we can 'diff' images later.
At the same time, we shouldn't cache all results because it consumes a lot of memory.

It'd be nice to have simple LRU cache class because there is locality of reference in such HTML files.
Comment 1 Ai Makabi 2011-09-14 00:01:24 PDT
Created attachment 107297 [details]
Patch
Comment 2 Hayato Ito 2011-09-14 01:33:58 PDT
Comment on attachment 107297 [details]
Patch

Ai-san, thank you for the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=107297&action=review

> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:1
> +#!/usr/bin/env python

I think this file should move to webkitpy/common/ or webkitpy/common/system directory.
And lru_cache.py might be better file name than lru.py

> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:28
> +class Node():

It might be better that Node class should have 'value' member in addition to 'prev' and 'next'.
That would make the implementation of LRU class simpler because we don't have to have a tuple (value, Node(...)).

> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:36
> +

L36-L37: One blank line instead of two blank lines.

> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:40
> +    def __init__(self, size):

Could you add a blank line between Line39 and Line40?

> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:44
> +        self._size = size

What if size <= 0? Could you assert the value of |size| here?
  assert size > 0

> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:49
> +            self.first = key

I think self.first and self.last should point to the first Node and the last Node directly.

def __setitem__(self, key, value):
    if not self.first:
        node = Node(value, ,,,)
        self._dict[key] = node
        self.first = node
        self.last = node
        self.last_key = key  // I am not sure, but I guess we should keep track of the last_key to delete the last Node from the self_dict.

And Node class's |prev| and |next| also point to other Nodes directly, instead of having |key|.
That might make the implementation of this LRU class more natural.

That'd be nice if you could try to rewrite this class as I suggested.

> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:114
> +    def pop(self, key):

Do we need this |pop| method?
To look up a value by a key, 'lru[key]' is good enough because this class defines __getitems__(..) method.

> Tools/Scripts/webkitpy/layout_tests/reftests/lru_unittest.py:28
> +import lru

We should use absolute path in import.
like:
  from webkipy.common.xxx import lru
Comment 3 Ai Makabi 2011-09-14 06:25:54 PDT
Created attachment 107322 [details]
Patch
Comment 4 Ai Makabi 2011-09-14 06:27:34 PDT
I addressed your comment.(In reply to comment #2)
> (From update of attachment 107297 [details])
> Ai-san, thank you for the patch.
> 
> View in context: https://bugs.webkit.org/attachment.cgi?id=107297&action=review
> 
> > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:1
> > +#!/usr/bin/env python
> 
> I think this file should move to webkitpy/common/ or webkitpy/common/system directory.
> And lru_cache.py might be better file name than lru.py
> 
> > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:28
> > +class Node():
> 
> It might be better that Node class should have 'value' member in addition to 'prev' and 'next'.
> That would make the implementation of LRU class simpler because we don't have to have a tuple (value, Node(...)).
> 
> > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:36
> > +
> 
> L36-L37: One blank line instead of two blank lines.
> 
> > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:40
> > +    def __init__(self, size):
> 
> Could you add a blank line between Line39 and Line40?
> 
> > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:44
> > +        self._size = size
> 
> What if size <= 0? Could you assert the value of |size| here?
>   assert size > 0
> 
> > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:49
> > +            self.first = key
> 
> I think self.first and self.last should point to the first Node and the last Node directly.
> 
> def __setitem__(self, key, value):
>     if not self.first:
>         node = Node(value, ,,,)
>         self._dict[key] = node
>         self.first = node
>         self.last = node
>         self.last_key = key  // I am not sure, but I guess we should keep track of the last_key to delete the last Node from the self_dict.
> 
> And Node class's |prev| and |next| also point to other Nodes directly, instead of having |key|.
> That might make the implementation of this LRU class more natural.
> 
> That'd be nice if you could try to rewrite this class as I suggested.
> 
> > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:114
> > +    def pop(self, key):
> 
> Do we need this |pop| method?
> To look up a value by a key, 'lru[key]' is good enough because this class defines __getitems__(..) method.
> 
> > Tools/Scripts/webkitpy/layout_tests/reftests/lru_unittest.py:28
> > +import lru
> 
> We should use absolute path in import.
> like:
>   from webkipy.common.xxx import lru
Comment 5 Hayato Ito 2011-09-14 22:04:14 PDT
Comment on attachment 107322 [details]
Patch

Thank you! The patch looks good!
I think there are still some corner cases we must handle carefully.

View in context: https://bugs.webkit.org/attachment.cgi?id=107322&action=review

> Tools/Scripts/webkitpy/common/lru_cache.py:39
> +        size: a cache memory size

'capacity' might be a better name than 'size' in this context.

> Tools/Scripts/webkitpy/common/lru_cache.py:42
> +        assert size > 0, "You should set the LRUCache size to not less than zero."

You might want to include the actual value of |capacity| in the assertion message.
like:
  assert capacity > 0, "capacity (%s) must be greater than zero." % capacity

> Tools/Scripts/webkitpy/common/lru_cache.py:43
> +        self.first = None

if you don't want to expose |first| and |last|, that should be named like |_first| and |_last|.

> Tools/Scripts/webkitpy/common/lru_cache.py:49
> +        if self.first == None:

I think 
  if not self.first:
is more pythonic than
  if self.first == None:

At least it should be 'self.first is None' instead of 'self.first == None'.

> Tools/Scripts/webkitpy/common/lru_cache.py:65
> +        node = Node(key, value, self.first)

When _one_node is called, self.first is always None, isn't it?
Therefore this should be:
  node = Node(key, value, None)

> Tools/Scripts/webkitpy/common/lru_cache.py:71
> +        if self.first.key == key:

If self.first is None, this will fail.

> Tools/Scripts/webkitpy/common/lru_cache.py:99
> +            self.first = node.prev

If LRUCache contains one item, (k, v) and del[k] is called, it seems self.first will become 'None' here. So the following self.first.next will fail. Could you add a testcase for this?

And if self.first and self.last point to the same node, it seems we also have to set self.last = None here.
Comment 6 Ai Makabi 2011-09-15 07:27:23 PDT
Created attachment 107494 [details]
Patch
Comment 7 Ai Makabi 2011-09-15 07:29:49 PDT
I addressed your comment.

(In reply to comment #5)
> (From update of attachment 107322 [details])
> Thank you! The patch looks good!
> I think there are still some corner cases we must handle carefully.
> 
> View in context: https://bugs.webkit.org/attachment.cgi?id=107322&action=review
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:39
> > +        size: a cache memory size
> 
> 'capacity' might be a better name than 'size' in this context.
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:42
> > +        assert size > 0, "You should set the LRUCache size to not less than zero."
> 
> You might want to include the actual value of |capacity| in the assertion message.
> like:
>   assert capacity > 0, "capacity (%s) must be greater than zero." % capacity
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:43
> > +        self.first = None
> 
> if you don't want to expose |first| and |last|, that should be named like |_first| and |_last|.
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:49
> > +        if self.first == None:
> 
> I think 
>   if not self.first:
> is more pythonic than
>   if self.first == None:
> 
> At least it should be 'self.first is None' instead of 'self.first == None'.
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:65
> > +        node = Node(key, value, self.first)
> 
> When _one_node is called, self.first is always None, isn't it?
> Therefore this should be:
>   node = Node(key, value, None)
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:71
> > +        if self.first.key == key:
> 
> If self.first is None, this will fail.
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:99
> > +            self.first = node.prev
> 
> If LRUCache contains one item, (k, v) and del[k] is called, it seems self.first will become 'None' here. So the following self.first.next will fail. Could you add a testcase for this?
> 
> And if self.first and self.last point to the same node, it seems we also have to set self.last = None here.
Comment 8 Ai Makabi 2011-09-15 18:50:14 PDT
Created attachment 107582 [details]
Patch
Comment 9 Hayato Ito 2011-09-15 21:34:52 PDT
Comment on attachment 107582 [details]
Patch

Thank you for addressing my comments! I added some comments. Sorry for the back and forth again.
I hope you will enjoy your last day of the internship. :)

View in context: https://bugs.webkit.org/attachment.cgi?id=107582&action=review

> Tools/Scripts/webkitpy/common/lru_cache.py:59
> +        node = Node(key, value, self._first)

I think if self._dict already contains the key, we have to delete the node with the key from the doubly linked list.
Without that, we are likely to have two nodes with the same key in the doubly liked list.

> Tools/Scripts/webkitpy/common/lru_cache.py:77
> +            next__last = self._last.next

Nit: Two underscores '__'.
You meant |next_last|, |next_first|?

> Tools/Scripts/webkitpy/common/lru_cache.py:88
> +        key_next = self._dict[key].next

You might use |node| here.
  key_next = node.next
  key_prev = node.prev

Or You might rewrite L88-L91 with the following two lines:
  node.next.prev = node.prev
  node.prev.next = node.next

> Tools/Scripts/webkitpy/common/lru_cache.py:99
> +        if self._first == self._last:

It seems that this should be 'self._first is self._last'.

> Tools/Scripts/webkitpy/common/lru_cache.py:102
> +            del self._dict[key]

It seems that if the key is not found here, a lru_cache will become inconsistant state (self._dict has one item, but self._first is None).

It might be better to call 'node = self._dict[key]' at the beginning of the function.
e.g.

   def __delitem__(self, key):
       node = self._dict[key]
       del self._dict[key]
       if self._first is self._last:
            self._first = None
            self._last = None
            return
       if self._first is node:
            self._first = node.prev
            self._first.next = None
            return
       if self._last is node:
            self._last = node.next
            self._last.prev = None
            return
       node.next.prev = node.prev
       node.prev.next = node.next

I've not tested this. I guess your existing unittests (or new tests) will find any bugs in this code :)
Comment 10 Ai Makabi 2011-09-15 22:27:00 PDT
Created attachment 107604 [details]
Patch
Comment 11 Ai Makabi 2011-09-15 22:27:49 PDT
I addressed your comment.

(In reply to comment #9)
> (From update of attachment 107582 [details])
> Thank you for addressing my comments! I added some comments. Sorry for the back and forth again.
> I hope you will enjoy your last day of the internship. :)
> 
> View in context: https://bugs.webkit.org/attachment.cgi?id=107582&action=review
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:59
> > +        node = Node(key, value, self._first)
> 
> I think if self._dict already contains the key, we have to delete the node with the key from the doubly linked list.
> Without that, we are likely to have two nodes with the same key in the doubly liked list.
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:77
> > +            next__last = self._last.next
> 
> Nit: Two underscores '__'.
> You meant |next_last|, |next_first|?
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:88
> > +        key_next = self._dict[key].next
> 
> You might use |node| here.
>   key_next = node.next
>   key_prev = node.prev
> 
> Or You might rewrite L88-L91 with the following two lines:
>   node.next.prev = node.prev
>   node.prev.next = node.next
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:99
> > +        if self._first == self._last:
> 
> It seems that this should be 'self._first is self._last'.
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:102
> > +            del self._dict[key]
> 
> It seems that if the key is not found here, a lru_cache will become inconsistant state (self._dict has one item, but self._first is None).
> 
> It might be better to call 'node = self._dict[key]' at the beginning of the function.
> e.g.
> 
>    def __delitem__(self, key):
>        node = self._dict[key]
>        del self._dict[key]
>        if self._first is self._last:
>             self._first = None
>             self._last = None
>             return
>        if self._first is node:
>             self._first = node.prev
>             self._first.next = None
>             return
>        if self._last is node:
>             self._last = node.next
>             self._last.prev = None
>             return
>        node.next.prev = node.prev
>        node.prev.next = node.next
> 
> I've not tested this. I guess your existing unittests (or new tests) will find any bugs in this code :)
Comment 12 Hayato Ito 2011-09-15 23:04:23 PDT
Comment on attachment 107604 [details]
Patch

Thank you for updating the patch! I gave you one comment!

View in context: https://bugs.webkit.org/attachment.cgi?id=107604&action=review

> Tools/Scripts/webkitpy/common/lru_cache.py:60
> +        if key in self._dict:

I think L60-61 lines should move at the beginning of this function.
If self._dict has one item and L61 deletes that item, self._first becomes None, so L62 fails...:(
Comment 13 Ai Makabi 2011-09-16 00:26:54 PDT
Created attachment 107615 [details]
Patch
Comment 14 Ai Makabi 2011-09-16 00:27:50 PDT
I addressed your comment.

(In reply to comment #12)
> (From update of attachment 107604 [details])
> Thank you for updating the patch! I gave you one comment!
> 
> View in context: https://bugs.webkit.org/attachment.cgi?id=107604&action=review
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:60
> > +        if key in self._dict:
> 
> I think L60-61 lines should move at the beginning of this function.
> If self._dict has one item and L61 deletes that item, self._first becomes None, so L62 fails...:(
Comment 15 Hayato Ito 2011-09-16 00:34:04 PDT
Looks good to me. Thank you!
Comment 16 Tony Chang 2011-09-16 10:11:56 PDT
Comment on attachment 107615 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=107615&action=review

FWIW, I think this would be less confusing if you just kept a list of keys rather than the linked list.  As long as the capacity is small (maybe less than 20 entries), it'll still be plenty fast to update.  But this approach is fine too if you fix the memory leak I mentioned.

> Tools/Scripts/webkitpy/common/lru_cache.py:39
> +    Args:
> +        capacity: a cache memory capacity

Nit: This should be in the docstring for __init__.

> Tools/Scripts/webkitpy/common/lru_cache.py:57
> +            if self._capacity == 1:
> +                self._one_node(key, value)

I think you need to clear self._first here or the new node will point to the old value.  If the capacity is 1, we leak all the old objects because we hold a ref to them.
Comment 17 Hayato Ito 2011-09-20 01:10:42 PDT
Hi Tony, thank you for the review.

Since Ai-san, the author of the patch, has already left Google, I'll update the patch later, addressing your comment.

(In reply to comment #16)
> (From update of attachment 107615 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=107615&action=review
> 
> FWIW, I think this would be less confusing if you just kept a list of keys rather than the linked list.  As long as the capacity is small (maybe less than 20 entries), it'll still be plenty fast to update.  But this approach is fine too if you fix the memory leak I mentioned.
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:39
> > +    Args:
> > +        capacity: a cache memory capacity
> 
> Nit: This should be in the docstring for __init__.
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:57
> > +            if self._capacity == 1:
> > +                self._one_node(key, value)
> 
> I think you need to clear self._first here or the new node will point to the old value.  If the capacity is 1, we leak all the old objects because we hold a ref to them.
Comment 18 Hayato Ito 2011-09-21 22:34:17 PDT
Created attachment 108278 [details]
fix_leak
Comment 19 Hayato Ito 2011-09-21 22:38:05 PDT
Comment on attachment 107615 [details]
Patch

I've addressed your comments. Could you take another look?

View in context: https://bugs.webkit.org/attachment.cgi?id=107615&action=review

>>> Tools/Scripts/webkitpy/common/lru_cache.py:39
>>> +        capacity: a cache memory capacity
>> 
>> Nit: This should be in the docstring for __init__.
> 
> 

Done

>> Tools/Scripts/webkitpy/common/lru_cache.py:57
>> +                self._one_node(key, value)
> 
> I think you need to clear self._first here or the new node will point to the old value.  If the capacity is 1, we leak all the old objects because we hold a ref to them.

Nice catch. Instead of clearing self._first here, I've changed Node's __init__ method so that it doesn't receive |prev| and |next| parameters, which looks better to me.
Comment 20 Tony Chang 2011-09-22 10:11:29 PDT
Comment on attachment 108278 [details]
fix_leak

View in context: https://bugs.webkit.org/attachment.cgi?id=108278&action=review

> Tools/Scripts/webkitpy/common/lru_cache.py:32
> +class Node():
> +    def __init__(self, key, value):
> +        self.key = key
> +        self.value = value
> +        self.prev = None
> +        self.next = None

Nit: You can save a bit of memory by making prev and next class variables.  E.g.:
class Node():
  prev = None
  next = None
  def __init__(self, key, value):
    self.key = key
    self.value = value
Comment 21 Hayato Ito 2011-09-25 18:01:41 PDT
Thank you for the review. I am going to land that.

(In reply to comment #20)
> (From update of attachment 108278 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=108278&action=review
> 
> > Tools/Scripts/webkitpy/common/lru_cache.py:32
> > +class Node():
> > +    def __init__(self, key, value):
> > +        self.key = key
> > +        self.value = value
> > +        self.prev = None
> > +        self.next = None
> 
> Nit: You can save a bit of memory by making prev and next class variables.  E.g.:
> class Node():
>   prev = None
>   next = None
>   def __init__(self, key, value):
>     self.key = key
>     self.value = value

Nice tips, but in each lru_cache instance, there are at most two nodes, _first and _last, whose next (or prev) value is None. Therefore it might not be worth making these class variables.
Comment 22 Hayato Ito 2011-09-25 18:12:13 PDT
Committed r95928: <http://trac.webkit.org/changeset/95928>