WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 67823
[NRWT] Provides a simple LRU cache class in Python.
https://bugs.webkit.org/show_bug.cgi?id=67823
Summary
[NRWT] Provides a simple LRU cache class in Python.
Hayato Ito
Reported
2011-09-08 17:42:27 PDT
In CSSWG reftests, there are a lot of cases that the same HTML file is used as a reference file in many tests. It should be avoided that calling DumpRenderTree for the same file many times. We should cache the rendering result somehow for the same HTML file. Note that we need image itself as weel as image hash so that we can 'diff' images later. At the same time, we shouldn't cache all results because it consumes a lot of memory. It'd be nice to have simple LRU cache class because there is locality of reference in such HTML files.
Attachments
Patch
(8.25 KB, patch)
2011-09-14 00:01 PDT
,
Ai Makabi
no flags
Details
Formatted Diff
Diff
Patch
(8.88 KB, patch)
2011-09-14 06:25 PDT
,
Ai Makabi
no flags
Details
Formatted Diff
Diff
Patch
(5.91 KB, patch)
2011-09-15 07:27 PDT
,
Ai Makabi
no flags
Details
Formatted Diff
Diff
Patch
(9.27 KB, patch)
2011-09-15 18:50 PDT
,
Ai Makabi
no flags
Details
Formatted Diff
Diff
Patch
(9.29 KB, patch)
2011-09-15 22:27 PDT
,
Ai Makabi
no flags
Details
Formatted Diff
Diff
Patch
(9.29 KB, patch)
2011-09-16 00:26 PDT
,
Ai Makabi
no flags
Details
Formatted Diff
Diff
fix_leak
(9.38 KB, patch)
2011-09-21 22:34 PDT
,
Hayato Ito
tony
: review+
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Ai Makabi
Comment 1
2011-09-14 00:01:24 PDT
Created
attachment 107297
[details]
Patch
Hayato Ito
Comment 2
2011-09-14 01:33:58 PDT
Comment on
attachment 107297
[details]
Patch Ai-san, thank you for the patch. View in context:
https://bugs.webkit.org/attachment.cgi?id=107297&action=review
> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:1 > +#!/usr/bin/env python
I think this file should move to webkitpy/common/ or webkitpy/common/system directory. And lru_cache.py might be better file name than lru.py
> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:28 > +class Node():
It might be better that Node class should have 'value' member in addition to 'prev' and 'next'. That would make the implementation of LRU class simpler because we don't have to have a tuple (value, Node(...)).
> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:36 > +
L36-L37: One blank line instead of two blank lines.
> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:40 > + def __init__(self, size):
Could you add a blank line between Line39 and Line40?
> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:44 > + self._size = size
What if size <= 0? Could you assert the value of |size| here? assert size > 0
> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:49 > + self.first = key
I think self.first and self.last should point to the first Node and the last Node directly. def __setitem__(self, key, value): if not self.first: node = Node(value, ,,,) self._dict[key] = node self.first = node self.last = node self.last_key = key // I am not sure, but I guess we should keep track of the last_key to delete the last Node from the self_dict. And Node class's |prev| and |next| also point to other Nodes directly, instead of having |key|. That might make the implementation of this LRU class more natural. That'd be nice if you could try to rewrite this class as I suggested.
> Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:114 > + def pop(self, key):
Do we need this |pop| method? To look up a value by a key, 'lru[key]' is good enough because this class defines __getitems__(..) method.
> Tools/Scripts/webkitpy/layout_tests/reftests/lru_unittest.py:28 > +import lru
We should use absolute path in import. like: from webkipy.common.xxx import lru
Ai Makabi
Comment 3
2011-09-14 06:25:54 PDT
Created
attachment 107322
[details]
Patch
Ai Makabi
Comment 4
2011-09-14 06:27:34 PDT
I addressed your comment.(In reply to
comment #2
)
> (From update of
attachment 107297
[details]
) > Ai-san, thank you for the patch. > > View in context:
https://bugs.webkit.org/attachment.cgi?id=107297&action=review
> > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:1 > > +#!/usr/bin/env python > > I think this file should move to webkitpy/common/ or webkitpy/common/system directory. > And lru_cache.py might be better file name than lru.py > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:28 > > +class Node(): > > It might be better that Node class should have 'value' member in addition to 'prev' and 'next'. > That would make the implementation of LRU class simpler because we don't have to have a tuple (value, Node(...)). > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:36 > > + > > L36-L37: One blank line instead of two blank lines. > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:40 > > + def __init__(self, size): > > Could you add a blank line between Line39 and Line40? > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:44 > > + self._size = size > > What if size <= 0? Could you assert the value of |size| here? > assert size > 0 > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:49 > > + self.first = key > > I think self.first and self.last should point to the first Node and the last Node directly. > > def __setitem__(self, key, value): > if not self.first: > node = Node(value, ,,,) > self._dict[key] = node > self.first = node > self.last = node > self.last_key = key // I am not sure, but I guess we should keep track of the last_key to delete the last Node from the self_dict. > > And Node class's |prev| and |next| also point to other Nodes directly, instead of having |key|. > That might make the implementation of this LRU class more natural. > > That'd be nice if you could try to rewrite this class as I suggested. > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:114 > > + def pop(self, key): > > Do we need this |pop| method? > To look up a value by a key, 'lru[key]' is good enough because this class defines __getitems__(..) method. > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru_unittest.py:28 > > +import lru > > We should use absolute path in import. > like: > from webkipy.common.xxx import lru
Hayato Ito
Comment 5
2011-09-14 22:04:14 PDT
Comment on
attachment 107322
[details]
Patch Thank you! The patch looks good! I think there are still some corner cases we must handle carefully. View in context:
https://bugs.webkit.org/attachment.cgi?id=107322&action=review
> Tools/Scripts/webkitpy/common/lru_cache.py:39 > + size: a cache memory size
'capacity' might be a better name than 'size' in this context.
> Tools/Scripts/webkitpy/common/lru_cache.py:42 > + assert size > 0, "You should set the LRUCache size to not less than zero."
You might want to include the actual value of |capacity| in the assertion message. like: assert capacity > 0, "capacity (%s) must be greater than zero." % capacity
> Tools/Scripts/webkitpy/common/lru_cache.py:43 > + self.first = None
if you don't want to expose |first| and |last|, that should be named like |_first| and |_last|.
> Tools/Scripts/webkitpy/common/lru_cache.py:49 > + if self.first == None:
I think if not self.first: is more pythonic than if self.first == None: At least it should be 'self.first is None' instead of 'self.first == None'.
> Tools/Scripts/webkitpy/common/lru_cache.py:65 > + node = Node(key, value, self.first)
When _one_node is called, self.first is always None, isn't it? Therefore this should be: node = Node(key, value, None)
> Tools/Scripts/webkitpy/common/lru_cache.py:71 > + if self.first.key == key:
If self.first is None, this will fail.
> Tools/Scripts/webkitpy/common/lru_cache.py:99 > + self.first = node.prev
If LRUCache contains one item, (k, v) and del[k] is called, it seems self.first will become 'None' here. So the following self.first.next will fail. Could you add a testcase for this? And if self.first and self.last point to the same node, it seems we also have to set self.last = None here.
Ai Makabi
Comment 6
2011-09-15 07:27:23 PDT
Created
attachment 107494
[details]
Patch
Ai Makabi
Comment 7
2011-09-15 07:29:49 PDT
I addressed your comment. (In reply to
comment #5
)
> (From update of
attachment 107322
[details]
) > Thank you! The patch looks good! > I think there are still some corner cases we must handle carefully. > > View in context:
https://bugs.webkit.org/attachment.cgi?id=107322&action=review
> > > Tools/Scripts/webkitpy/common/lru_cache.py:39 > > + size: a cache memory size > > 'capacity' might be a better name than 'size' in this context. > > > Tools/Scripts/webkitpy/common/lru_cache.py:42 > > + assert size > 0, "You should set the LRUCache size to not less than zero." > > You might want to include the actual value of |capacity| in the assertion message. > like: > assert capacity > 0, "capacity (%s) must be greater than zero." % capacity > > > Tools/Scripts/webkitpy/common/lru_cache.py:43 > > + self.first = None > > if you don't want to expose |first| and |last|, that should be named like |_first| and |_last|. > > > Tools/Scripts/webkitpy/common/lru_cache.py:49 > > + if self.first == None: > > I think > if not self.first: > is more pythonic than > if self.first == None: > > At least it should be 'self.first is None' instead of 'self.first == None'. > > > Tools/Scripts/webkitpy/common/lru_cache.py:65 > > + node = Node(key, value, self.first) > > When _one_node is called, self.first is always None, isn't it? > Therefore this should be: > node = Node(key, value, None) > > > Tools/Scripts/webkitpy/common/lru_cache.py:71 > > + if self.first.key == key: > > If self.first is None, this will fail. > > > Tools/Scripts/webkitpy/common/lru_cache.py:99 > > + self.first = node.prev > > If LRUCache contains one item, (k, v) and del[k] is called, it seems self.first will become 'None' here. So the following self.first.next will fail. Could you add a testcase for this? > > And if self.first and self.last point to the same node, it seems we also have to set self.last = None here.
Ai Makabi
Comment 8
2011-09-15 18:50:14 PDT
Created
attachment 107582
[details]
Patch
Hayato Ito
Comment 9
2011-09-15 21:34:52 PDT
Comment on
attachment 107582
[details]
Patch Thank you for addressing my comments! I added some comments. Sorry for the back and forth again. I hope you will enjoy your last day of the internship. :) View in context:
https://bugs.webkit.org/attachment.cgi?id=107582&action=review
> Tools/Scripts/webkitpy/common/lru_cache.py:59 > + node = Node(key, value, self._first)
I think if self._dict already contains the key, we have to delete the node with the key from the doubly linked list. Without that, we are likely to have two nodes with the same key in the doubly liked list.
> Tools/Scripts/webkitpy/common/lru_cache.py:77 > + next__last = self._last.next
Nit: Two underscores '__'. You meant |next_last|, |next_first|?
> Tools/Scripts/webkitpy/common/lru_cache.py:88 > + key_next = self._dict[key].next
You might use |node| here. key_next = node.next key_prev = node.prev Or You might rewrite L88-L91 with the following two lines: node.next.prev = node.prev node.prev.next = node.next
> Tools/Scripts/webkitpy/common/lru_cache.py:99 > + if self._first == self._last:
It seems that this should be 'self._first is self._last'.
> Tools/Scripts/webkitpy/common/lru_cache.py:102 > + del self._dict[key]
It seems that if the key is not found here, a lru_cache will become inconsistant state (self._dict has one item, but self._first is None). It might be better to call 'node = self._dict[key]' at the beginning of the function. e.g. def __delitem__(self, key): node = self._dict[key] del self._dict[key] if self._first is self._last: self._first = None self._last = None return if self._first is node: self._first = node.prev self._first.next = None return if self._last is node: self._last = node.next self._last.prev = None return node.next.prev = node.prev node.prev.next = node.next I've not tested this. I guess your existing unittests (or new tests) will find any bugs in this code :)
Ai Makabi
Comment 10
2011-09-15 22:27:00 PDT
Created
attachment 107604
[details]
Patch
Ai Makabi
Comment 11
2011-09-15 22:27:49 PDT
I addressed your comment. (In reply to
comment #9
)
> (From update of
attachment 107582
[details]
) > Thank you for addressing my comments! I added some comments. Sorry for the back and forth again. > I hope you will enjoy your last day of the internship. :) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=107582&action=review
> > > Tools/Scripts/webkitpy/common/lru_cache.py:59 > > + node = Node(key, value, self._first) > > I think if self._dict already contains the key, we have to delete the node with the key from the doubly linked list. > Without that, we are likely to have two nodes with the same key in the doubly liked list. > > > Tools/Scripts/webkitpy/common/lru_cache.py:77 > > + next__last = self._last.next > > Nit: Two underscores '__'. > You meant |next_last|, |next_first|? > > > Tools/Scripts/webkitpy/common/lru_cache.py:88 > > + key_next = self._dict[key].next > > You might use |node| here. > key_next = node.next > key_prev = node.prev > > Or You might rewrite L88-L91 with the following two lines: > node.next.prev = node.prev > node.prev.next = node.next > > > Tools/Scripts/webkitpy/common/lru_cache.py:99 > > + if self._first == self._last: > > It seems that this should be 'self._first is self._last'. > > > Tools/Scripts/webkitpy/common/lru_cache.py:102 > > + del self._dict[key] > > It seems that if the key is not found here, a lru_cache will become inconsistant state (self._dict has one item, but self._first is None). > > It might be better to call 'node = self._dict[key]' at the beginning of the function. > e.g. > > def __delitem__(self, key): > node = self._dict[key] > del self._dict[key] > if self._first is self._last: > self._first = None > self._last = None > return > if self._first is node: > self._first = node.prev > self._first.next = None > return > if self._last is node: > self._last = node.next > self._last.prev = None > return > node.next.prev = node.prev > node.prev.next = node.next > > I've not tested this. I guess your existing unittests (or new tests) will find any bugs in this code :)
Hayato Ito
Comment 12
2011-09-15 23:04:23 PDT
Comment on
attachment 107604
[details]
Patch Thank you for updating the patch! I gave you one comment! View in context:
https://bugs.webkit.org/attachment.cgi?id=107604&action=review
> Tools/Scripts/webkitpy/common/lru_cache.py:60 > + if key in self._dict:
I think L60-61 lines should move at the beginning of this function. If self._dict has one item and L61 deletes that item, self._first becomes None, so L62 fails...:(
Ai Makabi
Comment 13
2011-09-16 00:26:54 PDT
Created
attachment 107615
[details]
Patch
Ai Makabi
Comment 14
2011-09-16 00:27:50 PDT
I addressed your comment. (In reply to
comment #12
)
> (From update of
attachment 107604
[details]
) > Thank you for updating the patch! I gave you one comment! > > View in context:
https://bugs.webkit.org/attachment.cgi?id=107604&action=review
> > > Tools/Scripts/webkitpy/common/lru_cache.py:60 > > + if key in self._dict: > > I think L60-61 lines should move at the beginning of this function. > If self._dict has one item and L61 deletes that item, self._first becomes None, so L62 fails...:(
Hayato Ito
Comment 15
2011-09-16 00:34:04 PDT
Looks good to me. Thank you!
Tony Chang
Comment 16
2011-09-16 10:11:56 PDT
Comment on
attachment 107615
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=107615&action=review
FWIW, I think this would be less confusing if you just kept a list of keys rather than the linked list. As long as the capacity is small (maybe less than 20 entries), it'll still be plenty fast to update. But this approach is fine too if you fix the memory leak I mentioned.
> Tools/Scripts/webkitpy/common/lru_cache.py:39 > + Args: > + capacity: a cache memory capacity
Nit: This should be in the docstring for __init__.
> Tools/Scripts/webkitpy/common/lru_cache.py:57 > + if self._capacity == 1: > + self._one_node(key, value)
I think you need to clear self._first here or the new node will point to the old value. If the capacity is 1, we leak all the old objects because we hold a ref to them.
Hayato Ito
Comment 17
2011-09-20 01:10:42 PDT
Hi Tony, thank you for the review. Since Ai-san, the author of the patch, has already left Google, I'll update the patch later, addressing your comment. (In reply to
comment #16
)
> (From update of
attachment 107615
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=107615&action=review
> > FWIW, I think this would be less confusing if you just kept a list of keys rather than the linked list. As long as the capacity is small (maybe less than 20 entries), it'll still be plenty fast to update. But this approach is fine too if you fix the memory leak I mentioned. > > > Tools/Scripts/webkitpy/common/lru_cache.py:39 > > + Args: > > + capacity: a cache memory capacity > > Nit: This should be in the docstring for __init__. > > > Tools/Scripts/webkitpy/common/lru_cache.py:57 > > + if self._capacity == 1: > > + self._one_node(key, value) > > I think you need to clear self._first here or the new node will point to the old value. If the capacity is 1, we leak all the old objects because we hold a ref to them.
Hayato Ito
Comment 18
2011-09-21 22:34:17 PDT
Created
attachment 108278
[details]
fix_leak
Hayato Ito
Comment 19
2011-09-21 22:38:05 PDT
Comment on
attachment 107615
[details]
Patch I've addressed your comments. Could you take another look? View in context:
https://bugs.webkit.org/attachment.cgi?id=107615&action=review
>>> Tools/Scripts/webkitpy/common/lru_cache.py:39 >>> + capacity: a cache memory capacity >> >> Nit: This should be in the docstring for __init__. > >
Done
>> Tools/Scripts/webkitpy/common/lru_cache.py:57 >> + self._one_node(key, value) > > I think you need to clear self._first here or the new node will point to the old value. If the capacity is 1, we leak all the old objects because we hold a ref to them.
Nice catch. Instead of clearing self._first here, I've changed Node's __init__ method so that it doesn't receive |prev| and |next| parameters, which looks better to me.
Tony Chang
Comment 20
2011-09-22 10:11:29 PDT
Comment on
attachment 108278
[details]
fix_leak View in context:
https://bugs.webkit.org/attachment.cgi?id=108278&action=review
> Tools/Scripts/webkitpy/common/lru_cache.py:32 > +class Node(): > + def __init__(self, key, value): > + self.key = key > + self.value = value > + self.prev = None > + self.next = None
Nit: You can save a bit of memory by making prev and next class variables. E.g.: class Node(): prev = None next = None def __init__(self, key, value): self.key = key self.value = value
Hayato Ito
Comment 21
2011-09-25 18:01:41 PDT
Thank you for the review. I am going to land that. (In reply to
comment #20
)
> (From update of
attachment 108278
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=108278&action=review
> > > Tools/Scripts/webkitpy/common/lru_cache.py:32 > > +class Node(): > > + def __init__(self, key, value): > > + self.key = key > > + self.value = value > > + self.prev = None > > + self.next = None > > Nit: You can save a bit of memory by making prev and next class variables. E.g.: > class Node(): > prev = None > next = None > def __init__(self, key, value): > self.key = key > self.value = value
Nice tips, but in each lru_cache instance, there are at most two nodes, _first and _last, whose next (or prev) value is None. Therefore it might not be worth making these class variables.
Hayato Ito
Comment 22
2011-09-25 18:12:13 PDT
Committed
r95928
: <
http://trac.webkit.org/changeset/95928
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug