RESOLVED FIXED Bug 67823
[NRWT] Provides a simple LRU cache class in Python.
https://bugs.webkit.org/show_bug.cgi?id=67823
Summary [NRWT] Provides a simple LRU cache class in Python.
Hayato Ito
Reported 2011-09-08 17:42:27 PDT
In CSSWG reftests, there are a lot of cases that the same HTML file is used as a reference file in many tests. It should be avoided that calling DumpRenderTree for the same file many times. We should cache the rendering result somehow for the same HTML file. Note that we need image itself as weel as image hash so that we can 'diff' images later. At the same time, we shouldn't cache all results because it consumes a lot of memory. It'd be nice to have simple LRU cache class because there is locality of reference in such HTML files.
Attachments
Patch (8.25 KB, patch)
2011-09-14 00:01 PDT, Ai Makabi
no flags
Patch (8.88 KB, patch)
2011-09-14 06:25 PDT, Ai Makabi
no flags
Patch (5.91 KB, patch)
2011-09-15 07:27 PDT, Ai Makabi
no flags
Patch (9.27 KB, patch)
2011-09-15 18:50 PDT, Ai Makabi
no flags
Patch (9.29 KB, patch)
2011-09-15 22:27 PDT, Ai Makabi
no flags
Patch (9.29 KB, patch)
2011-09-16 00:26 PDT, Ai Makabi
no flags
fix_leak (9.38 KB, patch)
2011-09-21 22:34 PDT, Hayato Ito
tony: review+
Ai Makabi
Comment 1 2011-09-14 00:01:24 PDT
Hayato Ito
Comment 2 2011-09-14 01:33:58 PDT
Comment on attachment 107297 [details] Patch Ai-san, thank you for the patch. View in context: https://bugs.webkit.org/attachment.cgi?id=107297&action=review > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:1 > +#!/usr/bin/env python I think this file should move to webkitpy/common/ or webkitpy/common/system directory. And lru_cache.py might be better file name than lru.py > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:28 > +class Node(): It might be better that Node class should have 'value' member in addition to 'prev' and 'next'. That would make the implementation of LRU class simpler because we don't have to have a tuple (value, Node(...)). > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:36 > + L36-L37: One blank line instead of two blank lines. > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:40 > + def __init__(self, size): Could you add a blank line between Line39 and Line40? > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:44 > + self._size = size What if size <= 0? Could you assert the value of |size| here? assert size > 0 > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:49 > + self.first = key I think self.first and self.last should point to the first Node and the last Node directly. def __setitem__(self, key, value): if not self.first: node = Node(value, ,,,) self._dict[key] = node self.first = node self.last = node self.last_key = key // I am not sure, but I guess we should keep track of the last_key to delete the last Node from the self_dict. And Node class's |prev| and |next| also point to other Nodes directly, instead of having |key|. That might make the implementation of this LRU class more natural. That'd be nice if you could try to rewrite this class as I suggested. > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:114 > + def pop(self, key): Do we need this |pop| method? To look up a value by a key, 'lru[key]' is good enough because this class defines __getitems__(..) method. > Tools/Scripts/webkitpy/layout_tests/reftests/lru_unittest.py:28 > +import lru We should use absolute path in import. like: from webkipy.common.xxx import lru
Ai Makabi
Comment 3 2011-09-14 06:25:54 PDT
Ai Makabi
Comment 4 2011-09-14 06:27:34 PDT
I addressed your comment.(In reply to comment #2) > (From update of attachment 107297 [details]) > Ai-san, thank you for the patch. > > View in context: https://bugs.webkit.org/attachment.cgi?id=107297&action=review > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:1 > > +#!/usr/bin/env python > > I think this file should move to webkitpy/common/ or webkitpy/common/system directory. > And lru_cache.py might be better file name than lru.py > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:28 > > +class Node(): > > It might be better that Node class should have 'value' member in addition to 'prev' and 'next'. > That would make the implementation of LRU class simpler because we don't have to have a tuple (value, Node(...)). > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:36 > > + > > L36-L37: One blank line instead of two blank lines. > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:40 > > + def __init__(self, size): > > Could you add a blank line between Line39 and Line40? > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:44 > > + self._size = size > > What if size <= 0? Could you assert the value of |size| here? > assert size > 0 > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:49 > > + self.first = key > > I think self.first and self.last should point to the first Node and the last Node directly. > > def __setitem__(self, key, value): > if not self.first: > node = Node(value, ,,,) > self._dict[key] = node > self.first = node > self.last = node > self.last_key = key // I am not sure, but I guess we should keep track of the last_key to delete the last Node from the self_dict. > > And Node class's |prev| and |next| also point to other Nodes directly, instead of having |key|. > That might make the implementation of this LRU class more natural. > > That'd be nice if you could try to rewrite this class as I suggested. > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru.py:114 > > + def pop(self, key): > > Do we need this |pop| method? > To look up a value by a key, 'lru[key]' is good enough because this class defines __getitems__(..) method. > > > Tools/Scripts/webkitpy/layout_tests/reftests/lru_unittest.py:28 > > +import lru > > We should use absolute path in import. > like: > from webkipy.common.xxx import lru
Hayato Ito
Comment 5 2011-09-14 22:04:14 PDT
Comment on attachment 107322 [details] Patch Thank you! The patch looks good! I think there are still some corner cases we must handle carefully. View in context: https://bugs.webkit.org/attachment.cgi?id=107322&action=review > Tools/Scripts/webkitpy/common/lru_cache.py:39 > + size: a cache memory size 'capacity' might be a better name than 'size' in this context. > Tools/Scripts/webkitpy/common/lru_cache.py:42 > + assert size > 0, "You should set the LRUCache size to not less than zero." You might want to include the actual value of |capacity| in the assertion message. like: assert capacity > 0, "capacity (%s) must be greater than zero." % capacity > Tools/Scripts/webkitpy/common/lru_cache.py:43 > + self.first = None if you don't want to expose |first| and |last|, that should be named like |_first| and |_last|. > Tools/Scripts/webkitpy/common/lru_cache.py:49 > + if self.first == None: I think if not self.first: is more pythonic than if self.first == None: At least it should be 'self.first is None' instead of 'self.first == None'. > Tools/Scripts/webkitpy/common/lru_cache.py:65 > + node = Node(key, value, self.first) When _one_node is called, self.first is always None, isn't it? Therefore this should be: node = Node(key, value, None) > Tools/Scripts/webkitpy/common/lru_cache.py:71 > + if self.first.key == key: If self.first is None, this will fail. > Tools/Scripts/webkitpy/common/lru_cache.py:99 > + self.first = node.prev If LRUCache contains one item, (k, v) and del[k] is called, it seems self.first will become 'None' here. So the following self.first.next will fail. Could you add a testcase for this? And if self.first and self.last point to the same node, it seems we also have to set self.last = None here.
Ai Makabi
Comment 6 2011-09-15 07:27:23 PDT
Ai Makabi
Comment 7 2011-09-15 07:29:49 PDT
I addressed your comment. (In reply to comment #5) > (From update of attachment 107322 [details]) > Thank you! The patch looks good! > I think there are still some corner cases we must handle carefully. > > View in context: https://bugs.webkit.org/attachment.cgi?id=107322&action=review > > > Tools/Scripts/webkitpy/common/lru_cache.py:39 > > + size: a cache memory size > > 'capacity' might be a better name than 'size' in this context. > > > Tools/Scripts/webkitpy/common/lru_cache.py:42 > > + assert size > 0, "You should set the LRUCache size to not less than zero." > > You might want to include the actual value of |capacity| in the assertion message. > like: > assert capacity > 0, "capacity (%s) must be greater than zero." % capacity > > > Tools/Scripts/webkitpy/common/lru_cache.py:43 > > + self.first = None > > if you don't want to expose |first| and |last|, that should be named like |_first| and |_last|. > > > Tools/Scripts/webkitpy/common/lru_cache.py:49 > > + if self.first == None: > > I think > if not self.first: > is more pythonic than > if self.first == None: > > At least it should be 'self.first is None' instead of 'self.first == None'. > > > Tools/Scripts/webkitpy/common/lru_cache.py:65 > > + node = Node(key, value, self.first) > > When _one_node is called, self.first is always None, isn't it? > Therefore this should be: > node = Node(key, value, None) > > > Tools/Scripts/webkitpy/common/lru_cache.py:71 > > + if self.first.key == key: > > If self.first is None, this will fail. > > > Tools/Scripts/webkitpy/common/lru_cache.py:99 > > + self.first = node.prev > > If LRUCache contains one item, (k, v) and del[k] is called, it seems self.first will become 'None' here. So the following self.first.next will fail. Could you add a testcase for this? > > And if self.first and self.last point to the same node, it seems we also have to set self.last = None here.
Ai Makabi
Comment 8 2011-09-15 18:50:14 PDT
Hayato Ito
Comment 9 2011-09-15 21:34:52 PDT
Comment on attachment 107582 [details] Patch Thank you for addressing my comments! I added some comments. Sorry for the back and forth again. I hope you will enjoy your last day of the internship. :) View in context: https://bugs.webkit.org/attachment.cgi?id=107582&action=review > Tools/Scripts/webkitpy/common/lru_cache.py:59 > + node = Node(key, value, self._first) I think if self._dict already contains the key, we have to delete the node with the key from the doubly linked list. Without that, we are likely to have two nodes with the same key in the doubly liked list. > Tools/Scripts/webkitpy/common/lru_cache.py:77 > + next__last = self._last.next Nit: Two underscores '__'. You meant |next_last|, |next_first|? > Tools/Scripts/webkitpy/common/lru_cache.py:88 > + key_next = self._dict[key].next You might use |node| here. key_next = node.next key_prev = node.prev Or You might rewrite L88-L91 with the following two lines: node.next.prev = node.prev node.prev.next = node.next > Tools/Scripts/webkitpy/common/lru_cache.py:99 > + if self._first == self._last: It seems that this should be 'self._first is self._last'. > Tools/Scripts/webkitpy/common/lru_cache.py:102 > + del self._dict[key] It seems that if the key is not found here, a lru_cache will become inconsistant state (self._dict has one item, but self._first is None). It might be better to call 'node = self._dict[key]' at the beginning of the function. e.g. def __delitem__(self, key): node = self._dict[key] del self._dict[key] if self._first is self._last: self._first = None self._last = None return if self._first is node: self._first = node.prev self._first.next = None return if self._last is node: self._last = node.next self._last.prev = None return node.next.prev = node.prev node.prev.next = node.next I've not tested this. I guess your existing unittests (or new tests) will find any bugs in this code :)
Ai Makabi
Comment 10 2011-09-15 22:27:00 PDT
Ai Makabi
Comment 11 2011-09-15 22:27:49 PDT
I addressed your comment. (In reply to comment #9) > (From update of attachment 107582 [details]) > Thank you for addressing my comments! I added some comments. Sorry for the back and forth again. > I hope you will enjoy your last day of the internship. :) > > View in context: https://bugs.webkit.org/attachment.cgi?id=107582&action=review > > > Tools/Scripts/webkitpy/common/lru_cache.py:59 > > + node = Node(key, value, self._first) > > I think if self._dict already contains the key, we have to delete the node with the key from the doubly linked list. > Without that, we are likely to have two nodes with the same key in the doubly liked list. > > > Tools/Scripts/webkitpy/common/lru_cache.py:77 > > + next__last = self._last.next > > Nit: Two underscores '__'. > You meant |next_last|, |next_first|? > > > Tools/Scripts/webkitpy/common/lru_cache.py:88 > > + key_next = self._dict[key].next > > You might use |node| here. > key_next = node.next > key_prev = node.prev > > Or You might rewrite L88-L91 with the following two lines: > node.next.prev = node.prev > node.prev.next = node.next > > > Tools/Scripts/webkitpy/common/lru_cache.py:99 > > + if self._first == self._last: > > It seems that this should be 'self._first is self._last'. > > > Tools/Scripts/webkitpy/common/lru_cache.py:102 > > + del self._dict[key] > > It seems that if the key is not found here, a lru_cache will become inconsistant state (self._dict has one item, but self._first is None). > > It might be better to call 'node = self._dict[key]' at the beginning of the function. > e.g. > > def __delitem__(self, key): > node = self._dict[key] > del self._dict[key] > if self._first is self._last: > self._first = None > self._last = None > return > if self._first is node: > self._first = node.prev > self._first.next = None > return > if self._last is node: > self._last = node.next > self._last.prev = None > return > node.next.prev = node.prev > node.prev.next = node.next > > I've not tested this. I guess your existing unittests (or new tests) will find any bugs in this code :)
Hayato Ito
Comment 12 2011-09-15 23:04:23 PDT
Comment on attachment 107604 [details] Patch Thank you for updating the patch! I gave you one comment! View in context: https://bugs.webkit.org/attachment.cgi?id=107604&action=review > Tools/Scripts/webkitpy/common/lru_cache.py:60 > + if key in self._dict: I think L60-61 lines should move at the beginning of this function. If self._dict has one item and L61 deletes that item, self._first becomes None, so L62 fails...:(
Ai Makabi
Comment 13 2011-09-16 00:26:54 PDT
Ai Makabi
Comment 14 2011-09-16 00:27:50 PDT
I addressed your comment. (In reply to comment #12) > (From update of attachment 107604 [details]) > Thank you for updating the patch! I gave you one comment! > > View in context: https://bugs.webkit.org/attachment.cgi?id=107604&action=review > > > Tools/Scripts/webkitpy/common/lru_cache.py:60 > > + if key in self._dict: > > I think L60-61 lines should move at the beginning of this function. > If self._dict has one item and L61 deletes that item, self._first becomes None, so L62 fails...:(
Hayato Ito
Comment 15 2011-09-16 00:34:04 PDT
Looks good to me. Thank you!
Tony Chang
Comment 16 2011-09-16 10:11:56 PDT
Comment on attachment 107615 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=107615&action=review FWIW, I think this would be less confusing if you just kept a list of keys rather than the linked list. As long as the capacity is small (maybe less than 20 entries), it'll still be plenty fast to update. But this approach is fine too if you fix the memory leak I mentioned. > Tools/Scripts/webkitpy/common/lru_cache.py:39 > + Args: > + capacity: a cache memory capacity Nit: This should be in the docstring for __init__. > Tools/Scripts/webkitpy/common/lru_cache.py:57 > + if self._capacity == 1: > + self._one_node(key, value) I think you need to clear self._first here or the new node will point to the old value. If the capacity is 1, we leak all the old objects because we hold a ref to them.
Hayato Ito
Comment 17 2011-09-20 01:10:42 PDT
Hi Tony, thank you for the review. Since Ai-san, the author of the patch, has already left Google, I'll update the patch later, addressing your comment. (In reply to comment #16) > (From update of attachment 107615 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=107615&action=review > > FWIW, I think this would be less confusing if you just kept a list of keys rather than the linked list. As long as the capacity is small (maybe less than 20 entries), it'll still be plenty fast to update. But this approach is fine too if you fix the memory leak I mentioned. > > > Tools/Scripts/webkitpy/common/lru_cache.py:39 > > + Args: > > + capacity: a cache memory capacity > > Nit: This should be in the docstring for __init__. > > > Tools/Scripts/webkitpy/common/lru_cache.py:57 > > + if self._capacity == 1: > > + self._one_node(key, value) > > I think you need to clear self._first here or the new node will point to the old value. If the capacity is 1, we leak all the old objects because we hold a ref to them.
Hayato Ito
Comment 18 2011-09-21 22:34:17 PDT
Created attachment 108278 [details] fix_leak
Hayato Ito
Comment 19 2011-09-21 22:38:05 PDT
Comment on attachment 107615 [details] Patch I've addressed your comments. Could you take another look? View in context: https://bugs.webkit.org/attachment.cgi?id=107615&action=review >>> Tools/Scripts/webkitpy/common/lru_cache.py:39 >>> + capacity: a cache memory capacity >> >> Nit: This should be in the docstring for __init__. > > Done >> Tools/Scripts/webkitpy/common/lru_cache.py:57 >> + self._one_node(key, value) > > I think you need to clear self._first here or the new node will point to the old value. If the capacity is 1, we leak all the old objects because we hold a ref to them. Nice catch. Instead of clearing self._first here, I've changed Node's __init__ method so that it doesn't receive |prev| and |next| parameters, which looks better to me.
Tony Chang
Comment 20 2011-09-22 10:11:29 PDT
Comment on attachment 108278 [details] fix_leak View in context: https://bugs.webkit.org/attachment.cgi?id=108278&action=review > Tools/Scripts/webkitpy/common/lru_cache.py:32 > +class Node(): > + def __init__(self, key, value): > + self.key = key > + self.value = value > + self.prev = None > + self.next = None Nit: You can save a bit of memory by making prev and next class variables. E.g.: class Node(): prev = None next = None def __init__(self, key, value): self.key = key self.value = value
Hayato Ito
Comment 21 2011-09-25 18:01:41 PDT
Thank you for the review. I am going to land that. (In reply to comment #20) > (From update of attachment 108278 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=108278&action=review > > > Tools/Scripts/webkitpy/common/lru_cache.py:32 > > +class Node(): > > + def __init__(self, key, value): > > + self.key = key > > + self.value = value > > + self.prev = None > > + self.next = None > > Nit: You can save a bit of memory by making prev and next class variables. E.g.: > class Node(): > prev = None > next = None > def __init__(self, key, value): > self.key = key > self.value = value Nice tips, but in each lru_cache instance, there are at most two nodes, _first and _last, whose next (or prev) value is None. Therefore it might not be worth making these class variables.
Hayato Ito
Comment 22 2011-09-25 18:12:13 PDT
Note You need to log in before you can comment on or make changes to this bug.