| Summary: | [webkitcorepy] Add NestedFuzzyDict | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Jonathan Bedard <jbedard> | ||||||||||||
| Component: | Tools / Tests | Assignee: | Jonathan Bedard <jbedard> | ||||||||||||
| Status: | RESOLVED FIXED | ||||||||||||||
| Severity: | Normal | CC: | aakash_jain, dewei_zhu, webkit-bug-importer | ||||||||||||
| Priority: | P2 | Keywords: | InRadar | ||||||||||||
| Version: | WebKit Nightly Build | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| See Also: | https://bugs.webkit.org/show_bug.cgi?id=225616 | ||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Jonathan Bedard
2021-06-17 16:03:10 PDT
Created attachment 431735 [details]
Patch
Created attachment 431737 [details]
Patch
Created attachment 432048 [details]
Patch
Created attachment 432425 [details]
Patch
Comment on attachment 432425 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=432425&action=review > Tools/Scripts/libraries/webkitcorepy/webkitcorepy/nested_fuzzy_dict.py:26 > +class NestedFuzzyDict(object): Do we expect to support delete in this function? > Tools/Scripts/libraries/webkitcorepy/webkitcorepy/nested_fuzzy_dict.py:33 > + self.primary_size = int(primary_size or 4) Do we expect it raise an exception when primary_size is something int() will raise an exception e.g.'invalid' ? Also, why the primary size is set to 4? Would it be considered 'over engineering' if we implement this using Trie? > Tools/Scripts/libraries/webkitcorepy/webkitcorepy/nested_fuzzy_dict.py:47 > + self.assert_valid_key(keyname) > + key_a, key_b = keyname[:self.primary_size], keyname[self.primary_size:] > + value = None > + found = False > + for key, result in self._data.get(key_a, dict()).items(): > + if key.startswith(key_b): > + if found: > + raise KeyError("Multiple values match '{}'".format(keyname)) > + found = True > + value = result Can we invoke self.getitem(keyname, None) instead? Comment on attachment 432425 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=432425&action=review >> Tools/Scripts/libraries/webkitcorepy/webkitcorepy/nested_fuzzy_dict.py:26 >> +class NestedFuzzyDict(object): > > Do we expect to support delete in this function? I don't have a need for it currently, although I could add it relatively easily. I didn't want to completely recreate the API for dict, this isn't something I expect we'll use much. >> Tools/Scripts/libraries/webkitcorepy/webkitcorepy/nested_fuzzy_dict.py:33 >> + self.primary_size = int(primary_size or 4) > > Do we expect it raise an exception when primary_size is something int() will raise an exception e.g.'invalid' ? > Also, why the primary size is set to 4? > Would it be considered 'over engineering' if we implement this using Trie? I think the problem with a Trie is that we really only want the single nested dict. The point of nesting is so that we can narrow down the set of potentially matching hashes sufficiently so we can do the startswith check on just a handful of hashes. Its very similar to the CSS 101 idea of a hash table managing collisions with a linked list >> Tools/Scripts/libraries/webkitcorepy/webkitcorepy/nested_fuzzy_dict.py:47 >> + value = result > > Can we invoke self.getitem(keyname, None) instead? We cannot, and it's this little bit that makes it fuzzy. This entire structure is optimized for hashes. The first 4 characters of a hash are mostly sufficient to avoid collisions in the webkit repository, but there are a significant minority of 4 character sequences which do point to multiple hashes. However, it's very unlikely that those 4 character sequences point to more than 2-3 hashes, which is what this function is counting on. Basically, this lets us have the fast look up of a hash table, while also allowing the user to specify partial hashes (hopefully 8-12 characters, but we do our best regardless) Comment on attachment 432425 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=432425&action=review >>> Tools/Scripts/libraries/webkitcorepy/webkitcorepy/nested_fuzzy_dict.py:47 >>> + value = result >> >> Can we invoke self.getitem(keyname, None) instead? > > We cannot, and it's this little bit that makes it fuzzy. > > This entire structure is optimized for hashes. The first 4 characters of a hash are mostly sufficient to avoid collisions in the webkit repository, but there are a significant minority of 4 character sequences which do point to multiple hashes. However, it's very unlikely that those 4 character sequences point to more than 2-3 hashes, which is what this function is counting on. Basically, this lets us have the fast look up of a hash table, while also allowing the user to specify partial hashes (hopefully 8-12 characters, but we do our best regardless) I mean, what if we do ``` def __getitem__(self, keyname): found, value = self.getitem(keyname) if found: return value raise KeyError(keyname) ``` Created attachment 432505 [details]
Patch
Comment on attachment 432505 [details]
Patch
r=me
Committed r279381 (239246@main): <https://commits.webkit.org/239246@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 432505 [details]. |