Summary: | [Microdata] Implement cache mechanism for HTMLPropertiesCollection. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Arko Saha <arko> | ||||||||||
Component: | DOM | Assignee: | Nobody <webkit-unassigned> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | abarth, rniwa, webkit.review.bot | ||||||||||
Priority: | P2 | ||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Bug Depends on: | |||||||||||||
Bug Blocks: | 68609 | ||||||||||||
Attachments: |
|
Description
Arko Saha
2012-03-06 22:13:39 PST
Created attachment 130817 [details]
Patch
I'm going to defer to rniwa in reviewing this patch. As I discussee with arko on IRC, we should push back on the spec so that property item lists are in the document order instead of being sorted. Wait... the spec says it's in the tree order :( http://dev.w3.org/html5/md/#htmlpropertiescollection collection(index) Returns the element with index index from the collection. The items are sorted in tree order. We shouldn't need to do any of these sorting. DOM is in the tree order by default! (In reply to comment #4) > Wait... the spec says it's in the tree order :( > > http://dev.w3.org/html5/md/#htmlpropertiescollection > collection(index) > Returns the element with index index from the collection. The items are sorted in tree order. > > We shouldn't need to do any of these sorting. DOM is in the tree order by default! Let me try to explain the situation with below example: <div id="id1"> <p itemprop="foo" id="a"></p> </div> <section itemscope itemref="id1 id2"> <p itemprop="foo" id="b"></p> </section> <div id="id2"> <p itemprop="foo" id="c"></p> </div> Here <section> has itemscope attribute, its an microdata item. if we do item = document.getItems()[0], it will return the <section> element. first item in the document. item.properties.namedItem(foo) must return a NodeList object containing three <p> elements with id- a, b, c respectively. it should be in the tree order. With itemref attribute we can specify the additional properties of an item. Those properties will not be children of the item as they are added through itmref attribute. Now to find the properties of an item we do the below steps as per spec : 1. Find all children of the item with itemprop attribute. 2. Find the properties which are added through the itemref attribute. In the example: find element with id 'id1', 'id2'. Then find the properties of id1, id2 3. Sort the result in the tree order. Please note that the traversal is started from base element i.e <section> in this case and not from the document root so the node list we get will not be in tree order(without step 3). It will be b, a, c. So without sorting how to get the properties in tree order (a, b, c)?? Thanks for the clarification. (In reply to comment #5) > With itemref attribute we can specify the additional properties of an item. Those properties will not be children of the item as they are added through itmref attribute. > > Now to find the properties of an item we do the below steps as per spec : > 1. Find all children of the item with itemprop attribute. If we're currently in one of these children, then we can simply look for the next sibling. > 2. Find the properties which are added through the itemref attribute. In the example: find element with id 'id1', 'id2'. Then find the properties of id1, id2 We can probably cache a sorted (in tree order) list of these itemref ids. We can then use it to find the next element with the said id in the tree order. I have modified the patch so that it caches only item and itemref elements. And sort happens only on this list. Chlidren of item and itemref elements are exculded from sorting as they are already in tree order. Created attachment 131853 [details]
Updated patch
Hi Ryosuke, can you please review the updated patch. Thanks. Comment on attachment 131853 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=131853&action=review r- because you probably want to address some of my comments. > Source/WebCore/html/HTMLPropertiesCollection.cpp:67 > +static Node* nextNodeOrSibling(Node* base, Node* node) Could you give this function more descriptive name? It does certainly traverse next node or sibling but it doesn't explain why it does. > Source/WebCore/html/HTMLPropertiesCollection.cpp:69 > + return (node == base) || (node->isHTMLElement() && !toHTMLElement(node)->fastHasAttribute(itemscopeAttr)) No need for parenthesis around node == base. > Source/WebCore/html/HTMLPropertiesCollection.cpp:79 > + Node* current; > + if (!previous) > + current = base; > + else > + current = nextNodeOrSibling(base, previous); I think you should use the ternary operator here. > Source/WebCore/html/HTMLPropertiesCollection.cpp:84 > + if (!current->isHTMLElement()) > + continue; > + Element* element = toHTMLElement(current); It seems odd to check if current is a HTML element then store it to Element*. > Source/WebCore/html/HTMLPropertiesCollection.cpp:98 > + if (!base()->isHTMLElement() || !toHTMLElement(base())->fastHasAttribute(itemscopeAttr)) Can base() ever return non-Element node? Maybe we want to change the type there so that we don't have to check whether it's not an element or not. > Source/WebCore/html/HTMLPropertiesCollection.cpp:103 > + m_cache.itemRefElements.append(baseElement); > + m_cache.hasItemRefElements = true; We probably want to add a bunch of helper functions in m_cache so that we don't have to update these member variables manually like this. > Source/WebCore/html/HTMLPropertiesCollection.cpp:125 > + for (unsigned i = 0; i < m_cache.itemRefElements.size(); ++i) > + for (Element* element = itemAfter(m_cache.itemRefElements[i], 0); element; element = itemAfter(m_cache.itemRefElements[i], element)) > + ++length; You need curly brackets for the outer for loop. > Source/WebCore/html/HTMLPropertiesCollection.cpp:158 > + for (unsigned i = 0; i < m_cache.itemRefElements.size(); ++i) You need curly brackets for this for loop since the statement is more than one line. > Source/WebCore/html/HTMLPropertiesCollection.cpp:159 > + if ((m_cache.current = itemAfter(m_cache.itemRefElements[i], 0))) { No extra parenthesis needed. > Source/WebCore/html/HTMLPropertiesCollection.cpp:165 > + m_cache.itemRefElementPosition = i; > + break; > + } > + m_cache.position = 0; > + if (!m_cache.current) > + return 0; It seems like this entire block is redundant and can be merged into the for loop below. > Source/WebCore/html/HTMLPropertiesCollection.cpp:168 > + Element* e = m_cache.current; Please don't use one-letter variable name like e. > Source/WebCore/html/HTMLPropertiesCollection.cpp:174 > + e = itemAfter(m_cache.itemRefElements[i], e); We need to initialize e before each inner loop here (i.e. before for (unsigned pos = currentPosition;...), right? > Source/WebCore/html/HTMLPropertiesCollection.cpp:179 > + if (e) > + currentPosition++; > + else > + break; It's probably better to bail out early as in: if (!e) break; currentPosition++; > Source/WebCore/html/HTMLPropertiesCollection.cpp:204 > + if (m_cache.propertyNames->isEmpty() || !m_cache.propertyNames->contains(propertyName)) I think you can just check !m_cache.propertyNames->contains(propertyName). If m_cache.propertyNames->isEmpty(), then it certainly doesn't contain anything. > Source/WebCore/html/HTMLPropertiesCollection.cpp:233 > + invalidateCacheIfNeeded(); > + updateNameCache(); It seems like updateNameCache can call invalidateCacheIfNeeded. (In reply to comment #10) > It seems like this entire block is redundant and can be merged into the for loop below. > The algorithm of HTMLPropertiesCollection::item() method is as below: In if (!m_cache.current || m_cache.position > index) {.. } Here we reset the position to 0. set current to the 0th property in the collection. set the itemRefElementPosition to the position in itemRefElements where we find the 0th property. The first block is responsible for finding the 0th property in the itemRefElements list. Then searching required property in the collection starts in the second block. So we require this block to get started with the searching in the second block. In the second block i.e, for (unsigned i = m_cache.itemRefElementPosition..) : 1. The outer for loop picks an element from itemRefElements. 2. The inner for loop is responsible to find a property in that element i.e, itemRefElements[i]. using itemAfter() method. itemAfter() returns the 0th property if the previous element is null else returns the next property in the itemRefElements[i]. It will return null if it already processed all properties in itemRefElements[i]. 3. increment the currentPosition appropriately. 4. If the currentPosition is equal to index then return the property. else find the next property. 5. If the next property is null that means itemRefElements[i] has no more property to process. then control will come to the outer for loop and picks the next element in the itemRefElements. And executes above steps (repeat from step 1). So I think if we merge two blocks here then the algorithm would become little unclear and complex. In the following block, if (!m_cache.current || m_cache.position > index) {m_cache.position = 0; ... } we are resetting the position to 0, but if we do not set the m_cache.current to the first property in the collection, it might require some special handling for the 0th property to start the search with.. which might complicate the algo. Thanks for the clarification. (In reply to comment #11) > The first block is responsible for finding the 0th property in the itemRefElements list. Then searching required property in the collection starts in the second block. > So we require this block to get started with the searching in the second block. > > In the second block i.e, for (unsigned i = m_cache.itemRefElementPosition..) : > > 1. The outer for loop picks an element from itemRefElements. > 2. The inner for loop is responsible to find a property in that element i.e, itemRefElements[i]. using itemAfter() method. > itemAfter() returns the 0th property if the previous element is null else returns the next property in the itemRefElements[i]. It will return null if it already processed all properties in itemRefElements[i]. > 3. increment the currentPosition appropriately. > 4. If the currentPosition is equal to index then return the property. else find the next property. > 5. If the next property is null that means itemRefElements[i] has no more property to process. then control will come to the outer for loop and picks the next element in the itemRefElements. And executes above steps (repeat from step 1). > > So I think if we merge two blocks here then the algorithm would become little unclear and complex. In that case, can extract either or both parts as a helper function(s) with some descriptive name? That'll make the semantics much clearer :) Created attachment 134572 [details]
Updated patch
Incorporated review comments.
Comment on attachment 134572 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=134572&action=review Okay, the patch looks much better. Maybe we want one more iteration before we land this. > Source/WebCore/html/HTMLCollection.h:92 > +#if ENABLE(MICRODATA) > + NodeCacheMap propertyCache; > + Vector<Element*> itemRefElements; > + RefPtr<DOMStringList> propertyNames; > + unsigned itemRefElementPosition; > + bool hasItemRefElements; > +#endif On my second thought, it might be better to store this in HTMLPropertiesCollection or an inner class or HTMLPropertiesCollection because the change as is will increase the size of all HTMLCollection even though those collections would never use these variables. > Source/WebCore/html/HTMLPropertiesCollection.cpp:67 > +static Node* nextNodeToProcess(Node* base, Node* node) Maybe it's better to call this function nextNodeWithProperty? nextNodeToProcess sounds too generic. > Source/WebCore/html/HTMLPropertiesCollection.cpp:69 > + // An Microdata item may contain properties which in turn an item itself. Properties can also "which in turn" what? "which in turn are items themselves"? > Source/WebCore/html/HTMLPropertiesCollection.cpp:110 > + std::sort(itemRefElements.begin(), itemRefElements.end(), compareTreeOrder); This is very inefficient. I think it's probably better to walk the entire document and collect nodes in the document order since sorting nodes will end up doing the same work anyway. > Source/WebCore/html/HTMLPropertiesCollection.cpp:184 > + for (unsigned pos = currentPosition; pos < index; ++pos) { What's the point of pos local variable here? Can't we just increment currentPosition? Created attachment 136059 [details]
Updated patch
Incorporated review comments.
Comment on attachment 136059 [details] Updated patch Clearing flags on attachment: 136059 Committed r113862: <http://trac.webkit.org/changeset/113862> All reviewed patches have been landed. Closing bug. |