Bug 80490

Summary:

[Microdata] Implement cache mechanism for HTMLPropertiesCollection.

Product:

WebKit

Reporter:

Arko Saha <arko>

Component:

DOM

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

abarth, rniwa, webkit.review.bot

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Bug Depends on:

Bug Blocks:

68609

Attachments:

Description	Flags
Patch	rniwa: review-
Updated patch	rniwa: review-
Updated patch	none
Updated patch	none

Arko Saha

Reported 2012-03-06 22:13:39 PST

Currently HTMLPropertiesCollection item, length does not use cache. Its recomputing item, length every time. Need to implement caching mechanism for HTMLPropertiesCollection.

Attachments
Patch (9.52 KB, patch) 2012-03-08 06:32 PST, Arko Saha	rniwa: review-	Details Formatted Diff Diff
Updated patch (16.06 KB, patch) 2012-03-14 08:19 PDT, Arko Saha	rniwa: review-	Details Formatted Diff Diff
Updated patch (17.71 KB, patch) 2012-03-29 07:03 PDT, Arko Saha	no flags	Details Formatted Diff Diff
Updated patch (18.59 KB, patch) 2012-04-06 13:40 PDT, Arko Saha	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Arko Saha

Comment 1 2012-03-08 06:32:53 PST

Created attachment 130817 [details] Patch

Adam Barth

Comment 2 2012-03-08 14:19:45 PST

I'm going to defer to rniwa in reviewing this patch.

Ryosuke Niwa

Comment 3 2012-03-09 08:54:12 PST

As I discussee with arko on IRC, we should push back on the spec so that property item lists are in the document order instead of being sorted.

Ryosuke Niwa

Comment 4 2012-03-09 11:30:38 PST

Wait... the spec says it's in the tree order :( http://dev.w3.org/html5/md/#htmlpropertiescollection collection(index) Returns the element with index index from the collection. The items are sorted in tree order. We shouldn't need to do any of these sorting. DOM is in the tree order by default!

Arko Saha

Comment 5 2012-03-12 00:22:50 PDT

(In reply to comment #4) > Wait... the spec says it's in the tree order :( > > http://dev.w3.org/html5/md/#htmlpropertiescollection > collection(index) > Returns the element with index index from the collection. The items are sorted in tree order. > > We shouldn't need to do any of these sorting. DOM is in the tree order by default! Let me try to explain the situation with below example: <div id="id1"> <p itemprop="foo" id="a"></p> </div> <section itemscope itemref="id1 id2"> <p itemprop="foo" id="b"></p> </section> <div id="id2"> <p itemprop="foo" id="c"></p> </div> Here <section> has itemscope attribute, its an microdata item. if we do item = document.getItems()[0], it will return the <section> element. first item in the document. item.properties.namedItem(foo) must return a NodeList object containing three <p> elements with id- a, b, c respectively. it should be in the tree order. With itemref attribute we can specify the additional properties of an item. Those properties will not be children of the item as they are added through itmref attribute. Now to find the properties of an item we do the below steps as per spec : 1. Find all children of the item with itemprop attribute. 2. Find the properties which are added through the itemref attribute. In the example: find element with id 'id1', 'id2'. Then find the properties of id1, id2 3. Sort the result in the tree order. Please note that the traversal is started from base element i.e <section> in this case and not from the document root so the node list we get will not be in tree order(without step 3). It will be b, a, c. So without sorting how to get the properties in tree order (a, b, c)??

Ryosuke Niwa

Comment 6 2012-03-12 00:42:19 PDT

Thanks for the clarification. (In reply to comment #5) > With itemref attribute we can specify the additional properties of an item. Those properties will not be children of the item as they are added through itmref attribute. > > Now to find the properties of an item we do the below steps as per spec : > 1. Find all children of the item with itemprop attribute. If we're currently in one of these children, then we can simply look for the next sibling. > 2. Find the properties which are added through the itemref attribute. In the example: find element with id 'id1', 'id2'. Then find the properties of id1, id2 We can probably cache a sorted (in tree order) list of these itemref ids. We can then use it to find the next element with the said id in the tree order.

Arko Saha

Comment 7 2012-03-14 08:17:34 PDT

I have modified the patch so that it caches only item and itemref elements. And sort happens only on this list. Chlidren of item and itemref elements are exculded from sorting as they are already in tree order.

Arko Saha

Comment 8 2012-03-14 08:19:13 PDT

Created attachment 131853 [details] Updated patch

Arko Saha

Comment 9 2012-03-20 07:07:27 PDT

Hi Ryosuke, can you please review the updated patch. Thanks.

Ryosuke Niwa

Comment 10 2012-03-21 23:31:27 PDT

Comment on attachment 131853 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=131853&action=review r- because you probably want to address some of my comments. > Source/WebCore/html/HTMLPropertiesCollection.cpp:67 > +static Node* nextNodeOrSibling(Node* base, Node* node) Could you give this function more descriptive name? It does certainly traverse next node or sibling but it doesn't explain why it does. > Source/WebCore/html/HTMLPropertiesCollection.cpp:69 > + return (node == base) || (node->isHTMLElement() && !toHTMLElement(node)->fastHasAttribute(itemscopeAttr)) No need for parenthesis around node == base. > Source/WebCore/html/HTMLPropertiesCollection.cpp:79 > + Node* current; > + if (!previous) > + current = base; > + else > + current = nextNodeOrSibling(base, previous); I think you should use the ternary operator here. > Source/WebCore/html/HTMLPropertiesCollection.cpp:84 > + if (!current->isHTMLElement()) > + continue; > + Element* element = toHTMLElement(current); It seems odd to check if current is a HTML element then store it to Element*. > Source/WebCore/html/HTMLPropertiesCollection.cpp:98 > + if (!base()->isHTMLElement() || !toHTMLElement(base())->fastHasAttribute(itemscopeAttr)) Can base() ever return non-Element node? Maybe we want to change the type there so that we don't have to check whether it's not an element or not. > Source/WebCore/html/HTMLPropertiesCollection.cpp:103 > + m_cache.itemRefElements.append(baseElement); > + m_cache.hasItemRefElements = true; We probably want to add a bunch of helper functions in m_cache so that we don't have to update these member variables manually like this. > Source/WebCore/html/HTMLPropertiesCollection.cpp:125 > + for (unsigned i = 0; i < m_cache.itemRefElements.size(); ++i) > + for (Element* element = itemAfter(m_cache.itemRefElements[i], 0); element; element = itemAfter(m_cache.itemRefElements[i], element)) > + ++length; You need curly brackets for the outer for loop. > Source/WebCore/html/HTMLPropertiesCollection.cpp:158 > + for (unsigned i = 0; i < m_cache.itemRefElements.size(); ++i) You need curly brackets for this for loop since the statement is more than one line. > Source/WebCore/html/HTMLPropertiesCollection.cpp:159 > + if ((m_cache.current = itemAfter(m_cache.itemRefElements[i], 0))) { No extra parenthesis needed. > Source/WebCore/html/HTMLPropertiesCollection.cpp:165 > + m_cache.itemRefElementPosition = i; > + break; > + } > + m_cache.position = 0; > + if (!m_cache.current) > + return 0; It seems like this entire block is redundant and can be merged into the for loop below. > Source/WebCore/html/HTMLPropertiesCollection.cpp:168 > + Element* e = m_cache.current; Please don't use one-letter variable name like e. > Source/WebCore/html/HTMLPropertiesCollection.cpp:174 > + e = itemAfter(m_cache.itemRefElements[i], e); We need to initialize e before each inner loop here (i.e. before for (unsigned pos = currentPosition;...), right? > Source/WebCore/html/HTMLPropertiesCollection.cpp:179 > + if (e) > + currentPosition++; > + else > + break; It's probably better to bail out early as in: if (!e) break; currentPosition++; > Source/WebCore/html/HTMLPropertiesCollection.cpp:204 > + if (m_cache.propertyNames->isEmpty() || !m_cache.propertyNames->contains(propertyName)) I think you can just check !m_cache.propertyNames->contains(propertyName). If m_cache.propertyNames->isEmpty(), then it certainly doesn't contain anything. > Source/WebCore/html/HTMLPropertiesCollection.cpp:233 > + invalidateCacheIfNeeded(); > + updateNameCache(); It seems like updateNameCache can call invalidateCacheIfNeeded.

Arko Saha

Comment 11 2012-03-27 06:31:34 PDT

(In reply to comment #10) > It seems like this entire block is redundant and can be merged into the for loop below. > The algorithm of HTMLPropertiesCollection::item() method is as below: In if (!m_cache.current || m_cache.position > index) {.. } Here we reset the position to 0. set current to the 0th property in the collection. set the itemRefElementPosition to the position in itemRefElements where we find the 0th property. The first block is responsible for finding the 0th property in the itemRefElements list. Then searching required property in the collection starts in the second block. So we require this block to get started with the searching in the second block. In the second block i.e, for (unsigned i = m_cache.itemRefElementPosition..) : 1. The outer for loop picks an element from itemRefElements. 2. The inner for loop is responsible to find a property in that element i.e, itemRefElements[i]. using itemAfter() method. itemAfter() returns the 0th property if the previous element is null else returns the next property in the itemRefElements[i]. It will return null if it already processed all properties in itemRefElements[i]. 3. increment the currentPosition appropriately. 4. If the currentPosition is equal to index then return the property. else find the next property. 5. If the next property is null that means itemRefElements[i] has no more property to process. then control will come to the outer for loop and picks the next element in the itemRefElements. And executes above steps (repeat from step 1). So I think if we merge two blocks here then the algorithm would become little unclear and complex. In the following block, if (!m_cache.current || m_cache.position > index) {m_cache.position = 0; ... } we are resetting the position to 0, but if we do not set the m_cache.current to the first property in the collection, it might require some special handling for the 0th property to start the search with.. which might complicate the algo.

Ryosuke Niwa

Comment 12 2012-03-27 22:59:30 PDT

Thanks for the clarification. (In reply to comment #11) > The first block is responsible for finding the 0th property in the itemRefElements list. Then searching required property in the collection starts in the second block. > So we require this block to get started with the searching in the second block. > > In the second block i.e, for (unsigned i = m_cache.itemRefElementPosition..) : > > 1. The outer for loop picks an element from itemRefElements. > 2. The inner for loop is responsible to find a property in that element i.e, itemRefElements[i]. using itemAfter() method. > itemAfter() returns the 0th property if the previous element is null else returns the next property in the itemRefElements[i]. It will return null if it already processed all properties in itemRefElements[i]. > 3. increment the currentPosition appropriately. > 4. If the currentPosition is equal to index then return the property. else find the next property. > 5. If the next property is null that means itemRefElements[i] has no more property to process. then control will come to the outer for loop and picks the next element in the itemRefElements. And executes above steps (repeat from step 1). > > So I think if we merge two blocks here then the algorithm would become little unclear and complex. In that case, can extract either or both parts as a helper function(s) with some descriptive name? That'll make the semantics much clearer :)

Arko Saha

Comment 13 2012-03-29 07:03:52 PDT

Created attachment 134572 [details] Updated patch Incorporated review comments.

Ryosuke Niwa

Comment 14 2012-03-29 15:23:33 PDT

Comment on attachment 134572 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=134572&action=review Okay, the patch looks much better. Maybe we want one more iteration before we land this. > Source/WebCore/html/HTMLCollection.h:92 > +#if ENABLE(MICRODATA) > + NodeCacheMap propertyCache; > + Vector<Element*> itemRefElements; > + RefPtr<DOMStringList> propertyNames; > + unsigned itemRefElementPosition; > + bool hasItemRefElements; > +#endif On my second thought, it might be better to store this in HTMLPropertiesCollection or an inner class or HTMLPropertiesCollection because the change as is will increase the size of all HTMLCollection even though those collections would never use these variables. > Source/WebCore/html/HTMLPropertiesCollection.cpp:67 > +static Node* nextNodeToProcess(Node* base, Node* node) Maybe it's better to call this function nextNodeWithProperty? nextNodeToProcess sounds too generic. > Source/WebCore/html/HTMLPropertiesCollection.cpp:69 > + // An Microdata item may contain properties which in turn an item itself. Properties can also "which in turn" what? "which in turn are items themselves"? > Source/WebCore/html/HTMLPropertiesCollection.cpp:110 > + std::sort(itemRefElements.begin(), itemRefElements.end(), compareTreeOrder); This is very inefficient. I think it's probably better to walk the entire document and collect nodes in the document order since sorting nodes will end up doing the same work anyway. > Source/WebCore/html/HTMLPropertiesCollection.cpp:184 > + for (unsigned pos = currentPosition; pos < index; ++pos) { What's the point of pos local variable here? Can't we just increment currentPosition?

Arko Saha

Comment 15 2012-04-06 13:40:16 PDT

Created attachment 136059 [details] Updated patch Incorporated review comments.

WebKit Review Bot

Comment 16 2012-04-11 08:20:23 PDT

Comment on attachment 136059 [details] Updated patch Clearing flags on attachment: 136059 Committed r113862: <http://trac.webkit.org/changeset/113862>

WebKit Review Bot

Comment 17 2012-04-11 08:20:28 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.