Bug 144310 - NodeList has issues with Symbol and empty string
Summary: NodeList has issues with Symbol and empty string
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Joseph Pecoraro
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2015-04-27 20:11 PDT by Joseph Pecoraro
Modified: 2015-04-29 18:14 PDT (History)
7 users (show)

See Also:


Attachments
[PATCH] Proposed Fix (29.12 KB, patch)
2015-04-28 17:53 PDT, Joseph Pecoraro
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Pecoraro 2015-04-27 20:11:47 PDT
* SUMMARY
NodeList has issues with Symbol.

* STEPS TO REPRODUCE
1. Inspect <http://www.webkit.org>
2. js> document.querySelectorAll("nothing")[Symbol()]
  => expected undefined, actually got <p>

* NOTES
- This affects Array.from(NodeList), because apparently NodeList[Symbol.iterator] is returning a Node, instead of undefined.
Comment 1 Radar WebKit Bug Importer 2015-04-27 20:13:07 PDT
<rdar://problem/20721814>
Comment 2 Ryosuke Niwa 2015-04-27 22:43:29 PDT
I suspect what's happening is that propertyNameToAtomicString in JSDOMBinding.h is converting Symbol() to some valid string:

inline AtomicString propertyNameToAtomicString(JSC::PropertyName propertyName)
{
    return AtomicString(propertyName.publicName());
}
Comment 4 Joseph Pecoraro 2015-04-28 12:26:03 PDT
Yes, the issue here is:

   54  	bool JSNodeList::getOwnPropertySlotDelegate(ExecState* exec, PropertyName propertyName, PropertySlot& slot)
   55  	{
-> 56  	    if (Node* item = impl().namedItem(propertyNameToAtomicString(propertyName))) {
   57  	        slot.setValue(this, ReadOnly | DontDelete | DontEnum, toJS(exec, globalObject(), item));
   58  	        return true;
   59  	    }

The propertyNameToAtomicString converts the Symbol PropertyName to AtomicString(nullptr), which later matches a "namedItem" comparing to an element's non-existent idAttribute (line 72):

   68  	Node* StaticElementList::namedItem(const AtomicString& elementId) const
   69  	{
   70  	    for (unsigned i = 0, length = m_elements.size(); i < length; ++i) {
   71  	        Element& element = const_cast<Element&>(m_elements[i].get());
   72  	        if (element.getIdAttribute() == elementId)
   73  	            return &element;
   74  	    }
   75  	    return nullptr;
   76  	}
Comment 5 Ryosuke Niwa 2015-04-28 12:29:07 PDT
We should look at other places where propertyNameToAtomicString is called. I bet we're doing something wrong.
Comment 6 Yusuke Suzuki 2015-04-28 12:32:13 PDT
Seeing this, it seems that using `propertyName.uid()` solves this problem.
`propertyName.uid()` returns symbolized StringImpl* if it's symbol.
Comment 7 Ryosuke Niwa 2015-04-28 12:33:24 PDT
(In reply to comment #6)
> Seeing this, it seems that using `propertyName.uid()` solves this problem.
> `propertyName.uid()` returns symbolized StringImpl* if it's symbol.

Is that always different from any other string?
Comment 8 Joseph Pecoraro 2015-04-28 12:39:23 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > Seeing this, it seems that using `propertyName.uid()` solves this problem.
> > `propertyName.uid()` returns symbolized StringImpl* if it's symbol.
> 
> Is that always different from any other string?

Yeah, I started with that approach and it should work, but we may want to do even less work and bail in the "named" item cases where a Symbol would never be a name.

I'll audit all propertyNameToString / propertyNameToAtomicString cases.
Comment 9 Yusuke Suzuki 2015-04-28 12:42:18 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > Seeing this, it seems that using `propertyName.uid()` solves this problem.
> > > `propertyName.uid()` returns symbolized StringImpl* if it's symbol.
> > 
> > Is that always different from any other string?
> 
> Yeah, I started with that approach and it should work, but we may want to do
> even less work and bail in the "named" item cases where a Symbol would never
> be a name.
> 
> I'll audit all propertyNameToString / propertyNameToAtomicString cases.

Yay! You can check the identity of the given symbolized StringImpl* by comparing it in pointer values.

But now, we need to take care to use this AtomicStringImpl* because it holds a different hash value from the other normal StringImpl* which content is the same to this.
For example, a given Symbolized StringImpl* "Object" doesn't have a same hash value with the non symbolized StringImpl* "Object"
I think it easily causes a serious error. I've opened the issue to change the situation. https://bugs.webkit.org/show_bug.cgi?id=144347
Comment 10 Joseph Pecoraro 2015-04-28 15:00:18 PDT
Same issue exists for this test:

    <body>
    <div id=""></div>
    <script>
    alert(document.getElementsByTagName("div")[""]); // undefined.
    alert(document.querySelectorAll("div")[""]); // <div>. (expected undefined)
    </script>

NOTE: Firefox and Chrome have undefined for both.

I think I'll address both at the same time considering how related they are.
Comment 11 Joseph Pecoraro 2015-04-28 17:53:29 PDT
Created attachment 251904 [details]
[PATCH] Proposed Fix

Turns out we had a number of custom cases in the bindings generation. I searched around and tried to handle all of them.

Here we match Chrome's behavior for Symbol properties on special named objects, like sessionStorage/localStorage.

  // Treat as defining a private property on the storage.
  localStorage[symbol] = value;
  // Treat as setItem(name, value)
  localStorage[name] = value;

  // Treat as defining a private property on the map.
  element.dataset[symbol] = value;
  // Treat as adding a new "data-<name>" attribute.
  element.dataset[name] = value;

Likewise handle deleteProperty appropriately in such cases.
Comment 12 Darin Adler 2015-04-29 00:15:10 PDT
Comment on attachment 251904 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=251904&action=review

Is all this extra isSymbol checking OK from a performance point of view?

All this additional empty string checking needs some test coverage.

> Source/WebCore/Modules/mediastream/RTCStatsResponse.cpp:45
> +    if (name.isEmpty())
> +        return nullptr;

I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.

> Source/WebCore/Modules/mediastream/RTCStatsResponse.cpp:48
>      if (m_idmap.find(name) != m_idmap.end())
>          return m_result[m_idmap.get(name)];
>      return nullptr;

This is comically incorrect code. We should never do a find followed by a get! It should be:

    auto iterator = m_idmap.find(name);
    if (iterator == m_idmap.end())
        return nullptr;
    return m_result[*iterator];

> Source/WebCore/Modules/mediastream/RTCStatsResponse.cpp:53
> +    return !name.isEmpty() && m_idmap.contains(name);

I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return false. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.

> Source/WebCore/dom/ChildNodeList.cpp:87
> +    if (name.isEmpty())
> +        return nullptr;

I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.

> Source/WebCore/dom/DOMNamedFlowCollection.cpp:59
> +    if (name.isEmpty())
> +        return nullptr;

I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.

> Source/WebCore/dom/LiveNodeList.cpp:57
> +    if (elementId.isEmpty())
> +        return nullptr;

I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.

> Source/WebCore/dom/StaticNodeList.cpp:49
> +    if (elementId.isEmpty())
> +        return nullptr;

I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.

> Source/WebCore/dom/StaticNodeList.cpp:73
> +    if (elementId.isEmpty())
> +        return nullptr;

I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.

> Source/WebCore/html/HTMLFormControlsCollection.cpp:132
> +    if (name.isEmpty())
> +        return nullptr;

I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.
Comment 13 Joseph Pecoraro 2015-04-29 12:14:07 PDT
(In reply to comment #12)
> Comment on attachment 251904 [details]
> [PATCH] Proposed Fix
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=251904&action=review
> 
> Is all this extra isSymbol checking OK from a performance point of view?
> 
> All this additional empty string checking needs some test coverage.

Yeah, I'll just remove the empty string checks.

I added them but didn't go back and test them for each of the possible collection types. Each collection will need to be investigated/tested individually.
Comment 14 Joseph Pecoraro 2015-04-29 16:08:38 PDT
Comment on attachment 251904 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=251904&action=review

>> Source/WebCore/dom/StaticNodeList.cpp:49
>> +        return nullptr;
> 
> I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.

These empty checks in StaticNodeList and StaticElementList were important. Covered by this test:

    <body>
    <div id=""></div>
    <script>
    alert(document.getElementsByTagName("div")[""]); // undefined.
    alert(document.querySelectorAll("div")[""]); // <div>. (expected undefined)
    </script>

The named lookup compares this potentially empty string with a potentially empty id attribute:

  if (is<Element>(node) && downcast<Element>(node).getIdAttribute() == elementId)

I'm not entirely sure how LiveNodeList gets away with it. Its check is slightly different and includes a FIXME.

  // FIXME: This should probably be using getIdAttribute instead of idForStyleResolution.
  if (element.hasID() && element.idForStyleResolution() == elementId)

Maybe an empty string was avoided getting into idForStyleResolution.

Do you think there would be a better change then the isEmpty check here?
Comment 15 Joseph Pecoraro 2015-04-29 16:19:13 PDT
> Do you think there would be a better change then the isEmpty check here?

HTMLCollection had the isEmpty check which was the pattern I was following with these collections.
Comment 16 Joseph Pecoraro 2015-04-29 16:29:16 PDT
<http://trac.webkit.org/changeset/183589>
Comment 17 Joseph Pecoraro 2015-04-29 18:14:25 PDT
Comment on attachment 251904 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=251904&action=review

>>> Source/WebCore/dom/StaticNodeList.cpp:49
>>> +        return nullptr;
>> 
>> I don’t understand why this is needed. I believe that if this function was passed the empty string it would already return nullptr. If this is an optimization I’d like to understand why it’s needed. If this is here so that we can handle the null string, which the default hash function can’t handle, then maybe we should do the more efficient isNull check instead of the more expansive isEmpty one.
> 
> These empty checks in StaticNodeList and StaticElementList were important. Covered by this test:
> 
>     <body>
>     <div id=""></div>
>     <script>
>     alert(document.getElementsByTagName("div")[""]); // undefined.
>     alert(document.querySelectorAll("div")[""]); // <div>. (expected undefined)
>     </script>
> 
> The named lookup compares this potentially empty string with a potentially empty id attribute:
> 
>   if (is<Element>(node) && downcast<Element>(node).getIdAttribute() == elementId)
> 
> I'm not entirely sure how LiveNodeList gets away with it. Its check is slightly different and includes a FIXME.
> 
>   // FIXME: This should probably be using getIdAttribute instead of idForStyleResolution.
>   if (element.hasID() && element.idForStyleResolution() == elementId)
> 
> Maybe an empty string was avoided getting into idForStyleResolution.
> 
> Do you think there would be a better change then the isEmpty check here?

Darin walked over and talked with me on this. For the LiveNodeList case, the same issue exhibits for a node in a live node list that is not in the document.

Test:

    <script>
    var container = document.createElement("div");
    var div = document.createElement("div");
    div.id = "";
    container.appendChild(div);
    var list = container.getElementsByTagName("div");
    list[""]; // expected undefined, was <div>
    </script>

I'll address this in a new bug.