Bug 105713

Summary: getAttribute does not behave correctly for mixed-case attributes on HTML elements
Product: WebKit Reporter: Boris Zbarsky <bzbarsky>
Component: DOMAssignee: Arpita Bahuguna <arpitabahuguna>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, arpitabahuguna, cmarcelo, commit-queue, darin, esprehn+autocc, kling, ojan.autocc, ojan, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 111681    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Performance test results
none
Performance test results on Mac
none
Updated perf test results on mac none

Description Boris Zbarsky 2012-12-24 01:37:55 PST
Consider this testcase:

  var e = document.createElement("div");
  e.setAttributeNS(null, "SOMETHING", "FAIL");
  e.setAttributeNS(null, "something", "PASS");
  alert(e.getAttribute("SOMETHING"));

Per current spec at http://www.w3.org/TR/domcore/#dom-element-getattribute this should alert "PASS", which is what Gecko and Presto do.  WebKit alerts "FAIL".

Note that this causes WebKit to fail some of the tests at http://w3c-test.org/webapps/DOMCore/tests/submissions/Ms2ger/attributes.html
Comment 1 Boris Zbarsky 2012-12-24 01:39:23 PST
Er, with Presto it might depend on the version you test, apparently...
Comment 2 Alexey Proskuryakov 2013-01-02 10:19:46 PST
See also: bug 90341.
Comment 3 Arpita Bahuguna 2013-02-18 21:55:07 PST
For the given testcase, if we were to use setAttribute() instead of setAttributeNS(), there is no issue.

The specification for setAttribute() clearly states that the "name" parameter of the API should be converted to ASCII lowercase (before use) whereas, the specification for setAttributeNS() has no such mandate.
[http://www.w3.org/TR/domcore/#dom-element-setattribute]

This is perhaps what causes the setAttributeNS() method to store case sensitive attributes thereby failing the given testcase.

The question is whether WebKit too should follow Gecko and handle setAttributeNS() in a manner similar to setAttribute().
Comment 4 Boris Zbarsky 2013-02-19 05:13:44 PST
setAttributeNS always preserves the case of the attribute name; that's true in both WebKit and in Gecko.  You can see this trivially by just enumerating the attributes on the given testcase.

This bug is purely about the behavior of getAttribute.  In WebKit the argument to getAttribute is compared case-insensitively to all attribute names; if that fails to find a match further processing is done.  So in WebKit the "FAIL" value is found.

But per spec, the argument to getAttribute should be ASCII-lower-cased before doing any comparisons, so the "PASS" value should be found.
Comment 5 Boris Zbarsky 2013-02-19 05:14:10 PST
> compared case-insensitively to all attribute names

I meant "compared case-sensitively".
Comment 6 Arpita Bahuguna 2013-02-19 06:22:48 PST
(In reply to comment #4)
> setAttributeNS always preserves the case of the attribute name; that's true in both WebKit and in Gecko.  You can see this trivially by just enumerating the attributes on the given testcase.
> 
> This bug is purely about the behavior of getAttribute.  In WebKit the argument to getAttribute is compared case-insensitively to all attribute names; if that fails to find a match further processing is done.  So in WebKit the "FAIL" value is found.
> 
> But per spec, the argument to getAttribute should be ASCII-lower-cased before doing any comparisons, so the "PASS" value should be found.

Thank-you for the explanation Boris. I understand the issue now. I will check on this further.
Comment 7 Arpita Bahuguna 2013-02-22 06:24:47 PST
As I see it, there are two issues with the usage of getAttribute() and getAttributeNode() APIs in WebKit:

1. As exemplified by the given testcase, getAttribute() (and getAttributeNode()) should convert its argument to lowercase before comparing against existing attributes. This is as per the specification as well.

2. Say we have an existing attribute, stored via setAttributeNS(), in uppercase. getAttribute() (and getAttributeNode()) API will however still return the existing attribute even while comparing using lowercase.
This is what causes the testcases specified in http://w3c-test.org/webapps/DOMCore/tests/submissions/Ms2ger/attributes.html for getAttribute() to fail.

I shall try and address issue 1. in this bug and open another bug for the second issue.
Comment 8 Arpita Bahuguna 2013-03-04 04:13:22 PST
Created attachment 191201 [details]
Patch
Comment 9 Andreas Kling 2013-03-04 06:05:21 PST
Comment on attachment 191201 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=191201&action=review

r=me with one adjustment:

> Source/WebCore/dom/Element.h:944
> -            if (name == attribute->localName())
> +            if ((shouldIgnoreAttributeCase ? name.lower() : name) == attribute->localName())

You can avoid creating a temporary string from .lower() by using WTF::equalPossiblyIgnoringCase:
if (equalPossiblyIgnoringCase(name, attribute->localName(), shouldIgnoreAttributeCase)
Comment 10 Boris Zbarsky 2013-03-04 06:46:07 PST
> You can avoid creating a temporary string from .lower() by using
> WTF::equalPossiblyIgnoringCase:
> if (equalPossiblyIgnoringCase(name, attribute->localName(),
> shouldIgnoreAttributeCase)

Unless I'm totally missing something, that will give the wrong behavior, and in particular would fail the testcase this bug is filed on.  The whole point is that per spec "SOMETHING" passed in to getAttribute in HTML should NOT match the attribute named "SOMETHING".
Comment 11 Andreas Kling 2013-03-04 06:52:55 PST
(In reply to comment #10)
> > You can avoid creating a temporary string from .lower() by using
> > WTF::equalPossiblyIgnoringCase:
> > if (equalPossiblyIgnoringCase(name, attribute->localName(),
> > shouldIgnoreAttributeCase)
> 
> Unless I'm totally missing something, that will give the wrong behavior, and in particular would fail the testcase this bug is filed on.  The whole point is that per spec "SOMETHING" passed in to getAttribute in HTML should NOT match the attribute named "SOMETHING".

Well, duh. That's what I get for trying to infer the purpose of the change instead of reading the bug. You are right of course. Please disregard my comment, the patch is good as-is.
Comment 12 WebKit Review Bot 2013-03-04 22:59:56 PST
Comment on attachment 191201 [details]
Patch

Clearing flags on attachment: 191201

Committed r144726: <http://trac.webkit.org/changeset/144726>
Comment 13 WebKit Review Bot 2013-03-04 23:00:00 PST
All reviewed patches have been landed.  Closing bug.
Comment 14 Arpita Bahuguna 2013-03-05 04:30:15 PST
(In reply to comment #9)
> (From update of attachment 191201 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=191201&action=review
> 
> r=me with one adjustment:
> 
> > Source/WebCore/dom/Element.h:944
> > -            if (name == attribute->localName())
> > +            if ((shouldIgnoreAttributeCase ? name.lower() : name) == attribute->localName())
> 
> You can avoid creating a temporary string from .lower() by using WTF::equalPossiblyIgnoringCase:
> if (equalPossiblyIgnoringCase(name, attribute->localName(), shouldIgnoreAttributeCase)

Thanks for the r+ Kling.
Sorry, I left for the day after uploading the patch, but thanks to Boris for pointing out the issue which would be caused if equalPossiblyIgnoringCase() were to be used.
Comment 15 Darin Adler 2013-03-05 09:44:18 PST
Surely we can do better for performance here. For calls inside the engine it seems silly to be calling lower() on strings known to be lowercase at the call site. And when there are multiple attributes it seems the call to lower() should be outside the loop.
Comment 18 Ojan Vafai 2013-03-06 19:24:29 PST
Can we revert first and try to figure out a more performant way to do this after? Darin has some good suggestions to try.

As a matter of policy, we don't leave performance regressions in the tree for more than a few hours (or maybe a day) because it gets very difficult to make sure the regression is fully addressed when there are other performance affecting changes getting committed.
Comment 19 Arpita Bahuguna 2013-03-06 21:28:24 PST
(In reply to comment #18)
> Can we revert first and try to figure out a more performant way to do this after? Darin has some good suggestions to try.
> 
> As a matter of policy, we don't leave performance regressions in the tree for more than a few hours (or maybe a day) because it gets very difficult to make sure the regression is fully addressed when there are other performance affecting changes getting committed.

Hi Ojan, sure this can be reverted. Will take Darin's suggestions and verify performance regressions before submitting the new patch.
Comment 20 WebKit Review Bot 2013-03-06 21:39:50 PST
Re-opened since this is blocked by bug 111681
Comment 21 Ojan Vafai 2013-03-07 00:18:34 PST
(In reply to comment #19)
> Hi Ojan, sure this can be reverted. Will take Darin's suggestions and verify performance regressions before submitting the new patch.

Thanks! I know it's a pain, but reverting and reworking the patch is the only way we've found so far to consistently avoid leaving performance regressions in.
Comment 22 Arpita Bahuguna 2013-03-08 01:23:21 PST
Created attachment 192176 [details]
Patch
Comment 23 Arpita Bahuguna 2013-03-08 01:38:57 PST
(In reply to comment #15)
> Surely we can do better for performance here. For calls inside the engine it seems silly to be calling lower() on strings known to be lowercase at the call site. And when there are multiple attributes it seems the call to lower() should be outside the loop.

Hi Darin, Kling,
Have uploaded another patch, hopefully, with not as glaring a performance regression as before. :)

Have moved the conversion (to lower) outside the loop. I understand what you mean by "seems silly to be calling lower() on strings known to be lowercase at the call site" but at the places where we are converting to lowercase before calling on getAttributeItemIndex() we also pass the second param as false which would guarantee that we don't make another unnecessary conversion call within getAttributeItemIndex().

Also, I could either do the conversion within getAttributeItemIndex() or else, do it before calling this method (which is being done in some but not all the cases). These are the cases where they explicitly pass the second param as false intending it not to go through the slow check (namespace sensitive check).
Perhaps we can differentiate between these bools (slow check and case insensitive check), i.e. introduce another param, or maybe always carry out the lowercase conversion before calling on getAttributeItemIndex().
Comment 24 Andreas Kling 2013-04-04 02:53:17 PDT
(In reply to comment #23)
> (In reply to comment #15)
> > Surely we can do better for performance here. For calls inside the engine it seems silly to be calling lower() on strings known to be lowercase at the call site. And when there are multiple attributes it seems the call to lower() should be outside the loop.
> 
> Hi Darin, Kling,
> Have uploaded another patch, hopefully, with not as glaring a performance regression as before. :)
> 
> Have moved the conversion (to lower) outside the loop. I understand what you mean by "seems silly to be calling lower() on strings known to be lowercase at the call site" but at the places where we are converting to lowercase before calling on getAttributeItemIndex() we also pass the second param as false which would guarantee that we don't make another unnecessary conversion call within getAttributeItemIndex().
> 
> Also, I could either do the conversion within getAttributeItemIndex() or else, do it before calling this method (which is being done in some but not all the cases). These are the cases where they explicitly pass the second param as false intending it not to go through the slow check (namespace sensitive check).
> Perhaps we can differentiate between these bools (slow check and case insensitive check), i.e. introduce another param, or maybe always carry out the lowercase conversion before calling on getAttributeItemIndex().

Could you run the Dromaeo tests Ojan mentioned with run-perf-tests before and after this patch and report results? It'd be good to have numbers here.
Comment 25 Arpita Bahuguna 2013-04-10 06:54:06 PDT
Created attachment 197261 [details]
Performance test results
Comment 26 Arpita Bahuguna 2013-04-10 07:03:37 PDT
(In reply to comment #24)
> 
> Could you run the Dromaeo tests Ojan mentioned with run-perf-tests before and after this patch and report results? It'd be good to have numbers here.

Hi Kling, apologize for the mega delay in posting the performance test results. Have attached the results for the Dromeo tests (on Qt (r147518)), for both before and after the patch.
Comment 27 Arpita Bahuguna 2013-04-12 06:57:14 PDT
Created attachment 197814 [details]
Performance test results on Mac

Performance test results on mac r148010
Comment 28 Arpita Bahuguna 2013-04-12 07:10:49 PDT
Hi Kling,

Have posted another set of Dromeo perf test results (on Mac r148010).

The first set is for a clean build, the second set of results are with the patch and the third set of results are for a build with a slightly tweaked patch.

For the third set, instead of:
   const AtomicString caseAdjustedName = shouldIgnoreAttributeCase ? name.lower() : name;
have modified it as:
   AtomicString caseAdjustedName = name;
   if (shouldIgnoreAttributeCase)
        caseAdjustedName = name.lower();
(avoiding else)

Would appreciate your thoughts on the same.
Comment 29 Arpita Bahuguna 2013-04-12 07:43:17 PDT
Comment on attachment 197814 [details]
Performance test results on Mac

Incorrect first run, hence obsoleting.
Comment 30 Andreas Kling 2013-04-12 07:47:12 PDT
> (In reply to comment #24)
> Have attached the results for the Dromeo tests (on Qt (r147518)), for both before and after the patch.

So there's a ~2% regression on the attribute-related tests? Eep. Better than 27% though :)

(In reply to comment #28)
> For the third set, instead of:
>    const AtomicString caseAdjustedName = shouldIgnoreAttributeCase ? name.lower() : name;
> have modified it as:
>    AtomicString caseAdjustedName = name;
>    if (shouldIgnoreAttributeCase)
>         caseAdjustedName = name.lower();
> (avoiding else)
> 
> Would appreciate your thoughts on the same.

Can we use a const reference here, and avoid the extra ref()/deref()?
Comment 31 Arpita Bahuguna 2013-04-16 06:23:30 PDT
Created attachment 198323 [details]
Updated perf test results on mac
Comment 32 Arpita Bahuguna 2013-04-16 07:14:23 PDT
Hi Kling,

Have posted another set of results, finally (and hopefully) the correct ones this time. :)

The first set (r148010) corresponds to Dromaeo results for a clean build.

The second set (r148010-ternary) corresponds to the following change:
    const AtomicString caseAdjustedName = shouldIgnoreAttributeCase ? name.lower() : name;

The third set (r148010-withIf()) corresponds to the following change:
    AtomicString caseAdjustedName = name;
    if (shouldIgnoreAttributeCase)
         caseAdjustedName = name.lower();
[Considering branch prediction, this should perhaps be a little faster than the previous approach. (??)]

And the last set (r148010-ternaryWithRef) is for:
    const AtomicString& caseAdjustedName = shouldIgnoreAttributeCase ? name.lower() : name;
Surprisingly the last set (with the const ref) deteriorates the performance even more!

The second set (using if() with def initialization) shows an improved performance for most of the cases save for three:
/Dromaeo/jslib-attr-jquery
/Dromaeo/jslib-attr-prototype
/Dromaeo/jslib-modify-prototype
for which there is a marked performance deterioration.

Would appreciate your opinion for these results.
Comment 33 Andreas Kling 2013-04-17 07:38:44 PDT
Comment on attachment 192176 [details]
Patch

Okay, I think this will work as-is. I'm pretty sure that the variations in your test results are noise.
We'll get more stable numbers when this gets tested on the perf bots.
So r=me.
Comment 34 WebKit Commit Bot 2013-04-17 09:13:22 PDT
Comment on attachment 192176 [details]
Patch

Clearing flags on attachment: 192176

Committed r148614: <http://trac.webkit.org/changeset/148614>
Comment 35 WebKit Commit Bot 2013-04-17 09:13:25 PDT
All reviewed patches have been landed.  Closing bug.