Bug 63029 - Fix legacy color attribute parsing to match HTML spec
Summary: Fix legacy color attribute parsing to match HTML spec
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Tab Atkins
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-20 17:05 PDT by Tab Atkins
Modified: 2012-08-26 16:14 PDT (History)
10 users (show)

See Also:


Attachments
Patch (13.50 KB, patch)
2011-06-20 17:17 PDT, Tab Atkins
no flags Details | Formatted Diff | Diff
Patch (16.94 KB, patch)
2011-06-20 18:36 PDT, Tab Atkins
no flags Details | Formatted Diff | Diff
Patch (15.27 KB, patch)
2011-06-20 18:52 PDT, Tab Atkins
no flags Details | Formatted Diff | Diff
Patch (14.65 KB, patch)
2011-06-21 11:42 PDT, Tab Atkins
no flags Details | Formatted Diff | Diff
Patch (15.03 KB, patch)
2011-06-23 10:56 PDT, Tab Atkins
no flags Details | Formatted Diff | Diff
Patch (16.02 KB, patch)
2011-06-23 16:41 PDT, Tab Atkins
no flags Details | Formatted Diff | Diff
Patch (18.19 KB, patch)
2011-06-24 09:16 PDT, Tab Atkins
no flags Details | Formatted Diff | Diff
Patch (18.57 KB, patch)
2011-06-24 09:21 PDT, Tab Atkins
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tab Atkins 2011-06-20 17:05:32 PDT
Fix legacy color attribute parsing to match HTML spec
Comment 1 Tab Atkins 2011-06-20 17:17:20 PDT
Created attachment 97893 [details]
Patch
Comment 2 Tab Atkins 2011-06-20 17:20:10 PDT
This behavior matches Firefox and should match IE (don't have ability to quickly test).
Comment 3 Adam Barth 2011-06-20 17:37:49 PDT
Comment on attachment 97893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=97893&action=review

> Source/WebCore/dom/StyledElement.cpp:321
> -/* color parsing that tries to match as close as possible IE 6. */
> +/* color parsing that matches HTML's "rules for parsing a legacy color value" */

C++ style comments pls

> Source/WebCore/dom/StyledElement.cpp:324
> +    // These two cases don't apply any color

Comments should be complete sentences (that end with a "." ).

> Source/WebCore/dom/StyledElement.cpp:326
> +    if (!c.length() || equalIgnoringCase(c, "transparent"))
>          return;

Can we use a more descriptive variable name than "c" ?

> Source/WebCore/dom/StyledElement.cpp:332
> +    color.stripWhiteSpace();

Does this actually modify the string or just return a new version with whitespace stripped?  I never remember.

> Source/WebCore/dom/StyledElement.cpp:355
> +            if (!isASCIIHexDigit(color[pos]))
> +                color.replace(pos, 1, "0");

For realz?  Amazing.

> Source/WebCore/dom/StyledElement.cpp:359
> +        if (color.length() % 3 == 2)
> +            color += "0";
> +        if (color.length() % 3 == 1)

else if, right?

> Source/WebCore/dom/StyledElement.cpp:373
> +            int redIndex = max(0, componentLength - 8);

8?  Can we name this constant.  It's not obvious to me where it comes from.

> Source/WebCore/dom/StyledElement.cpp:384
> +            colors[0] = toASCIIHexValue(color[redIndex]) * 16 + toASCIIHexValue(color[redIndex+1]);
> +            colors[1] = toASCIIHexValue(color[greenIndex]) * 16 + toASCIIHexValue(color[greenIndex+1]);
> +            colors[2] = toASCIIHexValue(color[blueIndex]) * 16 + toASCIIHexValue(color[blueIndex+1]);

Can you add some ASSERTs before this block to make sure all the indicies remain sane and in their proper bounds?

> Source/WebCore/dom/StyledElement.cpp:390
> +    color = String::format("#%02x%02x%02x", colors[0], colors[1], colors[2]);
> +    if (attr->decl()->setProperty(id, color, false))
> +        return;

How could this fail?  Should we ASSERT_NOT_REACHED?
Comment 4 Simon Fraser (smfr) 2011-06-20 17:59:14 PDT
Comment on attachment 97893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=97893&action=review

Do we know of any colors that IE used to parse, that will now fail?

> LayoutTests/fast/dom/script-tests/attribute-legacy-colors.js:3
> +shouldBe("document.body.bgColor='';getComputedStyle(document.body).backgroundColor;", "'rgba(0, 0, 0, 0)'");

Wedging the getComputedStyle() in there is pretty ugly. There has to be a better way to write these shouldBe()s.
Would be nice to make this series of tests data driven with a loop.
Comment 5 Simon Fraser (smfr) 2011-06-20 17:59:45 PDT
Stupid bugzilla collisions.
Comment 6 Tab Atkins 2011-06-20 18:36:49 PDT
Created attachment 97906 [details]
Patch
Comment 7 Tab Atkins 2011-06-20 18:40:05 PDT
(In reply to comment #3)
> (From update of attachment 97893 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=97893&action=review
> 
> > Source/WebCore/dom/StyledElement.cpp:321
> > -/* color parsing that tries to match as close as possible IE 6. */
> > +/* color parsing that matches HTML's "rules for parsing a legacy color value" */
> 
> C++ style comments pls
> 
> > Source/WebCore/dom/StyledElement.cpp:324
> > +    // These two cases don't apply any color
> 
> Comments should be complete sentences (that end with a "." ).

Fixed.

> > Source/WebCore/dom/StyledElement.cpp:326
> > +    if (!c.length() || equalIgnoringCase(c, "transparent"))
> >          return;
> 
> Can we use a more descriptive variable name than "c" ?

Switched to "attrValue".

> > Source/WebCore/dom/StyledElement.cpp:332
> > +    color.stripWhiteSpace();
> 
> Does this actually modify the string or just return a new version with whitespace stripped?  I never remember.

The latter, as it turns out.  Fixed.  (Also, added a test that would have caught the fact that I was doing this wrong.)

> > Source/WebCore/dom/StyledElement.cpp:355
> > +            if (!isASCIIHexDigit(color[pos]))
> > +                color.replace(pos, 1, "0");
> 
> For realz?  Amazing.

Indeed.  Still missing the even crazier replacement where everything outside the BMP gets replaced with "00", because I'm not sure how to query the codepoint of a character in a String.

> > Source/WebCore/dom/StyledElement.cpp:359
> > +        if (color.length() % 3 == 2)
> > +            color += "0";
> > +        if (color.length() % 3 == 1)
> 
> else if, right?

Fixed.

> > Source/WebCore/dom/StyledElement.cpp:373
> > +            int redIndex = max(0, componentLength - 8);
> 
> 8?  Can we name this constant.  It's not obvious to me where it comes from.

The constant was explained in the preceding comment, but I've named it and slightly reshuffled the code there to hopefully be clearer.

> > Source/WebCore/dom/StyledElement.cpp:384
> > +            colors[0] = toASCIIHexValue(color[redIndex]) * 16 + toASCIIHexValue(color[redIndex+1]);
> > +            colors[1] = toASCIIHexValue(color[greenIndex]) * 16 + toASCIIHexValue(color[greenIndex+1]);
> > +            colors[2] = toASCIIHexValue(color[blueIndex]) * 16 + toASCIIHexValue(color[blueIndex+1]);
> 
> Can you add some ASSERTs before this block to make sure all the indicies remain sane and in their proper bounds?

Done.

> > Source/WebCore/dom/StyledElement.cpp:390
> > +    color = String::format("#%02x%02x%02x", colors[0], colors[1], colors[2]);
> > +    if (attr->decl()->setProperty(id, color, false))
> > +        return;
> 
> How could this fail?  Should we ASSERT_NOT_REACHED?

This was an artifact from the old code.  You're right that it can't ever fail.  I've just removed the if() entirely.
Comment 8 Tab Atkins 2011-06-20 18:42:27 PDT
(In reply to comment #4)
> (From update of attachment 97893 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=97893&action=review
> 
> Do we know of any colors that IE used to parse, that will now fail?

Do you mean "that we used to parse", or are you actually asking about IE here?

> > LayoutTests/fast/dom/script-tests/attribute-legacy-colors.js:3
> > +shouldBe("document.body.bgColor='';getComputedStyle(document.body).backgroundColor;", "'rgba(0, 0, 0, 0)'");
> 
> Wedging the getComputedStyle() in there is pretty ugly. There has to be a better way to write these shouldBe()s.

getComputedStyle() is necessary to actually pull the used color out.  This is my first patch in a year, so I'm not familiar with how these things "should" look.  If there's a better pattern, I'd be happy to switch to it.

> Would be nice to make this series of tests data driven with a loop.

Sounds good.  I'll switch over to that.
Comment 9 Simon Fraser (smfr) 2011-06-20 18:50:37 PDT
(In reply to comment #8)
> (In reply to comment #4)
> > (From update of attachment 97893 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=97893&action=review
> > 
> > Do we know of any colors that IE used to parse, that will now fail?
> 
> Do you mean "that we used to parse", or are you actually asking about IE here?

Sorry, brain fart. Do you know of any colors that we managed to parse by copying IE that will no longer parse with the new code?

> > > LayoutTests/fast/dom/script-tests/attribute-legacy-colors.js:3
> > > +shouldBe("document.body.bgColor='';getComputedStyle(document.body).backgroundColor;", "'rgba(0, 0, 0, 0)'");
> > 
> > Wedging the getComputedStyle() in there is pretty ugly. There has to be a better way to write these shouldBe()s.
> 
> getComputedStyle() is necessary to actually pull the used color out.  This is my first patch in a year, so I'm not familiar with how these things "should" look.  If there's a better pattern, I'd be happy to switch to it.

Not sure offhand, but I'd grep for tests using shouldBe and getComputedStyle.

> > Would be nice to make this series of tests data driven with a loop.
> 
> Sounds good.  I'll switch over to that.

Cool
Comment 10 Tab Atkins 2011-06-20 18:52:21 PDT
Created attachment 97909 [details]
Patch
Comment 11 Simon Fraser (smfr) 2011-06-20 20:28:30 PDT
Comment on attachment 97909 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=97909&action=review

r- on the return after else, and you might want to do one final round.

> Source/WebCore/dom/StyledElement.cpp:332
> +    int colors[3] = { 0, 0, 0 };

This would be better called 'components'.

> Source/WebCore/dom/StyledElement.cpp:340
> +        colors[2] = foundColor.blue();

Can't we call setProperty() here directly, rather than doing a String::format() which we have to re-parse again later?

> Source/WebCore/dom/StyledElement.cpp:344
> +    } else { 

We avoid 'else' after return.

> Source/WebCore/dom/StyledElement.cpp:345
> +        // Otherwise, use the crazy legacy parsing rules.

Might be nice to put the legacy parsing code into its own method, then the code becomes self-documenting.

> Source/WebCore/dom/StyledElement.cpp:347
> +        // FIXME: Replace all non-BMP characters in color with "00".

You should file a follow-up bug for this FIXME, but is this not what the isASCIIHexDigit() code below is doing?

> Source/WebCore/dom/StyledElement.cpp:369
> +            // Otherwise, we have at least 6 digits.

You should ASSERT this

> Source/WebCore/dom/StyledElement.cpp:372
> +            int componentSearchWindowSize = min(componentLength, static_cast<size_t>(8));

You can avoid the ugly static_cast by using min<int>()

> Source/WebCore/dom/StyledElement.cpp:395
> +    color = String::format("#%02x%02x%02x", colors[0], colors[1], colors[2]);

It's a shame to do all this work to compute the color components, and then turn around and make a String out of them.
Comment 12 Alexey Proskuryakov 2011-06-20 20:39:05 PDT
(In reply to comment #2)
> This behavior matches Firefox and should match IE (don't have ability to quickly test).

We don't always test IE, but color parsing is quite fundamental. Maybe you could check with <http://browsershots.org/>?
Comment 13 Tab Atkins 2011-06-21 11:42:22 PDT
Created attachment 98026 [details]
Patch
Comment 14 Darin Adler 2011-06-21 11:45:13 PDT
Comment on attachment 98026 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98026&action=review

This change looks OK, but the same logic could be done with no memory allocation except for the allocation of the strings for the property values. I’d prefer to see that.

> Source/WebCore/dom/StyledElement.cpp:354
> +            color.replace(pos, 1, "0");

Calling replace on a string is an extremely inefficient way to modify a character at a time, and involves multiple allocations per call. I suggest using a UChar array on the stack instead.

> Source/WebCore/dom/StyledElement.cpp:357
> +        color += "0";

Appending a string like this is inefficient, requiring a memory allocation each time.

> Source/WebCore/dom/StyledElement.cpp:359
> +        color += "00";

Ditto.

> Source/WebCore/dom/StyledElement.cpp:361
> +        color = "000";

Ditto.
Comment 15 Tab Atkins 2011-06-21 11:48:25 PDT
(In reply to comment #11)
> (From update of attachment 97909 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=97909&action=review
> 
> r- on the return after else, and you might want to do one final round.
> 
> > Source/WebCore/dom/StyledElement.cpp:332
> > +    int colors[3] = { 0, 0, 0 };
> 
> This would be better called 'components'.

Good suggestion.  Done.


> > Source/WebCore/dom/StyledElement.cpp:340
> > +        colors[2] = foundColor.blue();
> 
> Can't we call setProperty() here directly, rather than doing a String::format() which we have to re-parse again later?

As far as I can tell, setProperty() only ever takes a string.  However, I *can* just call setProperty here with the original string, now that it's confirmed to be a valid color name.  I've done so, and changed the other cases to also directly call setProperty and immediately return.


> > Source/WebCore/dom/StyledElement.cpp:344
> > +    } else { 
> 
> We avoid 'else' after return.

Fixed as part of the above.


> > Source/WebCore/dom/StyledElement.cpp:345
> > +        // Otherwise, use the crazy legacy parsing rules.
> 
> Might be nice to put the legacy parsing code into its own method, then the code becomes self-documenting.

The legacy parsing is the entire point of this method, really.  The preceding stuff is just a prelude, testing for the couple of things that are acceptable in legacy colors.


> > Source/WebCore/dom/StyledElement.cpp:347
> > +        // FIXME: Replace all non-BMP characters in color with "00".
> 
> You should file a follow-up bug for this FIXME, but is this not what the isASCIIHexDigit() code below is doing?

Nope.  Non-BMP chars get replaced with "00".  Non-hex chars that are in the BMP get replaced with "0".


> > Source/WebCore/dom/StyledElement.cpp:369
> > +            // Otherwise, we have at least 6 digits.
> 
> You should ASSERT this

Done.

> > Source/WebCore/dom/StyledElement.cpp:372
> > +            int componentSearchWindowSize = min(componentLength, static_cast<size_t>(8));
> 
> You can avoid the ugly static_cast by using min<int>()

Done.  (I used min<size_t>.)

> > Source/WebCore/dom/StyledElement.cpp:395
> > +    color = String::format("#%02x%02x%02x", colors[0], colors[1], colors[2]);
> 
> It's a shame to do all this work to compute the color components, and then turn around and make a String out of them.

You're right.  I can just use the found components directly, instead of making them into a number and then back into a string.  Done.  (This part of the code was around before I started touching things.)
Comment 16 Tab Atkins 2011-06-21 11:54:20 PDT
(In reply to comment #14)
> (From update of attachment 98026 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=98026&action=review
> 
> This change looks OK, but the same logic could be done with no memory allocation except for the allocation of the strings for the property values. I’d prefer to see that.

I don't believe I can get the allocations down quite that low, but I can definitely follow your suggestions below.

> > Source/WebCore/dom/StyledElement.cpp:354
> > +            color.replace(pos, 1, "0");
> 
> Calling replace on a string is an extremely inefficient way to modify a character at a time, and involves multiple allocations per call. I suggest using a UChar array on the stack instead.
> 
> > Source/WebCore/dom/StyledElement.cpp:357
> > +        color += "0";
> 
> Appending a string like this is inefficient, requiring a memory allocation each time.
> 
> > Source/WebCore/dom/StyledElement.cpp:359
> > +        color += "00";
> 
> Ditto.
> 
> > Source/WebCore/dom/StyledElement.cpp:361
> > +        color = "000";
> 
> Ditto.
Comment 17 Adam Barth 2011-06-21 12:21:09 PDT
> I don't believe I can get the allocations down quite that low, but I can definitely follow your suggestions below.

One benefit of using Vector<UChar> is that you can set an inline capacity that's allocated on the stack.  In most cases, you'll stay within the inline capacity and avoid heap allocations.
Comment 18 Tab Atkins 2011-06-23 10:56:15 PDT
Created attachment 98373 [details]
Patch
Comment 19 Tab Atkins 2011-06-23 11:07:56 PDT
Okay, switched to doing the fixup in a buffer.  I'm down to three allocations now - one String that's just a whitespace-stripped version of the attr value (necessary so I can pass it through to the named-color function), one Vector<UChar> that's preallocated to the maximum size I'll need, and a final String holding the CSS value we're going to set.

I'd appreciate a review of the changes to make sure what I'm doing is idiomatic.
Comment 20 Eric Seidel (no email) 2011-06-23 11:24:26 PDT
Comment on attachment 98373 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98373&action=review

> Source/WebCore/dom/StyledElement.cpp:331
> +    String color = attrValue.stripWhiteSpace();

Shouldn't we strip whitespace before checking "transparent" or length?  (And test that?)

> Source/WebCore/dom/StyledElement.cpp:346
> +    // First, fixup!

I'm not sure this comment is helpful.

> Source/WebCore/dom/StyledElement.cpp:347
> +    // Reserve 128 rounded to nearest multiple of 3

Why?  Saying why is more helpful than restating what the code does. :)

> Source/WebCore/dom/StyledElement.cpp:348
> +    Vector<UChar, 129> chars;

This could be defined as maxColorLength + 1, or maxColorLength + 3 if you wanted to avoid explaining why +1 was ok or avoid writing (maxColorLength / 3 + 1 * 3)

> Source/WebCore/dom/StyledElement.cpp:354
> +    for (; colorPos < color.length() && chars.size() < 128; colorPos++) {

Then this would use maxColorLength isntead of 128.

> Source/WebCore/dom/StyledElement.cpp:355
> +        // FIXME: Replace all non-BMP characters in color with "00".

Should be easy to do and test.  Why not just do that as part of this patch?

> Source/WebCore/dom/StyledElement.cpp:379
> +    // Split the digits into three components, then search the last 8 digits of each component.

Maybe this whole section should be a helper function.

> Source/WebCore/dom/StyledElement.cpp:398
> +    color = String::format("#%c%c%c%c%c%c", (char)chars[redIndex], (char)chars[redIndex+1], (char)chars[greenIndex], (char)chars[greenIndex+1], (char)chars[blueIndex], (char)chars[blueIndex+1]);

We don't generally use c-style casts.  Why are these char casts necessary?
Comment 21 Darin Adler 2011-06-23 12:41:15 PDT
Comment on attachment 98373 [details]
Patch

I’m almost certain the char casts are not needed. As long as both char and UChar are smaller than int, then they are both passed as int to varargs functions like String::format.
Comment 22 Tab Atkins 2011-06-23 16:41:23 PDT
Created attachment 98432 [details]
Patch
Comment 23 Tab Atkins 2011-06-23 16:50:10 PDT
(In reply to comment #20)
> (From update of attachment 98373 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=98373&action=review
> 
> > Source/WebCore/dom/StyledElement.cpp:331
> > +    String color = attrValue.stripWhiteSpace();
> 
> Shouldn't we strip whitespace before checking "transparent" or length?  (And test that?)

Thanks for the catch; we *should* strip before testing for "transparent".  The empty string test is correct, though - a string full of whitespace gets turned into black, per the spec algorithm.

I've added testing coverage for both of these cases.

> > Source/WebCore/dom/StyledElement.cpp:346
> > +    // First, fixup!
> 
> I'm not sure this comment is helpful.

It wasn't.  Removed.

> > Source/WebCore/dom/StyledElement.cpp:347
> > +    // Reserve 128 rounded to nearest multiple of 3
> 
> Why?  Saying why is more helpful than restating what the code does. :)
> 
> > Source/WebCore/dom/StyledElement.cpp:348
> > +    Vector<UChar, 129> chars;
> 
> This could be defined as maxColorLength + 1, or maxColorLength + 3 if you wanted to avoid explaining why +1 was ok or avoid writing (maxColorLength / 3 + 1 * 3)
> 
> > Source/WebCore/dom/StyledElement.cpp:354
> > +    for (; colorPos < color.length() && chars.size() < 128; colorPos++) {
> 
> Then this would use maxColorLength isntead of 128.

Done, thanks.

> > Source/WebCore/dom/StyledElement.cpp:355
> > +        // FIXME: Replace all non-BMP characters in color with "00".
> 
> Should be easy to do and test.  Why not just do that as part of this patch?

Because I wasn't sure how WTF::String dealt with non-BMP chars, particularly since it was made out of UChars which are 16-bit.  After some digging and testing, turns out that WTF::String is just composed of utf16 code elements, so my "replace non-hex with '0'" part handled the non-BMP case for free, because non-BMP characters are automatically 2 "characters" in a WTF::String.

> > Source/WebCore/dom/StyledElement.cpp:379
> > +    // Split the digits into three components, then search the last 8 digits of each component.
> 
> Maybe this whole section should be a helper function.

I've done so.  I was instructed that the common idiom is to make it a file-static function; is this correct?
 
> > Source/WebCore/dom/StyledElement.cpp:398
> > +    color = String::format("#%c%c%c%c%c%c", (char)chars[redIndex], (char)chars[redIndex+1], (char)chars[greenIndex], (char)chars[greenIndex+1], (char)chars[blueIndex], (char)chars[blueIndex+1]);
> 
> We don't generally use c-style casts.  Why are these char casts necessary?

They were necessary because String::format didn't like receiving UChars when using the %c directive.  However, I've since realized I don't actually need UChars, so I replaced my Vector<UChar> with a Vector<char> and removed the casts.
Comment 24 Darin Adler 2011-06-23 17:04:33 PDT
Comment on attachment 98432 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98432&action=review

review- because of the failing test in the test results in the patch

> LayoutTests/fast/dom/attribute-legacy-colors-expected.txt:1
> +This test ensures that legacy color attributes are parsed properly.

I don’t think there are enough tests here to test the algorithm and edge cases given my reading of the code. I don’t see any test cases that get up to 128 characters, or pass 128 characters. I don’t see tests that cover the behavior of skipping up to 8 zeroes but not 9 in the individual components.

> LayoutTests/fast/dom/attribute-legacy-colors-expected.txt:8
> +FAIL document.body.bgColor=' transparent ';getComputedStyle(document.body).backgroundColor; should be rgba(0, 0, 0, 0). Was rgb(0, 0, 0).

Failing test!?

> Source/WebCore/dom/StyledElement.cpp:321
> +static String parseInvalidColorAttributeValue(const String attrValue)

This should be const String& attributeValue, not const String attrValue.

> Source/WebCore/dom/StyledElement.cpp:323
> +    const size_t maxColorLength = 128;

It would be good to say where this 128 comes from. I assume it’s from the HTML specification’s invalid color parsing rules.

> Source/WebCore/dom/StyledElement.cpp:325
> +    // We'll pad the buffer to a multiple of 3, so we need to reserve a little more for safety
> +    Vector<char, maxColorLength+3> chars;

A vector will automatically start using the heap instead of the stack once you go past the inline buffer, so there's no need to have the inline buffer size be large enough for the worst case. It just needs to be big enough to cover the normal case. I suppose this is OK as is, but this pays the cost of using the vector class without getting any benefit from it.

Since this buffer holds only hex digits, I'd suggest naming it digitBuffer or digits, rather than naming it characters.

Given the comment the code should say "+ 2" rather than "+ 3". Saying "a little more for safety" is unnecessarily vague; I think you’re saying that because there’s no easy way to write "round up to multiple of 3", so we are using "+ 2" to approximate the ceiling of that.

> Source/WebCore/dom/StyledElement.cpp:327
> +    size_t colorPos = 0;

I’d rather see this be named something like "i". It’s not the position of a color, so I don’t like the name “color position” for it, and I also do not care for abbreviations like “pos”.

> Source/WebCore/dom/StyledElement.cpp:333
> +    // Grab the first 128 chars, replacing non-hex chars with 0
> +    // Non-BMP chars are replaced with "00" due to them appearing as two UChars in the String

We put periods on the end of sentences or even sentence-like fragments in our comments in WebKit.

Also, I would say characters, not chars.

> Source/WebCore/dom/StyledElement.cpp:340
> +    // Pad the buffer to a multiple of 3

Again, period needed.

> Source/WebCore/dom/StyledElement.cpp:346
> +    if (chars.size() % 3 == 2)
> +        chars.append('0');
> +    else if (chars.size() % 3 == 1) {
> +        chars.append('0');
> +        chars.append('0');
> +    }

No real need to check the size here. We could just unconditionally append two "0" characters instead after the size zero check.

> Source/WebCore/dom/StyledElement.cpp:351
> +    if (chars.size() == 3)

If we unconditionally added two "0" characters, then this would be "< 6" rather than "== 3", but otherwise the code could be the same.

> Source/WebCore/dom/StyledElement.cpp:361
> +    // Skip digits until one of them is non-zero, or you've only got two left in the component.

We’d normally say “we’ve”.

> Source/WebCore/dom/StyledElement.cpp:373
> +    return String::format("#%c%c%c%c%c%c", chars[redIndex], chars[redIndex+1], chars[greenIndex], chars[greenIndex+1], chars[blueIndex], chars[blueIndex+1]);   

We normally would put spaces around the "+ 1". in this expression.

> Source/WebCore/dom/StyledElement.cpp:376
> +// color parsing that matches HTML's "rules for parsing a legacy color value"

WebKit style would be to capitalize and put a period at the end.

> Source/WebCore/dom/StyledElement.cpp:377
> +void StyledElement::addCSSColor(Attribute* attr, int id, const String& attrValue)

I think these should have the names attribute and value or attributeValue rather than attr and attrValue.

> Source/WebCore/dom/StyledElement.cpp:383
> +    const String color = attrValue.stripWhiteSpace();

Not our WebKit style to use const in a case like this, although there is at least one person debating this style point on webkit-dev for the future.

> Source/WebCore/dom/StyledElement.cpp:385
> +    // "transparent" doesn't apply a color either.

Period at the end here.

> Source/WebCore/dom/StyledElement.cpp:402
> +    // If the string is a 3 or 6-digit hex color, use that color.
> +    if (color[0] == '#' && (color.length() == 4 || color.length() == 7) && attr->decl()->setProperty(id, color, false))
> +        return;

The comment says this runs if the string is a "hex color", but there is nothing here that checks if the characters are hex digits. So either the comment or the code is wrong.

> Source/WebCore/dom/StyledElement.cpp:404
> +    // Otherwise, use the crazy legacy parsing rules.

Not sure this comment helps. If the right name for these rules are "crazy legacy parsing rules" then the function should just be named that so we don’t need the comment.
Comment 25 Tab Atkins 2011-06-24 09:16:49 PDT
Created attachment 98500 [details]
Patch
Comment 26 Tab Atkins 2011-06-24 09:21:51 PDT
Created attachment 98502 [details]
Patch
Comment 27 Tab Atkins 2011-06-24 09:28:42 PDT
(In reply to comment #24)
> (From update of attachment 98432 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=98432&action=review
> 
> review- because of the failing test in the test results in the patch

Argh, sorry.  All the tests passed, but I didn't rebuild DRT before rebaselining, so it baselined with an old version.


> > LayoutTests/fast/dom/attribute-legacy-colors-expected.txt:1
> > +This test ensures that legacy color attributes are parsed properly.
> 
> I don’t think there are enough tests here to test the algorithm and edge cases given my reading of the code. I don’t see any test cases that get up to 128 characters, or pass 128 characters. I don’t see tests that cover the behavior of skipping up to 8 zeroes but not 9 in the individual components.

I've added two tests for the 128-characters case, one in the BMP and one outside.

I've also increased the coverage of tests in the "skipping zeros" section, and added one that ensures non-BMP characters are properly handled here.

> > Source/WebCore/dom/StyledElement.cpp:321
> > +static String parseInvalidColorAttributeValue(const String attrValue)
> 
> This should be const String& attributeValue, not const String attrValue.

Fixed.

> > Source/WebCore/dom/StyledElement.cpp:323
> > +    const size_t maxColorLength = 128;
> 
> It would be good to say where this 128 comes from. I assume it’s from the HTML specification’s invalid color parsing rules.

Fixed.

> > Source/WebCore/dom/StyledElement.cpp:325
> > +    // We'll pad the buffer to a multiple of 3, so we need to reserve a little more for safety
> > +    Vector<char, maxColorLength+3> chars;
> 
> A vector will automatically start using the heap instead of the stack once you go past the inline buffer, so there's no need to have the inline buffer size be large enough for the worst case. It just needs to be big enough to cover the normal case. I suppose this is OK as is, but this pays the cost of using the vector class without getting any benefit from it.

I chose a Vector based on Adam Barth's comments.

> Since this buffer holds only hex digits, I'd suggest naming it digitBuffer or digits, rather than naming it characters.

Named it digitBuffer.

> Given the comment the code should say "+ 2" rather than "+ 3". Saying "a little more for safety" is unnecessarily vague; I think you’re saying that because there’s no easy way to write "round up to multiple of 3", so we are using "+ 2" to approximate the ceiling of that.

Done, and reworded that comment.

> > Source/WebCore/dom/StyledElement.cpp:327
> > +    size_t colorPos = 0;
> 
> I’d rather see this be named something like "i". It’s not the position of a color, so I don’t like the name “color position” for it, and I also do not care for abbreviations like “pos”.

Changed to i.

> > Source/WebCore/dom/StyledElement.cpp:333
> > +    // Grab the first 128 chars, replacing non-hex chars with 0
> > +    // Non-BMP chars are replaced with "00" due to them appearing as two UChars in the String
> 
> We put periods on the end of sentences or even sentence-like fragments in our comments in WebKit.
> 
> Also, I would say characters, not chars.

Done.

> > Source/WebCore/dom/StyledElement.cpp:346
> > +    if (chars.size() % 3 == 2)
> > +        chars.append('0');
> > +    else if (chars.size() % 3 == 1) {
> > +        chars.append('0');
> > +        chars.append('0');
> > +    }
> 
> No real need to check the size here. We could just unconditionally append two "0" characters instead after the size zero check.

Not sure I completely agree, but done.

> > Source/WebCore/dom/StyledElement.cpp:351
> > +    if (chars.size() == 3)
> 
> If we unconditionally added two "0" characters, then this would be "< 6" rather than "== 3", but otherwise the code could be the same.

Done.

> > Source/WebCore/dom/StyledElement.cpp:361
> > +    // Skip digits until one of them is non-zero, or you've only got two left in the component.
> 
> We’d normally say “we’ve”.

Thanks, I wasn't sure what pronoun to use.  Done.

> > Source/WebCore/dom/StyledElement.cpp:373
> > +    return String::format("#%c%c%c%c%c%c", chars[redIndex], chars[redIndex+1], chars[greenIndex], chars[greenIndex+1], chars[blueIndex], chars[blueIndex+1]);   
> 
> We normally would put spaces around the "+ 1". in this expression.

Done.

> > Source/WebCore/dom/StyledElement.cpp:376
> > +// color parsing that matches HTML's "rules for parsing a legacy color value"
> 
> WebKit style would be to capitalize and put a period at the end.

Done.

> > Source/WebCore/dom/StyledElement.cpp:377
> > +void StyledElement::addCSSColor(Attribute* attr, int id, const String& attrValue)
> 
> I think these should have the names attribute and value or attributeValue rather than attr and attrValue.

Done.

> > Source/WebCore/dom/StyledElement.cpp:383
> > +    const String color = attrValue.stripWhiteSpace();
> 
> Not our WebKit style to use const in a case like this, although there is at least one person debating this style point on webkit-dev for the future.

I've removed the const for now.

> > Source/WebCore/dom/StyledElement.cpp:402
> > +    // If the string is a 3 or 6-digit hex color, use that color.
> > +    if (color[0] == '#' && (color.length() == 4 || color.length() == 7) && attr->decl()->setProperty(id, color, false))
> > +        return;
> 
> The comment says this runs if the string is a "hex color", but there is nothing here that checks if the characters are hex digits. So either the comment or the code is wrong.

The test is implicit, in that if it conforms to the tests I perform and successfully parses as a color, it must have been a valid hex color.  Would it be better to explicitly test the 3/6 digits with isASCIIHexDigit()?

> > Source/WebCore/dom/StyledElement.cpp:404
> > +    // Otherwise, use the crazy legacy parsing rules.
> 
> Not sure this comment helps. If the right name for these rules are "crazy legacy parsing rules" then the function should just be named that so we don’t need the comment.

Done.  Changed the helper function's name to "parseColorStringWithCrazyLegacyRules".  ^_^
Comment 28 Tab Atkins 2011-06-28 16:03:02 PDT
Bump for r+, since I submitted two patches in quick succession and the "review cancelled" message from the first may have confused you into thinking I didn't need a review.
Comment 29 Adam Barth 2011-06-30 10:59:29 PDT
Comment on attachment 98502 [details]
Patch

Looks nice.
Comment 30 Tab Atkins 2011-06-30 11:16:03 PDT
(In reply to comment #12)
> (In reply to comment #2)
> > This behavior matches Firefox and should match IE (don't have ability to quickly test).
> 
> We don't always test IE, but color parsing is quite fundamental. Maybe you could check with <http://browsershots.org/>?

I just tested in IE9, and we *mostly* match.  Other than the expected fails (due to them return "transparent" when we return "rgba(0, 0, 0, 0)", they fail the following tests:

1. "#f00"
2. "#555"
3. " red "
4. " #f00 "
5. " #ff0000 "
6. " ffffff "
7. "f"*128 + "0"*6
8. same as previous, but with 64 non-BMP chars instead of "f".
9. "rgb(255, 0, 0)"

1-2 fail because these cases aren't recognized as valid colors, and are passed down into their legacy codepath instead.  3-6 fail because they don't trim whitespace.  7-8 fail because they don't truncate to 128 chars before padding to a multiple of 3.  9 fails because they recognize the rgb() function as a CSS color and respect it (you're supposed to treat it like an arbitrary string and send it through legacy parsing).

They pass the rest of the testcases, though.  Firefox passes all of the testcases (besides the expected fails, for the same reason).  Apparently the spec algorithm was based directly off of what Firefox did to reverse-engineer IE.
Comment 31 WebKit Review Bot 2011-06-30 11:41:42 PDT
Comment on attachment 98502 [details]
Patch

Clearing flags on attachment: 98502

Committed r90139: <http://trac.webkit.org/changeset/90139>
Comment 32 WebKit Review Bot 2011-06-30 11:41:48 PDT
All reviewed patches have been landed.  Closing bug.
Comment 33 mitz 2012-03-22 16:11:13 PDT
(In reply to comment #31)
> (From update of attachment 98502 [details])
> Clearing flags on attachment: 98502
> 
> Committed r90139: <http://trac.webkit.org/changeset/90139>

This change caused text that was formerly hidden using <font color="transparent"> to become visible on the airprint.apple.com website.
Comment 34 Tab Atkins 2012-03-28 15:53:49 PDT
(In reply to comment #33)
> (In reply to comment #31)
> > (From update of attachment 98502 [details] [details])
> > Clearing flags on attachment: 98502
> > 
> > Committed r90139: <http://trac.webkit.org/changeset/90139>
> 
> This change caused text that was formerly hidden using <font color="transparent"> to become visible on the airprint.apple.com website.

...

That's possibly the *worst* way I've ever seen to hide text.  They deserve every bad thing that happened to them for committing that kind of sin.
Comment 35 mitz 2012-08-26 16:11:10 PDT
(In reply to comment #31)
> (From update of attachment 98502 [details])
> Clearing flags on attachment: 98502
> 
> Committed r90139: <http://trac.webkit.org/changeset/90139>

This change appears to have caused content that was using BGCOLOR=RGB(255,300,500) with a dark text color to become unreadable.
Comment 36 mitz 2012-08-26 16:14:14 PDT
(In reply to comment #31)
> (From update of attachment 98502 [details])
> Clearing flags on attachment: 98502
> 
> Committed r90139: <http://trac.webkit.org/changeset/90139>

This change caused some email messages authored in Microsoft Outlook for Mac to have unexpected colors.