Bug 119413 - srcset algorithm breaks base64 src attributes
Summary: srcset algorithm breaks base64 src attributes
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Images (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dean Jackson
URL:
Keywords: InRadar
Depends on: 110252
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-01 15:05 PDT by Dean Jackson
Modified: 2013-08-01 17:04 PDT (History)
1 user (show)

See Also:


Attachments
Patch (114.76 KB, patch)
2013-08-01 15:21 PDT, Dean Jackson
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dean Jackson 2013-08-01 15:05:08 PDT
The patch in bug 110252 breaks image src attributes if they are base64 encoded, because the comma is seen as a separator.

Update the algorithm to:

- handle encoded attributes
- skip any candidate strings that we can't handle (such as modifiers other than scale)
- handle whitespace other than SPACE characters
Comment 1 Radar WebKit Bug Importer 2013-08-01 15:05:26 PDT
<rdar://problem/14625461>
Comment 2 Dean Jackson 2013-08-01 15:21:06 PDT
Created attachment 207963 [details]
Patch
Comment 3 Dean Jackson 2013-08-01 15:22:35 PDT
This will break on EWS since it relies on patches that have not landed.
Comment 4 Darin Adler 2013-08-01 16:27:15 PDT
Comment on attachment 207963 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=207963&action=review

EWS couldn't apply the patch

> Source/WebCore/html/HTMLImageElement.cpp:115
> +    return m_bestFitImageURL.isEmpty() ? getAttribute(srcAttr) : m_bestFitImageURL;

Should use fastGetAttribute, I think.

> Source/WebCore/html/HTMLImageElement.cpp:148
> +    const String& srcSetAttributeValue = getAttribute(srcsetAttr).string().simplifyWhiteSpace(isHTMLSpace);

Should use fastGetAttribute. Calling simplifyWhiteSpace on the entire string seems a bit like overkill to me, but I guess it’s OK.

> Source/WebCore/html/HTMLImageElement.cpp:151
> +    srcSetAttributeValue.split(',', srcSetTokens);

Sure seems like a shame to allocate all these separate strings for the tokens. There must be some nicer way to write the parser. Ideally for good performance we wouldn't construct substrings at all, but that would require a version of toFloat and decodeURLEscapeSequences that operated on a substring.

> Source/WebCore/html/HTMLImageElement.cpp:172
> +        image.imageURL = decodeURLEscapeSequences(data[0]);

This seems like a strange thing to do. Why would we decode URL escape sequences here? Is there a test case that demonstrates the benefit of this?

> Source/WebCore/html/HTMLImageElement.cpp:196
> +        determineBestImageForScaleFactor();

It seems a shame to do this twice during the initial parse of an <img> that has both src and srcset attributes.
Comment 5 Dean Jackson 2013-08-01 16:54:51 PDT
Comment on attachment 207963 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=207963&action=review

>> Source/WebCore/html/HTMLImageElement.cpp:115
>> +    return m_bestFitImageURL.isEmpty() ? getAttribute(srcAttr) : m_bestFitImageURL;
> 
> Should use fastGetAttribute, I think.

Done.

>> Source/WebCore/html/HTMLImageElement.cpp:148
>> +    const String& srcSetAttributeValue = getAttribute(srcsetAttr).string().simplifyWhiteSpace(isHTMLSpace);
> 
> Should use fastGetAttribute. Calling simplifyWhiteSpace on the entire string seems a bit like overkill to me, but I guess it’s OK.

It is overkill, but something we'll remove once we get a better parser.

>> Source/WebCore/html/HTMLImageElement.cpp:151
>> +    srcSetAttributeValue.split(',', srcSetTokens);
> 
> Sure seems like a shame to allocate all these separate strings for the tokens. There must be some nicer way to write the parser. Ideally for good performance we wouldn't construct substrings at all, but that would require a version of toFloat and decodeURLEscapeSequences that operated on a substring.

Yes, this is all a bit of overkill. If we can remove decodeURLEscapeSequences it might be ok to extract just the bits for the scale factor into a new string and run toFloat on that.

Ultimately we want a parser that walks through the string without splitting. At the moment these attributes are not expected to have more than two candidate strings though.

>> Source/WebCore/html/HTMLImageElement.cpp:172
>> +        image.imageURL = decodeURLEscapeSequences(data[0]);
> 
> This seems like a strange thing to do. Why would we decode URL escape sequences here? Is there a test case that demonstrates the benefit of this?

Yes, it's the test case fast/hidpi/image-srcset-data-srcset.html
Since srcset requires comma separated values, a base64 encoded url breaks the parsing unless it is also % encoded (which is a pain that I've reported to Ted to see if it can be avoided in the spec).

>> Source/WebCore/html/HTMLImageElement.cpp:196
>> +        determineBestImageForScaleFactor();
> 
> It seems a shame to do this twice during the initial parse of an <img> that has both src and srcset attributes.

Yeah. I originally tried to have it calculated only when you're asked for the image source but that is a const method. Unfortunately when either the src or the srcset attributes are set we need to run the algorithm, and I'm not sure how to determine when we're in the initial parse (nor how to know we've finished the parse).
Comment 6 Dean Jackson 2013-08-01 17:04:15 PDT
Committed r153627: <http://trac.webkit.org/changeset/153627>