119413 – srcset algorithm breaks base64 src attributes

RESOLVED FIXED 119413

srcset algorithm breaks base64 src attributes

https://bugs.webkit.org/show_bug.cgi?id=119413

Summary srcset algorithm breaks base64 src attributes

Dean Jackson

Reported 2013-08-01 15:05:08 PDT

The patch in bug 110252 breaks image src attributes if they are base64 encoded, because the comma is seen as a separator. Update the algorithm to: - handle encoded attributes - skip any candidate strings that we can't handle (such as modifiers other than scale) - handle whitespace other than SPACE characters

Attachments
Patch (114.76 KB, patch) 2013-08-01 15:21 PDT, Dean Jackson	darin: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2013-08-01 15:05:26 PDT

<rdar://problem/14625461>

Dean Jackson

Comment 2 2013-08-01 15:21:06 PDT

Created attachment 207963 [details] Patch

Dean Jackson

Comment 3 2013-08-01 15:22:35 PDT

This will break on EWS since it relies on patches that have not landed.

Darin Adler

Comment 4 2013-08-01 16:27:15 PDT

Comment on attachment 207963 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=207963&action=review EWS couldn't apply the patch > Source/WebCore/html/HTMLImageElement.cpp:115 > + return m_bestFitImageURL.isEmpty() ? getAttribute(srcAttr) : m_bestFitImageURL; Should use fastGetAttribute, I think. > Source/WebCore/html/HTMLImageElement.cpp:148 > + const String& srcSetAttributeValue = getAttribute(srcsetAttr).string().simplifyWhiteSpace(isHTMLSpace); Should use fastGetAttribute. Calling simplifyWhiteSpace on the entire string seems a bit like overkill to me, but I guess it’s OK. > Source/WebCore/html/HTMLImageElement.cpp:151 > + srcSetAttributeValue.split(',', srcSetTokens); Sure seems like a shame to allocate all these separate strings for the tokens. There must be some nicer way to write the parser. Ideally for good performance we wouldn't construct substrings at all, but that would require a version of toFloat and decodeURLEscapeSequences that operated on a substring. > Source/WebCore/html/HTMLImageElement.cpp:172 > + image.imageURL = decodeURLEscapeSequences(data[0]); This seems like a strange thing to do. Why would we decode URL escape sequences here? Is there a test case that demonstrates the benefit of this? > Source/WebCore/html/HTMLImageElement.cpp:196 > + determineBestImageForScaleFactor(); It seems a shame to do this twice during the initial parse of an <img> that has both src and srcset attributes.

Dean Jackson

Comment 5 2013-08-01 16:54:51 PDT

Comment on attachment 207963 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=207963&action=review >> Source/WebCore/html/HTMLImageElement.cpp:115 >> + return m_bestFitImageURL.isEmpty() ? getAttribute(srcAttr) : m_bestFitImageURL; > > Should use fastGetAttribute, I think. Done. >> Source/WebCore/html/HTMLImageElement.cpp:148 >> + const String& srcSetAttributeValue = getAttribute(srcsetAttr).string().simplifyWhiteSpace(isHTMLSpace); > > Should use fastGetAttribute. Calling simplifyWhiteSpace on the entire string seems a bit like overkill to me, but I guess it’s OK. It is overkill, but something we'll remove once we get a better parser. >> Source/WebCore/html/HTMLImageElement.cpp:151 >> + srcSetAttributeValue.split(',', srcSetTokens); > > Sure seems like a shame to allocate all these separate strings for the tokens. There must be some nicer way to write the parser. Ideally for good performance we wouldn't construct substrings at all, but that would require a version of toFloat and decodeURLEscapeSequences that operated on a substring. Yes, this is all a bit of overkill. If we can remove decodeURLEscapeSequences it might be ok to extract just the bits for the scale factor into a new string and run toFloat on that. Ultimately we want a parser that walks through the string without splitting. At the moment these attributes are not expected to have more than two candidate strings though. >> Source/WebCore/html/HTMLImageElement.cpp:172 >> + image.imageURL = decodeURLEscapeSequences(data[0]); > > This seems like a strange thing to do. Why would we decode URL escape sequences here? Is there a test case that demonstrates the benefit of this? Yes, it's the test case fast/hidpi/image-srcset-data-srcset.html Since srcset requires comma separated values, a base64 encoded url breaks the parsing unless it is also % encoded (which is a pain that I've reported to Ted to see if it can be avoided in the spec). >> Source/WebCore/html/HTMLImageElement.cpp:196 >> + determineBestImageForScaleFactor(); > > It seems a shame to do this twice during the initial parse of an <img> that has both src and srcset attributes. Yeah. I originally tried to have it calculated only when you're asked for the image source but that is a const method. Unfortunately when either the src or the srcset attributes are set we need to run the algorithm, and I'm not sure how to determine when we're in the initial parse (nor how to know we've finished the parse).

Dean Jackson

Comment 6 2013-08-01 17:04:15 PDT

Committed r153627: <http://trac.webkit.org/changeset/153627>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Images

Assignee

Dean Jackson

Reported

2013-08-01 15:05 PDT

Modified

2013-08-01 17:04 PDT History

CC List

1 user Show

URL

Keywords InRadar

Depends on

110252

Blocks

Dependencies

tree graph