RESOLVED FIXED 119413
srcset algorithm breaks base64 src attributes
https://bugs.webkit.org/show_bug.cgi?id=119413
Summary srcset algorithm breaks base64 src attributes
Dean Jackson
Reported 2013-08-01 15:05:08 PDT
The patch in bug 110252 breaks image src attributes if they are base64 encoded, because the comma is seen as a separator. Update the algorithm to: - handle encoded attributes - skip any candidate strings that we can't handle (such as modifiers other than scale) - handle whitespace other than SPACE characters
Attachments
Patch (114.76 KB, patch)
2013-08-01 15:21 PDT, Dean Jackson
darin: review+
Radar WebKit Bug Importer
Comment 1 2013-08-01 15:05:26 PDT
Dean Jackson
Comment 2 2013-08-01 15:21:06 PDT
Dean Jackson
Comment 3 2013-08-01 15:22:35 PDT
This will break on EWS since it relies on patches that have not landed.
Darin Adler
Comment 4 2013-08-01 16:27:15 PDT
Comment on attachment 207963 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=207963&action=review EWS couldn't apply the patch > Source/WebCore/html/HTMLImageElement.cpp:115 > + return m_bestFitImageURL.isEmpty() ? getAttribute(srcAttr) : m_bestFitImageURL; Should use fastGetAttribute, I think. > Source/WebCore/html/HTMLImageElement.cpp:148 > + const String& srcSetAttributeValue = getAttribute(srcsetAttr).string().simplifyWhiteSpace(isHTMLSpace); Should use fastGetAttribute. Calling simplifyWhiteSpace on the entire string seems a bit like overkill to me, but I guess it’s OK. > Source/WebCore/html/HTMLImageElement.cpp:151 > + srcSetAttributeValue.split(',', srcSetTokens); Sure seems like a shame to allocate all these separate strings for the tokens. There must be some nicer way to write the parser. Ideally for good performance we wouldn't construct substrings at all, but that would require a version of toFloat and decodeURLEscapeSequences that operated on a substring. > Source/WebCore/html/HTMLImageElement.cpp:172 > + image.imageURL = decodeURLEscapeSequences(data[0]); This seems like a strange thing to do. Why would we decode URL escape sequences here? Is there a test case that demonstrates the benefit of this? > Source/WebCore/html/HTMLImageElement.cpp:196 > + determineBestImageForScaleFactor(); It seems a shame to do this twice during the initial parse of an <img> that has both src and srcset attributes.
Dean Jackson
Comment 5 2013-08-01 16:54:51 PDT
Comment on attachment 207963 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=207963&action=review >> Source/WebCore/html/HTMLImageElement.cpp:115 >> + return m_bestFitImageURL.isEmpty() ? getAttribute(srcAttr) : m_bestFitImageURL; > > Should use fastGetAttribute, I think. Done. >> Source/WebCore/html/HTMLImageElement.cpp:148 >> + const String& srcSetAttributeValue = getAttribute(srcsetAttr).string().simplifyWhiteSpace(isHTMLSpace); > > Should use fastGetAttribute. Calling simplifyWhiteSpace on the entire string seems a bit like overkill to me, but I guess it’s OK. It is overkill, but something we'll remove once we get a better parser. >> Source/WebCore/html/HTMLImageElement.cpp:151 >> + srcSetAttributeValue.split(',', srcSetTokens); > > Sure seems like a shame to allocate all these separate strings for the tokens. There must be some nicer way to write the parser. Ideally for good performance we wouldn't construct substrings at all, but that would require a version of toFloat and decodeURLEscapeSequences that operated on a substring. Yes, this is all a bit of overkill. If we can remove decodeURLEscapeSequences it might be ok to extract just the bits for the scale factor into a new string and run toFloat on that. Ultimately we want a parser that walks through the string without splitting. At the moment these attributes are not expected to have more than two candidate strings though. >> Source/WebCore/html/HTMLImageElement.cpp:172 >> + image.imageURL = decodeURLEscapeSequences(data[0]); > > This seems like a strange thing to do. Why would we decode URL escape sequences here? Is there a test case that demonstrates the benefit of this? Yes, it's the test case fast/hidpi/image-srcset-data-srcset.html Since srcset requires comma separated values, a base64 encoded url breaks the parsing unless it is also % encoded (which is a pain that I've reported to Ted to see if it can be avoided in the spec). >> Source/WebCore/html/HTMLImageElement.cpp:196 >> + determineBestImageForScaleFactor(); > > It seems a shame to do this twice during the initial parse of an <img> that has both src and srcset attributes. Yeah. I originally tried to have it calculated only when you're asked for the image source but that is a const method. Unfortunately when either the src or the srcset attributes are set we need to run the algorithm, and I'm not sure how to determine when we're in the initial parse (nor how to know we've finished the parse).
Dean Jackson
Comment 6 2013-08-01 17:04:15 PDT
Note You need to log in before you can comment on or make changes to this bug.