119423 – Improve srcset parser

Dean Jackson

Reported 2013-08-01 19:09:48 PDT

The srcset parser creates intermediate arrays of strings (multiple times). We should make a better parser than walks along the attribute value to extract the candidates.

Radar WebKit Bug Importer

Comment 1 2013-08-01 19:09:57 PDT

<rdar://problem/14628284>

Dean Jackson

Comment 2 2013-08-01 19:10:22 PDT

Note that the parser is duplicated in the preload scanner and the image element.

Dean Jackson

Comment 3 2013-08-12 14:14:40 PDT

(In reply to comment #2) > Note that the parser is duplicated in the preload scanner and the image element. It's not duplicated - ignore this. Still needs improvement though :)

Benjamin Poulain

Comment 4 2013-08-12 14:35:11 PDT

(In reply to comment #3) > (In reply to comment #2) > > Note that the parser is duplicated in the preload scanner and the image element. > > It's not duplicated - ignore this. Still needs improvement though :) Romain is getting started on this, he emailed me yesterday.

Romain Perier

Comment 5 2013-08-19 12:24:54 PDT

Created attachment 209108 [details] Patch

Romain Perier

Comment 6 2013-08-19 12:25:29 PDT

See a first proposal in attachment

Benjamin Poulain

Comment 7 2013-08-20 01:22:34 PDT

Comment on attachment 209108 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=209108&action=review I am going to bed, I'll finish the review on the next iteration :) General comments: -This looks very efficient allocation wise. (did you benchmark it?) -You should have more variables with better names instead of reusing index and from. -If you can avoid loading a character twice from memory, that is great. -We should probably extract the whole parsing code in a static function with a explanative name (parseImagesWithScaleFromSrcsetAttribute()?). That's usually how we document things in WebKit. > Source/WebCore/ChangeLog:14 > + when parsing. That is more efficient on ARM devices and avoids trashing the data cache. You can probably drop the "on ARM". I have the feeling this is gonna be faster everywhere. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:325 > + size_t from = 0; "from" is a little generic. What about imageCandidateStringStart? > Source/WebCore/html/parser/HTMLParserIdioms.cpp:328 > Extra space. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 > + if (from >= srcSetAttribute.length()) You could keep srcSetAttribute.length() in a temporary variable outside the loop. This would avoid querying for the string impl for each iterations. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:330 > + break; Since this is the only break condition for the loop, why not have while(from >= srcSetAttributeLength)? > Source/WebCore/html/parser/HTMLParserIdioms.cpp:335 > + // Collect a sequence of characters that are not space, it's the url Missing period. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:340 > + size_t index = srcSetAttribute.find(isNotHTMLSpace, from); > + if (index >= separator) { > + from = separator + 1; > continue; > - if (!data.last().endsWith('x')) > + } I would start with this, then find the separator. The reason has to do with how you read the memory: Let's say you have " foo.jpg 2x". When executing "srcSetAttribute.find(',', from)", you will go over the spaces and find the separator. Then, you execute "srcSetAttribute.find(isNotHTMLSpace, from)" which will read the memory again from the beginning to find the character 'f'. If you inverses the operations. You first skip all the HTMLSpace. Then you never have to go over them again. You take the position you found, and you look for the separator: "srcSetAttribute.find(',', index). > Source/WebCore/html/parser/HTMLParserIdioms.cpp:341 > + from = index; index is too generic. The previous "index" is "startOfTheURLString" > Source/WebCore/html/parser/HTMLParserIdioms.cpp:342 > + index = srcSetAttribute.find(isHTMLSpace, index); index -> endOfTheURLString? You could also imagine doing this before finding the separator (to avoid loading the same characters again from memory). In that case, "foo,bar.jpg 2x, would parse "foo,bar.jpg" as the URL. I don't know if that is okay? (hum, do we have tests for that kind of input anyway?) > Source/WebCore/html/parser/HTMLParserIdioms.cpp:347 > + size_t url = from, spaceSep = index; Each statement should get its own line. (WebKit coding style) > Source/WebCore/html/parser/HTMLParserIdioms.cpp:349 > + // Collect a sequence of characters that are neither "," nor space, it's the descriptor Missing period.

Romain Perier

Comment 8 2013-08-27 10:23:40 PDT

Created attachment 209785 [details] Patch

Romain Perier

Comment 9 2013-08-27 10:29:25 PDT

This patch should fit your needs

Benjamin Poulain

Comment 10 2013-08-27 20:53:40 PDT

Comment on attachment 209785 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=209785&action=review I just had a quick look but this seems good. You need to rebaseline though. The patch does not apply on the bots. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:328 > You can remove this space. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:360 > + imageScaleEnd = (imageScaleEnd == notFound) ? srcSetLen: imageScaleEnd; missing space before : > Source/WebCore/html/parser/HTMLParserIdioms.cpp:365 > + separator = srcSetAttribute.find(",", separator + 1); "," -> ',' > Source/WebCore/html/parser/HTMLParserIdioms.cpp:384 > + image.imageURL = decodeURLEscapeSequences(StringImpl::createWithoutCopying(srcSetAttribute.characters() + imageUrlStart, imageUrlEnd - imageUrlStart)); You could do without the StirngImpl::createWithoutCopying(). It is not gonna be a big deal here to create a single string, and it will be safer not to share the memory.

Dean Jackson

Comment 11 2013-08-28 14:27:49 PDT

Great stuff Romain. One area I'm especially interested in is data: URLs inside srcset. Ted tells me the spec says that we shouldn't universally split on commas. I wonder if we need more tests of this. Similarly, the inline URL might be a data: SVG which has inline commas as well.

Romain Perier

Comment 12 2013-08-29 05:22:40 PDT

(In reply to comment #11) > Great stuff Romain. > > One area I'm especially interested in is data: URLs inside srcset. > Mhhh right > Ted tells me the spec says that we shouldn't universally split on commas. Yes, but you can detect and skip invalid candidates (and switch to another one) > I wonder if we need more tests of this. Similarly, the inline URL might be a >data: SVG which has inline commas as well. It's an issue that I need to resolve, good catch. I will look at it.

Theresa O'Connor

Comment 13 2013-08-29 09:41:08 PDT

If your implementation follows the "Processing the image candidates" algorithm as specified, you won't have an issue with commas in data URLs. http://www.whatwg.org/specs/web-apps/current-work/multipage/embedded-content-1.html#processing-the-image-candidates http://www.w3.org/html/wg/drafts/srcset/w3c-srcset/#processing-the-image-candidates

Romain Perier

Comment 14 2013-09-02 03:59:37 PDT

Created attachment 210274 [details] Patch

Romain Perier

Comment 15 2013-09-02 04:17:57 PDT

(In reply to comment #11) > Great stuff Romain. > > One area I'm especially interested in is data: URLs inside srcset. Ted tells me the spec says that we shouldn't universally split on commas. I wonder if we need more tests of this. Similarly, the inline URL might be a data: SVG which has inline commas as well. Btw, as discussed with Benjamin an SVG might contain commas but also spaces... (the URL delimiter). The parser can handle several commas in the data but not space as it's the end of the url... (how do you make difference between a normal space and the space delimiter ? And yes I have fallowed the specification)

Romain Perier

Comment 16 2013-09-02 07:51:11 PDT

Created attachment 210285 [details] Patch

Dean Jackson

Comment 17 2013-09-03 18:19:04 PDT

(In reply to comment #15) > (In reply to comment #11) > > Great stuff Romain. > > > > One area I'm especially interested in is data: URLs inside srcset. Ted tells me the spec says that we shouldn't universally split on commas. I wonder if we need more tests of this. Similarly, the inline URL might be a data: SVG which has inline commas as well. > > Btw, as discussed with Benjamin an SVG might contain commas but also spaces... (the URL delimiter). The parser can handle several commas in the data but not space as it's the end of the url... (how do you make difference between a normal space and the space delimiter ? And yes I have fallowed the specification) I guess there are limits to what data: content we can put in a srcset attribute without having to escape things. I don' t think we should worry too much about it.

Dean Jackson

Comment 18 2013-09-03 18:28:03 PDT

Comment on attachment 210285 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=210285&action=review I think this looks almost ready - just some minor comments. However, the fact that this is custom parsing code makes me nervous. How confident are we that our tests cover all ranges of correct and incorrect input? It would be nice if we had some stress tests for really bad input (e.g. malformed data:, encoded commas, etc). > Source/WebCore/html/parser/HTMLParserIdioms.cpp:322 > +static void parseImagesWithScaleFromSrcsetAttribute(const String& srcSetAttribute, ImageCandidates& imageCandidates) I guess for consistency "FromSrcsetAttribute" should be "FromSrcSetAttribute" > Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 > + // Collect a sequence of characters that are not space, it's the url. I don't think we need this comment. Or it should be "Find the next non-space character in the attribute." > Source/WebCore/html/parser/HTMLParserIdioms.cpp:350 > + // Collect a sequence of characters that are neither ',' nor space, it's the descriptor. > + size_t imageScaleStart = srcSetAttribute.find(isNotHTMLSpace, imageUrlEnd + 1); The comment doesn't match the code. isNotHTMLSpace doesn't check for ','.

Romain Perier

Comment 19 2013-09-04 01:09:26 PDT

(In reply to comment #18) > (From update of attachment 210285 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=210285&action=review > > I think this looks almost ready - just some minor comments. > > However, the fact that this is custom parsing code makes me nervous. How confident are we that our tests cover all ranges of correct and incorrect input? See fast/hidpi/image-srcset-invalid-inputs.html which contains a various range of invalid cases. However It does not include data: , perhaps I should write a separated unit test for data: (including valid and invalid cases). For valid inputs, we already have several unit tests. I could improve them by adding some weird valid inputs (containing a lot of spaces for example). What do you think ? My implementation follows the specification , the only difference is that it handles better invalid inputs not covered by the high level algorithm in the spec, nothing more, really :) > It would be nice if we had some stress tests for really bad input (e.g. malformed data:, encoded commas, etc). > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:322 > > +static void parseImagesWithScaleFromSrcsetAttribute(const String& srcSetAttribute, ImageCandidates& imageCandidates) > > I guess for consistency "FromSrcsetAttribute" should be "FromSrcSetAttribute" > sure > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 > > + // Collect a sequence of characters that are not space, it's the url. > > I don't think we need this comment. Or it should be "Find the next non-space character in the attribute." > This comment is for the block of code which extracts the url, not the for the line > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:350 > > + // Collect a sequence of characters that are neither ',' nor space, it's the descriptor. > > + size_t imageScaleStart = srcSetAttribute.find(isNotHTMLSpace, imageUrlEnd + 1); > > The comment doesn't match the code. isNotHTMLSpace doesn't check for ','. Same thing as the previous comment, it is for the block of code which extracts the descriptor, not the for the line. However if these comments are really useless, I can remove them, sure.

Romain Perier

Comment 20 2013-09-04 08:47:02 PDT

For really bad inputs I have something like the following test in mind: <html> <head> <script src="resources/srcset-helper.js"></script> </head> <body id="body"> <div>This test passes if this img tag below is empty and displays nothing. It ensures that the srcset attribute supports invalid inputs</div> <img height="100" width="100" src="" srcset="1x,, , x ,2x , foo.jpg, 3x, bar.jpg 4x 100h, foo.jpg 5, bar.jpg dx,foo.jpg,bar.jpg, data:image/png;base64,ThisIsAMalFormedData%/,withForbidden,Characters:!?~&#') 6x, data: 7x, data:;charset= 8x, data:;charset=us-ASCII 9x, data:,BadData 10x, data:foo/bar,foobar 11x, data:foo/bar, 12x"></img> </body> </html> The test passes. I can improve tests for valid inputs by adding some weird cases for data:URL , but nothing more (I have no other ideas)

Benjamin Poulain

Comment 21 2013-09-04 15:15:50 PDT

(In reply to comment #19) > (In reply to comment #18) > > (From update of attachment 210285 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=210285&action=review > > > > I think this looks almost ready - just some minor comments. > > > > However, the fact that this is custom parsing code makes me nervous. How confident are we that our tests cover all ranges of correct and incorrect input? > > See fast/hidpi/image-srcset-invalid-inputs.html which contains a various range of invalid cases. However It does not include data: , perhaps I should write a separated unit test for data: (including valid and invalid cases). For valid inputs, we already have several unit tests. I could improve them by adding some weird valid inputs (containing a lot of spaces for example). > > What do you think ? New separate test for data is a good idea. Probably at least 2 tests, one for various valid input, and one for bad input.

Benjamin Poulain

Comment 22 2013-09-04 15:44:13 PDT

Comment on attachment 210285 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=210285&action=review > Source/WebCore/html/parser/HTMLParserIdioms.cpp:325 > + unsigned srcSetLen = srcSetAttribute.length(); Len -> Length > Source/WebCore/html/parser/HTMLParserIdioms.cpp:328 > Blank line here. >>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 >>> + // Collect a sequence of characters that are not space, it's the url. >> >> I don't think we need this comment. Or it should be "Find the next non-space character in the attribute." > > This comment is for the block of code which extracts the url, not the for the line I think it is great to use spec text as comments. You should have "5." in front to make it more obvious. You could also have a comment linking the spec before the algorithm. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:330 > + size_t imageUrlStart = srcSetAttribute.find(isNotHTMLSpace, imageCandidateStart); "4. Splitting loop: Skip whitespace."? > Source/WebCore/html/parser/HTMLParserIdioms.cpp:333 > + imageCandidateStart = srcSetLen; > continue; Shouldn't you just break here and be done with it? Here all you will have is imageCandidateStart = srcSetLen, go back to the branch (imageCandidateStart < srcSetLen) and exit the loop. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:338 > + if (srcSetAttribute[imageUrlStart] == ',') { > + imageCandidateStart = imageUrlStart + 1; > continue; > + } Which cases does this cover? The spec says "Collect a sequence of characters that are not space characters, and let that be url.", so it is not clear why the initial comma would not be part of the URL. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:342 > + imageCandidateStart = srcSetLen; > + continue; "break"? > Source/WebCore/html/parser/HTMLParserIdioms.cpp:347 > + if (srcSetAttribute[imageUrlEnd - 1] == ',') { > + imageCandidateStart = imageUrlEnd; > + continue; > + } An empty descriptor should still get the URL in the raw candidates. The density would default to 1. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:349 > + // Collect a sequence of characters that are neither ',' nor space, it's the descriptor. 7. Gotta go, I'll finish the review on the next iteration.

Romain Perier

Comment 23 2013-09-05 04:15:48 PDT

Rapid reply (In reply to comment #22) > (From update of attachment 210285 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=210285&action=review > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:325 > > + unsigned srcSetLen = srcSetAttribute.length(); > > Len -> Length > OK > > >>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 > >>> + // Collect a sequence of characters that are not space, it's the url. > >> > >> I don't think we need this comment. Or it should be "Find the next non-space character in the attribute." > > > > This comment is for the block of code which extracts the url, not the for the line > > I think it is great to use spec text as comments. You should have "5." in front to make it more obvious. > > You could also have a comment linking the spec before the algorithm. Yep, I agree. Good idea. > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:330 > > + size_t imageUrlStart = srcSetAttribute.find(isNotHTMLSpace, imageCandidateStart); > > "4. Splitting loop: Skip whitespace."? +1 > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:333 > > + imageCandidateStart = srcSetLen; > > continue; > Right, good catch ! > Shouldn't you just break here and be done with it? > Here all you will have is imageCandidateStart = srcSetLen, go back to the branch (imageCandidateStart < srcSetLen) and exit the loop. > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:338 > > + if (srcSetAttribute[imageUrlStart] == ',') { > > + imageCandidateStart = imageUrlStart + 1; > > continue; > > + } > > Which cases does this cover? > The spec says "Collect a sequence of characters that are not space characters, and let that be url.", so it is not clear why the initial comma would not be part of the URL. It covers this case srcset="foo.jpg 1x,, bar.jpg 2x", or this one srcset="foo.jpg, , bar.jpg 2x" If the current candidate does not contain letters (!= space) , the first match will be a comma... if we don't do this step, the next url might be "," or even weird ",bar.jpg" (if the next element does not begin with a space). > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:342 > > + imageCandidateStart = srcSetLen; > > + continue; > > "break"? +1 > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:347 > > + if (srcSetAttribute[imageUrlEnd - 1] == ',') { > > + imageCandidateStart = imageUrlEnd; > > + continue; > > + } > > An empty descriptor should still get the URL in the raw candidates. > The density would default to 1. Have a look at the current parsing algorithm in the repository. It skips the current candidate if this one does not include a descriptor. I just wanted to avoid a regression. What do we do for this case ? 1. We follow the spec, 2. We suppose that implicit descriptor is a bad idea because it might be misunderstood by the web developer, it would be better to only accept explicit descriptors. As srcset might be a cross-browser feature, this is probably better to choose 1, imho. Benjamin ? Dean ? opinion ?

Benjamin Poulain

Comment 24 2013-09-05 10:27:50 PDT

(In reply to comment #23) > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:347 > > > + if (srcSetAttribute[imageUrlEnd - 1] == ',') { > > > + imageCandidateStart = imageUrlEnd; > > > + continue; > > > + } > > > > An empty descriptor should still get the URL in the raw candidates. > > The density would default to 1. > > Have a look at the current parsing algorithm in the repository. It skips the current candidate if this one does not include a descriptor. I just wanted to avoid a regression. > What do we do for this case ? 1. We follow the spec, 2. We suppose that implicit descriptor is a bad idea because it might be misunderstood by the web developer, it would be better to only accept explicit descriptors. > As srcset might be a cross-browser feature, this is probably better to choose 1, imho. Benjamin ? Dean ? opinion ? It is better to follow the spec. This is still an experimental feature. It is not a problem changing the behavior and updating the tests accordingly. Supporting candidates without descriptor seems like a sensible choice. I could see people ignoring "src" entirely and writing this kind of declaration: <img srcset="regular.png, highres.png 2x">

Romain Perier

Comment 25 2013-09-10 04:20:49 PDT

Created attachment 211189 [details] Patch

Romain Perier

Comment 26 2013-09-10 04:23:47 PDT

See the new patch in attachment: 1. I have improved tests 2. I have added new ones (for data uri schemes) 3. I have fixed what was wrong

Romain Perier

Comment 27 2013-09-10 13:05:41 PDT

Created attachment 211229 [details] Patch

Romain Perier

Comment 28 2013-09-10 13:07:18 PDT

Rebased with latest commits

Benjamin Poulain

Comment 29 2013-09-11 20:52:22 PDT

Comment on attachment 211229 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=211229&action=review This looks great. Some nitpick bellow. Let's iterate on this quickly so you can move on. > Source/WebCore/ChangeLog:7 > + You also updated the behaviors to follow the spec regarding data URLs. You should mention the behavior change in the description. > Source/WebCore/ChangeLog:8 > + No new tests, covered by existing tests. This is no longer true. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:323 > +// http://www.w3.org/html/wg/drafts/srcset/w3c-srcset/Overview.html#processing-the-image-candidates Draft change by definition. IMHO, it is good to link the published version with a date: http://www.w3.org/TR/2013/WD-html-srcset-20130228/#processing-the-image-candidates > Source/WebCore/html/parser/HTMLParserIdioms.cpp:324 > +// Note: Some items in comments below make reference to the spec (prefixed by numbers). You don't need this line, that is common enough in WebKit. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:334 > + // 5. Collect a sequence of characters that are not space characters, and let that be url. This should be moved to #344 now. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:336 > + size_t imageUrlStart = srcSetAttribute.find(isNotHTMLSpace, imageCandidateStart); // 4. Splitting loop: Skip whitespace. Move the comment above the block. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:343 > + // If The current candidate is either totally empty or only contains space, skipping. > + if (srcSetAttribute[imageUrlStart] == ',') { > + imageCandidateStart = imageUrlStart + 1; > continue; > + } I kind of agree with your reasoning here. It is probably worth mailing the standard group to have this clarified in the spec. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:344 > + size_t imageUrlEnd = srcSetAttribute.find(isHTMLSpace, imageUrlStart + 1); The comment 5. would be good here. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:346 > + if (imageUrlEnd == notFound) > + break; Not sure I agree with this. if you have srcset=" foobar.jpg", you would fail here. Is there a test covering such case? (*) > Source/WebCore/html/parser/HTMLParserIdioms.cpp:349 > + imageUrlEnd--; --imageUrlEnd; usually in WebKit. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:354 > + > + // 7. Collect a sequence of characters that are not "," (U+002C) characters, and let that be descriptors. > + You can remove the blank lines above and above and bellow the comment. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:357 > + size_t imageScaleStart = srcSetAttribute.find(isNotHTMLSpace, imageUrlEnd + 1); > + if (imageScaleStart == notFound) > + break; Same as for (*), I am not sure about this. You could have srcset=" foobar.jpg " Again, do we have tests for this? > Source/WebCore/html/parser/HTMLParserIdioms.cpp:365 > + // Be sure that there are no other descriptors. Be sure that -> Make sure there are no... > Source/WebCore/html/parser/HTMLParserIdioms.cpp:367 > + while ((separator < srcSetLength - 1) && isHTMLSpace(srcSetAttribute[separator])) > + separator++; I think this may be abusing the separator variable. I would use an other variable and assign it to separator once you found a comma. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:368 > + if ((separator < srcSetLength - 1) && srcSetAttribute[separator] != ',') { I would add a comment explaining this failure case. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:371 > + separator = srcSetLength; break? > Source/WebCore/html/parser/HTMLParserIdioms.cpp:381 > + imgScaleFactor = charactersToFloat(srcSetAttribute.characters() + imageScaleStart, imageScaleEnd - imageScaleStart - 1, &validScaleFactor); The -1 is not obvious here. To explain it, you can just add more temporary variables. E.g.: size_t scaleFactorStringLength = imageScaleEnd - imageScaleStart; size_t scaleFactorStringLengthWithoutUnit = scaleFactorStringLength - 1; imgScaleFactor = charactersToFloat(srcSetAttribute.characters() + imageScaleStart, scaleFactorStringLengthWithoutUnit, &validScaleFactor); > Source/WebCore/html/parser/HTMLParserIdioms.cpp:390 > - image.imageURL = decodeURLEscapeSequences(data[0]); > + image.imageURL = decodeURLEscapeSequences(StringImpl::createWithoutCopying(srcSetAttribute.characters() + imageUrlStart, imageUrlEnd - imageUrlStart)); I feel like you mentioned why createWithoutCopying is safe here but I forgot. Could you explain again?

Romain Perier

Comment 30 2013-09-13 04:27:49 PDT

Created attachment 211535 [details] Patch

Romain Perier

Comment 31 2013-09-13 05:19:15 PDT

Everything should be fixed. > I feel like you mentioned why createWithoutCopying is safe here but I forgot. Could you explain again? That's simple: - StringImpl:createWithoutCopying() creates a StringImpl which shares the same buffer as srcsetAttribute - String is created on the fly from the StringImpl object, it will delegate string features to it and share the same buffer, as well - decodeURLEscapeSequences returns a copy of the input String in all cases. So the resulting image.imageURL is a copy and does not share memory :)

Yoav Weiss

Comment 32 2013-09-13 13:51:00 PDT

There's no need to decodeURLEscapeSequences() the output URL. Data URIs work fine with the new spec-compliant parser (which splits the attribute value according to white spaces, rather than commas)

Romain Perier

Comment 33 2013-09-13 13:53:49 PDT

The question being do we need to support escaped data:URLs or not , I would say yes

Yoav Weiss

Comment 34 2013-09-13 14:13:05 PDT

> The question being do we need to support escaped data:URLs or not , I would say yes Escaped data URIs don't seem to be supported on src on neither Blink nor Gecko.

Benjamin Poulain

Comment 35 2013-09-16 21:07:31 PDT

Comment on attachment 211535 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=211535&action=review This looks great. Some minor comments bellow. Can you please also add tests for a srcset descriptor with spaces around (if there is no such test yet). Something like srcset=" foobar.jpg", srcset="fobar.jpg " and srcset=" foobar.jpg ". (+ the variations with a scale factor if needed). > Source/WebCore/html/parser/HTMLParserIdioms.cpp:361 > + size_t comma = imageScaleEnd; comma -> commaPosition > Source/WebCore/html/parser/HTMLParserIdioms.cpp:362 > + // Make sure there are no other descriptors Missing period. > Source/WebCore/html/parser/HTMLParserIdioms.cpp:391 > + image.imageURL = decodeURLEscapeSequences(StringImpl::createWithoutCopying(srcSetAttribute.characters() + imageUrlStart, imageUrlEnd - imageUrlStart)); Please check decodeURLEscapeSequences and do a follow up if the current behavior is erroneous.

Romain Perier

Comment 36 2013-09-17 08:37:26 PDT

Created attachment 211910 [details] Patch

Romain Perier

Comment 37 2013-09-17 08:44:40 PDT

As discussed together, see the spec for data:URLs with escaped characters http://tools.ietf.org/html/rfc2397 (sorry this is the only document I found...). Also I tested the following example with different browsers: " <html> <img src="data:image/svg+xml,<svg%20xmlns='http://www.w3.org/2000/svg'%20width='100'%20height='100'><linearGradient%20id='gradient'><stop%20offset='10%'%20stop-color='#F00'/><stop%20offset='90%'%20stop-color='#fcc'/></linearGradient> <rect%20fill='url(#gradient)'%20x='0'%20y='0'%20width='100%'%20height='100%'/></svg>"/> </html> " Results: 1. It works with chromium/blink 29 2. It does not work with firefox 23/24 3. It does not work with IE 10 I think we should support this feature for src and srcset attributes. Also, as it's not syntaxically possible to support plain/text svgs in data:URLs (a svg contains space, and space is a delimiter for URLs), escaping spaces would be a good solution, imho.

WebKit Commit Bot

Comment 38 2013-09-17 13:05:43 PDT

Comment on attachment 211910 [details] Patch Clearing flags on attachment: 211910 Committed r155988: <http://trac.webkit.org/changeset/155988>

WebKit Commit Bot

Comment 39 2013-09-17 13:05:49 PDT

All reviewed patches have been landed. Closing bug.

Attachments
Patch (4.66 KB, patch) 2013-08-19 12:24 PDT, Romain Perier	no flags	Details Formatted Diff Diff
Patch (7.82 KB, patch) 2013-08-27 10:23 PDT, Romain Perier	no flags	Details Formatted Diff Diff
Patch (9.99 KB, patch) 2013-09-02 03:59 PDT, Romain Perier	no flags	Details Formatted Diff Diff
Patch (10.12 KB, patch) 2013-09-02 07:51 PDT, Romain Perier	no flags	Details Formatted Diff Diff
Patch (17.71 KB, patch) 2013-09-10 04:20 PDT, Romain Perier	no flags	Details Formatted Diff Diff
Patch (17.94 KB, patch) 2013-09-10 13:05 PDT, Romain Perier	no flags	Details Formatted Diff Diff
Patch (23.92 KB, patch) 2013-09-13 04:27 PDT, Romain Perier	no flags	Details Formatted Diff Diff
Patch (30.14 KB, patch) 2013-09-17 08:37 PDT, Romain Perier	no flags	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.