WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
119423
Improve srcset parser
https://bugs.webkit.org/show_bug.cgi?id=119423
Summary
Improve srcset parser
Dean Jackson
Reported
2013-08-01 19:09:48 PDT
The srcset parser creates intermediate arrays of strings (multiple times). We should make a better parser than walks along the attribute value to extract the candidates.
Attachments
Patch
(4.66 KB, patch)
2013-08-19 12:24 PDT
,
Romain Perier
no flags
Details
Formatted Diff
Diff
Patch
(7.82 KB, patch)
2013-08-27 10:23 PDT
,
Romain Perier
no flags
Details
Formatted Diff
Diff
Patch
(9.99 KB, patch)
2013-09-02 03:59 PDT
,
Romain Perier
no flags
Details
Formatted Diff
Diff
Patch
(10.12 KB, patch)
2013-09-02 07:51 PDT
,
Romain Perier
no flags
Details
Formatted Diff
Diff
Patch
(17.71 KB, patch)
2013-09-10 04:20 PDT
,
Romain Perier
no flags
Details
Formatted Diff
Diff
Patch
(17.94 KB, patch)
2013-09-10 13:05 PDT
,
Romain Perier
no flags
Details
Formatted Diff
Diff
Patch
(23.92 KB, patch)
2013-09-13 04:27 PDT
,
Romain Perier
no flags
Details
Formatted Diff
Diff
Patch
(30.14 KB, patch)
2013-09-17 08:37 PDT
,
Romain Perier
no flags
Details
Formatted Diff
Diff
Show Obsolete
(7)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2013-08-01 19:09:57 PDT
<
rdar://problem/14628284
>
Dean Jackson
Comment 2
2013-08-01 19:10:22 PDT
Note that the parser is duplicated in the preload scanner and the image element.
Dean Jackson
Comment 3
2013-08-12 14:14:40 PDT
(In reply to
comment #2
)
> Note that the parser is duplicated in the preload scanner and the image element.
It's not duplicated - ignore this. Still needs improvement though :)
Benjamin Poulain
Comment 4
2013-08-12 14:35:11 PDT
(In reply to
comment #3
)
> (In reply to
comment #2
) > > Note that the parser is duplicated in the preload scanner and the image element. > > It's not duplicated - ignore this. Still needs improvement though :)
Romain is getting started on this, he emailed me yesterday.
Romain Perier
Comment 5
2013-08-19 12:24:54 PDT
Created
attachment 209108
[details]
Patch
Romain Perier
Comment 6
2013-08-19 12:25:29 PDT
See a first proposal in attachment
Benjamin Poulain
Comment 7
2013-08-20 01:22:34 PDT
Comment on
attachment 209108
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=209108&action=review
I am going to bed, I'll finish the review on the next iteration :) General comments: -This looks very efficient allocation wise. (did you benchmark it?) -You should have more variables with better names instead of reusing index and from. -If you can avoid loading a character twice from memory, that is great. -We should probably extract the whole parsing code in a static function with a explanative name (parseImagesWithScaleFromSrcsetAttribute()?). That's usually how we document things in WebKit.
> Source/WebCore/ChangeLog:14 > + when parsing. That is more efficient on ARM devices and avoids trashing the data cache.
You can probably drop the "on ARM". I have the feeling this is gonna be faster everywhere.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:325 > + size_t from = 0;
"from" is a little generic. What about imageCandidateStringStart?
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:328 >
Extra space.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 > + if (from >= srcSetAttribute.length())
You could keep srcSetAttribute.length() in a temporary variable outside the loop. This would avoid querying for the string impl for each iterations.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:330 > + break;
Since this is the only break condition for the loop, why not have while(from >= srcSetAttributeLength)?
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:335 > + // Collect a sequence of characters that are not space, it's the url
Missing period.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:340 > + size_t index = srcSetAttribute.find(isNotHTMLSpace, from); > + if (index >= separator) { > + from = separator + 1; > continue; > - if (!data.last().endsWith('x')) > + }
I would start with this, then find the separator. The reason has to do with how you read the memory: Let's say you have " foo.jpg 2x". When executing "srcSetAttribute.find(',', from)", you will go over the spaces and find the separator. Then, you execute "srcSetAttribute.find(isNotHTMLSpace, from)" which will read the memory again from the beginning to find the character 'f'. If you inverses the operations. You first skip all the HTMLSpace. Then you never have to go over them again. You take the position you found, and you look for the separator: "srcSetAttribute.find(',', index).
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:341 > + from = index;
index is too generic. The previous "index" is "startOfTheURLString"
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:342 > + index = srcSetAttribute.find(isHTMLSpace, index);
index -> endOfTheURLString? You could also imagine doing this before finding the separator (to avoid loading the same characters again from memory). In that case, "foo,bar.jpg 2x, would parse "foo,bar.jpg" as the URL. I don't know if that is okay? (hum, do we have tests for that kind of input anyway?)
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:347 > + size_t url = from, spaceSep = index;
Each statement should get its own line. (WebKit coding style)
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:349 > + // Collect a sequence of characters that are neither "," nor space, it's the descriptor
Missing period.
Romain Perier
Comment 8
2013-08-27 10:23:40 PDT
Created
attachment 209785
[details]
Patch
Romain Perier
Comment 9
2013-08-27 10:29:25 PDT
This patch should fit your needs
Benjamin Poulain
Comment 10
2013-08-27 20:53:40 PDT
Comment on
attachment 209785
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=209785&action=review
I just had a quick look but this seems good. You need to rebaseline though. The patch does not apply on the bots.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:328 >
You can remove this space.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:360 > + imageScaleEnd = (imageScaleEnd == notFound) ? srcSetLen: imageScaleEnd;
missing space before :
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:365 > + separator = srcSetAttribute.find(",", separator + 1);
"," -> ','
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:384 > + image.imageURL = decodeURLEscapeSequences(StringImpl::createWithoutCopying(srcSetAttribute.characters() + imageUrlStart, imageUrlEnd - imageUrlStart));
You could do without the StirngImpl::createWithoutCopying(). It is not gonna be a big deal here to create a single string, and it will be safer not to share the memory.
Dean Jackson
Comment 11
2013-08-28 14:27:49 PDT
Great stuff Romain. One area I'm especially interested in is data: URLs inside srcset. Ted tells me the spec says that we shouldn't universally split on commas. I wonder if we need more tests of this. Similarly, the inline URL might be a data: SVG which has inline commas as well.
Romain Perier
Comment 12
2013-08-29 05:22:40 PDT
(In reply to
comment #11
)
> Great stuff Romain. > > One area I'm especially interested in is data: URLs inside srcset.
> Mhhh right
> Ted tells me the spec says that we shouldn't universally split on commas.
Yes, but you can detect and skip invalid candidates (and switch to another one)
> I wonder if we need more tests of this. Similarly, the inline URL might be a >data: SVG which has inline commas as well.
It's an issue that I need to resolve, good catch. I will look at it.
Theresa O'Connor
Comment 13
2013-08-29 09:41:08 PDT
If your implementation follows the "Processing the image candidates" algorithm as specified, you won't have an issue with commas in data URLs.
http://www.whatwg.org/specs/web-apps/current-work/multipage/embedded-content-1.html#processing-the-image-candidates
http://www.w3.org/html/wg/drafts/srcset/w3c-srcset/#processing-the-image-candidates
Romain Perier
Comment 14
2013-09-02 03:59:37 PDT
Created
attachment 210274
[details]
Patch
Romain Perier
Comment 15
2013-09-02 04:17:57 PDT
(In reply to
comment #11
)
> Great stuff Romain. > > One area I'm especially interested in is data: URLs inside srcset. Ted tells me the spec says that we shouldn't universally split on commas. I wonder if we need more tests of this. Similarly, the inline URL might be a data: SVG which has inline commas as well.
Btw, as discussed with Benjamin an SVG might contain commas but also spaces... (the URL delimiter). The parser can handle several commas in the data but not space as it's the end of the url... (how do you make difference between a normal space and the space delimiter ? And yes I have fallowed the specification)
Romain Perier
Comment 16
2013-09-02 07:51:11 PDT
Created
attachment 210285
[details]
Patch
Dean Jackson
Comment 17
2013-09-03 18:19:04 PDT
(In reply to
comment #15
)
> (In reply to
comment #11
) > > Great stuff Romain. > > > > One area I'm especially interested in is data: URLs inside srcset. Ted tells me the spec says that we shouldn't universally split on commas. I wonder if we need more tests of this. Similarly, the inline URL might be a data: SVG which has inline commas as well. > > Btw, as discussed with Benjamin an SVG might contain commas but also spaces... (the URL delimiter). The parser can handle several commas in the data but not space as it's the end of the url... (how do you make difference between a normal space and the space delimiter ? And yes I have fallowed the specification)
I guess there are limits to what data: content we can put in a srcset attribute without having to escape things. I don' t think we should worry too much about it.
Dean Jackson
Comment 18
2013-09-03 18:28:03 PDT
Comment on
attachment 210285
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=210285&action=review
I think this looks almost ready - just some minor comments. However, the fact that this is custom parsing code makes me nervous. How confident are we that our tests cover all ranges of correct and incorrect input? It would be nice if we had some stress tests for really bad input (e.g. malformed data:, encoded commas, etc).
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:322 > +static void parseImagesWithScaleFromSrcsetAttribute(const String& srcSetAttribute, ImageCandidates& imageCandidates)
I guess for consistency "FromSrcsetAttribute" should be "FromSrcSetAttribute"
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 > + // Collect a sequence of characters that are not space, it's the url.
I don't think we need this comment. Or it should be "Find the next non-space character in the attribute."
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:350 > + // Collect a sequence of characters that are neither ',' nor space, it's the descriptor. > + size_t imageScaleStart = srcSetAttribute.find(isNotHTMLSpace, imageUrlEnd + 1);
The comment doesn't match the code. isNotHTMLSpace doesn't check for ','.
Romain Perier
Comment 19
2013-09-04 01:09:26 PDT
(In reply to
comment #18
)
> (From update of
attachment 210285
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=210285&action=review
> > I think this looks almost ready - just some minor comments. > > However, the fact that this is custom parsing code makes me nervous. How confident are we that our tests cover all ranges of correct and incorrect input?
See fast/hidpi/image-srcset-invalid-inputs.html which contains a various range of invalid cases. However It does not include data: , perhaps I should write a separated unit test for data: (including valid and invalid cases). For valid inputs, we already have several unit tests. I could improve them by adding some weird valid inputs (containing a lot of spaces for example). What do you think ? My implementation follows the specification , the only difference is that it handles better invalid inputs not covered by the high level algorithm in the spec, nothing more, really :)
> It would be nice if we had some stress tests for really bad input (e.g. malformed data:, encoded commas, etc). > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:322 > > +static void parseImagesWithScaleFromSrcsetAttribute(const String& srcSetAttribute, ImageCandidates& imageCandidates) > > I guess for consistency "FromSrcsetAttribute" should be "FromSrcSetAttribute"
> sure
> > Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 > > + // Collect a sequence of characters that are not space, it's the url. > > I don't think we need this comment. Or it should be "Find the next non-space character in the attribute." >
This comment is for the block of code which extracts the url, not the for the line
> > Source/WebCore/html/parser/HTMLParserIdioms.cpp:350 > > + // Collect a sequence of characters that are neither ',' nor space, it's the descriptor. > > + size_t imageScaleStart = srcSetAttribute.find(isNotHTMLSpace, imageUrlEnd + 1); > > The comment doesn't match the code. isNotHTMLSpace doesn't check for ','.
Same thing as the previous comment, it is for the block of code which extracts the descriptor, not the for the line. However if these comments are really useless, I can remove them, sure.
Romain Perier
Comment 20
2013-09-04 08:47:02 PDT
For really bad inputs I have something like the following test in mind: <html> <head> <script src="resources/srcset-helper.js"></script> </head> <body id="body"> <div>This test passes if this img tag below is empty and displays nothing. It ensures that the srcset attribute supports invalid inputs</div> <img height="100" width="100" src="" srcset="1x,, , x ,2x , foo.jpg, 3x, bar.jpg 4x 100h, foo.jpg 5, bar.jpg dx,foo.jpg,bar.jpg, data:image/png;base64,ThisIsAMalFormedData%/,withForbidden,Characters:!?~&#') 6x, data: 7x, data:;charset= 8x, data:;charset=us-ASCII 9x, data:,BadData 10x, data:foo/bar,foobar 11x, data:foo/bar, 12x"></img> </body> </html> The test passes. I can improve tests for valid inputs by adding some weird cases for data:URL , but nothing more (I have no other ideas)
Benjamin Poulain
Comment 21
2013-09-04 15:15:50 PDT
(In reply to
comment #19
)
> (In reply to
comment #18
) > > (From update of
attachment 210285
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=210285&action=review
> > > > I think this looks almost ready - just some minor comments. > > > > However, the fact that this is custom parsing code makes me nervous. How confident are we that our tests cover all ranges of correct and incorrect input? > > See fast/hidpi/image-srcset-invalid-inputs.html which contains a various range of invalid cases. However It does not include data: , perhaps I should write a separated unit test for data: (including valid and invalid cases). For valid inputs, we already have several unit tests. I could improve them by adding some weird valid inputs (containing a lot of spaces for example). > > What do you think ?
New separate test for data is a good idea. Probably at least 2 tests, one for various valid input, and one for bad input.
Benjamin Poulain
Comment 22
2013-09-04 15:44:13 PDT
Comment on
attachment 210285
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=210285&action=review
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:325 > + unsigned srcSetLen = srcSetAttribute.length();
Len -> Length
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:328 >
Blank line here.
>>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 >>> + // Collect a sequence of characters that are not space, it's the url. >> >> I don't think we need this comment. Or it should be "Find the next non-space character in the attribute." > > This comment is for the block of code which extracts the url, not the for the line
I think it is great to use spec text as comments. You should have "5." in front to make it more obvious. You could also have a comment linking the spec before the algorithm.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:330 > + size_t imageUrlStart = srcSetAttribute.find(isNotHTMLSpace, imageCandidateStart);
"4. Splitting loop: Skip whitespace."?
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:333 > + imageCandidateStart = srcSetLen; > continue;
Shouldn't you just break here and be done with it? Here all you will have is imageCandidateStart = srcSetLen, go back to the branch (imageCandidateStart < srcSetLen) and exit the loop.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:338 > + if (srcSetAttribute[imageUrlStart] == ',') { > + imageCandidateStart = imageUrlStart + 1; > continue; > + }
Which cases does this cover? The spec says "Collect a sequence of characters that are not space characters, and let that be url.", so it is not clear why the initial comma would not be part of the URL.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:342 > + imageCandidateStart = srcSetLen; > + continue;
"break"?
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:347 > + if (srcSetAttribute[imageUrlEnd - 1] == ',') { > + imageCandidateStart = imageUrlEnd; > + continue; > + }
An empty descriptor should still get the URL in the raw candidates. The density would default to 1.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:349 > + // Collect a sequence of characters that are neither ',' nor space, it's the descriptor.
7. Gotta go, I'll finish the review on the next iteration.
Romain Perier
Comment 23
2013-09-05 04:15:48 PDT
Rapid reply (In reply to
comment #22
)
> (From update of
attachment 210285
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=210285&action=review
> > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:325 > > + unsigned srcSetLen = srcSetAttribute.length(); > > Len -> Length >
OK
> > >>> Source/WebCore/html/parser/HTMLParserIdioms.cpp:329 > >>> + // Collect a sequence of characters that are not space, it's the url. > >> > >> I don't think we need this comment. Or it should be "Find the next non-space character in the attribute." > > > > This comment is for the block of code which extracts the url, not the for the line > > I think it is great to use spec text as comments. You should have "5." in front to make it more obvious. > > You could also have a comment linking the spec before the algorithm.
Yep, I agree. Good idea.
> > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:330 > > + size_t imageUrlStart = srcSetAttribute.find(isNotHTMLSpace, imageCandidateStart); > > "4. Splitting loop: Skip whitespace."?
+1
> > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:333 > > + imageCandidateStart = srcSetLen; > > continue; >
Right, good catch !
> Shouldn't you just break here and be done with it? > Here all you will have is imageCandidateStart = srcSetLen, go back to the branch (imageCandidateStart < srcSetLen) and exit the loop. > > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:338 > > + if (srcSetAttribute[imageUrlStart] == ',') { > > + imageCandidateStart = imageUrlStart + 1; > > continue; > > + } > > Which cases does this cover? > The spec says "Collect a sequence of characters that are not space characters, and let that be url.", so it is not clear why the initial comma would not be part of the URL.
It covers this case srcset="foo.jpg 1x,, bar.jpg 2x", or this one srcset="foo.jpg, , bar.jpg 2x" If the current candidate does not contain letters (!= space) , the first match will be a comma... if we don't do this step, the next url might be "," or even weird ",bar.jpg" (if the next element does not begin with a space).
> > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:342 > > + imageCandidateStart = srcSetLen; > > + continue; > > "break"?
+1
> > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:347 > > + if (srcSetAttribute[imageUrlEnd - 1] == ',') { > > + imageCandidateStart = imageUrlEnd; > > + continue; > > + } > > An empty descriptor should still get the URL in the raw candidates. > The density would default to 1.
Have a look at the current parsing algorithm in the repository. It skips the current candidate if this one does not include a descriptor. I just wanted to avoid a regression. What do we do for this case ? 1. We follow the spec, 2. We suppose that implicit descriptor is a bad idea because it might be misunderstood by the web developer, it would be better to only accept explicit descriptors. As srcset might be a cross-browser feature, this is probably better to choose 1, imho. Benjamin ? Dean ? opinion ?
Benjamin Poulain
Comment 24
2013-09-05 10:27:50 PDT
(In reply to
comment #23
)
> > > Source/WebCore/html/parser/HTMLParserIdioms.cpp:347 > > > + if (srcSetAttribute[imageUrlEnd - 1] == ',') { > > > + imageCandidateStart = imageUrlEnd; > > > + continue; > > > + } > > > > An empty descriptor should still get the URL in the raw candidates. > > The density would default to 1. > > Have a look at the current parsing algorithm in the repository. It skips the current candidate if this one does not include a descriptor. I just wanted to avoid a regression. > What do we do for this case ? 1. We follow the spec, 2. We suppose that implicit descriptor is a bad idea because it might be misunderstood by the web developer, it would be better to only accept explicit descriptors. > As srcset might be a cross-browser feature, this is probably better to choose 1, imho. Benjamin ? Dean ? opinion ?
It is better to follow the spec. This is still an experimental feature. It is not a problem changing the behavior and updating the tests accordingly. Supporting candidates without descriptor seems like a sensible choice. I could see people ignoring "src" entirely and writing this kind of declaration: <img srcset="regular.png, highres.png 2x">
Romain Perier
Comment 25
2013-09-10 04:20:49 PDT
Created
attachment 211189
[details]
Patch
Romain Perier
Comment 26
2013-09-10 04:23:47 PDT
See the new patch in attachment: 1. I have improved tests 2. I have added new ones (for data uri schemes) 3. I have fixed what was wrong
Romain Perier
Comment 27
2013-09-10 13:05:41 PDT
Created
attachment 211229
[details]
Patch
Romain Perier
Comment 28
2013-09-10 13:07:18 PDT
Rebased with latest commits
Benjamin Poulain
Comment 29
2013-09-11 20:52:22 PDT
Comment on
attachment 211229
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=211229&action=review
This looks great. Some nitpick bellow. Let's iterate on this quickly so you can move on.
> Source/WebCore/ChangeLog:7 > +
You also updated the behaviors to follow the spec regarding data URLs. You should mention the behavior change in the description.
> Source/WebCore/ChangeLog:8 > + No new tests, covered by existing tests.
This is no longer true.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:323 > +//
http://www.w3.org/html/wg/drafts/srcset/w3c-srcset/Overview.html#processing-the-image-candidates
Draft change by definition. IMHO, it is good to link the published version with a date:
http://www.w3.org/TR/2013/WD-html-srcset-20130228/#processing-the-image-candidates
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:324 > +// Note: Some items in comments below make reference to the spec (prefixed by numbers).
You don't need this line, that is common enough in WebKit.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:334 > + // 5. Collect a sequence of characters that are not space characters, and let that be url.
This should be moved to #344 now.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:336 > + size_t imageUrlStart = srcSetAttribute.find(isNotHTMLSpace, imageCandidateStart); // 4. Splitting loop: Skip whitespace.
Move the comment above the block.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:343 > + // If The current candidate is either totally empty or only contains space, skipping. > + if (srcSetAttribute[imageUrlStart] == ',') { > + imageCandidateStart = imageUrlStart + 1; > continue; > + }
I kind of agree with your reasoning here. It is probably worth mailing the standard group to have this clarified in the spec.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:344 > + size_t imageUrlEnd = srcSetAttribute.find(isHTMLSpace, imageUrlStart + 1);
The
comment 5
. would be good here.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:346 > + if (imageUrlEnd == notFound) > + break;
Not sure I agree with this. if you have srcset=" foobar.jpg", you would fail here. Is there a test covering such case? (*)
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:349 > + imageUrlEnd--;
--imageUrlEnd; usually in WebKit.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:354 > + > + // 7. Collect a sequence of characters that are not "," (U+002C) characters, and let that be descriptors. > +
You can remove the blank lines above and above and bellow the comment.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:357 > + size_t imageScaleStart = srcSetAttribute.find(isNotHTMLSpace, imageUrlEnd + 1); > + if (imageScaleStart == notFound) > + break;
Same as for (*), I am not sure about this. You could have srcset=" foobar.jpg " Again, do we have tests for this?
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:365 > + // Be sure that there are no other descriptors.
Be sure that -> Make sure there are no...
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:367 > + while ((separator < srcSetLength - 1) && isHTMLSpace(srcSetAttribute[separator])) > + separator++;
I think this may be abusing the separator variable. I would use an other variable and assign it to separator once you found a comma.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:368 > + if ((separator < srcSetLength - 1) && srcSetAttribute[separator] != ',') {
I would add a comment explaining this failure case.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:371 > + separator = srcSetLength;
break?
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:381 > + imgScaleFactor = charactersToFloat(srcSetAttribute.characters() + imageScaleStart, imageScaleEnd - imageScaleStart - 1, &validScaleFactor);
The -1 is not obvious here. To explain it, you can just add more temporary variables. E.g.: size_t scaleFactorStringLength = imageScaleEnd - imageScaleStart; size_t scaleFactorStringLengthWithoutUnit = scaleFactorStringLength - 1; imgScaleFactor = charactersToFloat(srcSetAttribute.characters() + imageScaleStart, scaleFactorStringLengthWithoutUnit, &validScaleFactor);
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:390 > - image.imageURL = decodeURLEscapeSequences(data[0]); > + image.imageURL = decodeURLEscapeSequences(StringImpl::createWithoutCopying(srcSetAttribute.characters() + imageUrlStart, imageUrlEnd - imageUrlStart));
I feel like you mentioned why createWithoutCopying is safe here but I forgot. Could you explain again?
Romain Perier
Comment 30
2013-09-13 04:27:49 PDT
Created
attachment 211535
[details]
Patch
Romain Perier
Comment 31
2013-09-13 05:19:15 PDT
Everything should be fixed.
> I feel like you mentioned why createWithoutCopying is safe here but I forgot. Could you explain again?
That's simple: - StringImpl:createWithoutCopying() creates a StringImpl which shares the same buffer as srcsetAttribute - String is created on the fly from the StringImpl object, it will delegate string features to it and share the same buffer, as well - decodeURLEscapeSequences returns a copy of the input String in all cases. So the resulting image.imageURL is a copy and does not share memory :)
Yoav Weiss
Comment 32
2013-09-13 13:51:00 PDT
There's no need to decodeURLEscapeSequences() the output URL. Data URIs work fine with the new spec-compliant parser (which splits the attribute value according to white spaces, rather than commas)
Romain Perier
Comment 33
2013-09-13 13:53:49 PDT
The question being do we need to support escaped data:URLs or not , I would say yes
Yoav Weiss
Comment 34
2013-09-13 14:13:05 PDT
> The question being do we need to support escaped data:URLs or not , I would say yes
Escaped data URIs don't seem to be supported on src on neither Blink nor Gecko.
Benjamin Poulain
Comment 35
2013-09-16 21:07:31 PDT
Comment on
attachment 211535
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=211535&action=review
This looks great. Some minor comments bellow. Can you please also add tests for a srcset descriptor with spaces around (if there is no such test yet). Something like srcset=" foobar.jpg", srcset="fobar.jpg " and srcset=" foobar.jpg ". (+ the variations with a scale factor if needed).
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:361 > + size_t comma = imageScaleEnd;
comma -> commaPosition
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:362 > + // Make sure there are no other descriptors
Missing period.
> Source/WebCore/html/parser/HTMLParserIdioms.cpp:391 > + image.imageURL = decodeURLEscapeSequences(StringImpl::createWithoutCopying(srcSetAttribute.characters() + imageUrlStart, imageUrlEnd - imageUrlStart));
Please check decodeURLEscapeSequences and do a follow up if the current behavior is erroneous.
Romain Perier
Comment 36
2013-09-17 08:37:26 PDT
Created
attachment 211910
[details]
Patch
Romain Perier
Comment 37
2013-09-17 08:44:40 PDT
As discussed together, see the spec for data:URLs with escaped characters
http://tools.ietf.org/html/rfc2397
(sorry this is the only document I found...). Also I tested the following example with different browsers: " <html> <img src="data:image/svg+xml,<svg%20xmlns='
http://www.w3.org/2000/svg'%20width='100'%20height='100
'><linearGradient%20id='gradient'><stop%20offset='10%'%20stop-color='#F00'/><stop%20offset='90%'%20stop-color='#fcc'/></linearGradient> <rect%20fill='url(#gradient)'%20x='0'%20y='0'%20width='100%'%20height='100%'/></svg>"/> </html> " Results: 1. It works with chromium/blink 29 2. It does not work with firefox 23/24 3. It does not work with IE 10 I think we should support this feature for src and srcset attributes. Also, as it's not syntaxically possible to support plain/text svgs in data:URLs (a svg contains space, and space is a delimiter for URLs), escaping spaces would be a good solution, imho.
WebKit Commit Bot
Comment 38
2013-09-17 13:05:43 PDT
Comment on
attachment 211910
[details]
Patch Clearing flags on attachment: 211910 Committed
r155988
: <
http://trac.webkit.org/changeset/155988
>
WebKit Commit Bot
Comment 39
2013-09-17 13:05:49 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug