Bug 11108

Summary: Replace usage of split by proper parsers
Product: WebKit Reporter: Rob Buis <rwlbuis>
Component: SVGAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal    
Priority: P2    
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
Bug Depends on:    
Bug Blocks: 11908    
Attachments:
Description Flags
First attempt
none
Probably better long-term solution
none
Using UChar
none
Improved patch
none
Slightly cleaned up patch
mitz: review-
Compiles with ToT
none
Completely remove split usage on ToT
none
Smaller patch
none
Better diff eric: review+

Rob Buis
Reported 2006-10-01 04:48:58 PDT
See http://www.kevlindev.com/gui/utilities/viewbox/viewBox.svg. When the aspect ratio params change to empty, it crashes.
Attachments
First attempt (8.04 KB, patch)
2006-10-01 05:05 PDT, Rob Buis
no flags
Probably better long-term solution (24.57 KB, patch)
2006-10-02 13:49 PDT, Rob Buis
no flags
Using UChar (30.72 KB, patch)
2006-10-03 06:03 PDT, Rob Buis
no flags
Improved patch (134.37 KB, patch)
2006-10-04 12:23 PDT, Rob Buis
no flags
Slightly cleaned up patch (177.32 KB, patch)
2006-10-06 01:10 PDT, Rob Buis
mitz: review-
Compiles with ToT (121.51 KB, patch)
2006-12-22 13:45 PST, Rob Buis
no flags
Completely remove split usage on ToT (134.34 KB, patch)
2006-12-24 05:46 PST, Rob Buis
no flags
Smaller patch (50.73 KB, patch)
2006-12-27 12:26 PST, Rob Buis
no flags
Better diff (44.48 KB, patch)
2006-12-27 12:54 PST, Rob Buis
eric: review+
Rob Buis
Comment 1 2006-10-01 05:05:26 PDT
Created attachment 10854 [details] First attempt A simple fix... Cheers, Rob.
Eric Seidel (no email)
Comment 2 2006-10-01 05:10:33 PDT
Comment on attachment 10854 [details] First attempt Looks fine. You should probably search for all other uses of DeprecatedString::split. There may be other crashing cases. Do we adequately protect against having only 1 or 2 or 3 params? Using DeprecatedStringList::split is really bad practice, since it copies the string for parsing, instead of just walking it. But at least this fixes the crash for now.
Rob Buis
Comment 3 2006-10-02 13:47:21 PDT
Hi, (In reply to comment #2) > (From update of attachment 10854 [details] [edit]) > Looks fine. > > You should probably search for all other uses of DeprecatedString::split. > There may be other crashing cases. Do we adequately protect against having > only 1 or 2 or 3 params? > > Using DeprecatedStringList::split is really bad practice, since it copies the > string for parsing, instead of just walking it. But at least this fixes the > crash for now. Nice to see the r+, however I think we may as well go for the real fix, which is removing use of split() and do proper parsers. I'll attach a patch for this, then maybe we can discuss the proper action. I think it is good that we have this bug reminding us of one of the current (split()) problems though. cheers, Rob.
Rob Buis
Comment 4 2006-10-02 13:49:56 PDT
Created attachment 10872 [details] Probably better long-term solution Here is the current state of my patch to get rid of split(). It needs speed testing, but it should be quicker since the copies are gone and less methods (like endsWith()) are called. Cheers, Rob.
Darin Adler
Comment 5 2006-10-02 19:39:29 PDT
Comment on attachment 10872 [details] Probably better long-term solution Why the conversion to char? The other functions could just as well take UChar* as char*, and converting to char just means you have to do additional memory allocation.
Rob Buis
Comment 6 2006-10-03 06:03:46 PDT
Created attachment 10880 [details] Using UChar Now I use UChar * instead of char* as suggested by Darin. Cheers, Rob.
Eric Seidel (no email)
Comment 7 2006-10-03 06:18:25 PDT
Comment on attachment 10880 [details] Using UChar These: + if (*currParam == 'd') { + if (!((eoString - currParam) > 4) && + currParam[1] == 'e' && + currParam[2] == 'f' && + currParam[3] == 'e' && + currParam[4] == 'r') + goto bail_out; + currParam += 5; + skipOptionalSpaces(currParam); } or at least part, should be abstracted into a static inline. maybe just this part: + if (!((eoString - currParam) > 4) && + currParam[1] == 'e' && + currParam[2] == 'f' && + currParam[3] == 'e' && + currParam[4] == 'r') Sleep now. I can look at the rest later.
Rob Buis
Comment 8 2006-10-04 12:23:53 PDT
Created attachment 10908 [details] Improved patch This patch is much more complete and removes split from use in ksvg. Also added and changed some testcases. Let me know whether there should be more tests. Cheers, Rob.
Rob Buis
Comment 9 2006-10-06 01:10:08 PDT
Created attachment 10944 [details] Slightly cleaned up patch This one has some nicer code in some parsing sections. Also I fixed obvious code style bugs. Cheers, Rob.
Maciej Stachowiak
Comment 10 2006-10-09 05:26:03 PDT
Don't forget to remove the conflict marker: +>>>>>>> .r16800
mitz
Comment 11 2006-10-20 07:30:53 PDT
Comment on attachment 10944 [details] Slightly cleaned up patch I didn't read the entire patch, but here are some observations: Conflict markers: +>>>>>>> .r16800 Typo: + - use functions from SVGParserUtils instead of spli Some cleanup opportunities: else if (attr->name() == SVGNames::valuesAttr) { This: String parse = m_rgbColor.stripWhiteSpace(); followed by: + const UChar* ptr = reinterpret_cast<const UChar*>(parse.charactersWithNullTermination()) + 4; always copies the buffer once (I think in many cases it copies it twice). I think you can avoid one copy using a combination of skipOptionalSpaces and String::find() (or a single condition that checks the next 4 characters) starting after the run of spaces. I can't say if it's worth it. Is it necessary to cast the result of charactersWithNullTermination()? Shouldn't you set ec before returning in these cases? + if (!tryParseCoord(ptr, r)) + return; The use of skipOptionalSpaces in the non-percentage case is redundant, as parseCoord always ends with skipOptionalSpacesOrDelimiter which ends with skipOptionalSpaces: + if (*ptr == '%') { + percentages++; + ptr++; + skipOptionalSpacesOrDelimiter(ptr); + } else { + normal++; + skipOptionalSpaces(ptr); It also seems that this will successfully parse things like "rgb(25 , % , 50%, 75%)". tryParseNumberList is only used in this pattern: + int nr = tryParseNumberList(value, x, y); + if (nr > 0) { + setKernelUnitLengthXBaseValue(x); + if (nr == 1) + setKernelUnitLengthYBaseValue(x); + else + setKernelUnitLengthYBaseValue(y); + } I think you can factor out the common logic so that you'll have a function bool tryParseNumberOptionalNumber(const String&, double& h, double& v) that sets h and v to the same value if there is only one value, and returns false if there is no value. I don't know how it should behave for more than 2 values (or any junk after the second value). SVGNumberList::parse() allows things like " , 1, 2, , 3 ," (since tryParseCoord allows one comma, and then skipOptionalSpacesOrDelimiter allows another one; it is also what allows the leading comma). This is a funny variable name: + const UChar* eoString = currSegment + points.length(); This can be changed to use tryParseCoord: currSegment = parseCoord(currSegment, xPos); if (currSegment == prevSegment) Some redundant parameter names: + const UChar* parseCoord(const UChar* ptr, double& number); + void parsePoints(const String& points) const; Extra spaces inside the parentheses; "d" is unnecessary: + void parseSVG( const String &d, bool process = false ); skipOptionalSpaces(" ") tests the first character twice. Does the compiler optimize it? skipOptionalSpacesOrDelimiter(" ") tests the first character three times. Same question. This should be FIXME: + // TODO : more error checking/reporting
mitz
Comment 12 2006-10-20 13:47:06 PDT
Comment on attachment 10944 [details] Slightly cleaned up patch Additional observations: In SVGColor::setRGBColor(), normal == 3 - percentages, so you don't need two variables. Brace should be indented: + static inline bool skipOptionalSpaces(const UChar*& ptr) // true means "found space" +{ Brace should be on the switch line: + switch (type) + {
Darin Adler
Comment 13 2006-11-09 08:37:42 PST
Comment on attachment 10854 [details] First attempt Clear review flag on this obsolete patch.
Rob Buis
Comment 14 2006-12-22 13:45:30 PST
Created attachment 11972 [details] Compiles with ToT This patch is done sort of from scratch. It is not complete, but does a lot of useful things already. Although it is not complete I am attaching it already since I am away tomorrow and someone may want to shark it (Eric?) :) I should be able to easily finish it sunday. Also note that it should help with bug 11908. Cheers, Rob.
Rob Buis
Comment 15 2006-12-24 05:46:29 PST
Created attachment 11993 [details] Completely remove split usage on ToT This one should be complete and quite clean. Cheers, Rob.
Rob Buis
Comment 16 2006-12-24 07:26:41 PST
After discussing with Eric, it is clear that there are two important issues with the patch. One is performance, other are tests. Here are some ways we can improve tests for transform lists and preserve aspect ratio: - multiple comma's - missing (, ) etc. - having too many/few params in the transform lists - maybe wrong ordering (defer as last one for example) - entering letter where number is expected Cheers, Rob. (In reply to comment #15) > Created an attachment (id=11993) [edit] > Completely remove split usage on ToT > > This one should be complete and quite clean. > Cheers, > > Rob. >
Rob Buis
Comment 17 2006-12-27 12:26:53 PST
Created attachment 12069 [details] Smaller patch This patch solves just the transform passing and should be easier to review. Also note that it gives a big speed improvement for the svg PLT. Cheers, Rob.
Rob Buis
Comment 18 2006-12-27 12:54:34 PST
Created attachment 12070 [details] Better diff As requested by Eric. Cheers, Rob.
Eric Seidel (no email)
Comment 19 2006-12-27 13:24:00 PST
Comment on attachment 12070 [details] Better diff This looks great. We definitely need more invalid path tests. I think you could rename SVGPathUtils to SVGPathUtilities for clarity, but that's not a huge deal. r=me
Eric Seidel (no email)
Comment 20 2006-12-27 13:26:11 PST
Oh, I almost forgot, a couple comments made over IRC: 1. skip = true, so the last parseNumber in parseNumberOptionalNumber might need to force that to false. 2. the x array may need to be cleared every time through the loop, otherwise optional parameters (like those for ROTATE) might end up being re-used from a previous parse.
David Kilzer (:ddkilzer)
Comment 21 2006-12-28 15:36:42 PST
Landed in r18440 by rwlbuis.
Note You need to log in before you can comment on or make changes to this bug.