Summary: | Replace usage of split by proper parsers | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Rob Buis <rwlbuis> | ||||||||||||||||||||
Component: | SVG | Assignee: | Nobody <webkit-unassigned> | ||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||
Severity: | Normal | ||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||
Version: | 420+ | ||||||||||||||||||||||
Hardware: | Mac | ||||||||||||||||||||||
OS: | OS X 10.4 | ||||||||||||||||||||||
Bug Depends on: | |||||||||||||||||||||||
Bug Blocks: | 11908 | ||||||||||||||||||||||
Attachments: |
|
Description
Rob Buis
2006-10-01 04:48:58 PDT
Created attachment 10854 [details]
First attempt
A simple fix...
Cheers,
Rob.
Comment on attachment 10854 [details]
First attempt
Looks fine.
You should probably search for all other uses of DeprecatedString::split. There may be other crashing cases. Do we adequately protect against having only 1 or 2 or 3 params?
Using DeprecatedStringList::split is really bad practice, since it copies the string for parsing, instead of just walking it. But at least this fixes the crash for now.
Hi, (In reply to comment #2) > (From update of attachment 10854 [details] [edit]) > Looks fine. > > You should probably search for all other uses of DeprecatedString::split. > There may be other crashing cases. Do we adequately protect against having > only 1 or 2 or 3 params? > > Using DeprecatedStringList::split is really bad practice, since it copies the > string for parsing, instead of just walking it. But at least this fixes the > crash for now. Nice to see the r+, however I think we may as well go for the real fix, which is removing use of split() and do proper parsers. I'll attach a patch for this, then maybe we can discuss the proper action. I think it is good that we have this bug reminding us of one of the current (split()) problems though. cheers, Rob. Created attachment 10872 [details]
Probably better long-term solution
Here is the current state of my patch to get rid of split(). It needs speed testing, but it
should be quicker since the copies are gone and less methods (like endsWith()) are called.
Cheers,
Rob.
Comment on attachment 10872 [details]
Probably better long-term solution
Why the conversion to char? The other functions could just as well take UChar* as char*, and converting to char just means you have to do additional memory allocation.
Created attachment 10880 [details]
Using UChar
Now I use UChar * instead of char* as suggested by Darin.
Cheers,
Rob.
Comment on attachment 10880 [details]
Using UChar
These:
+ if (*currParam == 'd') {
+ if (!((eoString - currParam) > 4) &&
+ currParam[1] == 'e' &&
+ currParam[2] == 'f' &&
+ currParam[3] == 'e' &&
+ currParam[4] == 'r')
+ goto bail_out;
+ currParam += 5;
+ skipOptionalSpaces(currParam);
}
or at least part, should be abstracted into a static inline. maybe just this part:
+ if (!((eoString - currParam) > 4) &&
+ currParam[1] == 'e' &&
+ currParam[2] == 'f' &&
+ currParam[3] == 'e' &&
+ currParam[4] == 'r')
Sleep now. I can look at the rest later.
Created attachment 10908 [details]
Improved patch
This patch is much more complete and removes split from use in ksvg. Also
added and changed some testcases. Let me know whether there should be more tests.
Cheers,
Rob.
Created attachment 10944 [details]
Slightly cleaned up patch
This one has some nicer code in some parsing sections. Also I fixed obvious code style bugs.
Cheers,
Rob.
Don't forget to remove the conflict marker: +>>>>>>> .r16800 Comment on attachment 10944 [details] Slightly cleaned up patch I didn't read the entire patch, but here are some observations: Conflict markers: +>>>>>>> .r16800 Typo: + - use functions from SVGParserUtils instead of spli Some cleanup opportunities: else if (attr->name() == SVGNames::valuesAttr) { This: String parse = m_rgbColor.stripWhiteSpace(); followed by: + const UChar* ptr = reinterpret_cast<const UChar*>(parse.charactersWithNullTermination()) + 4; always copies the buffer once (I think in many cases it copies it twice). I think you can avoid one copy using a combination of skipOptionalSpaces and String::find() (or a single condition that checks the next 4 characters) starting after the run of spaces. I can't say if it's worth it. Is it necessary to cast the result of charactersWithNullTermination()? Shouldn't you set ec before returning in these cases? + if (!tryParseCoord(ptr, r)) + return; The use of skipOptionalSpaces in the non-percentage case is redundant, as parseCoord always ends with skipOptionalSpacesOrDelimiter which ends with skipOptionalSpaces: + if (*ptr == '%') { + percentages++; + ptr++; + skipOptionalSpacesOrDelimiter(ptr); + } else { + normal++; + skipOptionalSpaces(ptr); It also seems that this will successfully parse things like "rgb(25 , % , 50%, 75%)". tryParseNumberList is only used in this pattern: + int nr = tryParseNumberList(value, x, y); + if (nr > 0) { + setKernelUnitLengthXBaseValue(x); + if (nr == 1) + setKernelUnitLengthYBaseValue(x); + else + setKernelUnitLengthYBaseValue(y); + } I think you can factor out the common logic so that you'll have a function bool tryParseNumberOptionalNumber(const String&, double& h, double& v) that sets h and v to the same value if there is only one value, and returns false if there is no value. I don't know how it should behave for more than 2 values (or any junk after the second value). SVGNumberList::parse() allows things like " , 1, 2, , 3 ," (since tryParseCoord allows one comma, and then skipOptionalSpacesOrDelimiter allows another one; it is also what allows the leading comma). This is a funny variable name: + const UChar* eoString = currSegment + points.length(); This can be changed to use tryParseCoord: currSegment = parseCoord(currSegment, xPos); if (currSegment == prevSegment) Some redundant parameter names: + const UChar* parseCoord(const UChar* ptr, double& number); + void parsePoints(const String& points) const; Extra spaces inside the parentheses; "d" is unnecessary: + void parseSVG( const String &d, bool process = false ); skipOptionalSpaces(" ") tests the first character twice. Does the compiler optimize it? skipOptionalSpacesOrDelimiter(" ") tests the first character three times. Same question. This should be FIXME: + // TODO : more error checking/reporting Comment on attachment 10944 [details]
Slightly cleaned up patch
Additional observations:
In SVGColor::setRGBColor(), normal == 3 - percentages, so you don't need two variables.
Brace should be indented:
+ static inline bool skipOptionalSpaces(const UChar*& ptr) // true means "found space"
+{
Brace should be on the switch line:
+ switch (type)
+ {
Comment on attachment 10854 [details]
First attempt
Clear review flag on this obsolete patch.
Created attachment 11972 [details] Compiles with ToT This patch is done sort of from scratch. It is not complete, but does a lot of useful things already. Although it is not complete I am attaching it already since I am away tomorrow and someone may want to shark it (Eric?) :) I should be able to easily finish it sunday. Also note that it should help with bug 11908. Cheers, Rob. Created attachment 11993 [details]
Completely remove split usage on ToT
This one should be complete and quite clean.
Cheers,
Rob.
After discussing with Eric, it is clear that there are two important issues with the patch. One is performance, other are tests. Here are some ways we can improve tests for transform lists and preserve aspect ratio: - multiple comma's - missing (, ) etc. - having too many/few params in the transform lists - maybe wrong ordering (defer as last one for example) - entering letter where number is expected Cheers, Rob. (In reply to comment #15) > Created an attachment (id=11993) [edit] > Completely remove split usage on ToT > > This one should be complete and quite clean. > Cheers, > > Rob. > Created attachment 12069 [details]
Smaller patch
This patch solves just the transform passing and should be easier to review. Also note that it gives a big speed improvement for the svg PLT.
Cheers,
Rob.
Created attachment 12070 [details]
Better diff
As requested by Eric.
Cheers,
Rob.
Comment on attachment 12070 [details]
Better diff
This looks great.
We definitely need more invalid path tests.
I think you could rename SVGPathUtils to SVGPathUtilities for clarity, but that's not a huge deal.
r=me
Oh, I almost forgot, a couple comments made over IRC: 1. skip = true, so the last parseNumber in parseNumberOptionalNumber might need to force that to false. 2. the x array may need to be cleared every time through the loop, otherwise optional parameters (like those for ROTATE) might end up being re-used from a previous parse. Landed in r18440 by rwlbuis. |