Bug 11108

Summary: Replace usage of split by proper parsers
Product: WebKit Reporter: Rob Buis <rwlbuis>
Component: SVGAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal    
Priority: P2    
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
Bug Depends on:    
Bug Blocks: 11908    
Attachments:
Description Flags
First attempt
none
Probably better long-term solution
none
Using UChar
none
Improved patch
none
Slightly cleaned up patch
mitz: review-
Compiles with ToT
none
Completely remove split usage on ToT
none
Smaller patch
none
Better diff eric: review+

Description Rob Buis 2006-10-01 04:48:58 PDT
See http://www.kevlindev.com/gui/utilities/viewbox/viewBox.svg. When the aspect
ratio params change to empty, it crashes.
Comment 1 Rob Buis 2006-10-01 05:05:26 PDT
Created attachment 10854 [details]
First attempt

A simple fix...
Cheers,

Rob.
Comment 2 Eric Seidel (no email) 2006-10-01 05:10:33 PDT
Comment on attachment 10854 [details]
First attempt

Looks fine.

You should probably search for all other uses of DeprecatedString::split.  There may be other crashing cases.  Do we adequately protect against having only 1 or 2 or 3 params?

Using DeprecatedStringList::split is really bad practice, since it copies the string for parsing, instead of just walking it.  But at least this fixes the crash for now.
Comment 3 Rob Buis 2006-10-02 13:47:21 PDT
Hi,

(In reply to comment #2)
> (From update of attachment 10854 [details] [edit])
> Looks fine.
> 
> You should probably search for all other uses of DeprecatedString::split. 
> There may be other crashing cases.  Do we adequately protect against having
> only 1 or 2 or 3 params?
> 
> Using DeprecatedStringList::split is really bad practice, since it copies the
> string for parsing, instead of just walking it.  But at least this fixes the
> crash for now.

Nice to see the r+, however I think we may as well go for the real fix, which is removing use
of split() and do proper parsers. I'll attach a patch for this, then maybe we can discuss the proper
action. I think it is good that we have this bug reminding us of one of the current (split()) problems
though.
cheers,

Rob.
Comment 4 Rob Buis 2006-10-02 13:49:56 PDT
Created attachment 10872 [details]
Probably better long-term solution

Here is the current state of my patch to get rid of split(). It needs speed testing, but it
should be quicker since the copies are gone and less methods (like endsWith()) are called.
Cheers,

Rob.
Comment 5 Darin Adler 2006-10-02 19:39:29 PDT
Comment on attachment 10872 [details]
Probably better long-term solution

Why the conversion to char? The other functions could just as well take UChar* as char*, and converting to char just means you have to do additional memory allocation.
Comment 6 Rob Buis 2006-10-03 06:03:46 PDT
Created attachment 10880 [details]
Using UChar

Now I use UChar * instead of char* as suggested by Darin.
Cheers,

Rob.
Comment 7 Eric Seidel (no email) 2006-10-03 06:18:25 PDT
Comment on attachment 10880 [details]
Using UChar

These:
+    if (*currParam == 'd') {
+        if (!((eoString - currParam) > 4) &&
+              currParam[1] == 'e' &&
+              currParam[2] == 'f' &&
+              currParam[3] == 'e' &&
+              currParam[4] == 'r')
+            goto bail_out;
+        currParam += 5;
+        skipOptionalSpaces(currParam);
     }

or at least part, should be abstracted into a static inline.  maybe just this part:

+        if (!((eoString - currParam) > 4) &&
+              currParam[1] == 'e' &&
+              currParam[2] == 'f' &&
+              currParam[3] == 'e' &&
+              currParam[4] == 'r')

Sleep now.  I can look at the rest later.
Comment 8 Rob Buis 2006-10-04 12:23:53 PDT
Created attachment 10908 [details]
Improved patch

This patch is much more complete and removes split from use in ksvg. Also
added and changed some testcases. Let me know whether there should be more tests.
Cheers,

Rob.
Comment 9 Rob Buis 2006-10-06 01:10:08 PDT
Created attachment 10944 [details]
Slightly cleaned up patch

This one has some nicer code in some parsing sections. Also I fixed obvious code style bugs.
Cheers,

Rob.
Comment 10 Maciej Stachowiak 2006-10-09 05:26:03 PDT
Don't forget to remove the conflict marker:

+>>>>>>> .r16800
Comment 11 mitz 2006-10-20 07:30:53 PDT
Comment on attachment 10944 [details]
Slightly cleaned up patch

I didn't read the entire patch, but here are some observations:

Conflict markers:
+>>>>>>> .r16800

Typo:
+        - use functions from SVGParserUtils instead of spli

Some cleanup opportunities:
     else if (attr->name() == SVGNames::valuesAttr)
     {

This:
     String parse = m_rgbColor.stripWhiteSpace();
followed by:
+        const UChar* ptr = reinterpret_cast<const UChar*>(parse.charactersWithNullTermination()) + 4;
always copies the buffer once (I think in many cases it copies it twice). I think you can avoid one copy using a combination of skipOptionalSpaces and String::find() (or a single condition that checks the next 4 characters) starting after the run of spaces. I can't say if it's worth it.

Is it necessary to cast the result of charactersWithNullTermination()?

Shouldn't you set ec before returning in these cases?
+        if (!tryParseCoord(ptr, r))
+            return;

The use of skipOptionalSpaces in the non-percentage case is redundant, as parseCoord always ends with skipOptionalSpacesOrDelimiter which ends with skipOptionalSpaces:
+        if (*ptr == '%') {
+            percentages++;
+            ptr++;
+            skipOptionalSpacesOrDelimiter(ptr);
+        } else {
+            normal++;
+            skipOptionalSpaces(ptr);

It also seems that this will successfully parse things like "rgb(25 , % , 50%, 75%)".

tryParseNumberList is only used in this pattern:
+        int nr = tryParseNumberList(value, x, y);
+        if (nr > 0) {
+            setKernelUnitLengthXBaseValue(x);
+            if (nr == 1)
+                setKernelUnitLengthYBaseValue(x);
+            else
+                setKernelUnitLengthYBaseValue(y);
+        }

I think you can factor out the common logic so that you'll have a function
bool tryParseNumberOptionalNumber(const String&, double& h, double& v)
that sets h and v to the same value if there is only one value, and returns false if there is no value. I don't know how it should behave for more than 2 values (or any junk after the second value).

SVGNumberList::parse() allows things like "  , 1, 2, , 3 ," (since tryParseCoord allows one comma, and then skipOptionalSpacesOrDelimiter allows another one; it is also what allows the leading comma).

This is a funny variable name:
+    const UChar* eoString = currSegment + points.length();

This can be changed to use tryParseCoord:
         currSegment = parseCoord(currSegment, xPos); 
         if (currSegment == prevSegment)

Some redundant parameter names:
+    const UChar* parseCoord(const UChar* ptr, double& number);
+        void parsePoints(const String& points) const;

Extra spaces inside the parentheses; "d" is unnecessary:
+        void parseSVG( const String &d, bool process = false );

skipOptionalSpaces(" ") tests the first character twice. Does the compiler optimize it?

skipOptionalSpacesOrDelimiter(" ") tests the first character three times. Same question.

This should be FIXME:
+    // TODO : more error checking/reporting
Comment 12 mitz 2006-10-20 13:47:06 PDT
Comment on attachment 10944 [details]
Slightly cleaned up patch

Additional observations:

In SVGColor::setRGBColor(), normal == 3 - percentages, so you don't need two variables.

Brace should be indented:
+    static inline bool skipOptionalSpaces(const UChar*& ptr) // true means "found space"
+{

Brace should be on the switch line:
+        switch (type)
+        {
Comment 13 Darin Adler 2006-11-09 08:37:42 PST
Comment on attachment 10854 [details]
First attempt

Clear review flag on this obsolete patch.
Comment 14 Rob Buis 2006-12-22 13:45:30 PST
Created attachment 11972 [details]
Compiles with ToT

This patch is done sort of from scratch. It is not complete, but does a lot of useful things already.
Although it is not complete I am attaching it already since I am away tomorrow and someone may want to shark it (Eric?) :) I should be able to easily finish it sunday. Also note that it should help with bug 11908.
Cheers,

Rob.
Comment 15 Rob Buis 2006-12-24 05:46:29 PST
Created attachment 11993 [details]
Completely remove split usage on ToT

This one should be complete and quite clean.
Cheers,

Rob.
Comment 16 Rob Buis 2006-12-24 07:26:41 PST
After discussing with Eric, it is clear that there are two important issues with the patch. One is performance, other are tests. Here are some ways we can improve tests for transform lists and preserve aspect ratio:

- multiple comma's
- missing (, ) etc.
- having too many/few params in the transform lists
- maybe wrong ordering (defer as last one for example)
- entering letter where number is expected

Cheers,

Rob.
(In reply to comment #15)
> Created an attachment (id=11993) [edit]
> Completely remove split usage on ToT
> 
> This one should be complete and quite clean.
> Cheers,
> 
> Rob.
> 

Comment 17 Rob Buis 2006-12-27 12:26:53 PST
Created attachment 12069 [details]
Smaller patch

This patch solves just the transform passing and should be easier to review. Also note that it gives a big speed improvement for the svg PLT.
Cheers,

Rob.
Comment 18 Rob Buis 2006-12-27 12:54:34 PST
Created attachment 12070 [details]
Better diff

As requested by Eric.
Cheers,

Rob.
Comment 19 Eric Seidel (no email) 2006-12-27 13:24:00 PST
Comment on attachment 12070 [details]
Better diff

This looks great.

We definitely need more invalid path tests.

I think you could rename SVGPathUtils to SVGPathUtilities for clarity, but that's not a huge deal.

r=me
Comment 20 Eric Seidel (no email) 2006-12-27 13:26:11 PST
Oh, I almost forgot, a couple comments made over IRC:

1.  skip = true, so the last parseNumber in parseNumberOptionalNumber might need to force that to false.

2.  the x array may need to be cleared every time through the loop, otherwise optional parameters (like those for ROTATE) might end up being re-used from a previous parse.
Comment 21 David Kilzer (:ddkilzer) 2006-12-28 15:36:42 PST
Landed in r18440 by rwlbuis.