Bug 39966 - Crash in qsort() because compareBorders() is not reliable
Summary: Crash in qsort() because compareBorders() is not reliable
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Yong Li
URL:
Keywords:
: 13147 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-05-31 11:28 PDT by Yong Li
Modified: 2011-01-12 11:13 PST (History)
11 users (show)

See Also:


Attachments
the patch (2.98 KB, patch)
2010-05-31 11:30 PDT, Yong Li
ap: review-
Details | Formatted Diff | Diff
the test case (1.94 KB, text/html)
2010-06-06 19:50 PDT, Yong Li
no flags Details
Test case (2.71 KB, patch)
2010-06-07 07:19 PDT, Yong Li
ap: review+
commit-queue: commit-queue-
Details | Formatted Diff | Diff
the patch that fixes the problem (3.09 KB, patch)
2010-06-07 07:19 PDT, Yong Li
no flags Details | Formatted Diff | Diff
Fixed bugs. use same function for both qsort and other cases (20.49 KB, patch)
2010-06-07 10:03 PDT, Yong Li
no flags Details | Formatted Diff | Diff
updated patch (20.49 KB, patch)
2010-06-07 10:08 PDT, Yong Li
no flags Details | Formatted Diff | Diff
Fix comments & keep old behavior for BNONE style (20.44 KB, patch)
2010-07-23 08:49 PDT, Yong Li
no flags Details | Formatted Diff | Diff
test case (tab removed) (2.73 KB, patch)
2010-09-20 14:20 PDT, Yong Li
commit-queue: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yong Li 2010-05-31 11:28:23 PDT
In RenderTableCell.cpp, compareBorders() is used to implement compareBorderStylesForQSort().

But it's not stable. When compareBorderStylesForQSort(a, b) returns 1, compareBorderStylesForQSort(b, a) can return 1 or 0.

We should pass a stable compare function to qsort().
Comment 1 Yong Li 2010-05-31 11:30:49 PDT
Created attachment 57485 [details]
the patch
Comment 2 Alexey Proskuryakov 2010-06-01 16:18:16 PDT
Comment on attachment 57485 [details]
the patch

Please add a test case, or explain why making one isn't possible (sounds like it should be possible).
Comment 3 Yong Li 2010-06-01 20:46:03 PDT
(In reply to comment #2)
> (From update of attachment 57485 [details])
> Please add a test case, or explain why making one isn't possible (sounds like it should be possible).

it's a hard job, because it relies on the implementation of qsort()
Comment 4 Alexey Proskuryakov 2010-06-01 22:22:20 PDT
Can a test be made that crashes on any common implementation? The bug title says "crash", so surely this was observed in practice?
Comment 5 Yong Li 2010-06-02 07:22:57 PDT
(In reply to comment #4)
> Can a test be made that crashes on any common implementation? The bug title says "crash", so surely this was observed in practice?

Sorry, the title is probably misleading. I have seen it crashed in armcc's qsort(), and using stable compare function fixed it. I cannot reproduce it on Safari win32.

But in general, the compare functions passed to sorting algorithm (including qsort, stable_sort, and sorted containers like std::set/map) should be written in a reliable way and should give consistent result. The compare function passed to qsort should always make this true: compare(a, b) == -compare(b, a). The old code uses a function that can make compare(a, b) == 1 and compare(b, a) == 0. The behavior of running qsort() with it is undetermined.
Comment 6 Alexey Proskuryakov 2010-06-02 08:03:41 PDT
Can you add a test that crashes on armcc without this fix?
Comment 7 Yong Li 2010-06-02 08:06:11 PDT
(In reply to comment #6)
> Can you add a test that crashes on armcc without this fix?

I'll give a try.
Comment 8 Yong Li 2010-06-06 19:50:55 PDT
Created attachment 57990 [details]
the test case
Comment 9 Yong Li 2010-06-06 19:51:57 PDT
(In reply to comment #6)
> Can you add a test that crashes on armcc without this fix?

Finally got a test case that can reproduce the crash. Do you think it should go into LayoutTest?
Comment 10 Yong Li 2010-06-07 07:19:07 PDT
Created attachment 58026 [details]
Test case
Comment 11 Yong Li 2010-06-07 07:19:44 PDT
Created attachment 58027 [details]
the patch that fixes the problem
Comment 12 Darin Adler 2010-06-07 07:25:23 PDT
Comment on attachment 58027 [details]
the patch that fixes the problem

Is there a way to fix this without writing a whole new copy of the compareBorders function? That's quite undesirable, because we can have real problems if the two functions get out of sync. What's the particular thing in the original function that makes it inconsistent?

The word "stable" has a specific meaning in sorting contexts. It means a sorting algorithm that does not reorder equal items in the array being sorted. Using the term stable in the title of the new version of the compareBorders function is imprecise and confusing since it already has a specific meaning for sorting.
Comment 13 Yong Li 2010-06-07 07:58:06 PDT
(In reply to comment #12)
> (From update of attachment 58027 [details])
> Is there a way to fix this without writing a whole new copy of the compareBorders function? That's quite undesirable, because we can have real problems if the two functions get out of sync. What's the particular thing in the original function that makes it inconsistent?
> The word "stable" has a specific meaning in sorting contexts. It means a sorting algorithm that does not reorder equal items in the array being sorted. Using the term stable in the title of the new version of the compareBorders function is imprecise and confusing since it already has a specific meaning for sorting.

1) The existing compareBorders function is mainly used to choose a border style from 2. 

It looks buggy for example the santiy check ignore the BHIDDEN check. If border 1 doesn't exist and border 2 is hidden, it doesn't return an empty border value as it's supposed to do. In fact, !border1.exists() rarely happens. 

Also, this function returns border1 when 2 border styles are equal, and the old compareBorderStylesForQSort implementation only checks the return value once and assumes the returned value is the bigger one. so it can get compareBorderStylesForQSort(a, b) == 1 and compare...ForQSort(b, a) == 1.

> return border1.precedence() >= border2.precedence() ? border1 : border2;

When both borders are BHIDDEN, the result is compareBorderStylesForQSort(a, b) == -1 and compare...ForQSort(b, a) == -1.

I'll try to make them use same code.

2) I thought about that, too, but couldn't find a good name. how about consistentCompare?

or? rename stableCompareBorders to compareBorderStyles, but rename compareBorders to chooseBorderBetween?
Comment 14 Yong Li 2010-06-07 08:33:58 PDT
Comment on attachment 58027 [details]
the patch that fixes the problem

there's a bug in there. HIDDEN border should take precedence.
Comment 15 Yong Li 2010-06-07 10:03:56 PDT
Created attachment 58041 [details]
Fixed bugs. use same function for both qsort and other cases

Also fixed a bug in the old function: when both borders have NONE style, it didn't continue comparing any more. But it should continue checking other properties in this case according to the comment:

// (2) Borders with a style of 'none' have the lowest priority. Only if the border properties of all 
// the elements meeting at this edge are 'none' will the border be omitted (but note that 'none' is 
// the default value for the border style.)
// (3) If none of the styles are 'hidden' and at least one of them is not 'none', then narrow borders
Comment 16 Yong Li 2010-06-07 10:04:35 PDT
Comment on attachment 58041 [details]
Fixed bugs. use same function for both qsort and other cases

oops wrong patch.
Comment 17 Yong Li 2010-06-07 10:08:26 PDT
Created attachment 58042 [details]
updated patch
Comment 18 Early Warning System Bot 2010-06-07 10:08:57 PDT
Attachment 58041 [details] did not build on qt:
Build output: http://webkit-commit-queue.appspot.com/results/3086190
Comment 19 WebKit Review Bot 2010-06-07 10:13:55 PDT
Attachment 58041 [details] did not build on chromium:
Build output: http://webkit-commit-queue.appspot.com/results/3095208
Comment 20 Yong Li 2010-06-07 10:25:26 PDT
Also, FireFox renders the test case in the way different from webkit browsers. FireFox renders some borders around a few elements. WebKit browsers like Safari and Qt port don't render any border. My patch makes webkit do the similar thing as FireFox. I'm not sure if FireFox is doing the right thing. Any expert?
Comment 21 Yong Li 2010-06-07 13:33:06 PDT
(In reply to comment #20)
> Also, FireFox renders the test case in the way different from webkit browsers. FireFox renders some borders around a few elements. WebKit browsers like Safari and Qt port don't render any border. My patch makes webkit do the similar thing as FireFox. I'm not sure if FireFox is doing the right thing. Any expert?

Please ignore this
Comment 22 Yong Li 2010-06-10 10:45:18 PDT
FYI, Daniel Bates has tested this patch on mac, and it passed all tests. He also ran pixels test for LayoutTests/table and LayoutTest/fast/table and did not notice any significant differences. Thanks, Dan.
Comment 23 Kent Hansen 2010-07-23 06:32:18 PDT
Comment on attachment 58042 [details]
updated patch

> +        Make compareBorders() a consistent compare function which can used be by qsort().

"which can be used by..."

> +        Also fixed a bug in previous implementation that it didn't continue comparing when
> +        both borders have NONE style.

If that's a separate bug it should have a separate patch and test case, and be fixed before this one. (See http://trac.webkit.org/wiki/CodeReview)
Alternatively, leave that part out of this patch (keeping the behavior of the old implementation) and create a follow-up bug after this has landed.

>      // Sanity check the values passed in.  If either is null, return the other.

You might want to reword comments like the above one, since the function now returns an int, not one of its arguments.

>      // Rule #2 above.  A style of 'none' has lowest priority and always loses to any other border.
> -    if (border2.style() == BNONE)
> -        return border1;
> +    if (border2.style() == BNONE) {
> +        if (border1.style() != BNONE)
> +            return 1;
> +    }
>      if (border1.style() == BNONE)
> -        return border2;
> +        return -1;

It looks like -1 is always returned when both have style BNONE. Shouldn't it be 0, like for BHIDDEN?

> @@ -767,10 +785,7 @@ static int compareBorderStylesForQSort(const void* pa, const void* pb)
>      const CollapsedBorderValue* b = static_cast<const CollapsedBorderValue*>(pb);
>      if (*a == *b)
>          return 0;
> -    CollapsedBorderValue borderWithHigherPrecedence = compareBorders(*a, *b);
> -    if (*a == borderWithHigherPrecedence)
> -        return 1;
> -    return -1;
> +    return compareBorders(*a, *b);
>  }

Nice!

In general I like that compareBorders() returns an int rather than a border; the name is better suited now than before. :)
As for chooseBorder(), I'm not able to come up with a better name.
Comment 24 Yong Li 2010-07-23 07:55:53 PDT
(In reply to comment #23)
> (From update of attachment 58042 [details])

> > -    if (border2.style() == BNONE)
> > -        return border1;
> > +    if (border2.style() == BNONE) {
> > +        if (border1.style() != BNONE)
> > +            return 1;
> > +    }
> >      if (border1.style() == BNONE)
> > -        return border2;
> > +        return -1;
> It looks like -1 is always returned when both have style BNONE. Shouldn't it be 0, like for BHIDDEN?

Nice catch! Actuall when both borders have style BNONE, we should continue comparing with other criteria. so it should


> >      "else" if (border1.style() == BNONE)
Comment 25 Yong Li 2010-07-23 07:56:35 PDT
Comment on attachment 58042 [details]
updated patch

Review cancelled because of the bug Kent found.
Comment 26 Yong Li 2010-07-23 08:34:40 PDT
(In reply to comment #24)
> (In reply to comment #23)
> > (From update of attachment 58042 [details] [details])
> 
> > > -    if (border2.style() == BNONE)
> > > -        return border1;
> > > +    if (border2.style() == BNONE) {
> > > +        if (border1.style() != BNONE)
> > > +            return 1;
> > > +    }
> > >      if (border1.style() == BNONE)
> > > -        return border2;
> > > +        return -1;
> > It looks like -1 is always returned when both have style BNONE. Shouldn't it be 0, like for BHIDDEN?
> 
> Nice catch! Actuall when both borders have style BNONE, we should continue comparing with other criteria. so it should
> 
> 
> > >      "else" if (border1.style() == BNONE)

I'm not very sure. I'll keep the old behavior (just return 0) and open another bug for that.
Comment 27 Yong Li 2010-07-23 08:49:03 PDT
Created attachment 62429 [details]
Fix comments & keep old behavior for BNONE style
Comment 28 Alexey Proskuryakov 2010-09-20 10:42:40 PDT
Comment on attachment 58026 [details]
Test case

Commit-queue won't land this, please land manually.
Comment 29 Yong Li 2010-09-20 10:44:21 PDT
(In reply to comment #28)
> (From update of attachment 58026 [details])
> Commit-queue won't land this, please land manually.

ha. thanks!
Comment 30 Eric Seidel (no email) 2010-09-20 10:48:23 PDT
Why won't it land it?
Comment 31 WebKit Commit Bot 2010-09-20 10:55:57 PDT
Comment on attachment 62429 [details]
Fix comments & keep old behavior for BNONE style

Clearing flags on attachment: 62429

Committed r67862: <http://trac.webkit.org/changeset/67862>
Comment 32 Yong Li 2010-09-20 10:56:52 PDT
Comment on attachment 58026 [details]
Test case

give a try...
Comment 33 Alexey Proskuryakov 2010-09-20 11:01:49 PDT
"This" was the other patch, the one with test. I thought commit queue didn't like multiple patches per bug.
Comment 34 Yong Li 2010-09-20 11:15:31 PDT
Comment on attachment 58026 [details]
Test case

commit queue says: Building and testing without the patch as a sanity check 5 minutes ago Unable to build and test patch 

I'll manually commit it
Comment 35 Eric Seidel (no email) 2010-09-20 11:18:03 PDT
(In reply to comment #34)
> (From update of attachment 58026 [details])
> commit queue says: Building and testing without the patch as a sanity check 5 minutes ago Unable to build and test patch 
> 
> I'll manually commit it

This just means the cq had trouble building.  (likely due to tree redness).  It's a non-fatal error.
Comment 36 Eric Seidel (no email) 2010-09-20 11:18:27 PDT
(In reply to comment #35)
> This just means the cq had trouble building.  (likely due to tree redness).  It's a non-fatal error.

The commit-queue will always comment in the bug if it hits a fatal error.
Comment 37 Eric Seidel (no email) 2010-09-20 11:19:34 PDT
(In reply to comment #33)
> "This" was the other patch, the one with test. I thought commit queue didn't like multiple patches per bug.

The commit-queue can handle multiple patches per bug.  It didn't used to like them, no. :)

I wouldn't say they're recommended, but the cq will try and process patches in attachment order per-bug.  it also won't close a bug after landing if there are still non-obsolete patches on that bug.
Comment 38 WebKit Commit Bot 2010-09-20 13:05:57 PDT
Comment on attachment 58026 [details]
Test case

Rejecting patch 58026 from commit-queue.

Failed to run "[u'git', u'svn', u'dcommit']" exit_code: 1
Last 500 characters of output:
repository/webkit/trunk ...
	M	LayoutTests/ChangeLog
	A	LayoutTests/tables/sort-collapsed-border-styles.html
A repository hook failed: Commit blocked by pre-commit hook (exit code 1) with output:

    The following files contain tab characters:

        trunk/LayoutTests/tables/sort-collapsed-border-styles.html

    Please use spaces instead to indent.
    If you must commit a file with tabs, use svn propset to set the "allow-tabs" property.
 at /usr/local/git/libexec/git-core/git-svn line 572


Full output: http://queues.webkit.org/results/4026079
Comment 39 Yong Li 2010-09-20 14:20:26 PDT
Created attachment 68135 [details]
test case (tab removed)
Comment 40 Yong Li 2010-09-20 14:21:16 PDT
Comment on attachment 68135 [details]
test case (tab removed)

with the tab removed
Comment 41 Yong Li 2010-09-20 14:23:35 PDT
When I tried manually committing it. I got this:

RA layer request failed: MKACTIVITY of '/repository/webkit/!svn/act/9569dd30-c4f
c-11df-b9b0-c0cdfa48311f': 403 Forbidden (http://svn.webkit.org) at C:\Git/libex
ec/git-core/git-svn line 3945
Comment 42 Eric Seidel (no email) 2010-09-20 15:06:45 PDT
This appears to have hung the commit-cluster again.  I'm investigating.  It's not your fault, it should not be possible for a patch to hang the cluster. :)
Comment 43 Yong Li 2010-09-20 15:18:56 PDT
(In reply to comment #42)
> This appears to have hung the commit-cluster again.  I'm investigating.  It's not your fault, it should not be possible for a patch to hang the cluster. :)

I did type wrong user/password more than one time. I haven't pushed a commit by myself for very long time. I cannot live without CQ :)
Comment 44 Andrew Wilson 2010-09-20 17:36:54 PDT
After this revision landed, fast/repaint/table-cell-collapsed-border.html started generating a different result and so is causing failures on the Chromium build bots. I am not certain whether the new result is still valid or not, but I suspect it's now drawing the borders in a different order and generating invalid results.

Please run this test and validate whether the results are still valid - I don't want to revert your CL when it's possible we just need to rebaseline the test expectations.
Comment 45 Eric Seidel (no email) 2010-09-20 17:41:07 PDT
Did this get landed manually and the bug not updated?
Comment 46 Andrew Wilson 2010-09-20 17:51:00 PDT
Looks like it landed as r67862.
Comment 47 Eric Seidel (no email) 2010-09-20 18:03:57 PDT
Closing, per above.
Comment 48 Yong Li 2010-09-21 07:28:15 PDT
(In reply to comment #47)
> Closing, per above.

The test case is still not landed...
Comment 49 Yong Li 2010-09-21 07:30:29 PDT
(In reply to comment #44)
> After this revision landed, fast/repaint/table-cell-collapsed-border.html started generating a different result and so is causing failures on the Chromium build bots. I am not certain whether the new result is still valid or not, but I suspect it's now drawing the borders in a different order and generating invalid results.
> 
> Please run this test and validate whether the results are still valid - I don't want to revert your CL when it's possible we just need to rebaseline the test expectations.

The patch fixes the compare function passed to qsort. So it is not surprising that it invalidates some old test result. I will check with fast/repaint/table-cell-collapsed-border.html.
Comment 50 Yong Li 2010-09-21 08:22:17 PDT
(In reply to comment #44)
> After this revision landed, fast/repaint/table-cell-collapsed-border.html started generating a different result and so is causing failures on the Chromium build bots. I am not certain whether the new result is still valid or not, but I suspect it's now drawing the borders in a different order and generating invalid results.
> 
> Please run this test and validate whether the results are still valid - I don't want to revert your CL when it's possible we just need to rebaseline the test expectations.

I noticed even the 2 table-cell-collapsed-border-expected.png files in chromium-linux and chromium-win are different. The new result is reverse of the old result: the actual result in chromium-win build is expected result in chromium-linux, and the actual result in chromium-linux is the expected result in chromium-win. I guess it must be due to the difference between 2 qsort algorithms in handling equal items.
Comment 51 Andrew Wilson 2010-09-21 09:56:34 PDT
OK, so I'll rebaseline the tests then. Thanks for verifying that they are correct.
Comment 52 Darin Adler 2010-09-21 09:59:46 PDT
Ideally we don’t want test results that are different on different platforms because of qsort. Maybe we can change things around so the sort is stable?
Comment 53 Yong Li 2010-09-21 10:29:19 PDT
(In reply to comment #52)
> Ideally we don’t want test results that are different on different platforms because of qsort. Maybe we can change things around so the sort is stable?

2 ways I can see:

1. use stable_sort instead of qsort to avoid changing the order of 2 equal items
2. make compare function never return 0

Seems 2) is very hard to be done gracefully. Also in most cases there are not too many collapsed border styles to sort. So probably 1) is acceptable?
Comment 54 Steve Block 2010-09-22 03:16:41 PDT
*** Bug 13147 has been marked as a duplicate of this bug. ***
Comment 55 Darin Adler 2010-09-22 08:20:22 PDT
(In reply to comment #53)
> 2 ways I can see:
> 
> 1. use stable_sort instead of qsort to avoid changing the order of 2 equal items
> 2. make compare function never return 0
> 
> Seems 2) is very hard to be done gracefully. Also in most cases there are not too many collapsed border styles to sort. So probably 1) is acceptable?

I’m not sure what order the borders are in beforehand. It seems fine to use stable_sort instead of qsort if the order beforehand is something well defined.

In other words, what is the ordering we want to apply to otherwise equal border styles? Some sort of document order?

I guess this bug may not be the right place to discuss this, but it would be nice to get this tidy.
Comment 56 Yong Li 2010-09-22 17:11:55 PDT
(In reply to comment #55)
> (In reply to comment #53)
> > 2 ways I can see:
> > 
> > 1. use stable_sort instead of qsort to avoid changing the order of 2 equal items
> > 2. make compare function never return 0
> > 
> > Seems 2) is very hard to be done gracefully. Also in most cases there are not too many collapsed border styles to sort. So probably 1) is acceptable?
> 
> I’m not sure what order the borders are in beforehand. It seems fine to use stable_sort instead of qsort if the order beforehand is something well defined.
> 
> In other words, what is the ordering we want to apply to otherwise equal border styles? Some sort of document order?
> 

Seems the pre-sort order is:

1) order of children nodes
2) for each child node: left border, right, top and then bottom
Comment 57 Yong Li 2010-12-22 07:54:25 PST
try the test case
Comment 58 WebKit Commit Bot 2010-12-22 12:23:55 PST
Comment on attachment 68135 [details]
test case (tab removed)

Rejecting attachment 68135 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-3', 'build-and-test', '--no-clean', '--no-update', '--test', '--non-interactive']" exit_code: 2
Last 500 characters of output:
sts/websocket/tests/workers ......
http/tests/workers .....
http/tests/xhtmlmp .
http/tests/xmlhttprequest ............................................................................................................................................................................
http/tests/xmlhttprequest/web-apps ...............
http/tests/xmlhttprequest/workers .........
635.73s total testing time

22228 test cases (99%) succeeded
1 test case (<1%) was new
13 test cases (<1%) had stderr output

Full output: http://queues.webkit.org/results/7325106
Comment 59 Eric Seidel (no email) 2010-12-22 13:36:41 PST
The commit-queue should give you a better message here, but your patch is missing expectation files for your new tests.
Comment 60 Eric Seidel (no email) 2010-12-22 13:37:42 PST
This test shoudl be dump as text.

Are you sure this is a reduced test case?  it looks like it might be way bigger than necessary.
Comment 61 Yong Li 2010-12-22 13:58:37 PST
(In reply to comment #60)
> This test shoudl be dump as text.
> Are you sure this is a reduced test case?  it looks like it might be way bigger than necessary.

Probably bigger than necessary. I used it to reprodce a crash in qsort with the old compare function. I didn't see rvct's qsort code, and just blindly tried different combinations and got a crash. Also, to test the sorting algorithm, problably the longer the better? Anyway the code has been changed from using qsort to using std::sort. Not sure if this test case is still important to us. Maybe leaving it to this bug is good enough?
Comment 62 Eric Seidel (no email) 2011-01-11 03:11:05 PST
I'm confused as to the status of this bug.