Bug 14453 - REGRESSION: www.nzherald.co.nz almost all the formatting is gone
: REGRESSION: www.nzherald.co.nz almost all the formatting is gone
Status: RESOLVED FIXED
: WebKit
Layout and Rendering
: 523.x (Safari 3)
: PC Windows XP
: P1 Normal
Assigned To:
: http://www.nzherald.co.nz
: HasReduction, InRadar, Regression
:
:
  Show dependency treegraph
 
Reported: 2007-06-28 20:23 PST by
Modified: 2007-07-12 02:03 PST (History)


Attachments
Broken formatting (132.09 KB, image/png)
2007-06-28 20:28 PST, Darren Collins
no flags Details
Formatted layout (nightly r23841) (406.43 KB, image/png)
2007-06-28 20:29 PST, Darren Collins
no flags Details
Reduction (120 bytes, text/html)
2007-06-29 03:25 PST, mitz@webkit.org
no flags Details
Fix for regression caused by my previous patch (1.11 KB, patch)
2007-06-30 15:21 PST, Rob Buis
no flags Review Patch | Details | Formatted Diff | Diff
Now with testcase (7.80 KB, patch)
2007-06-30 15:47 PST, Rob Buis
darin: review-
Review Patch | Details | Formatted Diff | Diff
Reduction: Bogus span color definition (126 bytes, text/html)
2007-06-30 17:25 PST, Darren Collins
no flags Details
Another attempt (119.25 KB, patch)
2007-07-02 06:23 PST, Rob Buis
no flags Review Patch | Details | Formatted Diff | Diff
This time with testcase and against ToT (7.49 KB, patch)
2007-07-04 06:50 PST, Rob Buis
darin: review-
Review Patch | Details | Formatted Diff | Diff
Improved patch and testcase (10.37 KB, patch)
2007-07-06 04:25 PST, Rob Buis
darin: review-
Review Patch | Details | Formatted Diff | Diff
Small fixes (10.36 KB, patch)
2007-07-08 09:29 PST, Rob Buis
darin: review+
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2007-06-28 20:23:25 PST
Almost all of the formating is gone from http://www.nzherald.co.nz when viewed with local debug build of r23865.

The formatting is present when using nightly build of r23841
------- Comment #1 From 2007-06-28 20:28:12 PST -------
Created an attachment (id=15304) [details]
Broken formatting
------- Comment #2 From 2007-06-28 20:29:34 PST -------
Created an attachment (id=15305) [details]
Formatted layout (nightly r23841)
------- Comment #3 From 2007-06-29 01:33:50 PST -------
Perhaps <http://trac.webkit.org/projects/webkit/changeset/23854>.
------- Comment #4 From 2007-06-29 03:25:56 PST -------
Created an attachment (id=15312) [details]
Reduction
------- Comment #5 From 2007-06-30 15:21:12 PST -------
Created an attachment (id=15333) [details]
Fix for regression caused by my previous patch

I hope this is enough to get all the semi-legal hex color values. I say we go for that for now, we can always revert the original patch later if worst comes to worst.
Cheers,

Rob.
------- Comment #6 From 2007-06-30 15:34:46 PST -------
(From update of attachment 15333 [details])
I don't understand the language so I can't review the code, but a regression test is required.
------- Comment #7 From 2007-06-30 15:47:06 PST -------
Created an attachment (id=15334) [details]
Now with testcase

Mitz pointed out a testcase was needed, there you go.
Cheers,

Rob.
------- Comment #8 From 2007-06-30 16:27:03 PST -------
If I am reading it correctly, the patch seems to now allow CSS colors of 2 to 6 hex values, rather than just 3 or 6.  The test case is for 2 hex values, which will fix the defect reduction.  But what will happen with 4 or 5 hex values?  Is that really valid?  Shouldn't there be a testcase for those too?
------- Comment #9 From 2007-06-30 17:25:07 PST -------
Created an attachment (id=15335) [details]
Reduction: Bogus span color definition

This new reduction shows that the defect is due to a bogus color attribute for the span element.  In Safari 3.0.2 and Firefox/IE the bogus value is ignored, and subsequent CSS elements are processed.  However, with r23841, the bogus value stops further CSS elements from being applied.  So I think the fix isn't to allow invalid shorthand color values like #f0, but instead gracefully ignore the bogus value and continue as per 3.0.2.
------- Comment #10 From 2007-06-30 18:00:20 PST -------
(From update of attachment 15334 [details])
r=me
------- Comment #11 From 2007-07-01 16:49:00 PST -------
Committed revision 23919.

Oops, only after landing this fix do I see Darren Collins's comment.

Perhaps we should roll the change out and do a better fix?
------- Comment #12 From 2007-07-01 19:24:47 PST -------
Layout test was failing. Rolling out the change.
------- Comment #13 From 2007-07-01 19:25:11 PST -------
(From update of attachment 15334 [details])
When I checked this in, the layout test didn't pass.
------- Comment #14 From 2007-07-02 06:23:29 PST -------
Created an attachment (id=15351) [details]
Another attempt

Wow, this one is hard to get right. I think this one is better, it fixes the new testcase. Unfortunately I am not able to make a new patch against ToT (this one is against the feature branch) for at least one more day.
Cheers,

Rob.
------- Comment #15 From 2007-07-03 05:06:12 PST -------
*** Bug 14509 has been marked as a duplicate of this bug. ***
------- Comment #16 From 2007-07-04 04:00:20 PST -------
Rob, it looks like you omitted the html layout test files from your most recent patch?
------- Comment #17 From 2007-07-04 06:50:03 PST -------
Created an attachment (id=15385) [details]
This time with testcase and against ToT

Should be clear from the description :) This one fixes the regression and doesnt regress otherwise.
Cheers,

Rob.
------- Comment #18 From 2007-07-05 00:25:40 PST -------
(From update of attachment 15385 [details])
This looks good, but not quite right. With this patch, every case in the parser that allows HEX_OR_IDENT also allows IDSEL, so it doesn't seem good to have two different tokens for these.

For the "specifier" production we allow characters that start with a numeric character, as long as they happen to be 3 or 6 hex digits. That doesn't seem right to me. Is our handling of such selectors consistent with other CSS parsers?

I think that certain types of illegal colors would still malfunction with our engine. For example, "#2bogus".

HEX_OR_IDENT is a strange name for a token that is # followed by either 3 or 6 hexadecimal digits. So if we keep it we should probably rename it.

Would you be willing to clean this up a bit more and straighten it out?
------- Comment #19 From 2007-07-05 22:31:10 PST -------
<rdar://problem/5316348>
------- Comment #20 From 2007-07-06 04:25:16 PST -------
Created an attachment (id=15416) [details]
Improved patch and testcase

This one has some better names, a check for numbers as first character of an id selector and an extended test.
Cheers,

Rob.
------- Comment #21 From 2007-07-08 05:10:04 PST -------
(From update of attachment 15416 [details])
It's not portable to call isdigit on a 16-bit character and count on it returning false for non-ASCII values. In particular I think it either doesn't compile at all or warns on Windows. Instead the code should just compare the UChar to check if it's >= '0' and <= '9'. We should probably add some helpful ASCII versions of ctype.h functions too, because the ctype.h functions change their behavior based on the current locale and we really don't want that. I think Sam Weinig is working on that.

Sorry to keep nitpicking this, but I still don't think that HASH is a clear name for one of two different tokens that both start with "#". My original comment still stands -- every code path for IDSEL and HASH is identical and it's not helpful that the lexer gives two different tokens for these two different cases. They should be combined into a single token. I think HASH *would* be a reasonable name for that single token.
------- Comment #22 From 2007-07-08 09:27:45 PST -------
Hi Darin,

(In reply to comment #21)
> (From update of attachment 15416 [details] [edit])
> It's not portable to call isdigit on a 16-bit character and count on it
> returning false for non-ASCII values. In particular I think it either doesn't
> compile at all or warns on Windows. Instead the code should just compare the
> UChar to check if it's >= '0' and <= '9'. We should probably add some helpful
> ASCII versions of ctype.h functions too, because the ctype.h functions change
> their behavior based on the current locale and we really don't want that. I
> think Sam Weinig is working on that.

Right, not using isdigit anymore in my upcoming patch.

> Sorry to keep nitpicking this, but I still don't think that HASH is a clear
> name for one of two different tokens that both start with "#". My original

I am going for the name HEX for now.

> comment still stands -- every code path for IDSEL and HASH is identical and
> it's not helpful that the lexer gives two different tokens for these two
> different cases. They should be combined into a single token. I think HASH
> *would* be a reasonable name for that single token.

I would like a single token too, the problem I have is that things like \0031 in an id selector get translated to simply '1', at which point I can't distinguish it from an id sel of #1, which also becomes '1'.
Unfortunately I only see the two labels as a solution for this. I am open for suggestions though :)
Cheers,

Rob.
------- Comment #23 From 2007-07-08 09:29:18 PST -------
Created an attachment (id=15444) [details]
Small fixes

Here I avoid using isdigit and rename HASH to HEX. Still not sure about the two label issue...
Cheers,

Rob.
------- Comment #24 From 2007-07-09 01:21:09 PST -------
I think you need to have two labels due to the escapes issue you describe. They are allowed in id selectors but not hex colors, but can result in the same effective token.
------- Comment #25 From 2007-07-11 00:16:36 PST -------
(From update of attachment 15444 [details])
r=me
------- Comment #26 From 2007-07-11 21:09:24 PST -------
Fixed by rwlbuis in r24185.
------- Comment #27 From 2007-07-11 21:58:48 PST -------
Since these color changes I've had lots of problems with color parsing.  
<td style="background-color: green"> works while <td style="background-color: #00ff00"> doesn't.  It fails on Linux/Qt, but not Mac/Qt.  I think it has to be bison/flex related and the versions are:

Linux
Bison: 2.1
Flex: 2.5.31

OS X
Bison: 1.28
Flex: 2.5.4
------- Comment #28 From 2007-07-12 01:46:47 PST -------
It is definitely a bison / flex problem. Using Bison 2.1 and Flex 2.5.31 causes the bug to occur on Mac OS X 10.4 as well. I am going to try and see exactly which versions of Bison and Flex break it.
------- Comment #29 From 2007-07-12 02:03:51 PST -------
Switching to the latest version of flex (2.5.33) fixes the bug. With the prior version (2.5.27), the CSS parser doesn't even compile. The flex SourceForge page <http://flex.sourceforge.net/> notes that 2.5.31 is rather buggy and discourages its use. Therefore, it is almost surely a bug in that older version of flex that causes the problem in Linux.