Bug 14453 - REGRESSION: www.nzherald.co.nz almost all the formatting is gone
Summary: REGRESSION: www.nzherald.co.nz almost all the formatting is gone
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 523.x (Safari 3)
Hardware: PC Windows XP
: P1 Normal
Assignee: Nobody
URL: http://www.nzherald.co.nz
Keywords: HasReduction, InRadar, Regression
: 14509 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-06-28 20:23 PDT by Darren Collins
Modified: 2007-07-12 02:03 PDT (History)
3 users (show)

See Also:


Attachments
Broken formatting (132.09 KB, image/png)
2007-06-28 20:28 PDT, Darren Collins
no flags Details
Formatted layout (nightly r23841) (406.43 KB, image/png)
2007-06-28 20:29 PDT, Darren Collins
no flags Details
Reduction (120 bytes, text/html)
2007-06-29 03:25 PDT, mitz
no flags Details
Fix for regression caused by my previous patch (1.11 KB, patch)
2007-06-30 15:21 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
Now with testcase (7.80 KB, patch)
2007-06-30 15:47 PDT, Rob Buis
darin: review-
Details | Formatted Diff | Diff
Reduction: Bogus span color definition (126 bytes, text/html)
2007-06-30 17:25 PDT, Darren Collins
no flags Details
Another attempt (119.25 KB, patch)
2007-07-02 06:23 PDT, Rob Buis
no flags Details | Formatted Diff | Diff
This time with testcase and against ToT (7.49 KB, patch)
2007-07-04 06:50 PDT, Rob Buis
darin: review-
Details | Formatted Diff | Diff
Improved patch and testcase (10.37 KB, patch)
2007-07-06 04:25 PDT, Rob Buis
darin: review-
Details | Formatted Diff | Diff
Small fixes (10.36 KB, patch)
2007-07-08 09:29 PDT, Rob Buis
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darren Collins 2007-06-28 20:23:25 PDT
Almost all of the formating is gone from http://www.nzherald.co.nz when viewed with local debug build of r23865.

The formatting is present when using nightly build of r23841
Comment 1 Darren Collins 2007-06-28 20:28:12 PDT
Created attachment 15304 [details]
Broken formatting
Comment 2 Darren Collins 2007-06-28 20:29:34 PDT
Created attachment 15305 [details]
Formatted layout (nightly r23841)
Comment 3 mitz 2007-06-29 01:33:50 PDT
Perhaps <http://trac.webkit.org/projects/webkit/changeset/23854>.
Comment 4 mitz 2007-06-29 03:25:56 PDT
Created attachment 15312 [details]
Reduction
Comment 5 Rob Buis 2007-06-30 15:21:12 PDT
Created attachment 15333 [details]
Fix for regression caused by my previous patch

I hope this is enough to get all the semi-legal hex color values. I say we go for that for now, we can always revert the original patch later if worst comes to worst.
Cheers,

Rob.
Comment 6 mitz 2007-06-30 15:34:46 PDT
Comment on attachment 15333 [details]
Fix for regression caused by my previous patch

I don't understand the language so I can't review the code, but a regression test is required.
Comment 7 Rob Buis 2007-06-30 15:47:06 PDT
Created attachment 15334 [details]
Now with testcase

Mitz pointed out a testcase was needed, there you go.
Cheers,

Rob.
Comment 8 Darren Collins 2007-06-30 16:27:03 PDT
If I am reading it correctly, the patch seems to now allow CSS colors of 2 to 6 hex values, rather than just 3 or 6.  The test case is for 2 hex values, which will fix the defect reduction.  But what will happen with 4 or 5 hex values?  Is that really valid?  Shouldn't there be a testcase for those too?



Comment 9 Darren Collins 2007-06-30 17:25:07 PDT
Created attachment 15335 [details]
Reduction: Bogus span color definition

This new reduction shows that the defect is due to a bogus color attribute for the span element.  In Safari 3.0.2 and Firefox/IE the bogus value is ignored, and subsequent CSS elements are processed.  However, with r23841, the bogus value stops further CSS elements from being applied.  So I think the fix isn't to allow invalid shorthand color values like #f0, but instead gracefully ignore the bogus value and continue as per 3.0.2.
Comment 10 Darin Adler 2007-06-30 18:00:20 PDT
Comment on attachment 15334 [details]
Now with testcase

r=me
Comment 11 Darin Adler 2007-07-01 16:49:00 PDT
Committed revision 23919.

Oops, only after landing this fix do I see Darren Collins's comment.

Perhaps we should roll the change out and do a better fix?
Comment 12 Darin Adler 2007-07-01 19:24:47 PDT
Layout test was failing. Rolling out the change.
Comment 13 Darin Adler 2007-07-01 19:25:11 PDT
Comment on attachment 15334 [details]
Now with testcase

When I checked this in, the layout test didn't pass.
Comment 14 Rob Buis 2007-07-02 06:23:29 PDT
Created attachment 15351 [details]
Another attempt

Wow, this one is hard to get right. I think this one is better, it fixes the new testcase. Unfortunately I am not able to make a new patch against ToT (this one is against the feature branch) for at least one more day.
Cheers,

Rob.
Comment 15 Darren Collins 2007-07-03 05:06:12 PDT
*** Bug 14509 has been marked as a duplicate of this bug. ***
Comment 16 Mark Rowe (bdash) 2007-07-04 04:00:20 PDT
Rob, it looks like you omitted the html layout test files from your most recent patch?
Comment 17 Rob Buis 2007-07-04 06:50:03 PDT
Created attachment 15385 [details]
This time with testcase and against ToT

Should be clear from the description :) This one fixes the regression and doesnt regress otherwise.
Cheers,

Rob.
Comment 18 Darin Adler 2007-07-05 00:25:40 PDT
Comment on attachment 15385 [details]
This time with testcase and against ToT

This looks good, but not quite right. With this patch, every case in the parser that allows HEX_OR_IDENT also allows IDSEL, so it doesn't seem good to have two different tokens for these.

For the "specifier" production we allow characters that start with a numeric character, as long as they happen to be 3 or 6 hex digits. That doesn't seem right to me. Is our handling of such selectors consistent with other CSS parsers?

I think that certain types of illegal colors would still malfunction with our engine. For example, "#2bogus".

HEX_OR_IDENT is a strange name for a token that is # followed by either 3 or 6 hexadecimal digits. So if we keep it we should probably rename it.

Would you be willing to clean this up a bit more and straighten it out?
Comment 19 David Kilzer (:ddkilzer) 2007-07-05 22:31:10 PDT
<rdar://problem/5316348>
Comment 20 Rob Buis 2007-07-06 04:25:16 PDT
Created attachment 15416 [details]
Improved patch and testcase

This one has some better names, a check for numbers as first character of an id selector and an extended test.
Cheers,

Rob.
Comment 21 Darin Adler 2007-07-08 05:10:04 PDT
Comment on attachment 15416 [details]
Improved patch and testcase

It's not portable to call isdigit on a 16-bit character and count on it returning false for non-ASCII values. In particular I think it either doesn't compile at all or warns on Windows. Instead the code should just compare the UChar to check if it's >= '0' and <= '9'. We should probably add some helpful ASCII versions of ctype.h functions too, because the ctype.h functions change their behavior based on the current locale and we really don't want that. I think Sam Weinig is working on that.

Sorry to keep nitpicking this, but I still don't think that HASH is a clear name for one of two different tokens that both start with "#". My original comment still stands -- every code path for IDSEL and HASH is identical and it's not helpful that the lexer gives two different tokens for these two different cases. They should be combined into a single token. I think HASH *would* be a reasonable name for that single token.
Comment 22 Rob Buis 2007-07-08 09:27:45 PDT
Hi Darin,

(In reply to comment #21)
> (From update of attachment 15416 [details] [edit])
> It's not portable to call isdigit on a 16-bit character and count on it
> returning false for non-ASCII values. In particular I think it either doesn't
> compile at all or warns on Windows. Instead the code should just compare the
> UChar to check if it's >= '0' and <= '9'. We should probably add some helpful
> ASCII versions of ctype.h functions too, because the ctype.h functions change
> their behavior based on the current locale and we really don't want that. I
> think Sam Weinig is working on that.

Right, not using isdigit anymore in my upcoming patch.

> Sorry to keep nitpicking this, but I still don't think that HASH is a clear
> name for one of two different tokens that both start with "#". My original

I am going for the name HEX for now.

> comment still stands -- every code path for IDSEL and HASH is identical and
> it's not helpful that the lexer gives two different tokens for these two
> different cases. They should be combined into a single token. I think HASH
> *would* be a reasonable name for that single token.

I would like a single token too, the problem I have is that things like \0031 in an id selector get translated to simply '1', at which point I can't distinguish it from an id sel of #1, which also becomes '1'.
Unfortunately I only see the two labels as a solution for this. I am open for suggestions though :)
Cheers,

Rob.
Comment 23 Rob Buis 2007-07-08 09:29:18 PDT
Created attachment 15444 [details]
Small fixes

Here I avoid using isdigit and rename HASH to HEX. Still not sure about the two label issue...
Cheers,

Rob.
Comment 24 Maciej Stachowiak 2007-07-09 01:21:09 PDT
I think you need to have two labels due to the escapes issue you describe. They are allowed in id selectors but not hex colors, but can result in the same effective token.

Comment 25 Darin Adler 2007-07-11 00:16:36 PDT
Comment on attachment 15444 [details]
Small fixes

r=me
Comment 26 David Kilzer (:ddkilzer) 2007-07-11 21:09:24 PDT
Fixed by rwlbuis in r24185.
Comment 27 George Staikos 2007-07-11 21:58:48 PDT
Since these color changes I've had lots of problems with color parsing.  
<td style="background-color: green"> works while <td style="background-color: #00ff00"> doesn't.  It fails on Linux/Qt, but not Mac/Qt.  I think it has to be bison/flex related and the versions are:

Linux
Bison: 2.1
Flex: 2.5.31

OS X
Bison: 1.28
Flex: 2.5.4
Comment 28 Cameron Zwarich (cpst) 2007-07-12 01:46:47 PDT
It is definitely a bison / flex problem. Using Bison 2.1 and Flex 2.5.31 causes the bug to occur on Mac OS X 10.4 as well. I am going to try and see exactly which versions of Bison and Flex break it.
Comment 29 Cameron Zwarich (cpst) 2007-07-12 02:03:51 PDT
Switching to the latest version of flex (2.5.33) fixes the bug. With the prior version (2.5.27), the CSS parser doesn't even compile. The flex SourceForge page <http://flex.sourceforge.net/> notes that 2.5.31 is rather buggy and discourages its use. Therefore, it is almost surely a bug in that older version of flex that causes the problem in Linux.