Bug 11572 - REGRESSION: U+0000 is no longer treated as whitespace, causing rendering trouble
Summary: REGRESSION: U+0000 is no longer treated as whitespace, causing rendering trouble
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 420+
Hardware: Mac (PowerPC) OS X 10.4
: P1 Normal
Assignee: Nobody
URL: http://www.denkor.com/en_US/ALL/about...
Keywords: HasReduction, Regression
Depends on:
Blocks:
 
Reported: 2006-11-11 10:17 PST by Tyson Safavi
Modified: 2007-01-22 13:14 PST (History)
4 users (show)

See Also:


Attachments
Testcase for the left side of the menu (65 bytes, text/html)
2007-01-19 06:08 PST, Joost de Valk (AlthA)
no flags Details
Reduced testcase (16 bytes, text/html)
2007-01-19 06:25 PST, Joost de Valk (AlthA)
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tyson Safavi 2006-11-11 10:17:42 PST
Menu Icons are duplicated and navigation on right side is overlapping
Comment 1 Alexey Proskuryakov 2006-11-11 12:27:31 PST
Confirmed as a regression (for me, the navigation on the right side appeared unstyled on the left).
Comment 2 Joost de Valk (AlthA) 2007-01-19 06:08:50 PST
Created attachment 12559 [details]
Testcase for the left side of the menu

Ok the rendering on the left side of the menu goes wrong because of a weird charachter right behind the ; in the style block.
Comment 3 Joost de Valk (AlthA) 2007-01-19 06:25:17 PST
Created attachment 12561 [details]
Reduced testcase

Ok, both wrong renderings boil down to the fact that the first char in this testcases causes ToT to render a margin before the text in the div, where release does not.
Comment 4 Joost de Valk (AlthA) 2007-01-19 06:32:43 PST
U+0001 used to be treated as whitespace in release, and now isn't anymore, causing the rendering here to break.
Comment 5 Mark Rowe (bdash) 2007-01-21 15:59:40 PST
The test case looks identical to me in WebKit 418.9.1 and ToT.  What am I missing?
Comment 6 Sam Weinig 2007-01-21 16:25:40 PST
There was a difference a few days ago.  Perhaps r18988 (http://trac.webkit.org/projects/webkit/changeset/18988) fixed it. 
Comment 7 Ian 'Hixie' Hickson 2007-01-21 23:16:16 PST
I don't understand exactly what this bug is about, but U+0001 shouldn't be treated as whitespace according to either CSS or HTML. Could you elaborate? Does IE do this too? Should the specs be fixed?
Comment 8 Joost de Valk (AlthA) 2007-01-21 23:55:42 PST
The bug seems fixed indeed, both the testcase and URL work fine now. Hixie: it wasn't being treated as whitespace it seems, it was ignored, and it seems to be ignored again now. What it did in the regression was unclear to me, it did cause rendering errors though... What SHOULD a browser be doing? I expect it should ignore it?
Comment 9 Mark Rowe (bdash) 2007-01-22 00:26:14 PST
I think the title is incorrect, and that the character that is present and causing the problem is U+0000 rather than U+0001.  The following command show that U+0001 isn't even present in the URL linked in the bug description:

atlas:~ mrowe$ curl -s http://www.denkor.com/en_US/ALL/aboutus/cu/index.shtml | xxd | cut -d ' ' -f 2-9 | egrep '(^00|00$| 00|00 )'
2f2d 2d3e 0a3c 2f73 6372 6970 743e 0000
723b 0000 0a20 2020 0a20 2020 626f 7264
743a 2031 3770 783b 0000 0a0a 2020 2066
3d22 5374 6172 7422 3e3c 2f61 3e00 000a
3c2f 7472 3e0a 3c2f 7461 626c 653e 0000
atlas:~ mrowe$ curl -s http://www.denkor.com/en_US/ALL/aboutus/cu/index.shtml | xxd | cut -d ' ' -f 2-9 | egrep '(^01|01$| 01|01 )'
atlas:~ mrowe$ 

I'm not sure whether this changes anything about the expected behaviour, but I'm mentioning it for sake of correctness.
Comment 10 Ian 'Hixie' Hickson 2007-01-22 00:27:43 PST
Where excatly are we talking about? As normal character data? As normal character data, U+0001 should, according to the HTML5 draft spec, be treated as normal character data like "x". Or do you mean elsewhere?

Assuming you do mean in HTML parsing, I wouldn't worry about the spec yet. Once the spec is stable, we can get browsers to look at implementing it but until then there's not really any point.
Comment 11 Ian 'Hixie' Hickson 2007-01-22 00:33:15 PST
Ah, per HTML5, all U+0000 NULL characters in the input must be replaced by U+FFFD REPLACEMENT CHARACTERs before the parser sees the input stream, even in the middle of a tagname. But my above caveat still applies.
Comment 12 Joost de Valk (AlthA) 2007-01-22 00:38:06 PST
Well fact was that it was behaving differently than release, that seems now solved :)

these replacement characters mean that it get's ignored by the parser i assume?
Comment 13 Ian 'Hixie' Hickson 2007-01-22 13:14:26 PST
No, they just get treated like unknown characters, like "%" or something.