RESOLVED FIXED 3230
CSS1: Words with inline elements get extra capital letters
https://bugs.webkit.org/show_bug.cgi?id=3230
Summary CSS1: Words with inline elements get extra capital letters
Dave Hyatt
Reported 2005-06-01 14:35:05 PDT
Words with inline elements should only have one capital letter when text-transform: capitalize is applied. this is in the w3c test, and is mentioned on diveintomark. test case: <html> <head> <style type="text/css"> .cap {text-transform: capitalize;} </style> </head> <body> <P>Words with inline elements inside them should only capitalize the first letter of the word. Therefore, the last word in this sentence should have one, and only one, capital <SPAN CLASS=cap>le<SPAN>tt</SPAN>er</SPAN>. </P> </body> </html>
Attachments
test case as attachement (343 bytes, text/html)
2005-12-28 01:56 PST, Eric Seidel (no email)
no flags
Proposed patch (14.35 KB, patch)
2006-02-13 04:44 PST, Andrew Wellington
hyatt: review-
Updated LayoutTests/css2.1/support/css1test545.png (1.76 KB, image/png)
2006-02-13 04:46 PST, Andrew Wellington
no flags
Proposed patch 2 (18.59 KB, patch)
2006-02-13 16:21 PST, Andrew Wellington
darin: review+
greenbox.png for testcase (95 bytes, image/png)
2006-02-13 16:22 PST, Andrew Wellington
no flags
Nicholas Shanks
Comment 1 2005-06-15 07:33:21 PDT
See my testcase attached to other capitalise bug (#3406) for this, specifically the words 'earthquake' 'earthworm' and 'cheeseburger' in the english and 3eme in the french.
Eric Seidel (no email)
Comment 2 2005-12-28 01:56:12 PST
Created attachment 5328 [details] test case as attachement
Andrew Wellington
Comment 3 2006-02-13 04:44:28 PST
Created attachment 6458 [details] Proposed patch While this doesn't use UBreakIterator, it does fix the problems we have displaying the CSS1 test suite and should make all capitalisation obey the same rules instead of inlines being able to break rendering (essentially we capitalise the first letter after a space and nothing else). Also added Copyright to StringImpl because I wrote the last fix for capitalisation in here, otherwise it's hardly worth it :) To be attached in a second after this, updated LayoutTests/css2.1/support/css1test545.png for the new capitalisation behaviour to draw the green box correctly.
Andrew Wellington
Comment 4 2006-02-13 04:46:57 PST
Created attachment 6459 [details] Updated LayoutTests/css2.1/support/css1test545.png Update to go with patch for background image from test case css2.1/t1605-c545-txttrans-00-b-ag
Dave Hyatt
Comment 5 2006-02-13 11:37:14 PST
Comment on attachment 6458 [details] Proposed patch + for (o = previousRenderer(); o && o->isInline() && !o->isText(); o = o->previousRenderer()) This will include images that occur between words, e.g., foo<img>goo I'm not sure that's right. You might want to see what other browsers do. + ; + if (o && o->isText()) { + DOMStringImpl* prevStr = static_cast<RenderText*>(o)->string(); + QChar c = (*prevStr)[prevStr->length() - 1]; + if (!c.isSpace()) + runOnString = true; + } Not sure about this from an RTL perspective. Maybe Mitz could comment, but that seems dangerous to me.
Andrew Wellington
Comment 6 2006-02-13 16:21:02 PST
Created attachment 6471 [details] Proposed patch 2 Fix the img case hyatt mentioned. Works correctly with RTL text as discussed with mitzpettel. Added a new LayoutTest to check for img case and RTL case. Will add the greenbox.png to this bug. Will require updated pixel test image for updated tests as well.
Andrew Wellington
Comment 7 2006-02-13 16:22:28 PST
Created attachment 6472 [details] greenbox.png for testcase Required for added test case in patch 2, located in LayoutTests/css1/text_properties/support/greenbox.png
Darin Adler
Comment 8 2006-02-18 06:40:42 PST
Comment on attachment 6471 [details] Proposed patch 2 I think it's OK to land this now. I don't like the name "runOnString" for a boolean that means "has a previous character is not a space", but those are nitpicks and this seems fine. Dave had told me this is a difficult area to get right, but on his previous review he only mentioned the two issues that were investigated and fixed so I'm going to assume all is well. r=me
Nicholas Shanks
Comment 9 2006-02-18 11:17:27 PST
To Andrew: Does this handle non?breaking spaces, zero?width space, thin spaces, em spaces etc. too? Does it NOT capitalise after zero width joiner, zero width non?joiner?
Andrew Wellington
Comment 10 2006-02-18 18:27:53 PST
Nick: This follows the same rules as capitalisation within a single span. From my tests this means that the following "unusual" characters are considered to break a work: Thin Space Em-space En-space Not breaking on: non-breaking space (not sure if we should or not on this one) zero-width non-joiner zero-width joiner The best way to resolve these is probably to use UBreakIterator, or to decide if they should be considered spaces for the isSpace() call.
Nicholas Shanks
Comment 11 2006-02-19 01:50:43 PST
Sorry about my previous comment, i didn't realise bugzilla couldn't handle unicode. (I have remapped U+2010 HYPHEN to the key between zero and equals.) Word breaking isn't the same thing though, so you shouldn't be using UBreakIterator really. For example hyphenated words like Semi?circle (that's "Semi-circle" using a HYPHEN-MINUS) or Meso­potamia (Mesopotamia with a SOFT HYPHEN after the second syllable) can break after the hyphen but should not capitalise. All of the spaces from U+2002 to U+200B should be followed by a capital. As for the non?breaking space, which as best I can tell is mostly used in cases such as "20 kg" and "Rameses II" should probably not be capitalised due to usage, where for example "100 ?" ? "100 ?".
Nicholas Shanks
Comment 12 2006-02-19 01:53:07 PST
(In reply to comment #11) > where for example "100 ?" ? "100 ?". Grr. That was supposed to say "100 microns" does not equal "100 capital Mu".
Darin Adler
Comment 13 2006-02-27 22:01:09 PST
(In reply to comment #11) > Word breaking isn't the same thing though, so you shouldn't be using > UBreakIterator really. That's the reason to not use UBreakIterator's "word boundary analysis". But according to the ICU documentation "title boundary analysis locates all positions, typically starts of words, that should be set to title case when title casing the text." It seems that *is* what we want to use. Anyway, since this bug is fixed, lets file new bug reports about further enhancements and cases that are behaving incorrectly instead of putting more comments here where no one is going to read them.
Darin Adler
Comment 14 2006-02-27 22:14:11 PST
(In reply to comment #3) > To be attached in a second after this, updated > LayoutTests/css2.1/support/css1test545.png for the new capitalisation behaviour > to draw the green box correctly. I wanted to land this patch, but I don't understand why we have to modify the CSS 1 test suite itself. That PNG came with the test suite -- why do we want to change it?
Beth Dakin
Comment 15 2006-03-02 17:51:00 PST
I committed this fix.
Note You need to log in before you can comment on or make changes to this bug.