Bug 3230 - CSS1: Words with inline elements get extra capital letters
Summary: CSS1: Words with inline elements get extra capital letters
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: 412
Hardware: All OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords: HasReduction
Depends on:
Blocks:
 
Reported: 2005-06-01 14:35 PDT by Dave Hyatt
Modified: 2006-03-02 17:51 PST (History)
4 users (show)

See Also:


Attachments
test case as attachement (343 bytes, text/html)
2005-12-28 01:56 PST, Eric Seidel (no email)
no flags Details
Proposed patch (14.35 KB, patch)
2006-02-13 04:44 PST, Andrew Wellington
hyatt: review-
Details | Formatted Diff | Diff
Updated LayoutTests/css2.1/support/css1test545.png (1.76 KB, image/png)
2006-02-13 04:46 PST, Andrew Wellington
no flags Details
Proposed patch 2 (18.59 KB, patch)
2006-02-13 16:21 PST, Andrew Wellington
darin: review+
Details | Formatted Diff | Diff
greenbox.png for testcase (95 bytes, image/png)
2006-02-13 16:22 PST, Andrew Wellington
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Hyatt 2005-06-01 14:35:05 PDT
Words with inline elements should only have one capital letter when text-transform: capitalize is 
applied.

this is in the w3c test, and is mentioned on diveintomark.

test case:
<html>
<head>
<style type="text/css">
.cap {text-transform: capitalize;}
</style>
</head>
<body>

<P>Words with inline elements inside them should only capitalize the first letter of the word. 
Therefore, the last word in this sentence should have one, and only one, capital <SPAN 
CLASS=cap>le<SPAN>tt</SPAN>er</SPAN>. 
</P>

</body>
</html>
Comment 1 Nicholas Shanks 2005-06-15 07:33:21 PDT
See my testcase attached to other capitalise bug (#3406) for this, specifically the words 'earthquake' 
'earthworm' and 'cheeseburger' in the english and 3eme in the french.
Comment 2 Eric Seidel (no email) 2005-12-28 01:56:12 PST
Created attachment 5328 [details]
test case as attachement
Comment 3 Andrew Wellington 2006-02-13 04:44:28 PST
Created attachment 6458 [details]
Proposed patch

While this doesn't use UBreakIterator, it does fix the problems we have displaying the CSS1 test suite and should make all capitalisation obey the same rules instead of inlines being able to break rendering (essentially we capitalise the first letter after a space and nothing else).

Also added Copyright to StringImpl because I wrote the last fix for capitalisation in here, otherwise it's hardly worth it :)

To be attached in a second after this, updated LayoutTests/css2.1/support/css1test545.png for the new capitalisation behaviour to draw the green box correctly.
Comment 4 Andrew Wellington 2006-02-13 04:46:57 PST
Created attachment 6459 [details]
Updated LayoutTests/css2.1/support/css1test545.png

Update to go with patch for background image from test case css2.1/t1605-c545-txttrans-00-b-ag
Comment 5 Dave Hyatt 2006-02-13 11:37:14 PST
Comment on attachment 6458 [details]
Proposed patch

+                    for (o = previousRenderer(); o && o->isInline() && !o->isText(); o = o->previousRenderer())

This will include images that occur between words, e.g.,

foo<img>goo

I'm not sure that's right.  You might want to see what other browsers do.


+                        ;
+                    if (o && o->isText()) {
+                        DOMStringImpl* prevStr = static_cast<RenderText*>(o)->string();
+                        QChar c = (*prevStr)[prevStr->length() - 1];
+                        if (!c.isSpace())
+                            runOnString = true;
+                    }

Not sure about this from an RTL perspective.  Maybe Mitz could comment, but that seems dangerous to me.
Comment 6 Andrew Wellington 2006-02-13 16:21:02 PST
Created attachment 6471 [details]
Proposed patch 2

Fix the img case hyatt mentioned. Works correctly with RTL text as discussed with mitzpettel. Added a new LayoutTest to check for img case and RTL case. Will add the greenbox.png to this bug. Will require updated pixel test image for updated tests as well.
Comment 7 Andrew Wellington 2006-02-13 16:22:28 PST
Created attachment 6472 [details]
greenbox.png for testcase

Required for added test case in patch 2, located in LayoutTests/css1/text_properties/support/greenbox.png
Comment 8 Darin Adler 2006-02-18 06:40:42 PST
Comment on attachment 6471 [details]
Proposed patch 2

I think it's OK to land this now. I don't like the name "runOnString" for a boolean that means "has a previous character is not a space", but those are nitpicks and this seems fine.

Dave had told me this is a difficult area to get right, but on his previous review he only mentioned the two issues that were investigated and fixed so I'm going to assume all is well.

r=me
Comment 9 Nicholas Shanks 2006-02-18 11:17:27 PST
To Andrew: Does this handle non?breaking spaces, zero?width space, thin spaces, em spaces etc. too?
Does it NOT capitalise after zero width joiner, zero width non?joiner?
Comment 10 Andrew Wellington 2006-02-18 18:27:53 PST
Nick: This follows the same rules as capitalisation within a single span. From my tests this means that the following "unusual" characters are considered to break a work:
Thin Space
Em-space
En-space

Not breaking on:
non-breaking space (not sure if we should or not on this one)
zero-width non-joiner
zero-width joiner

The best way to resolve these is probably to use UBreakIterator, or to decide if they should be considered spaces for the isSpace() call.
Comment 11 Nicholas Shanks 2006-02-19 01:50:43 PST
Sorry about my previous comment, i didn't realise bugzilla couldn't handle unicode. (I have remapped U+2010 HYPHEN to the key between zero and equals.)

Word breaking isn't the same thing though, so you shouldn't be using UBreakIterator really. For example hyphenated words like Semi?circle (that's "Semi-circle" using a HYPHEN-MINUS) or Meso­potamia (Mesopotamia with a SOFT HYPHEN after the second syllable) can break after the hyphen but should not capitalise. All of the spaces from U+2002 to U+200B should be followed by a capital. As for the non?breaking space, which as best I can tell is mostly used in cases such as "20 kg" and "Rameses II" should probably not be capitalised due to usage, where for example "100 ?" ? "100 ?".
Comment 12 Nicholas Shanks 2006-02-19 01:53:07 PST
(In reply to comment #11)
> where for example "100 ?" ? "100 ?".

Grr. That was supposed to say "100 microns" does not equal "100 capital Mu".

Comment 13 Darin Adler 2006-02-27 22:01:09 PST
(In reply to comment #11)
> Word breaking isn't the same thing though, so you shouldn't be using
> UBreakIterator really.

That's the reason to not use UBreakIterator's "word boundary analysis". But according to the ICU documentation "title boundary analysis locates all positions, typically starts of words, that should be set to title case when title casing the text."

It seems that *is* what we want to use.

Anyway, since this bug is fixed, lets file new bug reports about further enhancements and cases that are behaving incorrectly instead of putting more comments here where no one is going to read them.
Comment 14 Darin Adler 2006-02-27 22:14:11 PST
(In reply to comment #3)
> To be attached in a second after this, updated
> LayoutTests/css2.1/support/css1test545.png for the new capitalisation behaviour
> to draw the green box correctly.

I wanted to land this patch, but I don't understand why we have to modify the CSS 1 test suite itself. That PNG came with the test suite -- why do we want to change it?
Comment 15 Beth Dakin 2006-03-02 17:51:00 PST
I committed this fix.