Bug 3406 - CSS1: letters following apostrophe are wrongly capitalized when text-transform:capitalize applied
Summary: CSS1: letters following apostrophe are wrongly capitalized when text-transfor...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: 412
Hardware: Macintosh OS X 10.4
: P2 Normal
Assignee: Beth Dakin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-10 00:11 PDT by Dave Hyatt
Modified: 2005-08-08 07:38 PDT (History)
1 user (show)

See Also:


Attachments
test case: rsquo, combining diacritic and apos (389 bytes, text/html)
2005-06-10 02:51 PDT, Nicholas Shanks
no flags Details
improved test case, added word beginning with apos ('cept) (581 bytes, text/html)
2005-06-10 03:49 PDT, Nicholas Shanks
no flags Details
proposed patch (845 bytes, patch)
2005-06-10 03:50 PDT, Nicholas Shanks
no flags Details | Formatted Diff | Diff
proposed patch (1.08 KB, patch)
2005-06-10 04:05 PDT, Andrew Wellington
darin: review+
Details | Formatted Diff | Diff
Regression Test (481 bytes, text/html)
2005-06-10 04:06 PDT, Andrew Wellington
no flags Details
Regression Test (568 bytes, text/html)
2005-06-11 02:35 PDT, Andrew Wellington
no flags Details
a nasty little test >:-D (4.34 KB, text/html)
2005-06-11 10:59 PDT, Nicholas Shanks
no flags Details
nasty test amended (4.37 KB, text/html)
2005-06-11 11:07 PDT, Nicholas Shanks
no flags Details
added UTF-8 BOM to test (4.38 KB, text/html)
2005-06-15 07:28 PDT, Nicholas Shanks
no flags Details
correction from french-speaker (4.38 KB, text/html)
2005-06-16 00:09 PDT, Nicholas Shanks
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Hyatt 2005-06-10 00:11:00 PDT
3/20/03 11:46 AM Vicki Murley:
In IE 5, 5.5, and 6 under Windows, as well as Mozilla / Camino, the following treatment is applied 
correctly -->

<p style="text-transform:capitalize;">todd's bargain basement</p>

Output: Todd's Bargain Basement

In Safari (and IE 5 Mac for that matter) the same line renders as...

Todd'S Bargain Basement

Any character immediately following a quote or apostrophe wrongly receives a capitalization transform.
Comment 1 Dave Hyatt 2005-06-10 00:12:49 PDT
Apple Bug: 3204011
Comment 2 Andrew Wellington 2005-06-10 02:35:34 PDT
Patch and regression test posted to webkit-reviews
Comment 3 Nicholas Shanks 2005-06-10 02:51:54 PDT
Created attachment 2211 [details]
test case: rsquo, combining diacritic and apos

Correct rendering would be "Safari’s Naïve Nut'in"
Comment 4 Nicholas Shanks 2005-06-10 03:49:37 PDT
Created attachment 2212 [details]
improved test case, added word beginning with apos ('cept)

Note that due to a bug introduced between safari 2.0 and the current ToT, this
test fails. I have yet to work out the cause, but it's someone else's fault :-)
Comment 5 Nicholas Shanks 2005-06-10 03:50:09 PDT
Created attachment 2213 [details]
proposed patch
Comment 6 Andrew Wellington 2005-06-10 04:05:25 PDT
Created attachment 2214 [details]
proposed patch

This is a more generalised patch.
Comment 7 Andrew Wellington 2005-06-10 04:06:49 PDT
Created attachment 2215 [details]
Regression Test
Comment 8 Dave Hyatt 2005-06-10 23:12:45 PDT
More tests will help.  Make sure you don't capitalize letters that occur after soft hyphens or after regular 
hyphens.
Comment 9 Andrew Wellington 2005-06-11 02:35:33 PDT
Created attachment 2244 [details]
Regression Test

Now includes hyphenated and soft hyphenated words and an abbreviation "e.g."
Comment 10 Nicholas Shanks 2005-06-11 10:59:14 PDT
Created attachment 2248 [details]
a nasty little test   >:-D
Comment 11 Nicholas Shanks 2005-06-11 11:07:35 PDT
Created attachment 2249 [details]
nasty test amended
Comment 12 Darin Adler 2005-06-12 17:02:59 PDT
Test looks great. Needs a UTF-8 encoding meta tag.
Comment 13 Nicholas Shanks 2005-06-15 07:28:50 PDT
Created attachment 2360 [details]
added UTF-8 BOM to test
Comment 14 Nicholas Shanks 2005-06-16 00:09:33 PDT
Created attachment 2381 [details]
correction from french-speaker
Comment 15 Darin Adler 2005-07-27 14:16:11 PDT
I think a UTF-8 meta tag would be better than a UTF-8 BOM in the test case.
Comment 16 Darin Adler 2005-07-27 14:26:00 PDT
Beth and I just researched this a bit.

To do a good job of capitalizing words, we really want to use the ICU library. ICU specifically suggests 
using the break iterator in this way -- they call it "title boundary analysis".

We're thinking of landing the patch attached to this bug to make things a little better, then eventually 
following up with a much better implementation that uses UBreakIterator (from <unicode/ubrk.h>).

We also think that some of the items in this test case are beyond the scope of what should be expected 
from text-transform: capitalize. Specifically, we don't think the browser should be required to do 
linguistic analysis to tell the difference between words that should be capitalized in title case and words 
that should not. We can't find any other browser that does this.
Comment 17 Darin Adler 2005-07-27 14:26:38 PDT
Comment on attachment 2214 [details]
proposed patch

While this patch is not perfect, it does seem to make things better. So lets
land this, and write another bug about doing even better later on.
Comment 18 Nicholas Shanks 2005-08-08 07:38:40 PDT
I used a BOM in the test case because Safari first checks for a BOM, then goes to the Content-Encoding 
HTTP header. The bugzilla.opendarwin.org server seems to be sending incorrect Content-Encoding header 
information, as can be seen when viewing any page with non-ASCII characters in it (e.g. θης ις συμ γÏ?εεκ) 
when your default/current page encoding is set to something other than ISO-8859-1.