Bug 3406

Summary: CSS1: letters following apostrophe are wrongly capitalized when text-transform:capitalize applied
Product: WebKit Reporter: Dave Hyatt <hyatt>
Component: CSSAssignee: Beth Dakin <bdakin>
Status: RESOLVED FIXED    
Severity: Normal CC: nickshanks
Priority: P2    
Version: 412   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
test case: rsquo, combining diacritic and apos
none
improved test case, added word beginning with apos ('cept)
none
proposed patch
none
proposed patch
darin: review+
Regression Test
none
Regression Test
none
a nasty little test >:-D
none
nasty test amended
none
added UTF-8 BOM to test
none
correction from french-speaker none

Dave Hyatt
Reported 2005-06-10 00:11:00 PDT
3/20/03 11:46 AM Vicki Murley: In IE 5, 5.5, and 6 under Windows, as well as Mozilla / Camino, the following treatment is applied correctly --> <p style="text-transform:capitalize;">todd's bargain basement</p> Output: Todd's Bargain Basement In Safari (and IE 5 Mac for that matter) the same line renders as... Todd'S Bargain Basement Any character immediately following a quote or apostrophe wrongly receives a capitalization transform.
Attachments
test case: rsquo, combining diacritic and apos (389 bytes, text/html)
2005-06-10 02:51 PDT, Nicholas Shanks
no flags
improved test case, added word beginning with apos ('cept) (581 bytes, text/html)
2005-06-10 03:49 PDT, Nicholas Shanks
no flags
proposed patch (845 bytes, patch)
2005-06-10 03:50 PDT, Nicholas Shanks
no flags
proposed patch (1.08 KB, patch)
2005-06-10 04:05 PDT, Andrew Wellington
darin: review+
Regression Test (481 bytes, text/html)
2005-06-10 04:06 PDT, Andrew Wellington
no flags
Regression Test (568 bytes, text/html)
2005-06-11 02:35 PDT, Andrew Wellington
no flags
a nasty little test >:-D (4.34 KB, text/html)
2005-06-11 10:59 PDT, Nicholas Shanks
no flags
nasty test amended (4.37 KB, text/html)
2005-06-11 11:07 PDT, Nicholas Shanks
no flags
added UTF-8 BOM to test (4.38 KB, text/html)
2005-06-15 07:28 PDT, Nicholas Shanks
no flags
correction from french-speaker (4.38 KB, text/html)
2005-06-16 00:09 PDT, Nicholas Shanks
no flags
Dave Hyatt
Comment 1 2005-06-10 00:12:49 PDT
Apple Bug: 3204011
Andrew Wellington
Comment 2 2005-06-10 02:35:34 PDT
Patch and regression test posted to webkit-reviews
Nicholas Shanks
Comment 3 2005-06-10 02:51:54 PDT
Created attachment 2211 [details] test case: rsquo, combining diacritic and apos Correct rendering would be "Safari’s Naïve Nut'in"
Nicholas Shanks
Comment 4 2005-06-10 03:49:37 PDT
Created attachment 2212 [details] improved test case, added word beginning with apos ('cept) Note that due to a bug introduced between safari 2.0 and the current ToT, this test fails. I have yet to work out the cause, but it's someone else's fault :-)
Nicholas Shanks
Comment 5 2005-06-10 03:50:09 PDT
Created attachment 2213 [details] proposed patch
Andrew Wellington
Comment 6 2005-06-10 04:05:25 PDT
Created attachment 2214 [details] proposed patch This is a more generalised patch.
Andrew Wellington
Comment 7 2005-06-10 04:06:49 PDT
Created attachment 2215 [details] Regression Test
Dave Hyatt
Comment 8 2005-06-10 23:12:45 PDT
More tests will help. Make sure you don't capitalize letters that occur after soft hyphens or after regular hyphens.
Andrew Wellington
Comment 9 2005-06-11 02:35:33 PDT
Created attachment 2244 [details] Regression Test Now includes hyphenated and soft hyphenated words and an abbreviation "e.g."
Nicholas Shanks
Comment 10 2005-06-11 10:59:14 PDT
Created attachment 2248 [details] a nasty little test >:-D
Nicholas Shanks
Comment 11 2005-06-11 11:07:35 PDT
Created attachment 2249 [details] nasty test amended
Darin Adler
Comment 12 2005-06-12 17:02:59 PDT
Test looks great. Needs a UTF-8 encoding meta tag.
Nicholas Shanks
Comment 13 2005-06-15 07:28:50 PDT
Created attachment 2360 [details] added UTF-8 BOM to test
Nicholas Shanks
Comment 14 2005-06-16 00:09:33 PDT
Created attachment 2381 [details] correction from french-speaker
Darin Adler
Comment 15 2005-07-27 14:16:11 PDT
I think a UTF-8 meta tag would be better than a UTF-8 BOM in the test case.
Darin Adler
Comment 16 2005-07-27 14:26:00 PDT
Beth and I just researched this a bit. To do a good job of capitalizing words, we really want to use the ICU library. ICU specifically suggests using the break iterator in this way -- they call it "title boundary analysis". We're thinking of landing the patch attached to this bug to make things a little better, then eventually following up with a much better implementation that uses UBreakIterator (from <unicode/ubrk.h>). We also think that some of the items in this test case are beyond the scope of what should be expected from text-transform: capitalize. Specifically, we don't think the browser should be required to do linguistic analysis to tell the difference between words that should be capitalized in title case and words that should not. We can't find any other browser that does this.
Darin Adler
Comment 17 2005-07-27 14:26:38 PDT
Comment on attachment 2214 [details] proposed patch While this patch is not perfect, it does seem to make things better. So lets land this, and write another bug about doing even better later on.
Nicholas Shanks
Comment 18 2005-08-08 07:38:40 PDT
I used a BOM in the test case because Safari first checks for a BOM, then goes to the Content-Encoding HTTP header. The bugzilla.opendarwin.org server seems to be sending incorrect Content-Encoding header information, as can be seen when viewing any page with non-ASCII characters in it (e.g. θης ις συμ γÏ?εεκ) when your default/current page encoding is set to something other than ISO-8859-1.
Note You need to log in before you can comment on or make changes to this bug.