Summary: | CSS1: letters following apostrophe are wrongly capitalized when text-transform:capitalize applied | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Dave Hyatt <hyatt> | ||||||||||||||||||||||
Component: | CSS | Assignee: | Beth Dakin <bdakin> | ||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||
Severity: | Normal | CC: | nickshanks | ||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||||
Version: | 412 | ||||||||||||||||||||||||
Hardware: | Mac | ||||||||||||||||||||||||
OS: | OS X 10.4 | ||||||||||||||||||||||||
Attachments: |
|
Description
Dave Hyatt
2005-06-10 00:11:00 PDT
Apple Bug: 3204011 Patch and regression test posted to webkit-reviews Created attachment 2211 [details]
test case: rsquo, combining diacritic and apos
Correct rendering would be "Safari’s Naïve Nut'in"
Created attachment 2212 [details]
improved test case, added word beginning with apos ('cept)
Note that due to a bug introduced between safari 2.0 and the current ToT, this
test fails. I have yet to work out the cause, but it's someone else's fault :-)
Created attachment 2213 [details]
proposed patch
Created attachment 2214 [details]
proposed patch
This is a more generalised patch.
Created attachment 2215 [details]
Regression Test
More tests will help. Make sure you don't capitalize letters that occur after soft hyphens or after regular hyphens. Created attachment 2244 [details]
Regression Test
Now includes hyphenated and soft hyphenated words and an abbreviation "e.g."
Created attachment 2248 [details]
a nasty little test >:-D
Created attachment 2249 [details]
nasty test amended
Test looks great. Needs a UTF-8 encoding meta tag. Created attachment 2360 [details]
added UTF-8 BOM to test
Created attachment 2381 [details]
correction from french-speaker
I think a UTF-8 meta tag would be better than a UTF-8 BOM in the test case. Beth and I just researched this a bit. To do a good job of capitalizing words, we really want to use the ICU library. ICU specifically suggests using the break iterator in this way -- they call it "title boundary analysis". We're thinking of landing the patch attached to this bug to make things a little better, then eventually following up with a much better implementation that uses UBreakIterator (from <unicode/ubrk.h>). We also think that some of the items in this test case are beyond the scope of what should be expected from text-transform: capitalize. Specifically, we don't think the browser should be required to do linguistic analysis to tell the difference between words that should be capitalized in title case and words that should not. We can't find any other browser that does this. Comment on attachment 2214 [details]
proposed patch
While this patch is not perfect, it does seem to make things better. So lets
land this, and write another bug about doing even better later on.
I used a BOM in the test case because Safari first checks for a BOM, then goes to the Content-Encoding HTTP header. The bugzilla.opendarwin.org server seems to be sending incorrect Content-Encoding header information, as can be seen when viewing any page with non-ASCII characters in it (e.g. θης ις συμ γÏ?εεκ) when your default/current page encoding is set to something other than ISO-8859-1. |