3/20/03 11:46 AM Vicki Murley:
In IE 5, 5.5, and 6 under Windows, as well as Mozilla / Camino, the following treatment is applied
<p style="text-transform:capitalize;">todd's bargain basement</p>
Output: Todd's Bargain Basement
In Safari (and IE 5 Mac for that matter) the same line renders as...
Todd'S Bargain Basement
Any character immediately following a quote or apostrophe wrongly receives a capitalization transform.
Apple Bug: 3204011
Patch and regression test posted to webkit-reviews
Created attachment 2211 [details]
test case: rsquo, combining diacritic and apos
Correct rendering would be "Safari’s Naïve Nut'in"
Created attachment 2212 [details]
improved test case, added word beginning with apos ('cept)
Note that due to a bug introduced between safari 2.0 and the current ToT, this
test fails. I have yet to work out the cause, but it's someone else's fault :-)
Created attachment 2213 [details]
Created attachment 2214 [details]
This is a more generalised patch.
Created attachment 2215 [details]
More tests will help. Make sure you don't capitalize letters that occur after soft hyphens or after regular
Created attachment 2244 [details]
Now includes hyphenated and soft hyphenated words and an abbreviation "e.g."
Created attachment 2248 [details]
a nasty little test >:-D
Created attachment 2249 [details]
nasty test amended
Test looks great. Needs a UTF-8 encoding meta tag.
Created attachment 2360 [details]
added UTF-8 BOM to test
Created attachment 2381 [details]
correction from french-speaker
I think a UTF-8 meta tag would be better than a UTF-8 BOM in the test case.
Beth and I just researched this a bit.
To do a good job of capitalizing words, we really want to use the ICU library. ICU specifically suggests
using the break iterator in this way -- they call it "title boundary analysis".
We're thinking of landing the patch attached to this bug to make things a little better, then eventually
following up with a much better implementation that uses UBreakIterator (from <unicode/ubrk.h>).
We also think that some of the items in this test case are beyond the scope of what should be expected
from text-transform: capitalize. Specifically, we don't think the browser should be required to do
linguistic analysis to tell the difference between words that should be capitalized in title case and words
that should not. We can't find any other browser that does this.
Comment on attachment 2214 [details]
While this patch is not perfect, it does seem to make things better. So lets
land this, and write another bug about doing even better later on.
I used a BOM in the test case because Safari first checks for a BOM, then goes to the Content-Encoding
HTTP header. The bugzilla.opendarwin.org server seems to be sending incorrect Content-Encoding header
information, as can be seen when viewing any page with non-ASCII characters in it (e.g. Î¸Î·Ï‚ Î¹Ï‚ ÏƒÏ…Î¼ Î³Ï?ÎµÎµÎº)
when your default/current page encoding is set to something other than ISO-8859-1.