Summary: | text-transform: uppercase/lowercase don't handle cases one character becomes two | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Alexey Proskuryakov <ap> | ||||||||
Component: | CSS | Assignee: | Darin Adler <darin> | ||||||||
Status: | VERIFIED FIXED | ||||||||||
Severity: | Enhancement | CC: | darin, julian | ||||||||
Priority: | P2 | Keywords: | HasReduction | ||||||||
Version: | 420+ | ||||||||||
Hardware: | Mac | ||||||||||
OS: | OS X 10.4 | ||||||||||
Attachments: |
|
Description
Alexey Proskuryakov
2005-12-31 07:12:04 PST
Created attachment 5396 [details]
test case
Created attachment 5397 [details]
test case
Added Latin ligatures ffi and IJ.
I don't think lower() and upper() in DOMString are really to blame here. We won't usually have the entire string we want to upper or lower-case in a DOMString -- words can be split over many separate DOM nodes and have a mix of styles. So proper implementation of a more sophisticated upper and lowercasing algorithm would probably not affect lower() and upper() at all. Hence this bug should be retitled to describe the symptom. I suggest listing the Turkish "i" in a separate bug, since the issue is different. As for DOMString::lower()/upper(), one possible solution could be to dumb it down even further, to handle only ASCII text (since it's mostly used for tag canonicalization). Then, it could be renamed to something like asciiLower(), as a warning. I like the idea of dumbing lower/upper down and changing their names. Then we can create functions that take a language parameter and do the right thing, even for the Turkish, and handle cases where the number of characters changes when you change case. The ICU functions for this are: u_strToLower u_strToUpper u_strToTitle I think we can fix a lot of problems if we change things so that text transforms use these functions, either directly or through DOMString functions. Created attachment 6071 [details]
dumb down lower() and upper()
I have tried dumbing down lower() and upper() (and equalIgnoringCase(), for that matter), but there are too many cases where I'm not sure what to do. The attached patch has changes that seemed safe to me (although I woudn't be able to prove even their correctness for functions like getNamedItemNS() or parseMappedAttribute()).
KXMLCore::Unicode has some smarter functions for upper/lowercasing now, perhaps they can be reused in WebCore (no locale support yet, though). *** Bug 8805 has been marked as a duplicate of this bug. *** My QChar -> UChar patch is going to fix this "as a side effect". The test case attached here did not match the bug report. The test case was testing "capitalize" but the bug report mentions "uppercase" and "lowercase". I fixed uppercase and lowercase in a recent patch, and I added a test case based on the one here. Landed in r14273. (In reply to comment #12) > The test case attached here did not match the bug report. The test case was > testing "capitalize" but the bug report mentions "uppercase" and "lowercase". Mea culpa. Capitalize also appears to be broken; I'll file a separate bug for it. > I fixed uppercase and lowercase in a recent patch, and I added a test case > based on the one here. There's still a minor bug in the landed test case - the expected result for "IJ" is incorrect, although it looks the same. The actual/correct result is a ligature "ij", not two separate characters. As a reminder to myself, I'm seeing an assertion failure when selecting text in the test, should investigate and file a bug: /Users/ap/WebKit/WebCore/dom/Position.cpp:341: failed assertion `currentOffset >= renderer->caretMaxOffset()' |