Summary: | Searching for "し" also finds "じ" | ||
---|---|---|---|
Product: | WebKit | Reporter: | Justin Garcia <justin.garcia> |
Component: | HTML Editing | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED DUPLICATE | ||
Severity: | Normal | CC: | ap, darin, justin.garcia, yutak |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | OS X 10.5 | ||
URL: | http://en.wikipedia.org/wiki/Hiragana |
Description
Justin Garcia
2009-07-22 17:12:49 PDT
Some Unicode normalization is going on somewhere. Seems like n and ñ might reasonably be expected to be equivalent, but not し and じ. I found that Safari also matches "シ" (Katakana letter shi), "ジ" (Katakana letter ji) and "シ" (Halfwidth katakana letter shi). Interestingly, Safari (and Chrome) also matches "㋛" (U+32DB, CIRCLED KATAKANA SI). I bet text find engine uses Unicode Normalization Form KD (NFKD; compatibility decomposition). In fact, when I search for "アパート", Safari matches "㌀" (U+3300, SQUARE APAATO). I'm not sure whether this behavior should be fixed. The matching is based on ICU. The folding is due to code in ICU. In the released Safari 4 the behavior is the same no matter what the user’s locale is, but in the TOT WebKit it respects the user’s locale choice so will work differently for different users. If we don’t like the behavior we can ask ICU to change its behavior, which in some cases would mean changes to the Unicode specification, or we can add workarounds to WebKit’s TextIterator class. I'm going to close this given that it's behaving as expected. If someone wants to make the case for different behavior, feel free to open a new bug or even open an ICU or Unicode standard bug. |