27587 – Searching for "し" also finds "じ"

RESOLVED DUPLICATE of bug 3043727587

Searching for "し" also finds "じ"

https://bugs.webkit.org/show_bug.cgi?id=27587

Summary Searching for "し" also finds "じ"

Justin Garcia

Reported 2009-07-22 17:12:49 PDT

Go to http://en.wikipedia.org/wiki/Hiragana Search for a character Safari will also highlight the character + dakuten. For instance if I search for shi (し) Safari will also find ji (じ).

Attachments
Add attachment proposed patch, testcase, etc.

Justin Garcia

Comment 1 2009-07-22 17:24:38 PDT

Some Unicode normalization is going on somewhere. Seems like n and ñ might reasonably be expected to be equivalent, but not し and じ.

Yuta Kitamura

Comment 2 2009-07-22 18:43:20 PDT

I found that Safari also matches "シ" (Katakana letter shi), "ジ" (Katakana letter ji) and "ｼ" (Halfwidth katakana letter shi).

Yuta Kitamura

Comment 3 2009-07-22 19:06:53 PDT

Interestingly, Safari (and Chrome) also matches "㋛" (U+32DB, CIRCLED KATAKANA SI). I bet text find engine uses Unicode Normalization Form KD (NFKD; compatibility decomposition). In fact, when I search for "アパート", Safari matches "㌀" (U+3300, SQUARE APAATO). I'm not sure whether this behavior should be fixed.

Darin Adler

Comment 4 2009-08-11 11:56:46 PDT

The matching is based on ICU. The folding is due to code in ICU. In the released Safari 4 the behavior is the same no matter what the user’s locale is, but in the TOT WebKit it respects the user’s locale choice so will work differently for different users. If we don’t like the behavior we can ask ICU to change its behavior, which in some cases would mean changes to the Unicode specification, or we can add workarounds to WebKit’s TextIterator class.

Darin Adler

Comment 5 2009-10-01 12:25:07 PDT

I'm going to close this given that it's behaving as expected. If someone wants to make the case for different behavior, feel free to open a new bug or even open an ICU or Unicode standard bug.

Darin Adler

Comment 6 2010-01-11 10:41:43 PST

I learned more about this topic, and have now decided I was wrong. And I believe I fixed it. See the duplicate bug for details. *** This bug has been marked as a duplicate of bug 30437 ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution DUPLICATE

of bug 30437

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS OS X 10.5

Product WebKit

Component HTML Editing

Assignee

Nobody

Reported

2009-07-22 17:12 PDT

Modified

2010-01-11 10:41 PST History

CC List

4 users Show

URL

http://en.wikipedia.org/wiki/Hiragana

Keywords

Depends on

Blocks