WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED WORKSFORME
22247
Find in a page does not normalize
https://bugs.webkit.org/show_bug.cgi?id=22247
Summary
Find in a page does not normalize
Jungshik Shin
Reported
2008-11-13 16:21:11 PST
1. Go to
http://fr.wikipedia.org
2. In 'Find in a page box', type U+0065 U+0301 ( é ) [1] Expected : A lot of matches are found Actual: No match is found All the e with acute accent in the page are in composed form (U+00E9) and does not match the decomposed representation. A short-term fix : Convert the input ('needle') to NFC. This will take care of the majority of cases because most web pages tend to use composed forms when available. In the long run : NFC might not be the best choice. 'Hay' may have be normalized as well. At least on Windows, some African-language keyboards produce decomposed forms even for letters with accent which have a composed form representation. This may also be an issue for Japanese voicing marks. I barely remember some hard-coded normalization for them in Webkit, but I haven't checked whether that is used in 'Find in a page'. If they're not taken care of, it's a rather serious issue. Reported against chrome:
http://crbug.com/1100
[1] Go to
http://rishida.net/scripts/uniview/conversion.php
and type 'U+0065U+0301' in the second box on the left and copy'n'paste the result in the top-left box.
Attachments
Add attachment
proposed patch, testcase, etc.
Jungshik Shin
Comment 1
2009-06-11 13:23:25 PDT
This was fixed a while ago using the usearch apis of ICU (in TextIterator.cpp). It's still not locale-specific, but that's a separate issue.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug