Bug 22247
| Summary: | Find in a page does not normalize | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Jungshik Shin <jshin> |
| Component: | New Bugs | Assignee: | Nobody <webkit-unassigned> |
| Status: | RESOLVED WORKSFORME | ||
| Severity: | Normal | ||
| Priority: | P2 | ||
| Version: | 528+ (Nightly build) | ||
| Hardware: | All | ||
| OS: | All | ||
Jungshik Shin
1. Go to http://fr.wikipedia.org
2. In 'Find in a page box', type U+0065 U+0301 ( é ) [1]
Expected : A lot of matches are found
Actual: No match is found
All the e with acute accent in the page are in composed form (U+00E9) and does not match the decomposed representation.
A short-term fix : Convert the input ('needle') to NFC. This will take care of the majority of cases because most web pages tend to use composed forms when available.
In the long run : NFC might not be the best choice. 'Hay' may have be normalized as well.
At least on Windows, some African-language keyboards produce decomposed forms even for letters with accent which have a composed form representation.
This may also be an issue for Japanese voicing marks. I barely remember some hard-coded normalization for them in Webkit, but I haven't checked whether that is used in 'Find in a page'. If they're not taken care of, it's a rather serious issue.
Reported against chrome: http://crbug.com/1100
[1] Go to http://rishida.net/scripts/uniview/conversion.php and type 'U+0065U+0301' in the second box on the left and copy'n'paste the result in the top-left box.
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Jungshik Shin
This was fixed a while ago using the usearch apis of ICU (in TextIterator.cpp). It's still not locale-specific, but that's a separate issue.