RESOLVED FIXED 117353
window.find() case-insensitive search doesn't match diacritical marks
https://bugs.webkit.org/show_bug.cgi?id=117353
Summary window.find() case-insensitive search doesn't match diacritical marks
Daniel Bates
Reported 2013-06-07 10:33:53 PDT
Consider a web page that contains the spanish word "que". Then window.find("qué", false /* Case sensitive */) returns true. That is, we found a match since we consider the word "qué" to be equivalent to the word "que". Instead we should consider the accented e as being more specific and hence window.find() should return false (i.e. not found).
Attachments
Patch (31.25 KB, patch)
2013-06-07 11:41 PDT, Daniel Bates
no flags
Patch (31.26 KB, patch)
2013-06-07 13:11 PDT, Daniel Bates
darin: review+
Daniel Bates
Comment 1 2013-06-07 10:34:18 PDT
Daniel Bates
Comment 2 2013-06-07 11:41:14 PDT
Daniel Bates
Comment 3 2013-06-07 11:50:45 PDT
(In reply to comment #2) > Created an attachment (id=204058) [details] > Patch As a side effect of this change our case-insensitive search behavior for window.find() will more closely match the same behavior in Firefox.
Daniel Bates
Comment 4 2013-06-07 13:11:25 PDT
Created attachment 204062 [details] Patch Remove tab characters that I inadvertently added to file LayoutTests/fast/text/test-find.js. Also, write testFind() in terms of variable CaseInsensitive.
Darin Adler
Comment 5 2013-06-07 14:01:17 PDT
Comment on attachment 204058 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=204058&action=review > LayoutTests/fast/text/find-case-folding.html:40 > - if (!canFind("Stra" + smallSharpS + "e", "Strasse")) { > + if (canFind("Stra" + smallSharpS + "e", "Strasse")) { > success = false; > - message += " Cannot find ss when searching for small sharp s."; > + message += " Found ss when searching for small sharp s."; > } Is this change an improvement? I think it was good that searching for ß found ss.
Alexey Proskuryakov
Comment 6 2013-06-07 14:18:14 PDT
I asked Dan on IRC to review some of older bugs we had about diacritics (in)sensitive search, I think that user expectations may be more nuanced.
Daniel Bates
Comment 7 2013-06-07 15:15:19 PDT
(In reply to comment #6) > I asked Dan on IRC to review some of older bugs we had about diacritics (in)sensitive search, I think that user expectations may be more nuanced. From looking at the related bugs, the proposed change will make our search behavior match people's expectations when searching for accented characters. Moreover, it turned out that the Radar bug for this issue, <rdar://problem/13510672>, is a duplicate of <rdar://problem/8535227>.
Daniel Bates
Comment 8 2013-06-07 15:58:40 PDT
(In reply to comment #5) > (From update of attachment 204058 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=204058&action=review > > > LayoutTests/fast/text/find-case-folding.html:40 > > - if (!canFind("Stra" + smallSharpS + "e", "Strasse")) { > > + if (canFind("Stra" + smallSharpS + "e", "Strasse")) { > > success = false; > > - message += " Cannot find ss when searching for small sharp s."; > > + message += " Found ss when searching for small sharp s."; > > } > > Is this change an improvement? I think it was good that searching for ß found ss. Disregarding the case with ß for the moment (I talk about it below), the proposed patch is an improvement because it makes the behavior of window.find() more closely match a person's expectation when searching for special characters based on our in-person discussion this morning (06/07). This behavior change has also been requested by various people as seen in the bug reports related to <rdar://problem/8535227>. Should we look to special case our handling of ß? ICU regards ß similar to an accented character (é) when using strength UCOL_SECONDARY.
Daniel Bates
Comment 9 2013-06-11 16:31:18 PDT
(In reply to comment #8) > (In reply to comment #5) > > (From update of attachment 204058 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=204058&action=review > > > > > LayoutTests/fast/text/find-case-folding.html:40 > > > - if (!canFind("Stra" + smallSharpS + "e", "Strasse")) { > > > + if (canFind("Stra" + smallSharpS + "e", "Strasse")) { > > > success = false; > > > - message += " Cannot find ss when searching for small sharp s."; > > > + message += " Found ss when searching for small sharp s."; > > > } > > > [...] > Should we look to special case our handling of ß? ICU regards ß similar to an accented character (é) when using strength UCOL_SECONDARY. From talking with Darin today (06/11) in person, we should look to map ß to "ss". For now, we should land expected failure results for test LayoutTests/fast/text/find-case-folding.html instead of updating it to pass.
Daniel Bates
Comment 10 2013-06-12 09:04:00 PDT
(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #5) > > > (From update of attachment 204058 [details] [details] [details]) > > > View in context: https://bugs.webkit.org/attachment.cgi?id=204058&action=review > > > > > > > LayoutTests/fast/text/find-case-folding.html:40 > > > > - if (!canFind("Stra" + smallSharpS + "e", "Strasse")) { > > > > + if (canFind("Stra" + smallSharpS + "e", "Strasse")) { > > > > success = false; > > > > - message += " Cannot find ss when searching for small sharp s."; > > > > + message += " Found ss when searching for small sharp s."; > > > > } > > > > > [...] > > Should we look to special case our handling of ß? ICU regards ß similar to an accented character (é) when using strength UCOL_SECONDARY. > > From talking with Darin today (06/11) in person, we should look to map ß to "ss". For now, we should land expected failure results for test LayoutTests/fast/text/find-case-folding.html instead of updating it to pass. Filed bug #117548 to map ß to "ss".
Daniel Bates
Comment 11 2013-06-12 09:46:29 PDT
Allan Sandfeld Jensen
Comment 12 2013-09-13 05:43:44 PDT
This patch raised the minimum required libicu to 4.4, was that intentional?
Note You need to log in before you can comment on or make changes to this bug.