Bug 117353

Summary: window.find() case-insensitive search doesn't match diacritical marks
Product: WebKit Reporter: Daniel Bates <dbates>
Component: HTML EditingAssignee: Daniel Bates <dbates>
Status: RESOLVED FIXED    
Severity: Normal CC: allan.jensen, ap, darin, rniwa
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch darin: review+

Description Daniel Bates 2013-06-07 10:33:53 PDT
Consider a web page that contains the spanish word "que". Then window.find("qué", false /* Case sensitive */) returns true. That is, we found a match since we consider the word "qué" to be equivalent to the word "que". Instead we should consider the accented e as being more specific and hence window.find() should return false (i.e. not found).
Comment 1 Daniel Bates 2013-06-07 10:34:18 PDT
<rdar://problem/13510672>
Comment 2 Daniel Bates 2013-06-07 11:41:14 PDT
Created attachment 204058 [details]
Patch
Comment 3 Daniel Bates 2013-06-07 11:50:45 PDT
(In reply to comment #2)
> Created an attachment (id=204058) [details]
> Patch

As a side effect of this change our case-insensitive search behavior for window.find()
will more closely match the same behavior in Firefox.
Comment 4 Daniel Bates 2013-06-07 13:11:25 PDT
Created attachment 204062 [details]
Patch

Remove tab characters that I inadvertently added to file LayoutTests/fast/text/test-find.js. Also, write testFind() in terms of variable CaseInsensitive.
Comment 5 Darin Adler 2013-06-07 14:01:17 PDT
Comment on attachment 204058 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=204058&action=review

> LayoutTests/fast/text/find-case-folding.html:40
> -            if (!canFind("Stra" + smallSharpS + "e", "Strasse")) {
> +            if (canFind("Stra" + smallSharpS + "e", "Strasse")) {
>                  success = false;
> -                message += " Cannot find ss when searching for small sharp s.";
> +                message += " Found ss when searching for small sharp s.";
>              }

Is this change an improvement? I think it was good that searching for ß found ss.
Comment 6 Alexey Proskuryakov 2013-06-07 14:18:14 PDT
I asked Dan on IRC to review some of older bugs we had about diacritics (in)sensitive search, I think that user expectations may be more nuanced.
Comment 7 Daniel Bates 2013-06-07 15:15:19 PDT
(In reply to comment #6)
> I asked Dan on IRC to review some of older bugs we had about diacritics (in)sensitive search, I think that user expectations may be more nuanced.

From looking at the related bugs, the proposed change will make our search behavior match people's expectations when searching for accented characters.

Moreover, it turned out that the Radar bug for this issue, <rdar://problem/13510672>, is a duplicate of <rdar://problem/8535227>.
Comment 8 Daniel Bates 2013-06-07 15:58:40 PDT
(In reply to comment #5)
> (From update of attachment 204058 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=204058&action=review
> 
> > LayoutTests/fast/text/find-case-folding.html:40
> > -            if (!canFind("Stra" + smallSharpS + "e", "Strasse")) {
> > +            if (canFind("Stra" + smallSharpS + "e", "Strasse")) {
> >                  success = false;
> > -                message += " Cannot find ss when searching for small sharp s.";
> > +                message += " Found ss when searching for small sharp s.";
> >              }
> 
> Is this change an improvement? I think it was good that searching for ß found ss.

Disregarding the case with ß for the moment (I talk about it below), the proposed patch is an improvement because it makes the behavior of window.find() more closely match a person's expectation when searching for special characters based on our in-person discussion this morning (06/07). This behavior change has also been requested by various people as seen in the bug reports related to <rdar://problem/8535227>.

Should we look to special case our handling of ß? ICU regards ß similar to an accented character (é) when using strength UCOL_SECONDARY.
Comment 9 Daniel Bates 2013-06-11 16:31:18 PDT
(In reply to comment #8)
> (In reply to comment #5)
> > (From update of attachment 204058 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=204058&action=review
> > 
> > > LayoutTests/fast/text/find-case-folding.html:40
> > > -            if (!canFind("Stra" + smallSharpS + "e", "Strasse")) {
> > > +            if (canFind("Stra" + smallSharpS + "e", "Strasse")) {
> > >                  success = false;
> > > -                message += " Cannot find ss when searching for small sharp s.";
> > > +                message += " Found ss when searching for small sharp s.";
> > >              }
> > 
> [...]
> Should we look to special case our handling of ß? ICU regards ß similar to an accented character (é) when using strength UCOL_SECONDARY.

From talking with Darin today (06/11) in person,  we should look to map ß to "ss". For now, we should land expected failure results for test LayoutTests/fast/text/find-case-folding.html instead of updating it to pass.
Comment 10 Daniel Bates 2013-06-12 09:04:00 PDT
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #5)
> > > (From update of attachment 204058 [details] [details] [details])
> > > View in context: https://bugs.webkit.org/attachment.cgi?id=204058&action=review
> > > 
> > > > LayoutTests/fast/text/find-case-folding.html:40
> > > > -            if (!canFind("Stra" + smallSharpS + "e", "Strasse")) {
> > > > +            if (canFind("Stra" + smallSharpS + "e", "Strasse")) {
> > > >                  success = false;
> > > > -                message += " Cannot find ss when searching for small sharp s.";
> > > > +                message += " Found ss when searching for small sharp s.";
> > > >              }
> > > 
> > [...]
> > Should we look to special case our handling of ß? ICU regards ß similar to an accented character (é) when using strength UCOL_SECONDARY.
> 
> From talking with Darin today (06/11) in person,  we should look to map ß to "ss". For now, we should land expected failure results for test LayoutTests/fast/text/find-case-folding.html instead of updating it to pass.

Filed bug #117548 to map ß to "ss".
Comment 11 Daniel Bates 2013-06-12 09:46:29 PDT
Committed r151505: <http://trac.webkit.org/changeset/151505>
Comment 12 Allan Sandfeld Jensen 2013-09-13 05:43:44 PDT
This patch raised the minimum required libicu to 4.4, was that intentional?