Bug 29103

Summary: typeAheadFind in select element is case-sensitive for Cyrillic and Greek
Product: WebKit Reporter: Jungshik Shin <jshin>
Component: FormsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, aroben, commit-queue, darin, hbono
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows XP   
Attachments:
Description Flags
a simple test case
none
a quick fix
aroben: review-
a quick fix for SelectElement::typeAheadFind()
darin: review-
The second quick fix for SelectElement::typeAheadFind()
darin: review+
The third quick fix for SelectElement::typeAheadFind() none

Description Jungshik Shin 2009-09-09 12:13:30 PDT
Created attachment 39291 [details]
a simple test case

Reported in the chromium bug tracker : http://crbug.com/21357#c3 and http://crbug.com/21389.

1. Load the attachment into Safari
2. With the focus on the drop-down menu, press 'p' or 'P' multiple times. Three items starting with 'P' are selected in turn.
3. Switch your keyboard to Russian
4. Press 'g' (to get Cyrillic 'п'). 
5. The first item starting with capital 'п' (i.e. 'П') has to be selected, but it's not selected.
6. Press 'G' (to get Cyrillic capital letter 'П'). Now the first item starting with 'П' is selected.


typeAheadFind calls startsWith with the 2nd parameter set to false (case-insensitve), which in turn uses 'reverseFind' (StringImpl). It appears to do the right thing (I didn't check the way equalsIgnoreCase and toLowerCaseASCII used in reverseFind/Find are implemented, yet. I'll), but Safari trunk build has this issue.
Comment 1 Mark Rowe (bdash) 2009-09-09 13:40:37 PDT
<rdar://problem/7209689>
Comment 2 Hironori Bono 2009-09-11 01:26:28 PDT
Created attachment 39415 [details]
a quick fix

This change just replaces WTF::toASCIILower() calls in StringImpl::reverseFind() with WTF::Unicode::toLower() for non-ASCII characters.
I'm not sure why StringImpl::reverseFind() uses WTF::toASCIILower(), though. (Performance issue?)
Comment 3 Darin Adler 2009-09-11 10:41:23 PDT
It's not good to change reverseFind without also changing find.

Some callers may want the ASCII-only case folding, so we need to look at all the callers before changing the behavior of the function.
Comment 4 Adam Roben (:aroben) 2009-09-16 08:38:56 PDT
Comment on attachment 39415 [details]
a quick fix

This seems like an awfully broad fix for such a specific symptom. Are there other bugs caused by reverseFind using toASCIILower? If the new way is better/more correct, wouldn't we want find to work the same way? What performance impact does this have?

r- so we can figure out the above issues.
Comment 5 Hironori Bono 2009-09-17 00:30:33 PDT
Thank you for your comments.
We should have had a discussion about how to fix this issue before sending a review request. Sorry for sending such a confusing review request.

(In reply to comment #4)
> (From update of attachment 39415 [details])
> This seems like an awfully broad fix for such a specific symptom.

In my honest opinion, I'm also wondering if it is a good option to change WTF::toASCIILower() with WTF::Unicode::toLower() in StringImpl::reverseFind() 
even though it is pretty simple.

> Are there other bugs caused by reverseFind using toASCIILower?

As far as I have tested, SelectElement::typeAheadFind() is the only possible place which calls StringImpl::reverseFind() with non-ASCII string (via WebCore::String::startsWith()).
That is, it may be an overkill to replace WTF::toASCIILower() with WTF::Unicode::toLower() just for SelectElement::typeAheadFind().

> If the new way is better/more correct, wouldn't we want find to work the same way?

WTF::Unicode::toLower() calls an ICU function u_tolower(). So, it works not only ASCII characters but also non-ASCII characters, such as Latin, Greek, Cyrillic, etc. But, as I noted above, SelectElement::typeAheadFind() is the only possible place that needs this feature.

> What performance impact does this have?

Even though I haven't measured any data. I'm sure this change decreases the performance of parsers more or less, because StringImpl::startsWith() is called from many parsers, such as CSS parsers, HTML parsers, etc.

Regards,

Hironori Bono
Comment 6 Adam Roben (:aroben) 2009-09-17 07:42:05 PDT
(In reply to comment #5)
It sounds like it would be better to find a more targeted fix for this bug, given your comments. Thanks for the detailed responses!
Comment 7 Hironori Bono 2009-10-12 21:59:38 PDT
Created attachment 41084 [details]
a quick fix for SelectElement::typeAheadFind()

Thank you for your comments and sorry for my slow updates.
I updated my change to add String::lower() calls in SelectElement::typeAheadFind() to fix this bug.
Would it be possible to review the updated change?

Regards,

Hironori Bono
Comment 8 Darin Adler 2009-10-13 14:14:08 PDT
Comment on attachment 41084 [details]
a quick fix for SelectElement::typeAheadFind()

Thanks for keeping at it!

> +    WebCore::String prefix_lower(prefix.lower());

There's no need for an explicit WebCore prefix here -- this code is already inside the WebCore namespace. We don't use underscores in variable names in WebKit code.

> +        // String::startsWith() doesn't convert non-ASCII characters to lowercase since converting a non-ASCII
> +        // string to lowercase is slow.

I think I steered you wrong here. It's strange that String::startsWith is not the same as other functions in StringImpl in this respect and arguable it *is* a bug in startsWith.

On the other hand, it seems safer to work around String::startsWith by calling lower() here than to change String::startsWith, so I think we can live with this as a fix to this bug. We can always revisit later if we fix startsWith. And this fix avoids the need for you to stdy all the call sites of startsWith.

Unfortunately, the comment makes it sound like this String::startsWith behavior is intentional and a good thing, and I do not think that is true. Really we're working around a bug in String::startsWith here, and someone should fix that bug later. Or alternatively we should change the design of the other String member functions. For many we need ASCII-only versions, not full-Unicode versions.

> +        // So, we convert each option string (which may contain non-ASCII characters) to lowercase before calling
> +        // String::startsWith().
>          String text = optionElement->textIndentedToRespectGroupLabel();
> -        if (stripLeadingWhiteSpace(text).startsWith(prefix, false)) {
> +        if (stripLeadingWhiteSpace(text).lower().startsWith(prefix_lower)) {

The correct function to use in both cases above is actually String::foldCase(), not String::lower().

review- because of the style mistakes.
Comment 9 Hironori Bono 2009-10-21 02:16:51 PDT
Created attachment 41558 [details]
The second quick fix for SelectElement::typeAheadFind()

Thank you for your review and comments.

> > +    WebCore::String prefix_lower(prefix.lower());
>
> There's no need for an explicit WebCore prefix here -- this code is already
> inside the WebCore namespace. We don't use underscores in variable names in
> WebKit code.

Thank you for noticing this. I have removed the "String" scope.

> > +        // String::startsWith() doesn't convert non-ASCII characters to lowercase since converting a non-ASCII
> > +        // string to lowercase is slow.
>
> I think I steered you wrong here. It's strange that String::startsWith is not
> the same as other functions in StringImpl in this respect and arguable it *is*
> a bug in startsWith.
>
> On the other hand, it seems safer to work around String::startsWith by calling
> lower() here than to change String::startsWith, so I think we can live with
> this as a fix to this bug. We can always revisit later if we fix startsWith.
> And this fix avoids the need for you to stdy all the call sites of startsWith.
>
> Unfortunately, the comment makes it sound like this String::startsWith behavior
> is intentional and a good thing, and I do not think that is true. Really we're
> working around a bug in String::startsWith here, and someone should fix that
> bug later. Or alternatively we should change the design of the other String
> member functions. For many we need ASCII-only versions, not full-Unicode
> versions.

Thank you for noticing this.
I totally misunderstood the comment #4.
I have updated this comment to "to be removed when startWith() supports non-ascii characters".

> > +        // So, we convert each option string (which may contain non-ASCII characters) to lowercase before calling
> > +        // String::startsWith().
> >          String text = optionElement->textIndentedToRespectGroupLabel();
> > -        if (stripLeadingWhiteSpace(text).startsWith(prefix, false)) {
> > +        if (stripLeadingWhiteSpace(text).lower().startsWith(prefix_lower)) {
>
> The correct function to use in both cases above is actually String::foldCase(),
> not String::lower().

Thank you. I forgot String::foldCase() and agree it is better than String::lower().
I changed String::lower() with String::foldCase().

Regards,

Hironori Bono
Comment 10 Darin Adler 2009-10-21 16:16:29 PDT
Comment on attachment 41558 [details]
The second quick fix for SelectElement::typeAheadFind()

> +    // Retrive the folding equivalent of this prefix string before finding a matching element.

Spelling error or typo here: "Retrieve".

I would call it the "case-folded equivalent"

> +    // This is a workaround that String::startWith() cannot fold non-ASCII characters, so
> +    // this workaround is to be removed when the above problem is fixed.

My comment wording would be more like this:

    // Compute a case-folded copy of the prefix string before beginning the search for
    // a matching element. This code uses foldCase to work around the fact that
    // String::startWith does not fold non-ASCII characters. This code can be changed
    // to use startWith once that is fixed.

> +    String prefixFoldCase(prefix.foldCase());

I would call this prefixWithCaseFolded.

> +        // Fold the option string and check its prefix is equal to the folded prefix.

"check its prefix" -> "check if its prefix"

review+ since these are only comments on the comments -- you could land it with your existing comments or fix them as I suggested.
Comment 11 Hironori Bono 2009-10-22 04:06:22 PDT
Created attachment 41652 [details]
The third quick fix for SelectElement::typeAheadFind()

Thank you for your quick review and comments.
I have updated my change to apply your comments. (Thank you for correcting my funny English.)
Unfortunately, I'm not a WebKit committer and cannot land changes by myself. So, would it be possible to land this change on my behalf?

Regards,

Hironori Bono
Comment 12 WebKit Commit Bot 2009-10-25 13:16:04 PDT
Comment on attachment 41652 [details]
The third quick fix for SelectElement::typeAheadFind()

Clearing flags on attachment: 41652

Committed r50046: <http://trac.webkit.org/changeset/50046>
Comment 13 WebKit Commit Bot 2009-10-25 13:16:09 PDT
All reviewed patches have been landed.  Closing bug.