Bug 13227 - StringImpl::isLower incorrectly assumes islower returns 1 (it can return any non-0)
Summary: StringImpl::isLower incorrectly assumes islower returns 1 (it can return any ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 523.x (Safari 3)
Hardware: All All
: P3 Minor
Assignee: Darin Adler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-29 16:55 PDT by Jungshik Shin
Modified: 2007-04-16 08:33 PDT (History)
1 user (show)

See Also:


Attachments
patch : 2-liner (1.22 KB, patch)
2007-04-05 13:21 PDT, Jungshik Shin
no flags Details | Formatted Diff | Diff
benchmark source file (1.90 KB, text/plain)
2007-04-10 11:58 PDT, Kirby White
no flags Details
updated benchmark source file (2.14 KB, text/plain)
2007-04-11 01:26 PDT, Darin Adler
no flags Details
patch that uses early-exit instead of &= (5.96 KB, patch)
2007-04-11 01:46 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
patch that uses && instead of & (5.49 KB, patch)
2007-04-11 01:56 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
patch that uses && instead of & and fixes non-BMP issue (5.49 KB, patch)
2007-04-11 02:03 PDT, Darin Adler
sullivan: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jungshik Shin 2007-03-29 16:55:12 PDT
We'd better make 'allLower' bool and use '!= 0' before using '&='
Comment 1 Jungshik Shin 2007-04-05 13:21:29 PDT
Created attachment 13972 [details]
patch : 2-liner

a simple patch.
Comment 2 Darin Adler 2007-04-05 13:45:41 PDT
Comment on attachment 13972 [details]
patch : 2-liner

What does "not very kosher" mean? Does this actually cause a problem? Does the fix have any performance impact?
Comment 3 Kirby White 2007-04-05 14:14:25 PDT
islower() from the C lib only claims to return nonzero if the character is lower-case, not necessarily 1. If it happened to return 1 in some situation and 2 in some other, the current test would fail.

An alternative to 
  allLower &= islower(c & 0x7F) != 0;
would be
  allLower = allLower && islower(c & 0x7F);
if that's faster.
Comment 4 Darin Adler 2007-04-09 15:47:22 PDT
(In reply to comment #3)
> An alternative to 
>   allLower &= islower(c & 0x7F) != 0;
> would be
>   allLower = allLower && islower(c & 0x7F);
> if that's faster.

No, that's way slower.
Comment 5 Darin Adler 2007-04-09 15:54:51 PDT
Comment on attachment 13972 [details]
patch : 2-liner

r=me; I think this might slow things down a bit; if so we might want to implement our own islower instead.
Comment 6 Jungshik Shin 2007-04-09 22:50:34 PDT
Thanks for r. Can you land it? 
Comment 7 Kirby White 2007-04-10 11:58:32 PDT
Created attachment 13998 [details]
benchmark source file

(In reply to comment #4)
> (In reply to comment #3)
> > An alternative to 
> >   allLower &= islower(c & 0x7F) != 0;
> > would be
> >   allLower = allLower && islower(c & 0x7F);
> > if that's faster.
> 
> No, that's way slower.

Are you sure?  I ran a simple (and completely unoptimized) benchmark that indicates it's noticeably faster than either the original version or the new boolean one, even for an all-lower-case string.  And for one with any upper-case, it's faster still, since logical && short-circuits the islower() call as soon as the first upper-case letter is encountered.

Lower-case string:
Original: 17.819 seconds (wall time)
Boolean: 19.257
Logical: 10.185

Upper-case string:
Original: 17.229
Boolean: 18.081
Logical: 7.698

User and kernel times from tcsh 'time' are analogous.  The benchmark source is attached.
Comment 8 Darin Adler 2007-04-11 01:25:54 PDT
My bad. Looks like I was completely wrong!

And in fact, the more obvious "early exit" version that breaks the loop when it finds something that's not lower is faster than any of the others in all cases I tested.

The "all lowercase" test was flawed because spaces are not lowercase.

All lowercase:

Original: 2.225
Boolean: 2.455
Logical: 2.226
Early Exit: 2.088

No lowercase:

Original: 2.521
Boolean: 2.782
Logical: 1.066
Early Exit: 0.107
Comment 9 Darin Adler 2007-04-11 01:26:54 PDT
Created attachment 14003 [details]
updated benchmark source file
Comment 10 Darin Adler 2007-04-11 01:46:00 PDT
Created attachment 14004 [details]
patch that uses early-exit instead of &=
Comment 11 Darin Adler 2007-04-11 01:46:26 PDT
Comment on attachment 13972 [details]
patch : 2-liner

Clearing the review flag from this older patch.
Comment 12 Darin Adler 2007-04-11 01:50:21 PDT
Uh oh! It's not correct to test unoptimized. When I optimize I get *very* different results for the all-lowercase version:

    Original: 0.210
    Boolean: 0.211
    Logical: 0.207
    Early Exit: 1.564

This says that we should stick with the logical version, not the early-exit version.
Comment 13 Darin Adler 2007-04-11 01:56:24 PDT
Created attachment 14005 [details]
patch that uses && instead of &
Comment 14 Darin Adler 2007-04-11 02:03:07 PDT
Created attachment 14006 [details]
patch that uses && instead of & and fixes non-BMP issue
Comment 15 Kirby White 2007-04-11 08:56:39 PDT
(In reply to comment #12)
> When I optimize I get *very* different results for the all-lowercase version:
> 
>     Original: 0.210
>     Boolean: 0.211
>     Logical: 0.207
>     Early Exit: 1.564

1) I wonder if the compiler is unrolling the loops, with constant test strings?  It might be worth testing with a user-input string.

2) I need a faster computer. :)
Comment 16 Darin Adler 2007-04-11 10:37:20 PDT
(In reply to comment #15)
> 1) I wonder if the compiler is unrolling the loops, with constant test strings?
>  It might be worth testing with a user-input string.

I don't think it's doing anything special because the test strings are constant, but I've been wrong before.
Comment 17 Kirby White 2007-04-11 10:39:11 PDT
Okay, I ran the test again with a user-input string and -O3 optimization and got the same conclusion you did, only more so.  (Using only -O doesn't give the same answer.)

thisisastringwithalllowercase (29)
Original: 0.522
Boolean: 0.473
Logical: 0.600
Early Exit: 12.309

HERE'S A STRING WITH NO LOWERCASE (33)
Original: 0.529
Boolean: 0.531
Logical: 0.523
Early Exit: 0.454
Comment 18 Darin Adler 2007-04-11 10:43:23 PDT
(In reply to comment #17)
> thisisastringwithalllowercase (29)
> Original: 0.522
> Boolean: 0.473
> Logical: 0.600
> Early Exit: 12.309

That's interesting. Logical is worse than my old code here. But anyway, lets just land the logical version.
Comment 19 John Sullivan 2007-04-15 21:17:52 PDT
Comment on attachment 14006 [details]
patch that uses && instead of & and fixes non-BMP issue

r=me
Comment 20 Darin Adler 2007-04-16 08:33:52 PDT
Sending        WebCore/ChangeLog
Sending        WebCore/platform/StringImpl.cpp
Transmitting file data ..
Committed revision 20900.