WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 15519
eliminate use of <ctype.h> for processing ASCII
https://bugs.webkit.org/show_bug.cgi?id=15519
Summary
eliminate use of <ctype.h> for processing ASCII
Darin Adler
Reported
2007-10-15 11:18:44 PDT
Since <ctype.h> can give locale-specific results, it's not good to use it in WebKit. Also, it's not necessarily fast or efficient enough. We'll use our own instead.
Attachments
patch
(80.44 KB, patch)
2007-10-15 18:13 PDT
,
Darin Adler
mjs
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Darin Adler
Comment 1
2007-10-15 17:04:43 PDT
Committed revision 26638.
Darin Adler
Comment 2
2007-10-15 17:59:50 PDT
(In reply to
comment #1
)
> Committed revision 26638.
Oops. That was for another bug.
Darin Adler
Comment 3
2007-10-15 18:13:10 PDT
Created
attachment 16683
[details]
patch
Eric Seidel (no email)
Comment 4
2007-10-15 21:45:18 PDT
Comment on
attachment 16683
[details]
patch I'm a big fan of this change. Those method names are so much cleaner than the ctype ones, and it's nice to get rid of the ctype include hack. I was a little surprised you didn't use template functions to implement the various copy/paste char variants. I would mark this r+ but I didn't stare at the individual is* functions long enough to determine if their actually 100% correct. I expect they are, but I don't trust my brain at this hour.
Geoffrey Garen
Comment 5
2007-10-16 10:59:10 PDT
- if (isalpha(*p) && toupper(*p) != *p) + if (isASCIIAlpha(*p) && toASCIILower(*p) != *p) lstyle = 0; As an EBay reviewer once wrote, "BAD BAD FRAUD."
Darin Adler
Comment 6
2007-10-16 11:15:22 PDT
(In reply to
comment #5
)
> - if (isalpha(*p) && toupper(*p) != *p) > + if (isASCIIAlpha(*p) && toASCIILower(*p) != *p) > lstyle = 0; > > As an EBay reviewer once wrote, "BAD BAD FRAUD."
Oops! I wonder why this one didn't show up as a test failure. I'll look into that.
Maciej Stachowiak
Comment 7
2007-10-16 11:28:56 PDT
Comment on
attachment 16683
[details]
patch Would it be reasonable to make these template functions instead of overloads: +inline bool isASCIIAlpha(char c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; } +inline bool isASCIIAlpha(unsigned short c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; } +inline bool isASCIIAlpha(wchar_t c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; } Perhaps DisallowCType.h or ASCIICType.h should have a comment explaining why ASCIICType functions are preferred to regular ctype functions. r=me anyway but please consider these suggestions.
Geoffrey Garen
Comment 8
2007-10-16 11:40:40 PDT
I verified with a table that the ASCII functions are correct. I agree with Maciej that a better explanation / compiler error would help a developer who run into it down the line. 5948 } else if (c == ' ' || ispunct(c)) { 5949 } else if (c == ' ' || c >= 0x21 && c <= 0x2F || c >= 0x3A && c <= 0x40 || c >= 0x5B && c <= 0x60 || c >= 0x7B && c <= 0x7D) { 59495950 Why not provide isASCIIPunct instead? For ASCIICType.h, I think the preferred style is to indent inside a namespace in header files. Also, why so much duplicated code? For instance, why doesn't isASCIIAlphanumeric call isASCIIAlpha? r=me
Geoffrey Garen
Comment 9
2007-10-16 11:40:52 PDT
I verified with a table that the ASCII functions are correct. I agree with Maciej that a better explanation / compiler error would help a developer who run into it down the line. 5948 } else if (c == ' ' || ispunct(c)) { 5949 } else if (c == ' ' || c >= 0x21 && c <= 0x2F || c >= 0x3A && c <= 0x40 || c >= 0x5B && c <= 0x60 || c >= 0x7B && c <= 0x7D) { 59495950 Why not provide isASCIIPunct instead? For ASCIICType.h, I think the preferred style is to indent inside a namespace in header files. Also, why so much duplicated code? For instance, why doesn't isASCIIAlphanumeric call isASCIIAlpha? r=me
Darin Adler
Comment 10
2007-10-16 12:06:57 PDT
(In reply to
comment #9
)
> - } else if (c == ' ' || ispunct(c)) { > + } else if (c == ' ' || c >= 0x21 && c <= 0x2F || c >= 0x3A && c > <= 0x40 || c >= 0x5B && c <= 0x60 || c >= 0x7B && c <= 0x7D) { > > Why not provide isASCIIPunct instead?
To me, this category seems far less definitive than the others. And the rule is specifically about Mac OS X behavior. I suspect that ispunct isn't even the correct rule; at some point we could visit this code and get the rule exactly right for all sorts of keyboards. I don't think it's correct to assume that the only punctuation you could type on a keyboard is ASCII punctuation. I didn't want to dignify the existing rule with a name, but I also didn't want to change behavior.
> For ASCIICType.h, I think the preferred style is to indent inside a namespace > in header files.
Hyatt and Maciej have been debating this point. I could go either way.
> Also, why so much duplicated code? For instance, why doesn't > isASCIIAlphanumeric call isASCIIAlpha?
Hard to say exactly why. I like the fact that each of these lines stands alone. I expect that any future changes to this file would primarily be for performance optimization, perhaps to remove some of these functions if we find they aren't really useful, or maybe to add a few more.
Darin Adler
Comment 11
2007-10-16 12:07:59 PDT
(In reply to
comment #7
)
> (From update of
attachment 16683
[details]
[edit]) > Would it be reasonable to make these template functions instead of overloads: > > +inline bool isASCIIAlpha(char c) { return (c | 0x20) >= 'a' && (c | 0x20) <= > 'z'; } > +inline bool isASCIIAlpha(unsigned short c) { return (c | 0x20) >= 'a' && (c | > 0x20) <= 'z'; } > +inline bool isASCIIAlpha(wchar_t c) { return (c | 0x20) >= 'a' && (c | 0x20) > <= 'z'; }
Maybe. Template functions are more annoying when debugging. I'm feeling slightly conflicted about it, and I think I'll leave it as-is for now.
Darin Adler
Comment 12
2007-10-16 13:26:56 PDT
Committed revision 26676.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug