RESOLVED FIXED Bug 15519
eliminate use of <ctype.h> for processing ASCII
https://bugs.webkit.org/show_bug.cgi?id=15519
Summary eliminate use of <ctype.h> for processing ASCII
Darin Adler
Reported 2007-10-15 11:18:44 PDT
Since <ctype.h> can give locale-specific results, it's not good to use it in WebKit. Also, it's not necessarily fast or efficient enough. We'll use our own instead.
Attachments
patch (80.44 KB, patch)
2007-10-15 18:13 PDT, Darin Adler
mjs: review+
Darin Adler
Comment 1 2007-10-15 17:04:43 PDT
Committed revision 26638.
Darin Adler
Comment 2 2007-10-15 17:59:50 PDT
(In reply to comment #1) > Committed revision 26638. Oops. That was for another bug.
Darin Adler
Comment 3 2007-10-15 18:13:10 PDT
Eric Seidel (no email)
Comment 4 2007-10-15 21:45:18 PDT
Comment on attachment 16683 [details] patch I'm a big fan of this change. Those method names are so much cleaner than the ctype ones, and it's nice to get rid of the ctype include hack. I was a little surprised you didn't use template functions to implement the various copy/paste char variants. I would mark this r+ but I didn't stare at the individual is* functions long enough to determine if their actually 100% correct. I expect they are, but I don't trust my brain at this hour.
Geoffrey Garen
Comment 5 2007-10-16 10:59:10 PDT
- if (isalpha(*p) && toupper(*p) != *p) + if (isASCIIAlpha(*p) && toASCIILower(*p) != *p) lstyle = 0; As an EBay reviewer once wrote, "BAD BAD FRAUD."
Darin Adler
Comment 6 2007-10-16 11:15:22 PDT
(In reply to comment #5) > - if (isalpha(*p) && toupper(*p) != *p) > + if (isASCIIAlpha(*p) && toASCIILower(*p) != *p) > lstyle = 0; > > As an EBay reviewer once wrote, "BAD BAD FRAUD." Oops! I wonder why this one didn't show up as a test failure. I'll look into that.
Maciej Stachowiak
Comment 7 2007-10-16 11:28:56 PDT
Comment on attachment 16683 [details] patch Would it be reasonable to make these template functions instead of overloads: +inline bool isASCIIAlpha(char c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; } +inline bool isASCIIAlpha(unsigned short c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; } +inline bool isASCIIAlpha(wchar_t c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; } Perhaps DisallowCType.h or ASCIICType.h should have a comment explaining why ASCIICType functions are preferred to regular ctype functions. r=me anyway but please consider these suggestions.
Geoffrey Garen
Comment 8 2007-10-16 11:40:40 PDT
I verified with a table that the ASCII functions are correct. I agree with Maciej that a better explanation / compiler error would help a developer who run into it down the line. 5948 } else if (c == ' ' || ispunct(c)) { 5949 } else if (c == ' ' || c >= 0x21 && c <= 0x2F || c >= 0x3A && c <= 0x40 || c >= 0x5B && c <= 0x60 || c >= 0x7B && c <= 0x7D) { 59495950 Why not provide isASCIIPunct instead? For ASCIICType.h, I think the preferred style is to indent inside a namespace in header files. Also, why so much duplicated code? For instance, why doesn't isASCIIAlphanumeric call isASCIIAlpha? r=me
Geoffrey Garen
Comment 9 2007-10-16 11:40:52 PDT
I verified with a table that the ASCII functions are correct. I agree with Maciej that a better explanation / compiler error would help a developer who run into it down the line. 5948 } else if (c == ' ' || ispunct(c)) { 5949 } else if (c == ' ' || c >= 0x21 && c <= 0x2F || c >= 0x3A && c <= 0x40 || c >= 0x5B && c <= 0x60 || c >= 0x7B && c <= 0x7D) { 59495950 Why not provide isASCIIPunct instead? For ASCIICType.h, I think the preferred style is to indent inside a namespace in header files. Also, why so much duplicated code? For instance, why doesn't isASCIIAlphanumeric call isASCIIAlpha? r=me
Darin Adler
Comment 10 2007-10-16 12:06:57 PDT
(In reply to comment #9) > - } else if (c == ' ' || ispunct(c)) { > + } else if (c == ' ' || c >= 0x21 && c <= 0x2F || c >= 0x3A && c > <= 0x40 || c >= 0x5B && c <= 0x60 || c >= 0x7B && c <= 0x7D) { > > Why not provide isASCIIPunct instead? To me, this category seems far less definitive than the others. And the rule is specifically about Mac OS X behavior. I suspect that ispunct isn't even the correct rule; at some point we could visit this code and get the rule exactly right for all sorts of keyboards. I don't think it's correct to assume that the only punctuation you could type on a keyboard is ASCII punctuation. I didn't want to dignify the existing rule with a name, but I also didn't want to change behavior. > For ASCIICType.h, I think the preferred style is to indent inside a namespace > in header files. Hyatt and Maciej have been debating this point. I could go either way. > Also, why so much duplicated code? For instance, why doesn't > isASCIIAlphanumeric call isASCIIAlpha? Hard to say exactly why. I like the fact that each of these lines stands alone. I expect that any future changes to this file would primarily be for performance optimization, perhaps to remove some of these functions if we find they aren't really useful, or maybe to add a few more.
Darin Adler
Comment 11 2007-10-16 12:07:59 PDT
(In reply to comment #7) > (From update of attachment 16683 [details] [edit]) > Would it be reasonable to make these template functions instead of overloads: > > +inline bool isASCIIAlpha(char c) { return (c | 0x20) >= 'a' && (c | 0x20) <= > 'z'; } > +inline bool isASCIIAlpha(unsigned short c) { return (c | 0x20) >= 'a' && (c | > 0x20) <= 'z'; } > +inline bool isASCIIAlpha(wchar_t c) { return (c | 0x20) >= 'a' && (c | 0x20) > <= 'z'; } Maybe. Template functions are more annoying when debugging. I'm feeling slightly conflicted about it, and I think I'll leave it as-is for now.
Darin Adler
Comment 12 2007-10-16 13:26:56 PDT
Committed revision 26676.
Note You need to log in before you can comment on or make changes to this bug.