The new TextCodecUTF8 is using the 0x80808080 trick from TextCodecLatin1. We should put the helper functions in a single place instead of duplicating it in both implementations.
Created attachment 82435 [details] Proposed patch
Comment on attachment 82435 [details] Proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=82435&action=review > Source/WebCore/ChangeLog:10 > + Move MachineWord, UCharByteFiller and the other helper functions > + into TextCodecHelpers.h where they can be used by both TextCodecUTF8 > + and TextCodecLatin1. Very good idea--I should have done this--but I do not like the name “helpers” because it’s so vague. I would call this TextCodecASCIIFastPath.h or something along those lines. > Source/WebCore/platform/text/TextCodecLatin1.cpp:130 > + if (*source < 0x80) { Should use isASCII instead of hardcoding 0x80.
Comment on attachment 82435 [details] Proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=82435&action=review > Source/WebCore/platform/text/TextCodecHelpers.h:34 > +// Assuming that a pointer is the size of a "machine word", then > +// uintptr_t is an integer type that is also a machine word. > +typedef uintptr_t MachineWord; Oops, this header needs to be in the WebCore namespace, not the global namespace.
Created attachment 82469 [details] Proposed patch v2 Updated patch addressing all comments.
Comment on attachment 82469 [details] Proposed patch v2 View in context: https://bugs.webkit.org/attachment.cgi?id=82469&action=review r=me -- also had some ideas for further cleanup > Source/WebCore/platform/text/TextCodecLatin1.cpp:137 > - if (chunk & NonASCIIMask<sizeof(uintptr_t)>::value()) > + if (chunk & NonASCIIMask<sizeof(MachineWord)>::value()) > goto useLookupTable; I realized we can make this even cleaner by defining a function in the header: inline MachineWord NonASCIIMachineWordMask() { return NonASCIIMask<sizeof(MachineWord)>::value(); } Or: inline bool isAllASCII(MachineWord word) { return word & NonASCIIMask<sizeof(MachineWord)>::value(); } And use one of these functions in the files. Then the template magic stays in the header. > Source/WebCore/platform/text/TextCodecLatin1.cpp:139 > + UCharByteFiller<sizeof(MachineWord)>::copy(destination, source); And here: inline void copyASCIIMachineWord(UChar* destination, const uint8_t* source) { UCharByteFiller<sizeof(MachineWord)>::copy(destination, source); } Or the type of source could be const MachineWord* and we could cast it. It’s nice to keep the template goop in the header.
(In reply to comment #5) > (From update of attachment 82469 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=82469&action=review > > r=me -- also had some ideas for further cleanup And good ideas they are, will incorporate them when landing. > inline bool isAllASCII(MachineWord word) > { > return word & NonASCIIMask<sizeof(MachineWord)>::value(); > } This should probably be called isNotAllASCII(). ;)
(In reply to comment #6) > And good ideas they are, will incorporate them when landing. > > > inline bool isAllASCII(MachineWord word) > > { > > return word & NonASCIIMask<sizeof(MachineWord)>::value(); > > } > > This should probably be called isNotAllASCII(). ;) Or we could put a ! in the function, which is probably what I would do.
Committed r78580: <http://trac.webkit.org/changeset/78580>