Bug 54446

Summary: Share the helper functions used by Latin-1 and UTF-8 text codecs
Product: WebKit Reporter: Andreas Kling <kling>
Component: TextAssignee: Andreas Kling <kling>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, koivisto
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Proposed patch
darin: review-
Proposed patch v2 darin: review+

Description Andreas Kling 2011-02-15 04:13:27 PST
The new TextCodecUTF8 is using the 0x80808080 trick from TextCodecLatin1.
We should put the helper functions in a single place instead of duplicating it in both implementations.
Comment 1 Andreas Kling 2011-02-15 04:19:15 PST
Created attachment 82435 [details]
Proposed patch
Comment 2 Darin Adler 2011-02-15 08:52:57 PST
Comment on attachment 82435 [details]
Proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=82435&action=review

> Source/WebCore/ChangeLog:10
> +        Move MachineWord, UCharByteFiller and the other helper functions
> +        into TextCodecHelpers.h where they can be used by both TextCodecUTF8
> +        and TextCodecLatin1.

Very good idea--I should have done this--but I do not like the name “helpers” because it’s so vague.

I would call this TextCodecASCIIFastPath.h or something along those lines.

> Source/WebCore/platform/text/TextCodecLatin1.cpp:130
> +        if (*source < 0x80) {

Should use isASCII instead of hardcoding 0x80.
Comment 3 Darin Adler 2011-02-15 08:53:59 PST
Comment on attachment 82435 [details]
Proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=82435&action=review

> Source/WebCore/platform/text/TextCodecHelpers.h:34
> +// Assuming that a pointer is the size of a "machine word", then
> +// uintptr_t is an integer type that is also a machine word.
> +typedef uintptr_t MachineWord;

Oops, this header needs to be in the WebCore namespace, not the global namespace.
Comment 4 Andreas Kling 2011-02-15 09:14:46 PST
Created attachment 82469 [details]
Proposed patch v2

Updated patch addressing all comments.
Comment 5 Darin Adler 2011-02-15 09:34:50 PST
Comment on attachment 82469 [details]
Proposed patch v2

View in context: https://bugs.webkit.org/attachment.cgi?id=82469&action=review

r=me -- also had some ideas for further cleanup

> Source/WebCore/platform/text/TextCodecLatin1.cpp:137
> -                    if (chunk & NonASCIIMask<sizeof(uintptr_t)>::value())
> +                    if (chunk & NonASCIIMask<sizeof(MachineWord)>::value())
>                          goto useLookupTable;

I realized we can make this even cleaner by defining a function in the header:

    inline MachineWord NonASCIIMachineWordMask()
    {
        return NonASCIIMask<sizeof(MachineWord)>::value();
    }

Or:

    inline bool isAllASCII(MachineWord word)
    {
        return word & NonASCIIMask<sizeof(MachineWord)>::value();
    }

And use one of these functions in the files. Then the template magic stays in the header.

> Source/WebCore/platform/text/TextCodecLatin1.cpp:139
> +                    UCharByteFiller<sizeof(MachineWord)>::copy(destination, source);

And here:

    inline void copyASCIIMachineWord(UChar* destination, const uint8_t* source)
    {
        UCharByteFiller<sizeof(MachineWord)>::copy(destination, source);
    }

Or the type of source could be const MachineWord* and we could cast it.

It’s nice to keep the template goop in the header.
Comment 6 Andreas Kling 2011-02-15 09:43:30 PST
(In reply to comment #5)
> (From update of attachment 82469 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=82469&action=review
> 
> r=me -- also had some ideas for further cleanup

And good ideas they are, will incorporate them when landing.

>     inline bool isAllASCII(MachineWord word)
>     {
>         return word & NonASCIIMask<sizeof(MachineWord)>::value();
>     }

This should probably be called isNotAllASCII(). ;)
Comment 7 Darin Adler 2011-02-15 09:45:11 PST
(In reply to comment #6)
> And good ideas they are, will incorporate them when landing.
> 
> >     inline bool isAllASCII(MachineWord word)
> >     {
> >         return word & NonASCIIMask<sizeof(MachineWord)>::value();
> >     }
> 
> This should probably be called isNotAllASCII(). ;)

Or we could put a ! in the function, which is probably what I would do.
Comment 8 Andreas Kling 2011-02-15 10:22:07 PST
Committed r78580: <http://trac.webkit.org/changeset/78580>