Bug 73882

Summary: Add 8 bit paths for StringTypeAdapter classes
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal    
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 73236    
Attachments:
Description Flags
Patch darin: review+

Description Michael Saboff 2011-12-05 18:00:45 PST
The StringTypeAdapter class family processes strings using 16 bit characters, possibly requiring up conversion for 8 bit strings.

This impacts Kraken audio-oscillator and stanford-crypto-aes.
Comment 1 Michael Saboff 2011-12-08 12:35:48 PST
Created attachment 118450 [details]
Patch

This patch has minimal performance improvement on the tests that us the paths due likely to the relative time spent in those paths.  It does seem to benefit date-format-xparb by 3-4%.
Comment 2 Darin Adler 2011-12-08 12:57:35 PST
Comment on attachment 118450 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=118450&action=review

> Source/JavaScriptCore/runtime/UStringConcatenate.h:45
> +    bool is8Bit() { return m_string.isNull() || m_string.is8Bit(); }

I wish we would be more consistent about these functions. For immutable objects like String I think that is8Bit and is16Bit should both return true for null and empty strings, characters8 and characters16 could both return a suitable pointer (if not null terminated it could even be the null pointer), and the guideline would be that callers check for the form they prefer first. But maybe there’s an argument for having is8Bit return false for null strings?

> Source/JavaScriptCore/runtime/UStringConcatenate.h:50
> +        ASSERT(is8Bit());
> +        const LChar* characters = m_string.characters8();

Why is this assertion needed? Doesn’t characters8 already do the assertion?

> Source/JavaScriptCore/runtime/UStringConcatenate.h:59
> +        const UChar* characters = m_string.characters();
>          for (unsigned i = 0; i < m_length; ++i)
> -            destination[i] = m_data[i];
> +            destination[i] = characters[i];

I think it might be better to use character8 and characters16 for this. Why expand the original string just to write it one time?

> Source/JavaScriptCore/runtime/UStringConcatenate.h:64
> -    const UChar* m_data;
> +    const JSC::UString& m_string;
>      unsigned m_length;

Seems wrong to have m_length if we also have m_string.

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:61
> +    void writeTo(LChar* destination)
> +    {
> +        ASSERT(is8Bit());
> +        *destination = m_buffer;
> +    }

The assertion here seems pointless.

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:84
> +        ASSERT(is8Bit());
> +        *destination = m_buffer;

The assertion here seems pointless.

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:135
> +    void writeTo(LChar* destination)
> +    {
> +        ASSERT(is8Bit());
> +        for (unsigned i = 0; i < m_length; ++i)
> +            destination[i] = static_cast<LChar>(m_buffer[i]);
> +    }

Assertion here seems pointless.

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:165
> +        memcpy(destination, m_buffer, static_cast<size_t>(m_length) * sizeof(LChar));

Why do we need that static_cast?

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:204
> +    void writeTo(LChar* destination)
> +    {
> +        ASSERT(is8Bit());
> +        for (unsigned i = 0; i < m_length; ++i)
> +            destination[i] = m_buffer[i];
> +    }

How can this ever work? Seems the assertion will always fire.

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:231
> +        memcpy(destination, m_buffer, static_cast<size_t>(m_length) * sizeof(LChar));

Again, why the cast?

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:292
> +            unsigned char c = m_buffer[i];
> +            destination[i] = c;

Other places in this same patch we used static_cast for this. Why should we use assignment instead here? Why not the other places too?

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:416
> +    bool is8Bit = adapter1.is8Bit() && adapter2.is8Bit();
> +
> +    if (is8Bit) {

I don’t think this local variable is helpful.
Comment 3 Michael Saboff 2011-12-08 13:47:05 PST
(In reply to comment #2)
> (From update of attachment 118450 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=118450&action=review
> 
> > Source/JavaScriptCore/runtime/UStringConcatenate.h:45
> > +    bool is8Bit() { return m_string.isNull() || m_string.is8Bit(); }
> 
> I wish we would be more consistent about these functions. For immutable objects like String I think that is8Bit and is16Bit should both return true for null and empty strings, characters8 and characters16 could both return a suitable pointer (if not null terminated it could even be the null pointer), and the guideline would be that callers check for the form they prefer first. But maybe there’s an argument for having is8Bit return false for null strings?

The reason for the isNull() check is for the case that we are creating empty strings in most cases and the String version of is8Bit() just calls m_impl->is8Bit(), even if m_impl is null.  Instead of putting a null check in String::is8Bit the check is placed here to reduce the overhead on most other calls.

There is no is16bit, that is implemented as !is8Bit() and we want null and empty strings to be considered 8 bit.  characters8, characters16 and characters all return 0 on null strings.

Adopted all other comments including changing the ASSERT that never works to a CRASH().
Comment 4 Michael Saboff 2011-12-08 13:47:57 PST
Committed r102380: <http://trac.webkit.org/changeset/102380>