Bug 90319 - Make TextCodecLatin1 handle 8 bit data without converting to UChar's
Summary: Make TextCodecLatin1 handle 8 bit data without converting to UChar's
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks: 90320 90321
  Show dependency treegraph
 
Reported: 2012-06-29 15:50 PDT by Michael Saboff
Modified: 2012-07-18 13:22 PDT (History)
1 user (show)

See Also:


Attachments
Patch for Review (8.25 KB, patch)
2012-07-11 18:18 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch with updates from review (8.01 KB, patch)
2012-07-18 10:26 PDT, Michael Saboff
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2012-06-29 15:50:35 PDT
Most of the Latin-1 tagged resources on the web can be processed as 8-bit data using 8-bit strings.  The task is to modify TextCodecLatin1 and related code to return strings appropriate for the source data.
Comment 1 Michael Saboff 2012-07-11 18:18:46 PDT
Created attachment 151833 [details]
Patch for Review
Comment 2 Oliver Hunt 2012-07-17 11:08:04 PDT
Comment on attachment 151833 [details]
Patch for Review

View in context: https://bugs.webkit.org/attachment.cgi?id=151833&action=review

> Source/WebCore/platform/text/TextCodecASCIIFastPath.h:43
> +    static void copy(LChar* destination, const uint8_t* source)
> +    {
> +        destination[0] = source[0];
> +        destination[1] = source[1];
> +        destination[2] = source[2];
> +        destination[3] = source[3];
> +    }
> +    

What does the assembly for this look like?

> Source/WebCore/platform/text/TextCodecASCIIFastPath.h:62
> +        destination[0] = source[0];
> +        destination[1] = source[1];
> +        destination[2] = source[2];
> +        destination[3] = source[3];
> +        destination[4] = source[4];
> +        destination[5] = source[5];
> +        destination[6] = source[6];
> +        destination[7] = source[7];

ditto

> Source/WebCore/platform/text/TextCodecLatin1.cpp:172
> +    while (characters < destination)

This condition is really confusing to me, what on earth is destination?
Comment 3 Michael Saboff 2012-07-18 10:26:25 PDT
Created attachment 153039 [details]
Patch with updates from review

(In reply to comment #2)
> (From update of attachment 151833 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=151833&action=review
> 
> > Source/WebCore/platform/text/TextCodecASCIIFastPath.h:43
> > +    static void copy(LChar* destination, const uint8_t* source)
> > +    {
> > +        destination[0] = source[0];
> > +        destination[1] = source[1];
> > +        destination[2] = source[2];
> > +        destination[3] = source[3];
> > +    }
> > +    
> 
> What does the assembly for this look like?
> 
> > Source/WebCore/platform/text/TextCodecASCIIFastPath.h:62
> > +        destination[0] = source[0];
> > +        destination[1] = source[1];
> > +        destination[2] = source[2];
> > +        destination[3] = source[3];
> > +        destination[4] = source[4];
> > +        destination[5] = source[5];
> > +        destination[6] = source[6];
> > +        destination[7] = source[7];
> 
> ditto

Changed both of these explicit copies to memcpy(destination, source, <4 or 8>)

For Mac 64 bit with these changes, the compiler generated:
    mov    (%r12),%rdx
    ...
    mov    %rdx,(%r15)

> 
> > Source/WebCore/platform/text/TextCodecLatin1.cpp:172
> > +    while (characters < destination)
> 
> This condition is really confusing to me, what on earth is destination?

Changed this to:

    // Zero extend and copy already processed 8 bit data
    LChar* ptr8 = characters;
    LChar* endPtr8 = destination;

    while (ptr8 < endPtr8)
        *destination16++ = *ptr8++;
Comment 4 Michael Saboff 2012-07-18 13:22:54 PDT
Committed r123008: <http://trac.webkit.org/changeset/123008>