Bug 99739 - convertUTF8ToUTF16() Should Check for ASCII Input
Summary: convertUTF8ToUTF16() Should Check for ASCII Input
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on: 99770
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-18 11:37 PDT by Michael Saboff
Modified: 2012-10-19 03:09 PDT (History)
3 users (show)

See Also:


Attachments
Patch (7.41 KB, patch)
2012-10-18 14:14 PDT, Michael Saboff
ggaren: review+
buildbot: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2012-10-18 11:37:10 PDT
If convertUTF8ToUTF16() checked if the source was ASCII during the conversion and returned that to the caller, the caller could create an 8 bit String from the source instead of a 16 bit String from the converted result.
Comment 1 Michael Saboff 2012-10-18 14:14:08 PDT
Created attachment 169473 [details]
Patch
Comment 2 Build Bot 2012-10-18 14:20:59 PDT
Comment on attachment 169473 [details]
Patch

Attachment 169473 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/14412711
Comment 3 Geoffrey Garen 2012-10-18 14:23:16 PDT
Comment on attachment 169473 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=169473&action=review

r=me

> Source/JavaScriptCore/API/JSStringRef.cpp:52
> +        if (conversionOK == convertUTF8ToUTF16(&string, string + length, &p, p + length, &sourceIsAllASCII)) {
> +            if (sourceIsAllASCII)
> +                return OpaqueJSString::create(reinterpret_cast<const LChar*>(string), length).leakRef();

Our 8bit strings can handle Latin1 in addition to ASCII. Is there a particular reason to exclude Latin1 characters from becoming 8bit strings here?

> Source/WTF/wtf/unicode/UTF8.cpp:298
>  ConversionResult convertUTF8ToUTF16(

Looks like you need to export this function in order to build.
Comment 4 Michael Saboff 2012-10-18 14:29:48 PDT
(In reply to comment #3)
> (From update of attachment 169473 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=169473&action=review
> 
> r=me
> 
> > Source/JavaScriptCore/API/JSStringRef.cpp:52
> > +        if (conversionOK == convertUTF8ToUTF16(&string, string + length, &p, p + length, &sourceIsAllASCII)) {
> > +            if (sourceIsAllASCII)
> > +                return OpaqueJSString::create(reinterpret_cast<const LChar*>(string), length).leakRef();
> 
> Our 8bit strings can handle Latin1 in addition to ASCII. Is there a particular reason to exclude Latin1 characters from becoming 8bit strings here?

A Latin 1 characters between 0x80 and 0xFF takes two UTF8 characters and requires some conversion.  ASCII doesn't.

> > Source/WTF/wtf/unicode/UTF8.cpp:298
> >  ConversionResult convertUTF8ToUTF16(
> 
> Looks like you need to export this function in order to build.

Looking into this.
Comment 5 Michael Saboff 2012-10-18 15:56:49 PDT
(In reply to comment #4)

> > > Source/WTF/wtf/unicode/UTF8.cpp:298
> > >  ConversionResult convertUTF8ToUTF16(
> > 
> > Looks like you need to export this function in order to build.
> 
> Looking into this.

The build failure was due to a problem in the Mac build process for WTF.  This is tracked in https://bugs.webkit.org/show_bug.cgi?id=99770.
Comment 6 Michael Saboff 2012-10-18 18:22:26 PDT
Committed r131836: <http://trac.webkit.org/changeset/131836>
Comment 7 Mark Lam 2012-10-19 02:44:37 PDT
(In reply to comment #6)
> Committed r131836: <http://trac.webkit.org/changeset/131836>

This commit broke the Windows build.  Attempted a fix landed in r131877 <http://trac.webkit.org/changeset/131877>.
Comment 8 Mark Lam 2012-10-19 03:09:41 PDT
Next step of fix landed in r131882 <http://trac.webkit.org/changeset/131882>.