Bug 72323 - Towards 8 bit Strings - Update utf8() and ascii() methods for 8 bit strings
Summary: Towards 8 bit Strings - Update utf8() and ascii() methods for 8 bit strings
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks: 71337
  Show dependency treegraph
 
Reported: 2011-11-14 16:02 PST by Michael Saboff
Modified: 2011-11-15 22:07 PST (History)
2 users (show)

See Also:


Attachments
Patch (12.59 KB, patch)
2011-11-14 16:46 PST, Michael Saboff
ggaren: review+
Details | Formatted Diff | Diff
Updated patch with speculative Windows fix (13.51 KB, patch)
2011-11-15 14:22 PST, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch - Windows Build Fix - This time for sure! (14.00 KB, patch)
2011-11-15 17:02 PST, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch updated with missing export (15.08 KB, patch)
2011-11-15 18:53 PST, Michael Saboff
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2011-11-14 16:02:49 PST
The UString and String classes have ascii() and utf8() methods that can handle 8 bit strings more efficiently.
Comment 1 Michael Saboff 2011-11-14 16:46:38 PST
Created attachment 115059 [details]
Patch
Comment 2 Geoffrey Garen 2011-11-15 11:12:29 PST
I think "#include <wtf/unicode/Unicode.h>" vs "#include Unicode.h" might fix the Windows build.
Comment 3 Geoffrey Garen 2011-11-15 11:50:31 PST
Comment on attachment 115059 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=115059&action=review

r=me

> Source/JavaScriptCore/runtime/UString.cpp:425
> +    if (is8Bit()) {
> +        const LChar* characters = this->characters8();
> +
> +        ConversionResult result = convertUTF16ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size());
> +        ASSERT_UNUSED(result, result != targetExhausted); // (length * 3) should be sufficient for any conversion

I think it's confusing to pretend that our underlying buffer is UTF16, when our convention is to say that it's Latin1. So, to produce UTF8 data from an LChar buffer, let's use a function called convertLatin1ToUTF8.

> Source/JavaScriptCore/wtf/text/WTFString.cpp:732
> +        ConversionResult result = convertUTF16ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size());
> +        ASSERT_UNUSED(result, result != targetExhausted); // (length * 3) should be sufficient for any conversion

Same comment here.
Comment 4 Michael Saboff 2011-11-15 14:22:30 PST
Created attachment 115243 [details]
Updated patch with speculative Windows fix
Comment 5 WebKit Review Bot 2011-11-15 14:25:20 PST
Attachment 115243 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/wtf/unicode/UTF8.cpp:157:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Total errors found: 1 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Michael Saboff 2011-11-15 17:02:48 PST
Created attachment 115283 [details]
Patch - Windows Build Fix - This time for sure!
Comment 7 Michael Saboff 2011-11-15 18:53:57 PST
Created attachment 115294 [details]
Patch updated with missing export
Comment 8 Michael Saboff 2011-11-15 22:07:18 PST
Committed r100405: <http://trac.webkit.org/changeset/100405>