Bug 99392 - Creating a String from an NSString should check for all 8 bit strings
Summary: Creating a String from an NSString should check for all 8 bit strings
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-15 17:17 PDT by Michael Saboff
Modified: 2012-10-17 15:17 PDT (History)
3 users (show)

See Also:


Attachments
Patch (6.13 KB, patch)
2012-10-15 17:34 PDT, Michael Saboff
darin: review-
darin: commit-queue-
Details | Formatted Diff | Diff
Updated Patch (4.08 KB, patch)
2012-10-16 15:15 PDT, Michael Saboff
buildbot: commit-queue-
Details | Formatted Diff | Diff
Patch with Test Failure Fix (4.20 KB, patch)
2012-10-17 10:14 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Fixed WTF/ChangeLog (3.55 KB, patch)
2012-10-17 13:52 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2012-10-15 17:17:27 PDT
In most cases NSString's that are used to create String's are all 8 bit (typically ASCII).  In these cases we should create 8 bit Strings.
Comment 1 Michael Saboff 2012-10-15 17:34:26 PDT
Created attachment 168818 [details]
Patch

During simple testing, I never saw a 16 bit NSString.
Comment 2 Darin Adler 2012-10-15 20:24:14 PDT
Comment on attachment 168818 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=168818&action=review

Please look into putting the fast check for 8-bit characters in ASCIIFastPath.h instead of here.

> Source/WTF/wtf/text/StringImpl.cpp:1579
> +#if CPU(X86_64)
> +bool containsOnly8BitCharacters(const UChar* characters, unsigned length)

This looks like code that should be in the ASCIIFastPath.h, rather than in this source file. The same techniques used in charactersAreAllASCII are appropriate and that function is even better optimized than this code, I believe.

With a bit of work we could probably make the code for charactersAreAllASCII and charactersAreAll8Bit be generated from a single function template.
Comment 3 Benjamin Poulain 2012-10-16 12:04:37 PDT
(In reply to comment #0)
> In most cases NSString's that are used to create String's are all 8 bit (typically ASCII).  In these cases we should create 8 bit Strings.

How common is this?
Can we fix those at the source, when the NSString/CFString are created? CoreFoundation also have shortcuts for 8bits, it would be nice if we could enjoy those.

> This looks like code that should be in the ASCIIFastPath.h, rather than in this source file. The same techniques used in charactersAreAllASCII are appropriate and that function is even better optimized than this code, I believe.
> 
> With a bit of work we could probably make the code for charactersAreAllASCII and charactersAreAll8Bit be generated from a single function template.

I would prefer a new header (Latin1FastPath.h?).
We are already abusing ASCIIFastPath.h with non-ASCII functions.
Comment 4 Michael Saboff 2012-10-16 15:15:44 PDT
Created attachment 169039 [details]
Updated Patch

It appears that these NSStrings are used in network responses among other places.  They appear to be created by Core Foundation code.

Found a way for CFStringGetBytes() to return an 8 bit Latin1 string that we can use directly.  No need to check for 8 bit characters.  Again in my simple browsing test this worked every time.

If I need to check for all 8 bit characters in the future, I think modifying and renaming ASCIIFastPath.h to CharacterFastPath.h is the way to go.
Comment 5 Build Bot 2012-10-16 16:31:05 PDT
Comment on attachment 169039 [details]
Updated Patch

Attachment 169039 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/14396335

New failing tests:
platform/mac/editing/pasteboard/text-precomposed.html
Comment 6 Michael Saboff 2012-10-17 10:14:10 PDT
Created attachment 169206 [details]
Patch with Test Failure Fix

Added code to check that all characters in the CFString were used in the conversion to Latin1.
Comment 7 Michael Saboff 2012-10-17 13:52:33 PDT
Created attachment 169251 [details]
Fixed WTF/ChangeLog

No other change from prior patch.
Comment 8 Geoffrey Garen 2012-10-17 14:46:52 PDT
Comment on attachment 169251 [details]
Fixed WTF/ChangeLog

r=me
Comment 9 WebKit Review Bot 2012-10-17 15:17:53 PDT
Comment on attachment 169251 [details]
Fixed WTF/ChangeLog

Clearing flags on attachment: 169251

Committed r131656: <http://trac.webkit.org/changeset/131656>
Comment 10 WebKit Review Bot 2012-10-17 15:17:57 PDT
All reviewed patches have been landed.  Closing bug.