Bug 66161 - Investigate storing strings in 8-bit buffers when possible
: Investigate storing strings in 8-bit buffers when possible
Status: RESOLVED FIXED
Product: WebKit
Classification: Unclassified
Component: Web Template Framework
: 528+ (Nightly build)
: Unspecified Unspecified
: P2 Normal
Assigned To: Michael Saboff
:
Depends on: 66286 66661 67022 66399 66706 67337 70937
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-12 13:41 PDT by Annie Sullivan
Modified: 2011-10-27 13:16 PDT (History)
43 users (show)

See Also:


Attachments
Proposed StringImpl patch that supports 8 bit strings. (76.71 KB, patch)
2011-10-09 22:04 PDT, Michael Saboff
webkit-ews: commit‑queue-
Details | Formatted Diff | Diff
Updated patch (76.85 KB, patch)
2011-10-10 17:43 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated patch (77.88 KB, patch)
2011-10-11 09:43 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated patch with tested export fix for Windows (78.40 KB, patch)
2011-10-13 13:59 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated patch refactored to use LChar (111.03 KB, patch)
2011-10-25 17:50 PDT, Michael Saboff
ggaren: review-
Details | Formatted Diff | Diff
Updated patch addressing comments #21 (118.58 KB, patch)
2011-10-26 14:50 PDT, Michael Saboff
ggaren: review-
webkit-ews: commit‑queue-
Details | Formatted Diff | Diff
Patch with updates suggested in comment #26, stylebot and build failures (117.21 KB, patch)
2011-10-26 17:32 PDT, Michael Saboff
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Annie Sullivan 2011-08-12 13:41:49 PDT
Currently WTF::String always uses 16-bit buffers. This can be wasteful, because many strings only contain characters which are 8-bit.

I did some testing on various sites, and found there would be savings of 500k-2MB (3-5% total memory usage) if we managed to store all the strings that don't contain double-byte characters in 8-bit buffers. Getting this right would definitely be difficult, but I think it's worth investigating.
Comment 1 Gavin Barraclough 2011-08-12 15:56:52 PDT
Hi Annie,

Great! - this is definitely something we're keen to see happen.

What are your current plans?  I'm assuming any approach will involve a flag in the StringImpl indicating the format of the character data.  Clearly in many cases where we currently access characters in the string we could be agnostic as to how the data is stored (e.g. change code that calls to characters() to get a const UChar* to instead access one character at a time), however doing so in all cases will be tricky.  One question would be, what do you plan on doing for more difficult cases where we have code expecting to be able to read UFT16 data?  A set of obvious options would seem to be:

(1) Fix them all, make the entire codebase 8-bit string safe, never need to convert or copy.
(2) Create a temporary UTF16 copy as necessary (i.e. when a method that needs a const UChar* is called).
(3) Convert 8-bit strings up to UTF16 when necessary, retaining only the UTF16 copy and freeing the 8-bit source.
(4) Convert 8-bit strings up to UTF16 when necessary, retaining both the UTF16 and 8-bit representations.

(1) Is probably the best option, but is also probably the most work.
(2) Could be a performance regression, if a string may be repeatedly converted.
(3) May be difficult since the underlying buffers of StringImpls may be shared when substrings are created (one StringImpl may have a pointer into another's buffer).  [This may also be fixable, e.g. we could make strings we an offset relative to their buffer's base, or give buffers access to the list of StringImpls referencing them].
(4) In a pathological case would mean that instead of saving 50% of memory used for string data you would start using 150%!

What are your current plans?  Did you have another course of action in mind?

One more complexity is that the contents of the StringImpl are directly accessed by JIT generated code in JSC (used by all ports, since WebCore uses JSC's regexp engine).  This will be a bit of work to fix, but should be very doable, and the JSC team will be happy to help.  There are very few places that access the string data from the regular JavaScript language JIT, but unsurprisingly more in the regular expression JIT.  The regular expression JIT should be a pretty easy fix, it'll mostly just mean switching some 16-bit loads to 8-bit, but we'll probably need restructure some regexp objects to be able to hold onto two copies of JIT generated code, one for use with 16-bit source data & one for use with 8-bit source data.  Again, we'll be happy to help make this happen.

cheers,
G.
Comment 2 Annie Sullivan 2011-08-15 21:40:54 PDT
Thanks so much, Gavin! We'd really like the help, especially since none of us are very familiar with JSC.

Our current plan is to start out by aiming for option 1--we made a spreadsheet of all the characters() usages and we're going to try and change as many of them as possible to use String APIs that don't expose the internal format of the character data. It's located here, if anyone cc-ed on this bug would like to take a task off of it, please feel free: https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AsX8ECAuTqEzdEVFYnZaUlZ3QzJCbnBKamwxdGMxVkE&hl=en_US#gid=9

Once we've gotten to the point where we're down to just the usages that really make option 1 impractical, we hope that having many fewer issues to look at will make the choice between options 2-4 easier. I know this is a lot of work, but we can easily spread it across several people.
Comment 3 Erik Corry 2011-08-16 00:10:00 PDT
Since this is likely to have a huge impact on V8 I would appreciate this bug being kept up to date on which of the 4 ideas you are thinking of going with.

FWIW V8 has has always stored strings in both 1 and 2-byte formats.  The 2-byte strings can be 'external' which means that they are backed by a WebKit string.  V8's one byte strings are strict ASCII ie only the characters from 0 to 127 are allowed.  Latin1 characters from 128 to 255 are not allowed in 1-byte strings.

One of the reasons for this is that you can just memcpy an ASCII string into a UTF-8 output buffer, but you have to inspect  every character if your source is Latin1.  It also simplifies the 1-byte version of the regexp engine if you know there are no Latin1 characters.

The ASCII restriction in V8 can be changed, obviously, but that's the way it is currently.
Comment 4 Annie Sullivan 2011-08-16 09:20:52 PDT
(In reply to comment #3)
> Since this is likely to have a huge impact on V8 I would appreciate this bug being kept up to date on which of the 4 ideas you are thinking of going with.

We definitely will.

> FWIW V8 has has always stored strings in both 1 and 2-byte formats.  The 2-byte strings can be 'external' which means that they are backed by a WebKit string.  V8's one byte strings are strict ASCII ie only the characters from 0 to 127 are allowed.  Latin1 characters from 128 to 255 are not allowed in 1-byte strings.
> 
> One of the reasons for this is that you can just memcpy an ASCII string into a UTF-8 output buffer, but you have to inspect  every character if your source is Latin1.  It also simplifies the 1-byte version of the regexp engine if you know there are no Latin1 characters.
> 
> The ASCII restriction in V8 can be changed, obviously, but that's the way it is currently.

I did some preliminary investigation into how many bytes are taken up by strings containing all-ASCII vs some Latin1 chars vs some double-byte chars on different websites:

CNN.com (english): 1M ASCII, 2K Latin1, 55K 2byte
ESPN.com (english): 1.5M ASCII, 16K Latin1, 54K 2byte
Baidu.com (chinese): 92K ASCII, 1.5K Latin1, 64K 2byte
news.baidu.com (chinese): 535K ASCII, 2.6K Latin1, 148K 2byte
yahoo.fr (french): 892K ASCII, 84K Latin1, 55K 2byte
news.google.de (german): 480K ASCII, 46K Latin1, 61K 2byte

Based on that, I think v8's approach of having only ASCII characters in 1-byte strings seems quite reasonable. Gavin, what do you think?
Comment 5 Gavin Barraclough 2011-08-16 14:52:21 PDT
Agreed - given those stats, targeting ASCII only seems like the right choice.

If we were to ever find evidence that LATIN1 support would be of benefit, then we could look at expanding the character set supported in 8-bit strings from ASCII to LATIN1 (and coordinate with V8 in doing so), but ASCII seems a simpler starting point, and by the look of it should be sufficient.
Comment 6 Jungshik Shin 2011-08-25 13:13:50 PDT
(In reply to comment #5)
> Agreed - given those stats, targeting ASCII only seems like the right choice.
> 
> If we were to ever find evidence that LATIN1 support would be of benefit, then we could look at expanding the character set supported in 8-bit strings from ASCII to LATIN1 (and coordinate with V8 in doing so), but ASCII seems a simpler starting point, and by the look of it should be sufficient.

I"m relieved that we're limiting a single byte string to ASCII only :-). Extending it to the first 256 bytes (up to U+00FF) may be ok, but we'd better stop there.
Comment 7 Michael Saboff 2011-08-25 16:24:52 PDT
Annie,

I have started to work on this from a JSC side of things.  I will first focus on making the YARR regular expression engine 8 / 16 bit switchable.  After that I'll look at removing JSC use of characters().

Have you thought through the StringImpl changes?
Comment 8 Annie Sullivan 2011-08-26 14:37:54 PDT
(In reply to comment #7)
> Annie,
> 
> I have started to work on this from a JSC side of things.  I will first focus on making the YARR regular expression engine 8 / 16 bit switchable.  After that I'll look at removing JSC use of characters().
> 
> Have you thought through the StringImpl changes?

Thanks, Michael! I'm looking into fleshing out the StringImpl changes now. Is the best way to get feedback to create a strawman patch in a blocking bug and ask people to give feedback as review comments? (Although of course we wouldn't actually submit the patch until all the other work is done and we'd finished properly testing possible performance changes).
Comment 9 Michael Saboff 2011-08-29 10:59:23 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > Annie,
> > 
> > I have started to work on this from a JSC side of things.  I will first focus on making the YARR regular expression engine 8 / 16 bit switchable.  After that I'll look at removing JSC use of characters().
> > 
> > Have you thought through the StringImpl changes?
> 
> Thanks, Michael! I'm looking into fleshing out the StringImpl changes now. Is the best way to get feedback to create a strawman patch in a blocking bug and ask people to give feedback as review comments? (Although of course we wouldn't actually submit the patch until all the other work is done and we'd finished properly testing possible performance changes).

I think that would work.  You can email me with ideas if you want before posting a patch.
Comment 10 Annie Sullivan 2011-09-23 12:56:26 PDT
I am going on maternity leave in two weeks, so Xianzhu Wang is taking ownership of this.
Comment 11 Sam Weinig 2011-09-25 11:08:54 PDT
(In reply to comment #10)
> I am going on maternity leave in two weeks, so Xianzhu Wang is taking ownership of this.

Congratulations!
Comment 12 Michael Saboff 2011-10-09 22:04:46 PDT
Created attachment 110327 [details]
Proposed StringImpl patch that supports 8 bit strings.

This change allows the creation of both 8 bit or 16 bit strings. If the characters() method is called on an 8 bit StringImpl object, a 16 bit shadow copy is created and returned.

This patch does not include changes that actually use 8 bit strings, in fact there is an ASSERT_NOT_REACHED() in the characters8() method.

Future patches will make other changes to support and use 8 bit strings.  The next of these are the neccessary JIT changes.

This work is broken up in smaller patches to make it more digestible.
Comment 13 Early Warning System Bot 2011-10-09 22:37:16 PDT
Comment on attachment 110327 [details]
Proposed StringImpl patch that supports 8 bit strings.

Attachment 110327 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/10000841
Comment 14 Xianzhu Wang 2011-10-09 23:15:41 PDT
(In reply to comment #12)
> Created an attachment (id=110327) [details]
> Proposed StringImpl patch that supports 8 bit strings.
> 
> This change allows the creation of both 8 bit or 16 bit strings. If the characters() method is called on an 8 bit StringImpl object, a 16 bit shadow copy is created and returned.
> 
> This patch does not include changes that actually use 8 bit strings, in fact there is an ASSERT_NOT_REACHED() in the characters8() method.
> 
> Future patches will make other changes to support and use 8 bit strings.  The next of these are the neccessary JIT changes.
> 
> This work is broken up in smaller patches to make it more digestible.

Thank you for the patch. Would it be better to create a blocking bug for the change of StringImpl?

What do you think about bug 66706? I think with StringCharacterIterator, we can keep most of the characters() usages unchanged without the 16bit shadow copy.
Comment 15 Michael Saboff 2011-10-10 08:50:25 PDT
(In reply to comment #14)
> (In reply to comment #12)
> > Created an attachment (id=110327) [details] [details]
> > Proposed StringImpl patch that supports 8 bit strings.
> > 
> > This change allows the creation of both 8 bit or 16 bit strings. If the characters() method is called on an 8 bit StringImpl object, a 16 bit shadow copy is created and returned.
> > 
> > This patch does not include changes that actually use 8 bit strings, in fact there is an ASSERT_NOT_REACHED() in the characters8() method.
> > 
> > Future patches will make other changes to support and use 8 bit strings.  The next of these are the neccessary JIT changes.
> > 
> > This work is broken up in smaller patches to make it more digestible.
> 
> Thank you for the patch. Would it be better to create a blocking bug for the change of StringImpl?
> 
> What do you think about bug 66706? I think with StringCharacterIterator, we can keep most of the characters() usages unchanged without the 16bit shadow copy.

I don't have a problem creating a blocking bug.  I figured that this bug was for the StringImpl changes, so I used it for the proposed changes.

What I found in JavaScriptCore with comprehensive changes for 8 bit strings is that the 16bit shadow copy is hardly every created.  Most strings end up being 8 or 16 bit based on source content and there is seldom a need to convert an 8 bit string to a 16 bit string.

Having characters8() and characters16() allows callers that want to manipulate actual characters the ability to manipulate the native format.  I don't think this precludes characters() returning a StringCharacterIterator and following the 4 step plan described in 66706.  We still typedef StringCharacterIterator to be a "const UChar*" and use the shadow copy for 8 bit strings.  We can remove the 16 bit shadow copy at a later time, but in the mean time we can make forward progress.

Given that per-character size checks will hurt performance, hot paths will likely need to manipulate the characters directly as the current code does.  That means that either code paths check once (e.g. at the top of a loop) for 8/16 bit-ness or we templatize classes based on character size.  For code that isn't as performance sensitive, an iterator implemented similar to the CharAccess class I added to YarrInterpreter would work.
Comment 16 Michael Saboff 2011-10-10 17:43:44 PDT
Created attachment 110451 [details]
Updated patch

This patch fixes two issues in the prior patch
 - The build failure reported - fixed StringImpl::lower()
 - Test failures - fixed StringImpl::replace(UChar, StringImpl*)
Comment 17 Michael Saboff 2011-10-11 09:43:47 PDT
Created attachment 110526 [details]
Updated patch

(Speculative fix) Add export to fix windows build.
Comment 18 Michael Saboff 2011-10-13 13:59:23 PDT
Created attachment 110903 [details]
Updated patch with tested export fix for Windows
Comment 19 Geoffrey Garen 2011-10-13 16:02:59 PDT
Comment on attachment 110903 [details]
Updated patch with tested export fix for Windows

View in context: https://bugs.webkit.org/attachment.cgi?id=110903&action=review

> Source/JavaScriptCore/wtf/text/StringImpl.h:471
> +        const char* m_data8;

I think you should create a new typedef, "unsigned char LChar" for "Latin1 character", as distinct from the existing "UChar" for "Unicode character". Regular char is not so good because on OS X and other platforms with signed chars, (a) handing back just a char gives the client a value that will blow up and sign extend silently and incorrectly when converted to UChar, and (b) char requires a bunch of casting.
Comment 20 Michael Saboff 2011-10-25 17:50:57 PDT
Created attachment 112435 [details]
Updated patch refactored to use LChar

This patch includes the typedef suggested by Geoff.  That change rippled through the code growing this patch somewhat.  Comment #12 still holds true.

Still working on getting the JavaScriptCore symbol export list right for Windows.
Comment 21 Geoffrey Garen 2011-10-25 22:24:12 PDT
Comment on attachment 112435 [details]
Updated patch refactored to use LChar

View in context: https://bugs.webkit.org/attachment.cgi?id=112435&action=review

You added LChar but didn't remove most of the static_casts. I pointed out some of them above, but ran out of time. You need to take a sweep through and remove them all.

> Source/JavaScriptCore/runtime/UString.h:108
> +            return static_cast<unsigned char>(m_impl->characters8()[index]);

No need for static_cast, now that you have LChar.

> Source/JavaScriptCore/runtime/UString.h:145
>      if (rep1 == rep2) // If they're the same rep, they're equal.

This function is way too big to inline now. You should pick the right part to inline and move the rest out of line.

> Source/JavaScriptCore/runtime/UString.h:185
> +            if (static_cast<unsigned char>(d1[i]) != d2[i])

No need for static_cast, now that you have LChar.

> Source/JavaScriptCore/runtime/UString.h:196
> +            if (d1[i] != static_cast<unsigned char>(d2[i]))

No need for static_cast, now that you have LChar.

> Source/JavaScriptCore/wtf/StringHasher.h:165
> +        return static_cast<UChar>(ch);

No need for static_cast, now that you have LChar.

> Source/JavaScriptCore/wtf/text/StringImpl.cpp:222
> +        m_copyData16[i] = static_cast<unsigned char>(m_data8[i]);

No need for static_cast, now that you have LChar.

> Source/JavaScriptCore/wtf/text/StringImpl.cpp:236
> +            UChar c = static_cast<unsigned char>(m_data8[i]);

No need for static_cast, now that you have LChar.
Comment 22 Michael Saboff 2011-10-26 14:50:14 PDT
Created attachment 112601 [details]
Updated patch addressing comments #21

This addresses Geoff's comments and doesn't include the flag bit change factored out in https://bugs.webkit.org/show_bug.cgi?id=70937.

Also cleanup of comments suggested by Darin.
Comment 23 WebKit Review Bot 2011-10-26 14:54:10 PDT
Attachment 112601 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/runtime/UString.cpp:277:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Source/JavaScriptCore/runtime/UString.cpp:315:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Source/JavaScriptCore/runtime/UString.cpp:320:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Source/JavaScriptCore/runtime/UString.h:180:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Source/JavaScriptCore/runtime/UString.h:198:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Source/JavaScriptCore/runtime/UString.h:203:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 6 in 28 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 24 Early Warning System Bot 2011-10-26 15:25:03 PDT
Comment on attachment 112601 [details]
Updated patch addressing comments #21

Attachment 112601 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/10220438
Comment 25 WebKit Review Bot 2011-10-26 15:30:25 PDT
Comment on attachment 112601 [details]
Updated patch addressing comments #21

Attachment 112601 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/10223123
Comment 26 Geoffrey Garen 2011-10-26 15:33:23 PDT
Comment on attachment 112601 [details]
Updated patch addressing comments #21

View in context: https://bugs.webkit.org/attachment.cgi?id=112601&action=review

The fastFree and ASCII vs Latin1 issues look like real bugs. The rest would just improve readability and performance.

I have a feeling that, in the end, we'll want all the inline fast cases to cover the 8bit case, moving the 16bit cases to out-of-line slow cases. (Right now, everything is in the same case, which probably defeats inlining. And we can't inline the 8bit case yet, since most strings are still 16bit.)

> Source/JavaScriptCore/runtime/UString.h:140
> +NEVER_INLINE bool equalSlowCase(const UString& s1, const UString& s2);
> +
>  ALWAYS_INLINE bool operator==(const UString& s1, const UString& s2)

Still worried this is too big to inline.

If it doesn't inline, perhaps you could: (1) inline just the size and buffer checks, and out-of-line everything else; (2) Do (1), and always just call memcmp with the right length for the slow case.

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:129
> +            unsigned char c = m_buffer[i];
> +            destination[i] = c;

This can just be destination[i] = m_buffer[i].

> Source/JavaScriptCore/wtf/text/StringConcatenate.h:205
> +            unsigned char c = m_buffer[i];
> +            destination[i] = c;

Ditto.

> Source/JavaScriptCore/wtf/text/StringHash.h:56
> +            if (a->is8Bit()) {

As above, I'm concerned that this can't inline anymore, and might need rejiggering to inline properly.

> Source/JavaScriptCore/wtf/text/StringImpl.cpp:59
> +    if (has16BitShadow() && ((ownership == BufferInternal) || (ownership == BufferOwned))) {

if has16BitShadow() is true, we should unconditionally fastFree.

> Source/JavaScriptCore/wtf/text/StringImpl.cpp:184
> +    m_copyData16 = static_cast<UChar*>(fastCalloc(len, sizeof(UChar)));

Can this be fastMalloc instead?

> Source/JavaScriptCore/wtf/text/StringImpl.cpp:407
> +    if (is8Bit()) {
> +        LChar* data;
> +        RefPtr <StringImpl>newImpl = createUninitialized(m_length, data);
> +        for (int32_t i = 0; i < length; i++)
> +            data[i] = toASCIILower(m_data8[i]);
> +        return newImpl.release();
> +    }

What happens here for a Latin1 string, when you call toASCIILower on its characters? I think you need to check each character for being ASCII.

> Source/JavaScriptCore/wtf/text/StringImpl.cpp:748
> +    size_t matchStringLength = strlen(reinterpret_cast<const char *>(matchString));

No space before "*", please.

> Source/JavaScriptCore/wtf/text/StringImpl.cpp:1017
> +        memcpy(data, m_data8, position * sizeof(char));
> +        if (str)
> +            memcpy(data + position, str->m_data8, lengthToInsert * sizeof(char));
> +        memcpy(data + position + lengthToInsert, m_data8 + position + lengthToReplace,
> +               (length() - position - lengthToReplace) * sizeof(char));

Minor nit: should be sizeof(LChar) rather than sizeof(char) in these 3 places -- even though the math works out the same.

> Source/JavaScriptCore/wtf/text/StringImpl.h:252
> +    // FIXME: Add Vector<char> adopt?

This is probably more appropriate as a bug report, with an explanation of why you were considering that.

> Source/JavaScriptCore/wtf/text/StringImpl.h:272
> +    ALWAYS_INLINE const UChar* characters() const

Eventually, we'll want to rename characters() to "deprecatedCharacters()" to communicate this. (Not right now, though.)
Comment 27 Michael Saboff 2011-10-26 17:32:48 PDT
Created attachment 112623 [details]
Patch with updates suggested in comment #26, stylebot and build failures

> > Source/JavaScriptCore/runtime/UString.h:140
> > +NEVER_INLINE bool equalSlowCase(const UString& s1, const UString& s2);
> > +
> >  ALWAYS_INLINE bool operator==(const UString& s1, const UString& s2)
> 
> Still worried this is too big to inline.
> 
> If it doesn't inline, perhaps you could: (1) inline just the size and buffer checks, and out-of-line everything else; (2) Do (1), and always just call memcmp with the right length for the slow case.

It didn't inline, so I changed to (1) - size / buffer check, slow case the rest.

> > Source/JavaScriptCore/wtf/text/StringHash.h:56
> > +            if (a->is8Bit()) {
> 
> As above, I'm concerned that this can't inline anymore, and might need rejiggering to inline properly.

The existing code doesn't inline this method.  The StringHash class is used in StringImpl.h to create a template class therefore the complete method in the .h file.

> > Source/JavaScriptCore/wtf/text/StringImpl.cpp:59
> > +    if (has16BitShadow() && ((ownership == BufferInternal) || (ownership == BufferOwned))) {
> 
> if has16BitShadow() is true, we should unconditionally fastFree.

Done.

> > Source/JavaScriptCore/wtf/text/StringImpl.cpp:184
> > +    m_copyData16 = static_cast<UChar*>(fastCalloc(len, sizeof(UChar)));
> 
> Can this be fastMalloc instead?

Yes, done.

> > Source/JavaScriptCore/wtf/text/StringImpl.cpp:407
> > +    if (is8Bit()) {
> > +        LChar* data;
> > +        RefPtr <StringImpl>newImpl = createUninitialized(m_length, data);
> > +        for (int32_t i = 0; i < length; i++)
> > +            data[i] = toASCIILower(m_data8[i]);
> > +        return newImpl.release();
> > +    }
> 
> What happens here for a Latin1 string, when you call toASCIILower on its characters? I think you need to check each character for being ASCII.

Updated StringImpl::lower(), ::upper() and ::foldCase() to work with Latin-1 strings with an ASCII fast path.

> > Source/JavaScriptCore/wtf/text/StringImpl.h:252
> > +    // FIXME: Add Vector<char> adopt?
> 
> This is probably more appropriate as a bug report, with an explanation of why you were considering that.

Removed.

Also cleaned up the sizeof(char) that should be sizeof(LChar) and other nits.
Comment 28 Geoffrey Garen 2011-10-26 18:12:34 PDT
Comment on attachment 112623 [details]
Patch with updates suggested in comment #26, stylebot and build failures

r=me
Comment 29 Geoffrey Garen 2011-10-26 18:13:02 PDT
Please watch the perf bot result to verify this is not a measurable regression.
Comment 30 Michael Saboff 2011-10-27 13:16:35 PDT
Committed r98624: <http://trac.webkit.org/changeset/98624>