Most AtomicString's contain strings composed of 8 bit data even when created using a UChar* source. When creating an AtomicString with UChar data, the data should be checked for 8 bit characters and an 8 bit string should be created if possible.
Created attachment 171058 [details] Patch
Comment on attachment 171058 [details] Patch Clearing flags on attachment: 171058 Committed r132739: <http://trac.webkit.org/changeset/132739>
All reviewed patches have been landed. Closing bug.
Comment on attachment 171058 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171058&action=review > Source/WTF/wtf/text/StringImpl.cpp:276 > + LChar* data; > + RefPtr<StringImpl> string = createUninitialized(length, data); > + > + for (size_t i = 0; i < length; ++i) { > + if (characters[i] & 0xff00) > + return create(characters, length); > + data[i] = static_cast<LChar>(characters[i]); > + } What’s the performance hit like here? Did you do any measurement. We’re doing a second memory allocation every time we end up creating a 16-bit string, so that case is a bit slow. Also, I could imagine we might get better speed if we did the Latin-1 preflight using the techniques from ASCIIFastPath that process many characters at a time quickly, even though that would be a separate loop.
(In reply to comment #4) > (From update of attachment 171058 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=171058&action=review > > > Source/WTF/wtf/text/StringImpl.cpp:276 > > + LChar* data; > > + RefPtr<StringImpl> string = createUninitialized(length, data); > > + > > + for (size_t i = 0; i < length; ++i) { > > + if (characters[i] & 0xff00) > > + return create(characters, length); > > + data[i] = static_cast<LChar>(characters[i]); > > + } > > What’s the performance hit like here? Did you do any measurement. I ran benchmarks on a set of changes including this one. I ran sunspider, V8v7 and kraken. No change in their performance. I then ran the Parser performance tests and overall there wasn't a change. > We’re doing a second memory allocation every time we end up creating a 16-bit string, so that case is a bit slow. Also, I could imagine we might get better speed if we did the Latin-1 preflight using the techniques from ASCIIFastPath that process many characters at a time quickly, even though that would be a separate loop. This function is intended for mostly 8 bit strings. For AtomicStrings, the 16 bit path appears to be rarely used (a little over 1% of the time in my tests). I instrumented the 8 and 16 bit paths. Here are the number of times we took the 16 bit path for a given website: apple.com 0 cnn.com 3 yahoo.com 1 facebook.com 1 daringfireball.net 42 ebay.com 1 google.com 1 reddit.com 0 wikipedia.com 55 amazon.com 0 lemonde.fr 5 japantimes.co.jp 7 weather.com 1 espn.com 0 webkit.org 0 Across these sites, we took the 8 bit path 10960 times and the 16 bit path 117 times. I thought about employing the techniques in ASCIIFastPath, but the performance tests didn't show that there was an issue. If you like, I can file a defect to investigate performance improvements.
Please measure on ARM, both branches and memory accesses are relatively more expensive. Since the 16bits case is only used in 1% of the case, consider branching in 0xFF00 only once at the end instead of for every iteration.