173407 – WTF::StringImpl::copyChars segfaults when built with GCC 7

Florian Bruhin

Reported 2017-06-15 02:04:00 PDT

I haven't tested this with WebKit trunk so far, only with https://github.com/annulen/webkit and WebKit2GTK 2.16.3 - but since the code in trunk looks the same, I'm guessing this applies too. After updating my packages on Archlinux, I noticed a lot of segfaults when navigating to pages, or e.g. when submitting a comment on Reddit. I was able to minimize it to this example which crashes when run in jsc: s = 'xxxxxxxxxxxxxxAxxxxxxxxxxxxxxxxxxxxA–'; s.replace(/A/g, 'b') (note the non-ASCII – at the end) Stack (with the original failure on reddit, with QtWebKit): #0 0x00007fffed413b65 in WTF::StringImpl::copyChars<char16_t>(char16_t*, char16_t const*, unsigned int) (destination=0x7ffefa9e0250 u"<p ", source=0x7fff1baf0256 u"<p class=\"parent\"><a name=\"dircqkb\"></a></p><div class=\"midcol likes\" ><div class=\"arrow upmod login-required access-required\" data-event-action=\"upvote\" role=\"button\""..., numCharacters=20) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/WTF/wtf/text/StringImpl.h:634 #1 0x00007fffed420926 in JSC::jsSpliceSubstringsWithSeparators (separatorCount=251, separators=<optimized out>, rangeCount=<optimized out>, substringRanges=<optimized out>, source=..., sourceVal=<optimized out>, exec=0x7fffffffc6f0, this=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:431 #2 0x00007fffed420926 in JSC::replaceUsingRegExpSearch (replaceValue=..., replacementString=..., callType=<optimized out>, callData=..., searchValue=..., string=<optimized out>, exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:666 #3 0x00007fffed420926 in JSC::replaceUsingRegExpSearch (replaceValue=..., searchValue=..., string=<optimized out>, exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:708 #4 0x00007fffed420926 in JSC::replace(JSC::ExecState*, JSC::JSString*, JSC::JSValue, JSC::JSValue) (replaceValue=..., searchValue=..., string=<optimized out>, exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:829 #5 0x00007fffed420926 in JSC::replace(JSC::ExecState*, JSC::JSValue, JSC::JSValue, JSC::JSValue) (replaceValue=..., searchValue=..., thisValue=..., exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:841 #6 0x00007fffed420926 in JSC::stringProtoFuncReplace(JSC::ExecState*) (exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:846 #7 0x00007fff1bfff0c8 in () #8 0x00007fffffffc7b0 in () #9 0x00007fffed5c930d in llint_entry () at /usr/lib/libQt5WebKit.so.5 Stack (with jsc and WebKit2GTK, release build): #0 0x00007ffff7ac2c95 in void WTF::StringImpl::copyChars<char16_t>(char16_t*, char16_t const*, unsigned int) () at /usr/lib/libjavascriptcoregtk-4.0.so.18 #1 0x00007ffff7ac1d5e in JSC::stringProtoFuncReplaceUsingRegExp(JSC::ExecState*) () at /usr/lib/libjavascriptcoregtk-4.0.so.18 #2 0x00007fffb21ff028 in () #3 0x00007fffffffcfc0 in () #4 0x00007ffff77a4e42 in () at /usr/lib/libjavascriptcoregtk-4.0.so.18 #5 0x0000000000000000 in () So I looked into the implementation of WTF::StringImpl::copyChars: template <typename T> static void copyChars(T* destination, const T* source, unsigned numCharacters) { if (numCharacters == 1) { *destination = *source; return; } if (numCharacters <= s_copyCharsInlineCutOff) { unsigned i = 0; #if (CPU(X86) || CPU(X86_64)) const unsigned charsPerInt = sizeof(uint32_t) / sizeof(T); if (numCharacters > charsPerInt) { unsigned stopCount = numCharacters & ~(charsPerInt - 1); const uint32_t* srcCharacters = reinterpret_cast<const uint32_t*>(source); // aliasing uint32_t* destCharacters = reinterpret_cast<uint32_t*>(destination); // aliasing for (unsigned j = 0; i < stopCount; i += charsPerInt, ++j) destCharacters[j] = srcCharacters[j]; // CRASH } #endif for (; i < numCharacters; ++i) destination[i] = source[i]; } else memcpy(destination, source, numCharacters * sizeof(T)); } Removing the optimized #if block made things run fine. With some help from people in the #gcc IRC channel, I figured out that WebKit violates strict aliasing rules with the casts marked above (as T is an uint16_t, incompatible with uint32_t): https://www.cocoawithlove.com/2008/04/using-pointers-to-recast-in-c-is-bad.html http://dbp-consulting.com/tutorials/StrictAliasing.html WebKit should be built with -fno-strict-aliasing though, and at least with QtWebKit I verified this was the case. From what I understand, that should make it possible to do this kind of thing, yet with GCC 7.1.1 I get a crash there. I haven't looked at the generated assembler yet (and not sure I'll get to it anytime soon), but this might be a GCC 7 bug? But even if it's fixed there, I suppose WebKit should work around it in some way. I wonder whether the optimized inline copyChars is really needed nowadays (it was added in 2011, https://github.com/annulen/webkit/commit/0cb990be04a30870abb2a01e7f9fe5eea08b25a9 ) - I'd expect an inline memcpy() to generate equal if not better code nowadays, no? Downstream QtWebKit bug where I investigated this: https://github.com/annulen/webkit/issues/562

Konstantin Tokarev

Comment 1 2017-06-15 02:29:01 PDT

I've run JSC benchmarks on Linux with GCC 7.1, performance results were neutral after change of copyChars to do just memcpy

Florian Bruhin

Comment 2 2017-06-15 02:32:25 PDT

So then this essentially boils down to whether it should be replaced by a simple memcpy() only for GCC >= 7, or for other compilers as well? (Unless there's another solution to this I'm missing - right now the only option I see is to get rid of the "optimized" part)

Yusuke Suzuki

Comment 3 2017-06-15 02:46:26 PDT

(In reply to Konstantin Tokarev from comment #1) > I've run JSC benchmarks on Linux with GCC 7.1, performance results were > neutral after change of copyChars to do just memcpy I like to use memcpy if it does not hurt performance. Just out of curiosity, I wonder if we can use bitwise_cast here instead of reinterpret_cast. Maybe, JF knows much about strict-aliasing, type punning and memcpy.

Konstantin Tokarev

Comment 4 2017-06-15 05:03:07 PDT

(In reply to Yusuke Suzuki from comment #3) > (In reply to Konstantin Tokarev from comment #1) > > I've run JSC benchmarks on Linux with GCC 7.1, performance results were > > neutral after change of copyChars to do just memcpy > > I like to use memcpy if it does not hurt performance. I've checked just one platform/compiler combination so far. I guess that initially this code path might have been added because of deficiency in old version of memcpy in macOS' libc, or gcc 4.2. Could you recommend any specific benchmark that is signficantly affected by performance of copyChars? > > Just out of curiosity, I wonder if we can use bitwise_cast here instead of > reinterpret_cast. > > Maybe, JF knows much about strict-aliasing, type punning and memcpy.

JF Bastien

Comment 5 2017-06-15 09:33:12 PDT

(In reply to Yusuke Suzuki from comment #3) > (In reply to Konstantin Tokarev from comment #1) > > I've run JSC benchmarks on Linux with GCC 7.1, performance results were > > neutral after change of copyChars to do just memcpy > > I like to use memcpy if it does not hurt performance. > > Just out of curiosity, I wonder if we can use bitwise_cast here instead of > reinterpret_cast. > > Maybe, JF knows much about strict-aliasing, type punning and memcpy. bitwise_cast wouldn't help here, the pointers would still alias. Making the pointer type char* would fix the problem, because char* is magical in C++ and ca alias everything. memcpy sounds good if per isn't hurt. Can source ever be the same as destination though? I want to make sure we don't need memmove instead.

Florian Bruhin

Comment 6 2017-06-15 09:58:12 PDT

If it's a char*, this would be equal to the "naive" bytewise copying below the #ifdef though, right? If it's more than 20 chars (> s_copyCharsInlineCutOff), memcpy is already used, so if anything used copyChars with the same source/destination, that'd have to be only with strings guaranteed to be shorter, which seems quite a stretch I think?

JF Bastien

Comment 7 2017-06-15 10:01:47 PDT

(In reply to Florian Bruhin from comment #6) > If it's a char*, this would be equal to the "naive" bytewise copying below > the #ifdef though, right? Correct. > If it's more than 20 chars (> s_copyCharsInlineCutOff), memcpy is already > used, so if anything used copyChars with the same source/destination, that'd > have to be only with strings guaranteed to be shorter, which seems quite a > stretch I think? I'm not familiar with this part of the code, so it doesn't seem like a stretch to me :)

Konstantin Tokarev

Comment 8 2017-06-22 04:22:50 PDT

For the record, optimization that avoids memcpy was initially added by Darin in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008)

Yusuke Suzuki

Comment 9 2017-06-22 04:32:45 PDT

(In reply to Konstantin Tokarev from comment #8) > For the record, optimization that avoids memcpy was initially added by Darin > in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008) OK, that note is very helpful because it clarifies that this optimization is driven by SunSpider.

Yusuke Suzuki

Comment 10 2017-06-22 04:45:53 PDT

(In reply to Yusuke Suzuki from comment #9) > (In reply to Konstantin Tokarev from comment #8) > > For the record, optimization that avoids memcpy was initially added by Darin > > in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008) > > OK, that note is very helpful because it clarifies that this optimization is > driven by SunSpider. I've just tested it on my Linux box. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/string-copy-tot/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/string-copy/Release/bin/jsc Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched string-base64 7.7544+-0.1761 7.6138+-0.2071 might be 1.0185x faster string-fasta 10.5429+-0.2746 ? 10.7500+-0.2669 ? might be 1.0196x slower string-tagcloud 14.8588+-0.2828 14.8039+-0.3039 string-unpack-code 36.1769+-0.4251 35.3397+-0.5398 might be 1.0237x faster string-validate-input 8.5182+-0.2206 8.3514+-0.2179 might be 1.0200x faster <arithmetic> 15.5702+-0.1239 15.3718+-0.1452 might be 1.0129x faster It seems there is no explicit difference right now. The compiler is GCC 5.4.0 I would like to see the difference in macOS's clang.

Darin Adler

Comment 11 2017-06-22 12:13:38 PDT

It’s great news if memcpy is now fast enough. I’d love to see my old awkward optimization removed if it no longer does any good. In fact, I’m not sure we need a copyChars function at all. We could use memcpy directly as we once did, or even better ICU has u_memcpy if we want something that prevents the need to do "* sizeof(UChar)" at each call site. As long as u_memcpy has great performance too; presumably it expands into memcpy as an inline.

Konstantin Tokarev

Comment 12 2017-06-30 08:28:56 PDT

Ping?

Florian Bruhin

Comment 13 2017-07-02 05:52:12 PDT

FWIW Archlinux now added a patch which just uses memcpy to its qt5-webkit package: https://git.archlinux.org/svntogit/packages.git/tree/trunk/qt5-webkit-gcc7.patch?h=packages/qt5-webkit Meanwhile, a Fedora packager told me they didn't notice any issues with GCC 7.1.1... no idea why. :-/

Yusuke Suzuki

Comment 14 2017-07-05 01:57:25 PDT

Created attachment 314599 [details] Patch

Yusuke Suzuki

Comment 15 2017-07-05 02:00:03 PDT

Comment on attachment 314599 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review > Source/WTF/wtf/text/StringImpl.h:631 > } We still keep StringImpl::copyChars() because it is very useful to copy characters based on the number of chars. If we use memcpy directly, we need to consider sizeof(UChar) and sizeof(LChar) carefully. And StringImpl::copyChars(UChar*, const LChar*, unsigned) override is also useful.

Konstantin Tokarev

Comment 16 2017-07-05 03:28:34 PDT

Comment on attachment 314599 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review > Source/WTF/wtf/text/StringImpl.h:-631 > - if (numCharacters <= s_copyCharsInlineCutOff) { I think we should also remove s_copyCharsInlineCutOff definition

Yusuke Suzuki

Comment 17 2017-07-05 03:31:28 PDT

Comment on attachment 314599 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review >> Source/WTF/wtf/text/StringImpl.h:-631 >> - if (numCharacters <= s_copyCharsInlineCutOff) { > > I think we should also remove s_copyCharsInlineCutOff definition Nice catch! I'll update the patch.

Yusuke Suzuki

Comment 18 2017-07-05 03:33:03 PDT

Created attachment 314600 [details] Patch

JF Bastien

Comment 19 2017-07-05 06:35:27 PDT

Did anyone run numbers on MacOS?

Yusuke Suzuki

Comment 20 2017-07-05 06:45:46 PDT

(In reply to JF Bastien from comment #19) > Did anyone run numbers on MacOS? I don't measure it yet. If nobody runs it until tomorrow's evening (JST), possibly I can run it ;) (my macOS box is placed at my office).

JF Bastien

Comment 21 2017-07-05 06:46:29 PDT

Ok I can give it a try later today.

JF Bastien

Comment 22 2017-07-05 09:55:53 PDT

On my iMac: before after 3d-cube 3.8573+-0.0502 3.7993+-0.0415 might be 1.0153x faster 3d-morph 4.2964+-0.0214 4.2933+-0.0245 3d-raytrace 3.9897+-0.0476 3.9794+-0.0589 access-binary-trees 1.8148+-0.0242 1.8113+-0.0155 access-fannkuch 4.5038+-0.0384 4.5000+-0.0476 access-nbody 2.2008+-0.0196 ? 2.2017+-0.0167 ? access-nsieve 1.9470+-0.0433 ? 1.9663+-0.0509 ? bitops-3bit-bits-in-byte 1.0725+-0.0117 1.0721+-0.0118 bitops-bits-in-byte 2.1698+-0.0342 2.1535+-0.0250 bitops-bitwise-and 1.8784+-0.0138 ? 1.8907+-0.0219 ? bitops-nsieve-bits 2.7890+-0.0119 ? 2.8228+-0.0349 ? might be 1.0121x slower controlflow-recursive 2.0770+-0.0207 2.0688+-0.0231 crypto-aes 3.5230+-0.0352 ? 3.5641+-0.0508 ? might be 1.0117x slower crypto-md5 2.1610+-0.0176 2.1448+-0.0139 crypto-sha1 2.2058+-0.0256 ? 2.2185+-0.0236 ? date-format-tofte 6.0776+-0.0427 ? 6.1917+-0.0917 ? might be 1.0188x slower date-format-xparb 4.8437+-0.0344 ^ 4.7681+-0.0408 ^ definitely 1.0159x faster math-cordic 2.4417+-0.0326 2.4224+-0.0170 math-partial-sums 3.3532+-0.0203 ? 3.3763+-0.0432 ? math-spectral-norm 1.6996+-0.0198 1.6885+-0.0164 regexp-dna 5.9056+-0.0450 5.8670+-0.0320 string-base64 3.6116+-0.0421 3.5795+-0.0302 string-fasta 5.0965+-0.0400 5.0413+-0.0540 might be 1.0110x faster string-tagcloud 7.0740+-0.0728 ? 7.0958+-0.0883 ? string-unpack-code 16.5066+-0.1147 ? 16.6307+-0.1750 ? string-validate-input 4.1960+-0.0615 4.1216+-0.0161 might be 1.0181x faster <arithmetic> 3.8959+-0.0079 3.8950+-0.0104 might be 1.0002x faster

Andreas Kling

Comment 23 2017-07-05 10:36:35 PDT

Comment on attachment 314600 [details] Patch r=me

Yusuke Suzuki

Comment 24 2017-07-05 19:03:32 PDT

Comment on attachment 314600 [details] Patch Thanks! OK, let's go.

WebKit Commit Bot

Comment 25 2017-07-05 19:31:39 PDT

Comment on attachment 314600 [details] Patch Clearing flags on attachment: 314600 Committed r219182: <http://trac.webkit.org/changeset/219182>

WebKit Commit Bot

Comment 26 2017-07-05 19:31:40 PDT

All reviewed patches have been landed. Closing bug.

Darin Adler

Comment 27 2017-07-11 22:10:09 PDT

(In reply to Yusuke Suzuki from comment #15) > We still keep StringImpl::copyChars() because it is very useful to copy > characters based on the number of chars. > If we use memcpy directly, we need to consider sizeof(UChar) and > sizeof(LChar) carefully. But why not use u_memcpy directly? > And StringImpl::copyChars(UChar*, const LChar*, unsigned) override is also > useful. That makes sense.

Yusuke Suzuki

Comment 28 2017-07-11 22:28:33 PDT

(In reply to Darin Adler from comment #27) > (In reply to Yusuke Suzuki from comment #15) > > We still keep StringImpl::copyChars() because it is very useful to copy > > characters based on the number of chars. > > If we use memcpy directly, we need to consider sizeof(UChar) and > > sizeof(LChar) carefully. > > But why not use u_memcpy directly? The problem is that memcpy is not designed for LChar. It is designed for arbitrary bytes. Thus it's signature is `void* memcpy(void*, const void*, size_t)`. We cannot catch the case using memcpy for UChar as a compile error. I think offering copyChars is safer, its overloading and templates can select appropriate memcpy based implementation automatically.

Attachments
Patch (4.24 KB, patch) 2017-07-05 01:57 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (4.72 KB, patch) 2017-07-05 03:33 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.