Bug 173407

Summary: WTF::StringImpl::copyChars segfaults when built with GCC 7
Product: WebKit Reporter: Florian Bruhin <webkit.org>
Component: Web Template FrameworkAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Major CC: annulen, benjamin, buildbot, cdumez, cmarcelo, commit-queue, darin, dbates, jfbastien, kling, msaboff, ysuzuki
Priority: P2    
Version: Other   
Hardware: Unspecified   
OS: Linux   
Attachments:
Description Flags
Patch
none
Patch none

Description Florian Bruhin 2017-06-15 02:04:00 PDT
I haven't tested this with WebKit trunk so far, only with https://github.com/annulen/webkit and WebKit2GTK 2.16.3 - but since the code in trunk looks the same, I'm guessing this applies too.

After updating my packages on Archlinux, I noticed a lot of segfaults when navigating to pages, or e.g. when submitting a comment on Reddit.

I was able to minimize it to this example which crashes when run in jsc:

    s = 'xxxxxxxxxxxxxxAxxxxxxxxxxxxxxxxxxxxA–'; s.replace(/A/g, 'b')

(note the non-ASCII – at the end)

Stack (with the original failure on reddit, with QtWebKit):

    #0  0x00007fffed413b65 in WTF::StringImpl::copyChars<char16_t>(char16_t*, char16_t const*, unsigned int) (destination=0x7ffefa9e0250 u"&lt;p ", source=0x7fff1baf0256 u"&lt;p class=\"parent\"&gt;&lt;a name=\"dircqkb\"&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class=\"midcol likes\" &gt;&lt;div class=\"arrow upmod login-required access-required\" data-event-action=\"upvote\" role=\"button\""..., numCharacters=20) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/WTF/wtf/text/StringImpl.h:634
    #1  0x00007fffed420926 in JSC::jsSpliceSubstringsWithSeparators (separatorCount=251, separators=<optimized out>, rangeCount=<optimized out>, substringRanges=<optimized out>, source=..., sourceVal=<optimized out>, exec=0x7fffffffc6f0, this=<optimized out>)
        at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:431
    #2  0x00007fffed420926 in JSC::replaceUsingRegExpSearch (replaceValue=..., replacementString=..., callType=<optimized out>, callData=..., searchValue=..., string=<optimized out>, exec=<optimized out>)
        at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:666
    #3  0x00007fffed420926 in JSC::replaceUsingRegExpSearch (replaceValue=..., searchValue=..., string=<optimized out>, exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:708
    #4  0x00007fffed420926 in JSC::replace(JSC::ExecState*, JSC::JSString*, JSC::JSValue, JSC::JSValue) (replaceValue=..., searchValue=..., string=<optimized out>, exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:829
    #5  0x00007fffed420926 in JSC::replace(JSC::ExecState*, JSC::JSValue, JSC::JSValue, JSC::JSValue) (replaceValue=..., searchValue=..., thisValue=..., exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:841
    #6  0x00007fffed420926 in JSC::stringProtoFuncReplace(JSC::ExecState*) (exec=<optimized out>) at /tmp/makepkg/qt5-webkit-ng-debug/src/webkit-qtwebkit-5.212.0-alpha/Source/JavaScriptCore/runtime/StringPrototype.cpp:846
    #7  0x00007fff1bfff0c8 in  ()
    #8  0x00007fffffffc7b0 in  ()
    #9  0x00007fffed5c930d in llint_entry () at /usr/lib/libQt5WebKit.so.5

Stack (with jsc and WebKit2GTK, release build):

    #0  0x00007ffff7ac2c95 in void WTF::StringImpl::copyChars<char16_t>(char16_t*, char16_t const*, unsigned int) () at /usr/lib/libjavascriptcoregtk-4.0.so.18
    #1  0x00007ffff7ac1d5e in JSC::stringProtoFuncReplaceUsingRegExp(JSC::ExecState*) ()
        at /usr/lib/libjavascriptcoregtk-4.0.so.18
    #2  0x00007fffb21ff028 in  ()
    #3  0x00007fffffffcfc0 in  ()
    #4  0x00007ffff77a4e42 in  () at /usr/lib/libjavascriptcoregtk-4.0.so.18
    #5  0x0000000000000000 in  ()

So I looked into the implementation of WTF::StringImpl::copyChars:

    template <typename T> static void copyChars(T* destination, const T* source, unsigned numCharacters)
    {
        if (numCharacters == 1) {
            *destination = *source;
            return;
        }

        if (numCharacters <= s_copyCharsInlineCutOff) {
            unsigned i = 0;
#if (CPU(X86) || CPU(X86_64))
            const unsigned charsPerInt = sizeof(uint32_t) / sizeof(T);

            if (numCharacters > charsPerInt) {
                unsigned stopCount = numCharacters & ~(charsPerInt - 1);

                const uint32_t* srcCharacters = reinterpret_cast<const uint32_t*>(source);  // aliasing
                uint32_t* destCharacters = reinterpret_cast<uint32_t*>(destination);  // aliasing
                for (unsigned j = 0; i < stopCount; i += charsPerInt, ++j)
                    destCharacters[j] = srcCharacters[j];   // CRASH
            }
#endif
            for (; i < numCharacters; ++i)
                destination[i] = source[i];
        } else
            memcpy(destination, source, numCharacters * sizeof(T));
    }

Removing the optimized #if block made things run fine. With some help from people in the #gcc IRC channel, I figured out that WebKit violates strict aliasing rules with the casts marked above (as T is an uint16_t, incompatible with uint32_t):

https://www.cocoawithlove.com/2008/04/using-pointers-to-recast-in-c-is-bad.html
http://dbp-consulting.com/tutorials/StrictAliasing.html

WebKit should be built with -fno-strict-aliasing though, and at least with QtWebKit I verified this was the case. From what I understand, that should make it possible to do this kind of thing, yet with GCC 7.1.1 I get a crash there.

I haven't looked at the generated assembler yet (and not sure I'll get to it anytime soon), but this might be a GCC 7 bug?

But even if it's fixed there, I suppose WebKit should work around it in some way. I wonder whether the optimized inline copyChars is really needed nowadays (it was added in 2011, https://github.com/annulen/webkit/commit/0cb990be04a30870abb2a01e7f9fe5eea08b25a9 ) - I'd expect an inline memcpy() to generate equal if not better code nowadays, no?

Downstream QtWebKit bug where I investigated this: https://github.com/annulen/webkit/issues/562
Comment 1 Konstantin Tokarev 2017-06-15 02:29:01 PDT
I've run JSC benchmarks on Linux with GCC 7.1, performance results were neutral after change of copyChars to do just memcpy
Comment 2 Florian Bruhin 2017-06-15 02:32:25 PDT
So then this essentially boils down to whether it should be replaced by a simple memcpy() only for GCC >= 7, or for other compilers as well?

(Unless there's another solution to this I'm missing - right now the only option I see is to get rid of the "optimized" part)
Comment 3 Yusuke Suzuki 2017-06-15 02:46:26 PDT
(In reply to Konstantin Tokarev from comment #1)
> I've run JSC benchmarks on Linux with GCC 7.1, performance results were
> neutral after change of copyChars to do just memcpy

I like to use memcpy if it does not hurt performance.

Just out of curiosity, I wonder if we can use bitwise_cast here instead of reinterpret_cast.

Maybe, JF knows much about strict-aliasing, type punning and memcpy.
Comment 4 Konstantin Tokarev 2017-06-15 05:03:07 PDT
(In reply to Yusuke Suzuki from comment #3)
> (In reply to Konstantin Tokarev from comment #1)
> > I've run JSC benchmarks on Linux with GCC 7.1, performance results were
> > neutral after change of copyChars to do just memcpy
> 
> I like to use memcpy if it does not hurt performance.

I've checked just one platform/compiler combination so far.

I guess that initially this code path might have been added because of deficiency in old version of memcpy in macOS' libc, or gcc 4.2.

Could you recommend any specific benchmark that is signficantly affected by performance of copyChars?

> 
> Just out of curiosity, I wonder if we can use bitwise_cast here instead of
> reinterpret_cast.
> 
> Maybe, JF knows much about strict-aliasing, type punning and memcpy.
Comment 5 JF Bastien 2017-06-15 09:33:12 PDT
(In reply to Yusuke Suzuki from comment #3)
> (In reply to Konstantin Tokarev from comment #1)
> > I've run JSC benchmarks on Linux with GCC 7.1, performance results were
> > neutral after change of copyChars to do just memcpy
> 
> I like to use memcpy if it does not hurt performance.
> 
> Just out of curiosity, I wonder if we can use bitwise_cast here instead of
> reinterpret_cast.
> 
> Maybe, JF knows much about strict-aliasing, type punning and memcpy.

bitwise_cast wouldn't help here, the pointers would still alias. Making the pointer type char* would fix the problem, because char* is magical in C++ and ca alias everything.

memcpy sounds good if per isn't hurt.

Can source ever be the same as destination though? I want to make sure we don't need memmove instead.
Comment 6 Florian Bruhin 2017-06-15 09:58:12 PDT
If it's a char*, this would be equal to the "naive" bytewise copying below the #ifdef though, right?

If it's more than 20 chars (> s_copyCharsInlineCutOff), memcpy is already used, so if anything used copyChars with the same source/destination, that'd have to be only with strings guaranteed to be shorter, which seems quite a stretch I think?
Comment 7 JF Bastien 2017-06-15 10:01:47 PDT
(In reply to Florian Bruhin from comment #6)
> If it's a char*, this would be equal to the "naive" bytewise copying below
> the #ifdef though, right?

Correct.

> If it's more than 20 chars (> s_copyCharsInlineCutOff), memcpy is already
> used, so if anything used copyChars with the same source/destination, that'd
> have to be only with strings guaranteed to be shorter, which seems quite a
> stretch I think?

I'm not familiar with this part of the code, so it doesn't seem like a stretch to me :)
Comment 8 Konstantin Tokarev 2017-06-22 04:22:50 PDT
For the record, optimization that avoids memcpy was initially added by Darin in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008)
Comment 9 Yusuke Suzuki 2017-06-22 04:32:45 PDT
(In reply to Konstantin Tokarev from comment #8)
> For the record, optimization that avoids memcpy was initially added by Darin
> in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008)

OK, that note is very helpful because it clarifies that this optimization is driven by SunSpider.
Comment 10 Yusuke Suzuki 2017-06-22 04:45:53 PDT
(In reply to Yusuke Suzuki from comment #9)
> (In reply to Konstantin Tokarev from comment #8)
> > For the record, optimization that avoids memcpy was initially added by Darin
> > in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008)
> 
> OK, that note is very helpful because it clarifies that this optimization is
> driven by SunSpider.

I've just tested it on my Linux box.

VMs tested:
"baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/string-copy-tot/Release/bin/jsc
"patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/string-copy/Release/bin/jsc

Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc()
between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the
jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in milliseconds.

                                baseline                  patched                                      

string-base64                7.7544+-0.1761            7.6138+-0.2071          might be 1.0185x faster
string-fasta                10.5429+-0.2746     ?     10.7500+-0.2669        ? might be 1.0196x slower
string-tagcloud             14.8588+-0.2828           14.8039+-0.3039        
string-unpack-code          36.1769+-0.4251           35.3397+-0.5398          might be 1.0237x faster
string-validate-input        8.5182+-0.2206            8.3514+-0.2179          might be 1.0200x faster

<arithmetic>                15.5702+-0.1239           15.3718+-0.1452          might be 1.0129x faster


It seems there is no explicit difference right now. The compiler is GCC 5.4.0
I would like to see the difference in macOS's clang.
Comment 11 Darin Adler 2017-06-22 12:13:38 PDT
It’s great news if memcpy is now fast enough. I’d love to see my old awkward optimization removed if it no longer does any good.

In fact, I’m not sure we need a copyChars function at all. We could use memcpy directly as we once did, or even better ICU has u_memcpy if we want something that prevents the need to do "* sizeof(UChar)" at each call site. As long as u_memcpy has great performance too; presumably it expands into memcpy as an inline.
Comment 12 Konstantin Tokarev 2017-06-30 08:28:56 PDT
Ping?
Comment 13 Florian Bruhin 2017-07-02 05:52:12 PDT
FWIW Archlinux now added a patch which just uses memcpy to its qt5-webkit package:

https://git.archlinux.org/svntogit/packages.git/tree/trunk/qt5-webkit-gcc7.patch?h=packages/qt5-webkit

Meanwhile, a Fedora packager told me they didn't notice any issues with GCC 7.1.1... no idea why. :-/
Comment 14 Yusuke Suzuki 2017-07-05 01:57:25 PDT
Created attachment 314599 [details]
Patch
Comment 15 Yusuke Suzuki 2017-07-05 02:00:03 PDT
Comment on attachment 314599 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review

> Source/WTF/wtf/text/StringImpl.h:631
>      }

We still keep StringImpl::copyChars() because it is very useful to copy characters based on the number of chars.
If we use memcpy directly, we need to consider sizeof(UChar) and sizeof(LChar) carefully.

And StringImpl::copyChars(UChar*, const LChar*, unsigned) override is also useful.
Comment 16 Konstantin Tokarev 2017-07-05 03:28:34 PDT
Comment on attachment 314599 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review

> Source/WTF/wtf/text/StringImpl.h:-631
> -        if (numCharacters <= s_copyCharsInlineCutOff) {

I think we should also remove s_copyCharsInlineCutOff definition
Comment 17 Yusuke Suzuki 2017-07-05 03:31:28 PDT
Comment on attachment 314599 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review

>> Source/WTF/wtf/text/StringImpl.h:-631
>> -        if (numCharacters <= s_copyCharsInlineCutOff) {
> 
> I think we should also remove s_copyCharsInlineCutOff definition

Nice catch! I'll update the patch.
Comment 18 Yusuke Suzuki 2017-07-05 03:33:03 PDT
Created attachment 314600 [details]
Patch
Comment 19 JF Bastien 2017-07-05 06:35:27 PDT
Did anyone run numbers on MacOS?
Comment 20 Yusuke Suzuki 2017-07-05 06:45:46 PDT
(In reply to JF Bastien from comment #19)
> Did anyone run numbers on MacOS?

I don't measure it yet.
If nobody runs it until tomorrow's evening (JST), possibly I can run it ;) (my macOS box is placed at my office).
Comment 21 JF Bastien 2017-07-05 06:46:29 PDT
Ok I can give it a try later today.
Comment 22 JF Bastien 2017-07-05 09:55:53 PDT
On my iMac:

                                    before                    after                                       

3d-cube                         3.8573+-0.0502            3.7993+-0.0415          might be 1.0153x faster
3d-morph                        4.2964+-0.0214            4.2933+-0.0245        
3d-raytrace                     3.9897+-0.0476            3.9794+-0.0589        
access-binary-trees             1.8148+-0.0242            1.8113+-0.0155        
access-fannkuch                 4.5038+-0.0384            4.5000+-0.0476        
access-nbody                    2.2008+-0.0196     ?      2.2017+-0.0167        ?
access-nsieve                   1.9470+-0.0433     ?      1.9663+-0.0509        ?
bitops-3bit-bits-in-byte        1.0725+-0.0117            1.0721+-0.0118        
bitops-bits-in-byte             2.1698+-0.0342            2.1535+-0.0250        
bitops-bitwise-and              1.8784+-0.0138     ?      1.8907+-0.0219        ?
bitops-nsieve-bits              2.7890+-0.0119     ?      2.8228+-0.0349        ? might be 1.0121x slower
controlflow-recursive           2.0770+-0.0207            2.0688+-0.0231        
crypto-aes                      3.5230+-0.0352     ?      3.5641+-0.0508        ? might be 1.0117x slower
crypto-md5                      2.1610+-0.0176            2.1448+-0.0139        
crypto-sha1                     2.2058+-0.0256     ?      2.2185+-0.0236        ?
date-format-tofte               6.0776+-0.0427     ?      6.1917+-0.0917        ? might be 1.0188x slower
date-format-xparb               4.8437+-0.0344     ^      4.7681+-0.0408        ^ definitely 1.0159x faster
math-cordic                     2.4417+-0.0326            2.4224+-0.0170        
math-partial-sums               3.3532+-0.0203     ?      3.3763+-0.0432        ?
math-spectral-norm              1.6996+-0.0198            1.6885+-0.0164        
regexp-dna                      5.9056+-0.0450            5.8670+-0.0320        
string-base64                   3.6116+-0.0421            3.5795+-0.0302        
string-fasta                    5.0965+-0.0400            5.0413+-0.0540          might be 1.0110x faster
string-tagcloud                 7.0740+-0.0728     ?      7.0958+-0.0883        ?
string-unpack-code             16.5066+-0.1147     ?     16.6307+-0.1750        ?
string-validate-input           4.1960+-0.0615            4.1216+-0.0161          might be 1.0181x faster

<arithmetic>                    3.8959+-0.0079            3.8950+-0.0104          might be 1.0002x faster
Comment 23 Andreas Kling 2017-07-05 10:36:35 PDT
Comment on attachment 314600 [details]
Patch

r=me
Comment 24 Yusuke Suzuki 2017-07-05 19:03:32 PDT
Comment on attachment 314600 [details]
Patch

Thanks! OK, let's go.
Comment 25 WebKit Commit Bot 2017-07-05 19:31:39 PDT
Comment on attachment 314600 [details]
Patch

Clearing flags on attachment: 314600

Committed r219182: <http://trac.webkit.org/changeset/219182>
Comment 26 WebKit Commit Bot 2017-07-05 19:31:40 PDT
All reviewed patches have been landed.  Closing bug.
Comment 27 Darin Adler 2017-07-11 22:10:09 PDT
(In reply to Yusuke Suzuki from comment #15)
> We still keep StringImpl::copyChars() because it is very useful to copy
> characters based on the number of chars.
> If we use memcpy directly, we need to consider sizeof(UChar) and
> sizeof(LChar) carefully.

But why not use u_memcpy directly?

> And StringImpl::copyChars(UChar*, const LChar*, unsigned) override is also
> useful.

That makes sense.
Comment 28 Yusuke Suzuki 2017-07-11 22:28:33 PDT
(In reply to Darin Adler from comment #27)
> (In reply to Yusuke Suzuki from comment #15)
> > We still keep StringImpl::copyChars() because it is very useful to copy
> > characters based on the number of chars.
> > If we use memcpy directly, we need to consider sizeof(UChar) and
> > sizeof(LChar) carefully.
> 
> But why not use u_memcpy directly?

The problem is that memcpy is not designed for LChar. It is designed for arbitrary bytes. Thus it's signature is `void* memcpy(void*, const void*, size_t)`.
We cannot catch the case using memcpy for UChar as a compile error.
I think offering copyChars is safer, its overloading and templates can select appropriate memcpy based implementation automatically.