Summary: | WTF::StringImpl::copyChars segfaults when built with GCC 7 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Florian Bruhin <webkit.org> | ||||||
Component: | Web Template Framework | Assignee: | Yusuke Suzuki <ysuzuki> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Major | CC: | annulen, benjamin, buildbot, cdumez, cmarcelo, commit-queue, darin, dbates, jfbastien, kling, msaboff, ysuzuki | ||||||
Priority: | P2 | ||||||||
Version: | Other | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Linux | ||||||||
Attachments: |
|
Description
Florian Bruhin
2017-06-15 02:04:00 PDT
I've run JSC benchmarks on Linux with GCC 7.1, performance results were neutral after change of copyChars to do just memcpy So then this essentially boils down to whether it should be replaced by a simple memcpy() only for GCC >= 7, or for other compilers as well? (Unless there's another solution to this I'm missing - right now the only option I see is to get rid of the "optimized" part) (In reply to Konstantin Tokarev from comment #1) > I've run JSC benchmarks on Linux with GCC 7.1, performance results were > neutral after change of copyChars to do just memcpy I like to use memcpy if it does not hurt performance. Just out of curiosity, I wonder if we can use bitwise_cast here instead of reinterpret_cast. Maybe, JF knows much about strict-aliasing, type punning and memcpy. (In reply to Yusuke Suzuki from comment #3) > (In reply to Konstantin Tokarev from comment #1) > > I've run JSC benchmarks on Linux with GCC 7.1, performance results were > > neutral after change of copyChars to do just memcpy > > I like to use memcpy if it does not hurt performance. I've checked just one platform/compiler combination so far. I guess that initially this code path might have been added because of deficiency in old version of memcpy in macOS' libc, or gcc 4.2. Could you recommend any specific benchmark that is signficantly affected by performance of copyChars? > > Just out of curiosity, I wonder if we can use bitwise_cast here instead of > reinterpret_cast. > > Maybe, JF knows much about strict-aliasing, type punning and memcpy. (In reply to Yusuke Suzuki from comment #3) > (In reply to Konstantin Tokarev from comment #1) > > I've run JSC benchmarks on Linux with GCC 7.1, performance results were > > neutral after change of copyChars to do just memcpy > > I like to use memcpy if it does not hurt performance. > > Just out of curiosity, I wonder if we can use bitwise_cast here instead of > reinterpret_cast. > > Maybe, JF knows much about strict-aliasing, type punning and memcpy. bitwise_cast wouldn't help here, the pointers would still alias. Making the pointer type char* would fix the problem, because char* is magical in C++ and ca alias everything. memcpy sounds good if per isn't hurt. Can source ever be the same as destination though? I want to make sure we don't need memmove instead. If it's a char*, this would be equal to the "naive" bytewise copying below the #ifdef though, right? If it's more than 20 chars (> s_copyCharsInlineCutOff), memcpy is already used, so if anything used copyChars with the same source/destination, that'd have to be only with strings guaranteed to be shorter, which seems quite a stretch I think? (In reply to Florian Bruhin from comment #6) > If it's a char*, this would be equal to the "naive" bytewise copying below > the #ifdef though, right? Correct. > If it's more than 20 chars (> s_copyCharsInlineCutOff), memcpy is already > used, so if anything used copyChars with the same source/destination, that'd > have to be only with strings guaranteed to be shorter, which seems quite a > stretch I think? I'm not familiar with this part of the code, so it doesn't seem like a stretch to me :) For the record, optimization that avoids memcpy was initially added by Darin in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008) (In reply to Konstantin Tokarev from comment #8) > For the record, optimization that avoids memcpy was initially added by Darin > in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008) OK, that note is very helpful because it clarifies that this optimization is driven by SunSpider. (In reply to Yusuke Suzuki from comment #9) > (In reply to Konstantin Tokarev from comment #8) > > For the record, optimization that avoids memcpy was initially added by Darin > > in https://bugs.webkit.org/show_bug.cgi?id=20671 (2008) > > OK, that note is very helpful because it clarifies that this optimization is > driven by SunSpider. I've just tested it on my Linux box. VMs tested: "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/string-copy-tot/Release/bin/jsc "patched" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/string-copy/Release/bin/jsc Collected 100 samples per benchmark/VM, with 100 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched string-base64 7.7544+-0.1761 7.6138+-0.2071 might be 1.0185x faster string-fasta 10.5429+-0.2746 ? 10.7500+-0.2669 ? might be 1.0196x slower string-tagcloud 14.8588+-0.2828 14.8039+-0.3039 string-unpack-code 36.1769+-0.4251 35.3397+-0.5398 might be 1.0237x faster string-validate-input 8.5182+-0.2206 8.3514+-0.2179 might be 1.0200x faster <arithmetic> 15.5702+-0.1239 15.3718+-0.1452 might be 1.0129x faster It seems there is no explicit difference right now. The compiler is GCC 5.4.0 I would like to see the difference in macOS's clang. It’s great news if memcpy is now fast enough. I’d love to see my old awkward optimization removed if it no longer does any good. In fact, I’m not sure we need a copyChars function at all. We could use memcpy directly as we once did, or even better ICU has u_memcpy if we want something that prevents the need to do "* sizeof(UChar)" at each call site. As long as u_memcpy has great performance too; presumably it expands into memcpy as an inline. Ping? FWIW Archlinux now added a patch which just uses memcpy to its qt5-webkit package: https://git.archlinux.org/svntogit/packages.git/tree/trunk/qt5-webkit-gcc7.patch?h=packages/qt5-webkit Meanwhile, a Fedora packager told me they didn't notice any issues with GCC 7.1.1... no idea why. :-/ Created attachment 314599 [details]
Patch
Comment on attachment 314599 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review > Source/WTF/wtf/text/StringImpl.h:631 > } We still keep StringImpl::copyChars() because it is very useful to copy characters based on the number of chars. If we use memcpy directly, we need to consider sizeof(UChar) and sizeof(LChar) carefully. And StringImpl::copyChars(UChar*, const LChar*, unsigned) override is also useful. Comment on attachment 314599 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review > Source/WTF/wtf/text/StringImpl.h:-631 > - if (numCharacters <= s_copyCharsInlineCutOff) { I think we should also remove s_copyCharsInlineCutOff definition Comment on attachment 314599 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314599&action=review >> Source/WTF/wtf/text/StringImpl.h:-631 >> - if (numCharacters <= s_copyCharsInlineCutOff) { > > I think we should also remove s_copyCharsInlineCutOff definition Nice catch! I'll update the patch. Created attachment 314600 [details]
Patch
Did anyone run numbers on MacOS? (In reply to JF Bastien from comment #19) > Did anyone run numbers on MacOS? I don't measure it yet. If nobody runs it until tomorrow's evening (JST), possibly I can run it ;) (my macOS box is placed at my office). Ok I can give it a try later today. On my iMac: before after 3d-cube 3.8573+-0.0502 3.7993+-0.0415 might be 1.0153x faster 3d-morph 4.2964+-0.0214 4.2933+-0.0245 3d-raytrace 3.9897+-0.0476 3.9794+-0.0589 access-binary-trees 1.8148+-0.0242 1.8113+-0.0155 access-fannkuch 4.5038+-0.0384 4.5000+-0.0476 access-nbody 2.2008+-0.0196 ? 2.2017+-0.0167 ? access-nsieve 1.9470+-0.0433 ? 1.9663+-0.0509 ? bitops-3bit-bits-in-byte 1.0725+-0.0117 1.0721+-0.0118 bitops-bits-in-byte 2.1698+-0.0342 2.1535+-0.0250 bitops-bitwise-and 1.8784+-0.0138 ? 1.8907+-0.0219 ? bitops-nsieve-bits 2.7890+-0.0119 ? 2.8228+-0.0349 ? might be 1.0121x slower controlflow-recursive 2.0770+-0.0207 2.0688+-0.0231 crypto-aes 3.5230+-0.0352 ? 3.5641+-0.0508 ? might be 1.0117x slower crypto-md5 2.1610+-0.0176 2.1448+-0.0139 crypto-sha1 2.2058+-0.0256 ? 2.2185+-0.0236 ? date-format-tofte 6.0776+-0.0427 ? 6.1917+-0.0917 ? might be 1.0188x slower date-format-xparb 4.8437+-0.0344 ^ 4.7681+-0.0408 ^ definitely 1.0159x faster math-cordic 2.4417+-0.0326 2.4224+-0.0170 math-partial-sums 3.3532+-0.0203 ? 3.3763+-0.0432 ? math-spectral-norm 1.6996+-0.0198 1.6885+-0.0164 regexp-dna 5.9056+-0.0450 5.8670+-0.0320 string-base64 3.6116+-0.0421 3.5795+-0.0302 string-fasta 5.0965+-0.0400 5.0413+-0.0540 might be 1.0110x faster string-tagcloud 7.0740+-0.0728 ? 7.0958+-0.0883 ? string-unpack-code 16.5066+-0.1147 ? 16.6307+-0.1750 ? string-validate-input 4.1960+-0.0615 4.1216+-0.0161 might be 1.0181x faster <arithmetic> 3.8959+-0.0079 3.8950+-0.0104 might be 1.0002x faster Comment on attachment 314600 [details]
Patch
r=me
Comment on attachment 314600 [details]
Patch
Thanks! OK, let's go.
Comment on attachment 314600 [details] Patch Clearing flags on attachment: 314600 Committed r219182: <http://trac.webkit.org/changeset/219182> All reviewed patches have been landed. Closing bug. (In reply to Yusuke Suzuki from comment #15) > We still keep StringImpl::copyChars() because it is very useful to copy > characters based on the number of chars. > If we use memcpy directly, we need to consider sizeof(UChar) and > sizeof(LChar) carefully. But why not use u_memcpy directly? > And StringImpl::copyChars(UChar*, const LChar*, unsigned) override is also > useful. That makes sense. (In reply to Darin Adler from comment #27) > (In reply to Yusuke Suzuki from comment #15) > > We still keep StringImpl::copyChars() because it is very useful to copy > > characters based on the number of chars. > > If we use memcpy directly, we need to consider sizeof(UChar) and > > sizeof(LChar) carefully. > > But why not use u_memcpy directly? The problem is that memcpy is not designed for LChar. It is designed for arbitrary bytes. Thus it's signature is `void* memcpy(void*, const void*, size_t)`. We cannot catch the case using memcpy for UChar as a compile error. I think offering copyChars is safer, its overloading and templates can select appropriate memcpy based implementation automatically. |