It would be useful in many situations to be able to use Strings across multiple threads, especially now that Workers, OffscreenCanvas and GPUProcess have become/are becoming things. Blink has already managed this (I think?), their plan is detailed here: https://docs.google.com/document/d/1hp7h6zfD5S6mnMI4cHe1PpacRnb2rNpRRkn1gBBJHuQ/edit#heading=h.erv1bv18t616 The founding part of making this work is to make ref-counting on WTF::StringImpl atomic. We could then have global string and symbol tables guarded by locks, and remove a bunch of now-unnecessary API. It looks like Google dealt with the performance degradation by optimising the worst-hit areas and adding special API for bulk creation of strings - certainly that latter part seems like a very reasonable idea, to me. Antti has suggested removing destruction from deref and having a separate 'GC' thread to perform periodic destruction to possibly optimise both performance and size - we should probably look at this after seeing the impact without it first. A rough, high-level plan; 1- Make ref/deref on StringImpl atomic and benchmark impact 2- Make string/symbol table static and guarded by a lock (already the case in some situations) 3- Remove now-unnecessary API (isolatedCopy, StringTableProvider...) X- Experiment with GC thread as a way of mitigating performance impact of 1/2 X- Identify and optimise badly-impacted areas I'm going to attempt to have a go at this, given its need in both OffscreenCanvas and GPUProcess. All suggestions/help/discussion appreciated.
Using a very simple ref/deref benchmark on StringImpl, changing m_refCount to std::atomic<unsigned> makes it run at about 35% of current speed on a release build on this Xeon machine. Making refs use relaxed memory ordering makes that 40%, making them both relaxed to gauge what a GC thread might give us in terms of ref/deref perf increase is about the same (which surprised me a bit...) Switching to an atomic ref count has a significant hit, the question will be how significant is it on real-world use-cases, once code has been refactored to take into account that Strings can be shared across threads now...
Created attachment 415120 [details] Patch
Created attachment 415130 [details] Patch
Created attachment 415133 [details] Patch
The attached patch is a first run at (1) and (2). speedtest goes from a 64.9 to a 60 on my machine (an 8% hit, which is above the error threshold) and Jetstream 1.1 is basically the same, I couldn't get Jetstream 2.0 to run (it hangs on bomb-workers). I haven't done any profiling yet to see where the hits are, but I think it's worth doing a first run at (3) before that.
Comment on attachment 415133 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415133&action=review > Source/WTF/wtf/text/StringImpl.h:1124 > + if (!(refCount - s_refCountIncrement)) { > + StringImpl::destroy(this); > + return; > + } I think this is not safe. Let's consider the following scenario. 1. There are two threads, A and B. 2. A thread deref-ed an atomic string "TEST", and about to calling StringImpl::destroy 3. But at (2), still the atomic string "TEST" is in AtomStringTable 4. B thread attempts to get "TEST" atomic string from the table, and B thread found (2)'s atomic string. Ref it and start using it. 5. A thread starts calling StringImpl::destroy and (2)'s string gets destroyed 6. Now, B thread is having destroyed StringImpl, UAF.
<rdar://problem/71896341>
I think blink's StringImpl is not thread-safe too. https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/platform/wtf/text/string_impl.h;l=267?q=StringImpl&ss=chromium