Bug 60950

Summary: Port the SmallStrings cache to Weak<T>
Product: WebKit Reporter: Xan Lopez <xan.lopez>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: NEW    
Severity: Normal CC: barraclough, ggaren, oliver
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
weakptrsmallstrings.diff none

Xan Lopez
Reported 2011-05-17 03:54:39 PDT
A FIXME in Heap.cpp suggests that the SmallStrings cache could/should be ported to use Weak<T> instead of its own ad-hoc marking logic. I've done a rough patch that does this, here are some doubts/comments: - This seems straightforward enough save for one bit: the cache exposes a singleCharacterStrings() method that gives the JIT direct access to the JSString array where the one character strings are stored. Arguably this is done for performance reasons. - As a test to see if I could make it actually work I ported everything to Weak<T> and made the JIT try to get the JSString from inside the Weak<T> now exposed in the API. This seems to work (in some cases), but I'm not sure if it makes any sense. In particular, I understand that the data behind a Weak<T> can change at any moment as a side effect of GC, so perhaps it just does not make any sense to generate code that does this. - If that's the case, I wonder what kind of API the cache should expose. None? If it's not, would it be acceptable to do what I have tried? Comments welcome!
Attachments
weakptrsmallstrings.diff (21.08 KB, patch)
2011-05-18 14:01 PDT, Xan Lopez
no flags
Oliver Hunt
Comment 1 2011-05-17 10:14:46 PDT
I'm not sure who put that there, the small strings cache is by definition small (eg. single character strings) and needs to have very fast access in order for charCodeAt, string indexing, and other optimisations to function. If it's really necessary to have weak behaviour, we have sufficient control of this data to implement weak semantics directly in the buffer.
Geoffrey Garen
Comment 2 2011-05-17 16:46:46 PDT
> - As a test to see if I could make it actually work I ported everything to Weak<T> and made the JIT try to get the JSString from inside the Weak<T> now exposed in the API. That sounds fine. It just adds one pointer load to the cost of charCodeAt. Did you measure the performance of your patch? > This seems to work (in some cases), but I'm not sure if it makes any sense. In particular, I understand that the data behind a Weak<T> can change at any moment as a side effect of GC, so perhaps it just does not make any sense to generate code that does this. If the JIT loads the JSString from the Weak<T>, it won't matter if the JSString in the Weak<T> has changed -- the JIT will correctly see the new value. Is something going wrong in your code? It would be easier to diagnose if you posted your patch. > I'm not sure who put that there, I did. > If it's really necessary to have weak behaviour, we have sufficient control of this data to implement weak semantics directly in the buffer. We have that now, but the implementation is a bit awkward. If the cost of an extra pointer load in these cases is not measurable, I think it would be an improvement to switch over to the more general Weak<T> implementation.
Oliver Hunt
Comment 3 2011-05-17 18:36:50 PDT
(In reply to comment #2) > > - As a test to see if I could make it actually work I ported everything to Weak<T> and made the JIT try to get the JSString from inside the Weak<T> now exposed in the API. > > That sounds fine. It just adds one pointer load to the cost of charCodeAt. It's also used for string indexing, see the stringGetByValStubGenerator() function. We would have to maintain a live handle for every slot for all time, each handle is in the order of 64bytes so spans a cache line so the indirection is more likely to add an additional cache miss, and the you blow locality. When I originally wrote this code i found that it was fairly sensitive to exact instruction ordering. I'm not saying that there is guaranteed to be cost, but "just one pointer load" does a lot. It may have even more impact on an in-order architecture. > > Did you measure the performance of your patch? My recommended tests would be manually unrolled calls to charAt(), charCodeAt(), and simple string indexing. something akin to function test(str) { for (var i = 0; i < 100000; i++) { str[0]; str[1]; str[2]; str[3]; } } test("foobar"); > > > This seems to work (in some cases), but I'm not sure if it makes any sense. In particular, I understand that the data behind a Weak<T> can change at any moment as a side effect of GC, so perhaps it just does not make any sense to generate code that does this. > > If the JIT loads the JSString from the Weak<T>, it won't matter if the JSString in the Weak<T> has changed -- the JIT will correctly see the new value. > > Is something going wrong in your code? It would be easier to diagnose if you posted your patch. > > > I'm not sure who put that there, > > I did. > > > If it's really necessary to have weak behaviour, we have sufficient control of this data to implement weak semantics directly in the buffer. > > We have that now, but the implementation is a bit awkward. If the cost of an extra pointer load in these cases is not measurable, I think it would be an improvement to switch over to the more general Weak<T> implementation. I suspect this would also be a memory hit as each handle is 6 pointers + a JSValue so 8k on 32bit and 14k on 64bit platforms (assuming there's no alignment related padding happening in HandleHeap::Node).
Xan Lopez
Comment 4 2011-05-18 14:01:06 PDT
Created attachment 93980 [details] weakptrsmallstrings.diff Here's the WIP patch. As I said this makes ~40 tests crash, so I'm sure I'm doing something wrong. I guess either the way I use the weak pointers, or the way I unpack the JSString from them, or both. Comments welcome. BTW, I had to change the string cache to be a pointer type in the global data, otherwise there was a circular dependency when trying to use Weak in SmallStrings.h. (JSGlobalData.h -> SmallStrings.h -> Weak.h -> JSGlobalData.h)
Note You need to log in before you can comment on or make changes to this bug.