I've found a few ways to reduce the memory usage of the ubiquitous WebCore::StringImpl: * Remove the unnecessary m_bufferIsInternal member -- saves 4 bytes. * Use a 16-bit flag member instead of 32-bit m_sharedBufferAndFlags if not using JSC -- saves 2 bytes, or 6 in 64-bit. * Make copy() and createWithTerminatingNullCharacter() create the string in a single malloc block (using the create() method) instead of 2 -- saves ~20 bytes and considerable CPU cycles, and increases locality of reference.
Created attachment 39786 [details] patch
Darin is probably your best bet.
I started looking this over and have a few questions and comments. I'll try to get to it as soon as possible.
Comment on attachment 39786 [details] patch StringImpl(const UChar*, unsigned length, unsigned hash); copies the buffer. StringImpl(const UChar*, unsigned length); doesn't. It is very surprising that the buffer is either adopted or copied depending on whether I pass in the hash value for the string or not. net: I recommend that you bring back the AdoptBuffer parameter for the constructor that does adopt the buffer. > Index: WebCore/ChangeLog > + - Remove unnecessary m_bufferIsInternal member (saves 4 bytes). Great stuff! > + - Use 16-bit flag member if not using JSC (saves 2 bytes, or 6 in 64-bit). This doesn't seem useful for several reasons: 1. StringImpl has a pointer in it and ends with a m_buffer[], so the m_flags member be have padding after it for alignment reasons, so there is no space savings here. 2. Although I've not yet had a chance to do so :(, this field will also be used for sharing the buffers across threads which will also be useful to platforms that don't use jsc. Not having this would put those platforms at a perf disadvantage when they have to allocate and copy strings for these scenarios. 3. This part of the change makes the code more complicated to understand and maintain (This alone wouldn't be a reason to avoid this but is the final straw after 1 and 2.) oth, I think "unsigned" defaults to 32 bits under 64-bit builds. (Unless the compiler uses an ILP64 model and I'm pretty sure xcode's gcc does LP64 and Win64 uses LLP64 which both treat int's as 32 bits.) If it is 32 bits, then moving the two unsigned field (m_length, m_hash) next two each other could save space for that 64bit platforms. > Index: WebCore/platform/text/StringImpl.cpp > - > + Please don't do misc trailing whitespace changes. > PassRefPtr<StringImpl> StringImpl::createUninitialized(unsigned length, UChar*& data) > { > if (!length) { > @@ -1006,11 +982,10 @@ PassRefPtr<StringImpl> StringImpl::creat It seems that the only thing that needs to be done is this: > - StringImpl* string = new (buffer) StringImpl(data, length, AdoptBuffer()); > - string->m_bufferIsInternal = true; > + string = new (buffer) StringImpl(data, length); The rest just changes 3 lines of code into 3 lines of code (and I find the previous 3 lines of code easier to read, so this is pretty subjective). Net: Please do the minimal change here. > PassRefPtr<StringImpl> StringImpl::createWithTerminatingNullCharacter(const StringImpl& string) > { > - return adoptRef(new StringImpl(string, WithTerminatingNullCharacter())); > + // Use createUninitialized instead of 'new StringImpl' so that the string and its buffer > + // get allocated in a single malloc block. > + UChar* data; > + int length = string.m_length; > + PassRefPtr<StringImpl> paddedString = createUninitialized(length + 1, data); Use RefPtr within functions not PassRefPtr (and then do release when returning it). Seems like terminatedString would be more appropriate than paddedString. > PassRefPtr<StringImpl> StringImpl::copy() > { > - // Using the constructor directly to make sure that per-thread empty string instance isn't returned. > - return adoptRef(new StringImpl(m_data, m_length)); > + // Special-case empty string to make sure that per-thread empty string instance isn't returned. > + if (!m_length) > + return adoptRef(new StringImpl()); > + else > + return create(m_data, m_length); Don't add an else when it isn't needed (if there is a return right before it). (very optional) Since I suspect that m_length > 0 more often than it is 0, I would reverse the conditions here (but this probably won't be noticeable at all given the allocations that will happen). if (m_length) return create(m_data, m_length); return adoptRef(new StringImpl()); > Index: WebCore/platform/text/StringImpl.h > - > + WebKit specifically does not care about trailing whitespace so please don't remove it. It only adds to the size of your patch and the number of diffs. > + // We usually allocate the StringImpl struct and its data usually? seems like it could change. Why not just state that this flag "Indicates when the StringImpl and m_data were created with a single allocation." > + // within a single heap buffer. In this case, the m_data pointer > + // is an "internal buffer", and does not need to be deallocated. > + bool bufferIsInternal() { return m_data == &m_buffer[0]; } > + inline bool isFlagSet(StringImplFlags flag) const; > + inline void setFlag(StringImplFlags flag); "flag" param name doesn't add any information so please remove it. (In fact I suspect this change will go away complete when the if USE(JSC) stuff is removed. > + const UChar m_buffer[0]; // m_data often points here (past the end of the object.) "often" seems subjective/relative and may easily change. > +inline bool StringImpl::isFlagSet(StringImplFlags flag) const > +{ > + return (m_flags & (1 << flag)) != 0; Don't do comparisons to 0. btw, check-webkit-style would have caught a few of the things I mentioning here. (This should be one of them.) Of course this code should go away...
>StringImpl(const UChar*, unsigned length, unsigned hash); copies the buffer. >StringImpl(const UChar*, unsigned length); doesn't. Good point. This patch is actually only half of my changes, the rest of which I'll provide later. I was urged to keep the patches as small as possible. In the next change the constructor with the hash is removed entirely, leaving the remaining one unambiguous. But I can move the removal of AdoptBuffer to the second patch. >1. StringImpl has a pointer in it and ends with a m_buffer[], so the m_flags >member be have padding after it for alignment reasons, so there is no space >savings here. No, actually. I even made sure of this in gdb after making the change. Each field is aligned based on its size, so 16-bit fields are aligned to 16-bit boundaries. Since both m_flags and m_buffer are 16-bit quantities (even though m_buffer is zero-size) there is no padding. >Although I've not yet had a chance to do so :(, this field will also be used >for sharing the buffers across threads which will also be useful to platforms >that don't use jsc. When you do this, it will be really easy to take out the #if USE(JSC) stuff. But this change I've made will save some memory right now. Since all future changes are speculative, it seems better to get a concrete improvement in now, just in case the future change doesn't happen (or takes a long time.) > This part of the change makes the code more complicated to understand and maintain Hm, I thought it simplified it! The point is that instead of mucking around with byte offsets and sizeof, there is now an explicit field m_buffer that points to where the internal string buffer goes. But if you're the maintainer of this code I won't argue with you about that. > If it is 32 bits, then moving the two unsigned field (m_length, m_hash) next > two each other could save space for that 64bit platforms. Good idea! I'll incorporate that. > Don't add an else when it isn't needed (if there is a return right before it). Wow, that really is in the WebKit style guide; huh. I've worked on other projects that did the opposite, discouraging early returns when an else is available. I think it's clearer as I wrote it, but I'll change it to match the guide.
>Use RefPtr within functions not PassRefPtr > (and then do release when returning it). But many of the existing methods in this file (upper, lower, foldCase, four variants of replace, two variants of create) already use PassRefPtr this way; I was copying what they did.
Yeah. :( They're wrong. They probably came into being before we standardized the behavior. You should feel free to fix them. I think this is documented in Darin's RefPtr article: http://webkit.org/coding/RefPtr.html
> This part of the change makes the code more complicated to understand and maintain I looked at this some more, and it is actually necessary to use the offset of m_buffer, as my change does, instead of sizeof(StringImpl) as the offset of the character data. This is because, while the position of m_buffer is only padded to 16 bits (providing that massive 2-byte savings), the size of the whole object is still padded to 32 bits. From an actual printf I dropped in, when run in Chrome: sizeof StringImpl = 28, offset of m_buffer = 26
Created attachment 40324 [details] patch #2 New patch incorporating feedback, and up-to-date with r48892.
Comment on attachment 40324 [details] patch #2 > + , m_buffer() What does this line of code do? Does it have any effect at all? Can we leave it out? > - // Using the constructor directly to make sure that per-thread empty string instance isn't returned. > - return adoptRef(new StringImpl(m_data, m_length)); > + // Special-case empty strings to make sure that per-thread empty string instance isn't returned. > + if (m_length > 0) > + return create(m_data, m_length); > + return adoptRef(new StringImpl()); We'd normally put the unusual case (zero-length string) first in the nested if. I'd normally omit the () after StringImpl since it's not needed. > - bool startsWith(StringImpl* m_data, bool caseSensitive = true) { return reverseFind(m_data, 0, caseSensitive) == 0; } > + bool startsWith(StringImpl* impl, bool caseSensitive = true) { return reverseFind(impl, 0, caseSensitive) == 0; } I think it's confusing to call this other string "impl". Calling it m_data was much worse, but still, it's not the impl, it's another string! > + inline bool isFlagSet(StringImplFlags flag) const; > + inline void setFlag(StringImplFlags flag); I don't believe the "inline" in these is needed. WebKit coding style requires omitting the argument name "flag" here. I also think it's confusing that the enum type StringImplFlags is a type used not for flags, but for the flag number of a single flag! What's the performance impact of this? I'm going to say review- even though my comments were minor. They should be easy to resolve.
Darin, > What does this line of code do? Does it have any effect at all? It has no effect, other than to suppress an uninitialized-member warning (I think that comes with the -Weffective-cplusplus package.) I sometimes turn those warnings on during development to find lurking errors. > We'd normally put the unusual case (zero-length string) first in the nested if. Heh, I had it that way at first but changed it at Eric's request ("Since I suspect that m_length > 0 more often than it is 0, I would reverse the conditions here...") > I don't believe the "inline" in these is needed. I like to put it in as a hint to the human reader, but if it's non-canon for WebKit I'll remove it. >it's confusing that the enum type StringImplFlags is a type used not for >flags, but for the flag number of a single flag! Agreed, but that enum already existed. It's a shame it can't be used as the type of m_flags, but enum sizes are compiler-dependent, and making this field 16-bit is the whole point of the change. Renaming the type "StringImplFlag" might help... >What's the performance impact of this? The only source of CPU overhead at all would be a few extra instructions in bufferIsInternal() compared to testing m_bufferIsInternal; but on the other hand you save having to initialize the variable. On the plus side, there are small memory savings by shrinking the object, and more significant CPU and memory savings from eliminating an extra malloc in the copy() and createWithTerminatingNull calls. (FYI, my next patch will be to make AtomicString always allocate its impl with a single malloc instead of two.) These days I'm a lot more concerned about memory usage and cache coherency than CPU cycles. I'll put up a new patch in a few minutes.
Correction: the initialization of m_buffer is actually required, otherwise build-webkit fails with an error: StringImpl.cpp:82: error: uninitialized member ‘WebCore::StringImpl::m_buffer’ with ‘const’ type ‘const UChar [0u]’ This is because it's marked 'const' (to prevent mutation of the string contents.)
Created attachment 40541 [details] patch #3
Comment on attachment 40541 [details] patch #3 The changelog is out of date in several aspects. 1. It doesn't mention anything about moving "unsigned m_length;" 2. > + (WebCore::StringImpl::createUninitialized): Use new m_buffer member to make the code > + a little bit clearer. Actually this isn't correct according to your comments in the bug. 3. It has several comments about functions no longer modified. Perhaps other things that I didn't catch. It would be good to go over it and bring it up to date.
Created attachment 40731 [details] patch #4 (no longer attempting to get rid of m_sharedBufferAndFlags) Synced with top-of-tree. Since m_sharedBufferAndFlags is now being used for purposes other than JSC integration, I can't #ifdef it out anymore, so I removed that part of the patch. :( Updated the changelog to correspond with the current changes.
Comment on attachment 40731 [details] patch #4 (no longer attempting to get rid of m_sharedBufferAndFlags) > + - Remove unnecessary m_bufferIsInternal member (saves 4 bytes). Instead, check whether > + m_data points to just pas the end of the object's members. "pas the end" > - bool hasTerminatingNullCharacter() const { return m_sharedBufferAndFlags.isFlagSet(HasTerminatingNullCharacter); } > + bool hasTerminatingNullCharacter() const { return isFlagSet(HasTerminatingNullCharacter); } This change is no longer needed. > > - bool inTable() const { return m_sharedBufferAndFlags.isFlagSet(InTable); } > - void setInTable() { return m_sharedBufferAndFlags.setFlag(InTable); } > + bool inTable() const { return isFlagSet(InTable); } > + void setInTable() { setFlag(InTable); } This change is no longer needed. > + bool isFlagSet(StringImplFlags) const; > + void setFlag(StringImplFlags); These are no longer needed. > + // m_buffer is declared with zero size; the actual size is determined when the instance > + // is created. It will be zero unless using an "internal buffer", in which case m_data > + // will point to m_buffer and the length of m_buffer will be equal to m_length. > + const UChar m_buffer[0]; Do all the compilers we need to compile with support zero-size? I ask because I'm pretty sure it's not allowed in standard C++. In the past I've had to use a length of 1 instead. Are there calls to new StringImpl left? If so, did you check they allocate the correct size. I'm going to say review- so you can remove the no-longer-needed indirection for flag getting and setting. Otherwise this is looking fine.
Created attachment 40744 [details] patch #5 (remove flag accessors) OK, I've removed the abstracted flag getter/setters. I looked into the empty-array thing -- it looks like specifying an array without size ("UChar m_buffer[];") is better. It's an official part of C99 and appear to have de facto support in real-world C++ compilers, including GCC and MSVC++. If there's a compiler that doesn't support it, it would be easy to add an #ifdef for that compiler and substitute an array size of 1. That will just end up wasting two bytes (plus any rounding) in the case of StringImpls with separate buffers.
Comment on attachment 40744 [details] patch #5 (remove flag accessors) > + // The StringImpl struct and its data may be allocated within a single heap block. > + // In this case, the m_data pointer is an "internal buffer", and does not need to be deallocated. > + bool bufferIsInternal() { return m_data == &m_buffer[0]; } I realized you can write this as m_data == m_buffer without the &[0] part. r=me
>I realized you can write this as m_data == m_buffer without the &[0] part. Probably; but there are some contexts where an array doesn't act just like the address of its first member, which I find confusing, so I like to be explicit about it. You (or whoever checks it in) can change it if you prefer, though.
Comment on attachment 40744 [details] patch #5 (remove flag accessors) Clearing flags on attachment: 40744 Committed r49272: <http://trac.webkit.org/changeset/49272>
All reviewed patches have been landed. Closing bug.