NEW 92475
Use 8 bit atomic strings where possible
https://bugs.webkit.org/show_bug.cgi?id=92475
Summary Use 8 bit atomic strings where possible
Dan Carney
Reported 2012-07-27 02:21:23 PDT
Use 8 bit atomic strings where possible
Attachments
Patch (6.46 KB, patch)
2012-07-27 02:31 PDT, Dan Carney
benjamin: review-
Dan Carney
Comment 1 2012-07-27 02:31:06 PDT
jochen
Comment 2 2012-07-27 02:45:01 PDT
Comment on attachment 154883 [details] Patch Can you post performance test results to see what impact the additional check on creating atomic strings has? View in context: https://bugs.webkit.org/attachment.cgi?id=154883&action=review > Source/WTF/wtf/text/AtomicString.cpp:126 > + } else else branch should also have { } > Source/WTF/wtf/text/AtomicString.cpp:217 > + if (buffer.utf16Length == buffer.length) if branch should have { } as well
Benjamin Poulain
Comment 3 2012-08-02 14:36:57 PDT
This is an interesting idea but I am missing how did we get here in the first place. I wonder if you are fixing this too late in the stack and we could improve previous layer to already use 8 bits characters. From your experiments, from where were those AtomicString initialized and why are they using 16bits string?
Benjamin Poulain
Comment 4 2012-08-02 14:47:18 PDT
Comment on attachment 154883 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=154883&action=review Before moving forward with this, please get some numbers on the performance gain: How much CPU time does the change cost in the positive and negative cases? How many times do we each each branch in real web pages? How many bytes saved (don't forget it is possible we do 16bits conversion later in which case your change causes extra memory, not less)? > Source/WTF/wtf/text/ASCIIFastPath.h:117 > +inline bool charactersAreAll8Bit(const UChar* characters, size_t length) > +{ > + MachineWord allCharBits = orAllCharacters(characters, length); > + MachineWord non8BitMask = Non8BitMask<sizeof(MachineWord)>::value(); > + return !(allCharBits & non8BitMask); > +} This is the wrong place for this function. The file is ASCIIFastPath, it is about the ASCII functions. > Source/WTF/wtf/text/AtomicString.cpp:117 > +static inline void constructMinimalStringImpl(StringImpl*& location, const UChar * buffer, unsigned length) const UChar * buffer -> coding style. This function should be in StringImpl, not in AtomicString. > Source/WTF/wtf/text/AtomicString.cpp:120 > + // optimize for 8 bit strings Bad formatting for your comment, see http://www.webkit.org/coding/coding-style.html. This comment is pretty useless anyway, you could remove it. > Source/WTF/wtf/text/AtomicString.cpp:122 > + location = StringImpl::createUninitialized(length, data8).leakRef(); This leakRef() makes the ownership difficult to follow. Please return a PassRefPtr from here instead. > Source/WTF/wtf/text/AtomicString.cpp:125 > + data8[i] = (LChar) data16[i]; Please use an explicit C++ cast, not a C cast. > Source/WTF/wtf/text/AtomicString.cpp:218 > + // optimize for ascii The style is wrong (see above). You can also remove this comment. > Source/WTF/wtf/text/AtomicString.cpp:219 > + location = StringImpl::create((const LChar*) buffer.characters, buffer.length).leakRef(); Cast (see above).
Benjamin Poulain
Comment 5 2012-08-02 15:02:11 PDT
> > Source/WTF/wtf/text/AtomicString.cpp:126 > > + } else > > else branch should also have { } This is incorrect. The style on that else was good. Dan, please ping me on IRC when you have an update. This patch looks promising.
Dan Carney
Comment 6 2012-08-03 00:08:22 PDT
(In reply to comment #3) > This is an interesting idea but I am missing how did we get here in the first place. > > I wonder if you are fixing this too late in the stack and we could improve previous layer to already use 8 bits characters. From your experiments, from where were those AtomicString initialized and why are they using 16bits string? In some ways, this is definitely the wrong place to execute the 8 bit check, but strings flow in from a bunch of places where no conversion to 8 bit strings has yet been done - most of the strings are css properties and dom element attributes. The reason that I want to inject 8 bit strings into the atomic table here is that these strings are often used in string composition downstream, and this means that all compositions are 16 bit as well, since there is no conversion from 16 bit to 8 bit anywhere, only the opposite. The memory savings I was hoping to achieve was more from composed strings. I can tell you now that on a typical page, almost 100% of the time, the 8 bit branch is taken, but I'll test the performance a bit and repost, particularly checking how often these strings get up converted to 16 bit and how often they are used in compositions. If it's often, pre-attaching the original buffer to the 8 bit string might be a good option that would actually lead to memory savings on compositions.
Note You need to log in before you can comment on or make changes to this bug.