WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
92475
Use 8 bit atomic strings where possible
https://bugs.webkit.org/show_bug.cgi?id=92475
Summary
Use 8 bit atomic strings where possible
Dan Carney
Reported
2012-07-27 02:21:23 PDT
Use 8 bit atomic strings where possible
Attachments
Patch
(6.46 KB, patch)
2012-07-27 02:31 PDT
,
Dan Carney
benjamin
: review-
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Dan Carney
Comment 1
2012-07-27 02:31:06 PDT
Created
attachment 154883
[details]
Patch
jochen
Comment 2
2012-07-27 02:45:01 PDT
Comment on
attachment 154883
[details]
Patch Can you post performance test results to see what impact the additional check on creating atomic strings has? View in context:
https://bugs.webkit.org/attachment.cgi?id=154883&action=review
> Source/WTF/wtf/text/AtomicString.cpp:126 > + } else
else branch should also have { }
> Source/WTF/wtf/text/AtomicString.cpp:217 > + if (buffer.utf16Length == buffer.length)
if branch should have { } as well
Benjamin Poulain
Comment 3
2012-08-02 14:36:57 PDT
This is an interesting idea but I am missing how did we get here in the first place. I wonder if you are fixing this too late in the stack and we could improve previous layer to already use 8 bits characters. From your experiments, from where were those AtomicString initialized and why are they using 16bits string?
Benjamin Poulain
Comment 4
2012-08-02 14:47:18 PDT
Comment on
attachment 154883
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=154883&action=review
Before moving forward with this, please get some numbers on the performance gain: How much CPU time does the change cost in the positive and negative cases? How many times do we each each branch in real web pages? How many bytes saved (don't forget it is possible we do 16bits conversion later in which case your change causes extra memory, not less)?
> Source/WTF/wtf/text/ASCIIFastPath.h:117 > +inline bool charactersAreAll8Bit(const UChar* characters, size_t length) > +{ > + MachineWord allCharBits = orAllCharacters(characters, length); > + MachineWord non8BitMask = Non8BitMask<sizeof(MachineWord)>::value(); > + return !(allCharBits & non8BitMask); > +}
This is the wrong place for this function. The file is ASCIIFastPath, it is about the ASCII functions.
> Source/WTF/wtf/text/AtomicString.cpp:117 > +static inline void constructMinimalStringImpl(StringImpl*& location, const UChar * buffer, unsigned length)
const UChar * buffer -> coding style. This function should be in StringImpl, not in AtomicString.
> Source/WTF/wtf/text/AtomicString.cpp:120 > + // optimize for 8 bit strings
Bad formatting for your comment, see
http://www.webkit.org/coding/coding-style.html
. This comment is pretty useless anyway, you could remove it.
> Source/WTF/wtf/text/AtomicString.cpp:122 > + location = StringImpl::createUninitialized(length, data8).leakRef();
This leakRef() makes the ownership difficult to follow. Please return a PassRefPtr from here instead.
> Source/WTF/wtf/text/AtomicString.cpp:125 > + data8[i] = (LChar) data16[i];
Please use an explicit C++ cast, not a C cast.
> Source/WTF/wtf/text/AtomicString.cpp:218 > + // optimize for ascii
The style is wrong (see above). You can also remove this comment.
> Source/WTF/wtf/text/AtomicString.cpp:219 > + location = StringImpl::create((const LChar*) buffer.characters, buffer.length).leakRef();
Cast (see above).
Benjamin Poulain
Comment 5
2012-08-02 15:02:11 PDT
> > Source/WTF/wtf/text/AtomicString.cpp:126 > > + } else > > else branch should also have { }
This is incorrect. The style on that else was good. Dan, please ping me on IRC when you have an update. This patch looks promising.
Dan Carney
Comment 6
2012-08-03 00:08:22 PDT
(In reply to
comment #3
)
> This is an interesting idea but I am missing how did we get here in the first place. > > I wonder if you are fixing this too late in the stack and we could improve previous layer to already use 8 bits characters. From your experiments, from where were those AtomicString initialized and why are they using 16bits string?
In some ways, this is definitely the wrong place to execute the 8 bit check, but strings flow in from a bunch of places where no conversion to 8 bit strings has yet been done - most of the strings are css properties and dom element attributes. The reason that I want to inject 8 bit strings into the atomic table here is that these strings are often used in string composition downstream, and this means that all compositions are 16 bit as well, since there is no conversion from 16 bit to 8 bit anywhere, only the opposite. The memory savings I was hoping to achieve was more from composed strings. I can tell you now that on a typical page, almost 100% of the time, the 8 bit branch is taken, but I'll test the performance a bit and repost, particularly checking how often these strings get up converted to 16 bit and how often they are used in compositions. If it's often, pre-attaching the original buffer to the 8 bit string might be a good option that would actually lead to memory savings on compositions.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug