Bug 92475 - Use 8 bit atomic strings where possible
Summary: Use 8 bit atomic strings where possible
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-27 02:21 PDT by Dan Carney
Modified: 2012-08-03 00:08 PDT (History)
5 users (show)

See Also:


Attachments
Patch (6.46 KB, patch)
2012-07-27 02:31 PDT, Dan Carney
benjamin: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Carney 2012-07-27 02:21:23 PDT
Use 8 bit atomic strings where possible
Comment 1 Dan Carney 2012-07-27 02:31:06 PDT
Created attachment 154883 [details]
Patch
Comment 2 jochen 2012-07-27 02:45:01 PDT
Comment on attachment 154883 [details]
Patch

Can you post performance test results to see what impact the additional check on creating atomic strings has?

View in context: https://bugs.webkit.org/attachment.cgi?id=154883&action=review

> Source/WTF/wtf/text/AtomicString.cpp:126
> +    } else

else branch should also have { }

> Source/WTF/wtf/text/AtomicString.cpp:217
> +        if (buffer.utf16Length == buffer.length)

if branch should have { } as well
Comment 3 Benjamin Poulain 2012-08-02 14:36:57 PDT
This is an interesting idea but I am missing how did we get here in the first place.

I wonder if you are fixing this too late in the stack and we could improve previous layer to already use 8 bits characters. From your experiments, from where were those AtomicString initialized and why are they using 16bits string?
Comment 4 Benjamin Poulain 2012-08-02 14:47:18 PDT
Comment on attachment 154883 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=154883&action=review

Before moving forward with this, please get some numbers on the performance gain:

How much CPU time does the change cost in the positive and negative cases?
How many times do we each each branch in real web pages?
How many bytes saved (don't forget it is possible we do 16bits conversion later in which case your change causes extra memory, not less)?

> Source/WTF/wtf/text/ASCIIFastPath.h:117
> +inline bool charactersAreAll8Bit(const UChar* characters, size_t length)
> +{
> +    MachineWord allCharBits = orAllCharacters(characters, length);
> +    MachineWord non8BitMask = Non8BitMask<sizeof(MachineWord)>::value();
> +    return !(allCharBits & non8BitMask);
> +}

This is the wrong place for this function. The file is ASCIIFastPath, it is about the ASCII functions.

> Source/WTF/wtf/text/AtomicString.cpp:117
> +static inline void constructMinimalStringImpl(StringImpl*& location, const UChar * buffer, unsigned length)

const UChar * buffer -> coding style.

This function should be in StringImpl, not in AtomicString.

> Source/WTF/wtf/text/AtomicString.cpp:120
> +        // optimize for 8 bit strings

Bad formatting for your comment, see http://www.webkit.org/coding/coding-style.html.
This comment is pretty useless anyway, you could remove it.

> Source/WTF/wtf/text/AtomicString.cpp:122
> +        location = StringImpl::createUninitialized(length, data8).leakRef();

This leakRef() makes the ownership difficult to follow.
Please return a PassRefPtr from here instead.

> Source/WTF/wtf/text/AtomicString.cpp:125
> +            data8[i] = (LChar) data16[i];

Please use an explicit C++ cast, not a C cast.

> Source/WTF/wtf/text/AtomicString.cpp:218
> +            // optimize for ascii

The style is wrong (see above). You can also remove this comment.

> Source/WTF/wtf/text/AtomicString.cpp:219
> +            location = StringImpl::create((const LChar*) buffer.characters, buffer.length).leakRef();

Cast (see above).
Comment 5 Benjamin Poulain 2012-08-02 15:02:11 PDT
> > Source/WTF/wtf/text/AtomicString.cpp:126
> > +    } else
> 
> else branch should also have { }

This is incorrect. The style on that else was good.


Dan, please ping me on IRC when you have an update. This patch looks promising.
Comment 6 Dan Carney 2012-08-03 00:08:22 PDT
(In reply to comment #3)
> This is an interesting idea but I am missing how did we get here in the first place.
> 
> I wonder if you are fixing this too late in the stack and we could improve previous layer to already use 8 bits characters. From your experiments, from where were those AtomicString initialized and why are they using 16bits string?

In some ways, this is definitely the wrong place to execute the 8 bit check, but strings flow in from a bunch of places where no conversion to 8 bit strings has yet been done - most of the strings are css properties and dom element attributes.  The reason that I want to inject 8 bit strings into the atomic table here is that these strings are often used in string composition downstream, and this means that all compositions are 16 bit as well, since there is no conversion from 16 bit to 8 bit anywhere, only the opposite.  The memory savings I was hoping to achieve was more from composed strings.

I can tell you now that on a typical page, almost 100% of the time, the 8 bit branch is taken, but I'll test the performance a bit and repost, particularly checking how often these strings get up converted to 16 bit and how often they are used in compositions.  If it's often, pre-attaching the original buffer to the 8 bit string might be a good option that would actually lead to memory savings on compositions.