Bug 33222 - Make text decoders and resource loaders read from segmented SharedBuffer
Summary: Make text decoders and resource loaders read from segmented SharedBuffer
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-05 11:11 PST by Yong Li
Modified: 2010-03-16 18:40 PDT (History)
2 users (show)

See Also:


Attachments
the patch (7.97 KB, patch)
2010-01-05 11:32 PST, Yong Li
eric: review-
Details | Formatted Diff | Diff
Make TextResourceDecoder read from segmented SharedBuffer (5.71 KB, patch)
2010-01-06 09:13 PST, Yong Li
no flags Details | Formatted Diff | Diff
Let ResourceLoader delivery segmented data (4.27 KB, patch)
2010-01-06 09:45 PST, Yong Li
eric: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yong Li 2010-01-05 11:11:32 PST
Make CachedScript, CSS Style Sheet, XSL Style Sheet and loaders work with segmented SharedBuffer

See bug 33178

patch is coming
Comment 1 Yong Li 2010-01-05 11:32:54 PST
Created attachment 45914 [details]
the patch
Comment 2 WebKit Review Bot 2010-01-05 11:36:09 PST
style-queue ran check-webkit-style on attachment 45914 [details] without any errors.
Comment 3 Eric Seidel (no email) 2010-01-05 12:16:23 PST
Comment on attachment 45914 [details]
the patch

I think it woudl be better to write two new functions.  One which does this loop and calls didRecievData on a passed in subresourceloader, and the other which does the loop and calls decode on a passed in decoder and returns the final string.  That saves us from any copy/paste code.
Comment 4 Darin Adler 2010-01-05 12:17:40 PST
Comment on attachment 45914 [details]
the patch

> -    String sheetText = m_decoder->decode(m_data->data(), m_data->size());
> +    String sheetText;
> +    const char* segment;
> +    unsigned pos = 0;

Could you avoid the abbreviation "pos" here? I like "offset" for this, myself.

> +    while (unsigned length = m_data->getSomeData(segment, pos)) {

It's unfortunate this operation is O(n log n) in the number of segments, having to find the data segment each time based on the passed-in offset, walking the list of segments. I wish the API of SharedBuffer didn't force this, although it's probably not a real concern.

Rather than repeating this decoding idiom over and over again, could you put a helper function somewhere that encapsulates the process of decoding a SharedBuffer into a String.

Appending to a String is a slow operation that requires reallocating the buffer each time. Instead we should append to a Vector<UChar> and convert to a String only at the end. Making a single function for this will help us make sure it works in a way that's optimal.

Another good optimization would be estimating the decoded size based on the size of the SharedBuffer to keep the number of reallocations to a minimum. This should be straightforward. We could start by assuming that each byte encodes an average of about one character.


> -            loader->didReceiveData(data->data(), data->size(), data->size(), true);
> +            const char* segment;
> +            unsigned pos = 0;
> +            int received = 0;
> +            while (unsigned length = data->getSomeData(segment, pos)) {
> +                pos += length;
> +                loader->didReceiveData(segment, length, pos, false);
> +            }

After this patch, is there any code anywhere passing true for the last argument to didReceiveData? If not, then we should remove that argument and the dead code.
Comment 5 Yong Li 2010-01-05 13:16:40 PST
Thanks, Eric, Darin, I'll try to improve it.
Comment 6 Yong Li 2010-01-06 09:13:34 PST
Created attachment 45965 [details]
Make TextResourceDecoder read from segmented SharedBuffer

How about this one?
Comment 7 WebKit Review Bot 2010-01-06 09:15:21 PST
Attachment 45965 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
Traceback (most recent call last):
  File "WebKitTools/Scripts/check-webkit-style", line 98, in <module>
    main()
  File "WebKitTools/Scripts/check-webkit-style", line 62, in main
    defaults = style.ArgumentDefaults(style.DEFAULT_OUTPUT_FORMAT,
AttributeError: 'module' object has no attribute 'ArgumentDefaults'
Comment 8 Yong Li 2010-01-06 09:31:12 PST
(In reply to comment #7)
> Attachment 45965 [details] did not pass style-queue:
> 
> Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
> Traceback (most recent call last):
>   File "WebKitTools/Scripts/check-webkit-style", line 98, in <module>
>     main()
>   File "WebKitTools/Scripts/check-webkit-style", line 62, in main
>     defaults = style.ArgumentDefaults(style.DEFAULT_OUTPUT_FORMAT,
> AttributeError: 'module' object has no attribute 'ArgumentDefaults'

??? confused
Comment 9 Yong Li 2010-01-06 09:45:18 PST
Created attachment 45966 [details]
Let ResourceLoader delivery segmented data

Let ResourceLoader delivery segmented data when loading data from an existing SharedBuffer object.
Comment 10 WebKit Review Bot 2010-01-06 09:46:44 PST
Attachment 45966 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
Traceback (most recent call last):
  File "WebKitTools/Scripts/check-webkit-style", line 98, in <module>
    main()
  File "WebKitTools/Scripts/check-webkit-style", line 62, in main
    defaults = style.ArgumentDefaults(style.DEFAULT_OUTPUT_FORMAT,
AttributeError: 'module' object has no attribute 'ArgumentDefaults'
Comment 11 Darin Adler 2010-01-06 09:47:08 PST
Comment on attachment 45965 [details]
Make TextResourceDecoder read from segmented SharedBuffer

> +String TextResourceDecoder::decode(const SharedBuffer& data)
> +{
> +    String result;
> +    const char* segment;
> +    unsigned offset = 0;
> +    while (unsigned length = data.getSomeData(segment, offset)) {
> +        result += decode(segment, length);
> +        offset += length;
> +    }
> +    return result;
> +}

It's great factoring to make use a function like this instead of spreading the code around.

I also note that all the clients who use this function also do the flushing, so in the future might want to make this helper function include the flush and have some name like "decodeAll" rather than just being cleanly parallel to the virtual decode function.

This is still a quite-inefficient algorithm for creating a string out of multiple decoded chunks; any loop that appends to a String is suboptimal because making a new String by appending two strings *always* involves allocating a fresh buffer each time and copying all the data; this inefficiency might make some real world cases pathologically worse if we find ourselves with multiple segments. It might actually be more efficient to do the dumb thing the old code was doing, and concatenate them all, just because the decoded data would be dealt with all at once even though the encoded data ends up being copied.

I think at some point we will want to change this. Even now I think we could do better with Vector<UChar>.

Maybe we want to get rid of the use of String in the interface to decoders eventually?

I am conflicted about the patch. It seems like a good idea in theory, but in practice it seems that this patch claiming to speed things up might always end up slowing things down instead. And might cause us to use more memory too.

> +    String decode(const SharedBuffer& data);

I don't think the argument name is needed for clarity here so normally we would leave it out.

I am going to say r=me but I do have serious reservations. I'd like to hear more concrete evidence about why this is a helpful change to make.
Comment 12 Darin Adler 2010-01-06 09:48:28 PST
Comment on attachment 45966 [details]
Let ResourceLoader delivery segmented data

> +        didReceiveData(segment, static_cast<int>(length), offset, false);

Why is this cast needed? I think it's not helpful with any compiler.

r=me
Comment 13 Yong Li 2010-01-06 10:04:16 PST
(In reply to comment #12)
> (From update of attachment 45966 [details])
> > +        didReceiveData(segment, static_cast<int>(length), offset, false);
> 
> Why is this cast needed? I think it's not helpful with any compiler.
> 
> r=me

MVSC can give warning for unsigned->int conversion. But seems they have turned this warning off by default. don't know if any other compiler will complain this.
Comment 14 Yong Li 2010-01-06 10:15:07 PST
(In reply to comment #11)
> 
> This is still a quite-inefficient algorithm for creating a string out of
> multiple decoded chunks; any loop that appends to a String is suboptimal
> because making a new String by appending two strings *always* involves
> allocating a fresh buffer each time and copying all the data; this inefficiency
> might make some real world cases pathologically worse if we find ourselves with
> multiple segments. It might actually be more efficient to do the dumb thing the
> old code was doing, and concatenate them all, just because the decoded data
> would be dealt with all at once even though the encoded data ends up being
> copied.
> 
> I think at some point we will want to change this. Even now I think we could do
> better with Vector<UChar>.
> 
> Maybe we want to get rid of the use of String in the interface to decoders
> eventually?
> 
> I am conflicted about the patch. It seems like a good idea in theory, but in
> practice it seems that this patch claiming to speed things up might always end
> up slowing things down instead. And might cause us to use more memory too.
> 
> > +    String decode(const SharedBuffer& data);
> 
> I don't think the argument name is needed for clarity here so normally we would
> leave it out.
> 
> I am going to say r=me but I do have serious reservations. I'd like to hear
> more concrete evidence about why this is a helpful change to make.

You're right, this patch is probably not helpful at all. I'll leave it here. I don't like the way that webkit uses String to store and parse scripts. SegmentedString is a good concept, but only html parser is using it.
Comment 15 Eric Seidel (no email) 2010-01-06 11:59:55 PST
bugzilla-tool had trouble reopening the bug, but this was rolled out in r52860.
Comment 16 Eric Seidel (no email) 2010-01-06 12:01:02 PST
Comment on attachment 45965 [details]
Make TextResourceDecoder read from segmented SharedBuffer

Clearing Darin's r+ since this has been landed.
Comment 17 Eric Seidel (no email) 2010-01-06 12:01:44 PST
Comment on attachment 45966 [details]
Let ResourceLoader delivery segmented data

Marking r- since this was rolled out in http://trac.webkit.org/changeset/52860.
Comment 18 Yong Li 2010-01-11 07:13:32 PST
My second patch (for ResourceLoader) is buggy for sure. The reason is didReceiveData() can cancel the request. We could fix it by checking if the job is cancelled in each loop. But I think it is probably better to let SharedBuffer merge all segments into one in this case. For the similar reason, the first patch is also not needed.
Comment 19 Yong Li 2010-03-16 18:40:37 PDT
never mind. I can load the webarchive with safari.

I guess there's a bug in SharedBuffer::createWithContentsOfFile()

Are you using http://trac.webkit.org/browser/trunk/WebCore/platform/win/SharedBufferWin.cpp?

If not, please note you have to set "m_size"