Bug 52085 - Add Document::setContent()
Summary: Add Document::setContent()
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Patrick R. Gansterer
URL:
Keywords:
Depends on:
Blocks: 52036
  Show dependency treegraph
 
Reported: 2011-01-07 14:56 PST by Patrick R. Gansterer
Modified: 2011-01-11 17:26 PST (History)
11 users (show)

See Also:


Attachments
Patch (7.50 KB, patch)
2011-01-07 15:10 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (7.18 KB, patch)
2011-01-08 17:42 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (7.13 KB, patch)
2011-01-11 09:19 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (7.09 KB, patch)
2011-01-11 10:37 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (7.14 KB, patch)
2011-01-11 11:54 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (7.14 KB, patch)
2011-01-11 12:19 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff
Patch (7.06 KB, patch)
2011-01-11 13:55 PST, Patrick R. Gansterer
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick R. Gansterer 2011-01-07 14:56:26 PST
Add Document::setContent()
Comment 1 Patrick R. Gansterer 2011-01-07 15:10:15 PST
Created attachment 78280 [details]
Patch
Comment 2 Darin Adler 2011-01-08 17:01:57 PST
Comment on attachment 78280 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=78280&action=review

> WebCore/dom/Document.cpp:1170
> +bool Document::setContent(const Vector<char>& content)

Is it really better to have something that takes a vector rather than a const char*, size_t pair?
Comment 3 Patrick R. Gansterer 2011-01-08 17:42:51 PST
Created attachment 78336 [details]
Patch

(In reply to comment #2)
> (From update of attachment 78280 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=78280&action=review
> 
> > WebCore/dom/Document.cpp:1170
> > +bool Document::setContent(const Vector<char>& content)
> 
> Is it really better to have something that takes a vector rather than a const char*, size_t pair?

No, Vector as argument is a clear mistake.
I think you meant const char*, int pair? (see http://trac.webkit.org/browser/trunk/Source/WebCore/dom/DocumentParser.h?rev=75314#L49)
Comment 4 Eric Seidel (no email) 2011-01-10 11:44:05 PST
Comment on attachment 78336 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=78336&action=review

I think this needs one more round.  I like where this is going though!

> Source/WebCore/dom/Document.cpp:1179
> +    return wellFormed();

Is wellFormed what we'd want for non-XML content?  Is this an XML-only method?

> Source/WebCore/dom/Document.cpp:1186
> +    // FIXME: We should pass content directly to the parser insted of decoding it here.
> +    // At the moment only SVGFonts use this method, so the xml mime type is ok for now.
> +    setDecoder(TextResourceDecoder::create("application/xml"));

Why should we pass it directly to the parser?

It seems it would be better to put this decoding code in the SVGImage code for now, unless you're about to post a follow-up patch to not decode?

> Source/WebCore/dom/Document.h:353
> +    bool setContent(const char*, int);

Don't you want size_t here?
Comment 5 Adam Barth 2011-01-10 15:51:20 PST
Comment on attachment 78336 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=78336&action=review

>> Source/WebCore/dom/Document.cpp:1186
>> +    // FIXME: We should pass content directly to the parser insted of decoding it here.
>> +    // At the moment only SVGFonts use this method, so the xml mime type is ok for now.
>> +    setDecoder(TextResourceDecoder::create("application/xml"));
> 
> Why should we pass it directly to the parser?
> 
> It seems it would be better to put this decoding code in the SVGImage code for now, unless you're about to post a follow-up patch to not decode?

It's confusing that this is only for XML.  I'd expect this method to work for both HTML and XML.  Also, have you looked at how the output of JavaScript URLs gets put into a document?  I think it might benefit from using this API.  You can follow the trail from executeIfJavaScriptURL.
Comment 6 Adam Barth 2011-01-10 15:51:57 PST
I wish I could upvote this patch as being awesome.
Comment 7 Patrick R. Gansterer 2011-01-11 09:19:27 PST
Created attachment 78542 [details]
Patch
Comment 8 Patrick R. Gansterer 2011-01-11 09:31:15 PST
Comment on attachment 78336 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=78336&action=review

>> Source/WebCore/dom/Document.cpp:1179
>> +    return wellFormed();
> 
> Is wellFormed what we'd want for non-XML content?  Is this an XML-only method?

IMHO wellFomed() is what we want. Do you know a better alternative? At the moment it's used for XML-only, but it feeds the parser, so it can be used for other content too.

>>> Source/WebCore/dom/Document.cpp:1186
>>> +    setDecoder(TextResourceDecoder::create("application/xml"));
>> 
>> Why should we pass it directly to the parser?
>> 
>> It seems it would be better to put this decoding code in the SVGImage code for now, unless you're about to post a follow-up patch to not decode?
> 
> It's confusing that this is only for XML.  I'd expect this method to work for both HTML and XML.  Also, have you looked at how the output of JavaScript URLs gets put into a document?  I think it might benefit from using this API.  You can follow the trail from executeIfJavaScriptURL.

see 52036 (Feed libxml2 with raw data, relying on it to do character set decoding): This will avoid UTF-8->UTF-16->UTF-8 convertion.
I only removed the code paths, which triggered the new assertion. I'll look for methos which might use this when I can set raw data on document. Step by step. :-)

>> Source/WebCore/dom/Document.h:353
>> +    bool setContent(const char*, int);
> 
> Don't you want size_t here?

see http://trac.webkit.org/browser/trunk/Source/WebCore/dom/DocumentParser.h?rev=75314#L49 (from https://bugs.webkit.org/show_bug.cgi?id=52085#c3 ;-))
Comment 9 Patrick R. Gansterer 2011-01-11 10:37:52 PST
Created attachment 78547 [details]
Patch
Comment 10 Eric Seidel (no email) 2011-01-11 11:45:48 PST
Comment on attachment 78547 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=78547&action=review

> Source/WebCore/dom/Document.cpp:1179
> +    return wellFormed();

I feel like well-formedness has little to do with if the setContent succeeded or not, which is what the bool return from setContent feels like it should be.

It seems the callers should check wellFormed separately.

> Source/WebCore/loader/cache/CachedFont.cpp:144
> +
> +        if (!m_externalSVGDocument->setContent(decoder->decode(m_data->data(), m_data->size()) + decoder->flush()))
> +            m_externalSVGDocument = 0;

This is wrong.

You got rid of the decoder error detection.  If we don't have a test for that, we should.
Comment 11 Eric Seidel (no email) 2011-01-11 11:46:06 PST
I think the wellFormed() check should be split out and setContent should just return void.
Comment 12 Patrick R. Gansterer 2011-01-11 11:54:19 PST
Created attachment 78569 [details]
Patch
Comment 13 WebKit Review Bot 2011-01-11 12:14:29 PST
Attachment 78569 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7409120
Comment 14 Patrick R. Gansterer 2011-01-11 12:19:25 PST
Created attachment 78576 [details]
Patch
Comment 15 Eric Seidel (no email) 2011-01-11 12:19:48 PST
Comment on attachment 78569 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=78569&action=review

In general this looks fine.  Looks like it breaks cr-linux though?

> Source/WebCore/loader/cache/CachedFont.cpp:-150
> -        m_externalSVGDocument->finishParsing();

It's unclear to me how this change from document->finishParsing to m_parser->finish() is going to behave.
Comment 16 Build Bot 2011-01-11 12:23:28 PST
Attachment 78569 [details] did not build on win:
Build output: http://queues.webkit.org/results/7455094
Comment 17 WebKit Review Bot 2011-01-11 12:25:04 PST
Attachment 78569 [details] did not build on gtk:
Build output: http://queues.webkit.org/results/7374143
Comment 18 Patrick R. Gansterer 2011-01-11 12:25:11 PST
(In reply to comment #15)
> (From update of attachment 78569 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=78569&action=review
> 
> In general this looks fine.  Looks like it breaks cr-linux though?

missed the bool -> void change in patch :-/

> > Source/WebCore/loader/cache/CachedFont.cpp:-150
> > -        m_externalSVGDocument->finishParsing();
> 
> It's unclear to me how this change from document->finishParsing to m_parser->finish() is going to behave.

document->finishParsing does "if (m_parser) m_parser->finish();". I think it's ok, because m_parser can't be null ;-)
Comment 19 WebKit Review Bot 2011-01-11 13:05:01 PST
Attachment 78576 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7434118
Comment 20 Early Warning System Bot 2011-01-11 13:06:49 PST
Attachment 78576 [details] did not build on qt:
Build output: http://queues.webkit.org/results/7402134
Comment 21 WebKit Review Bot 2011-01-11 13:52:02 PST
Attachment 78576 [details] did not build on gtk:
Build output: http://queues.webkit.org/results/7400126
Comment 22 Patrick R. Gansterer 2011-01-11 13:55:03 PST
Created attachment 78593 [details]
Patch
Comment 23 Eric Seidel (no email) 2011-01-11 13:58:32 PST
Comment on attachment 78593 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=78593&action=review

LGTM.

> Source/WebCore/loader/cache/CachedFont.cpp:143
> +        m_externalSVGDocument->setContent(decoder->decode(m_data->data(), m_data->size()) + decoder->flush());

It's unfortunate that this is much less memory efficient now.  We end up doing two (possibly large) allocations for the main data decoding instead of one.
Comment 24 Patrick R. Gansterer 2011-01-11 14:02:11 PST
Comment on attachment 78593 [details]
Patch

(In reply to comment #23)
> (From update of attachment 78593 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=78593&action=review
> 
> LGTM.

Sorry for so much spam.

> > Source/WebCore/loader/cache/CachedFont.cpp:143
> > +        m_externalSVGDocument->setContent(decoder->decode(m_data->data(), m_data->size()) + decoder->flush());
> 
> It's unfortunate that this is much less memory efficient now.  We end up doing two (possibly large) allocations for the main data decoding instead of one.

IMHO flush() won't return anything in 99.9%. Usually decode returns all of the decoded data, so we pass the string directly to setContent. flush() will only return a non-empty string when we have "invalid" input enconding.
Comment 25 Build Bot 2011-01-11 14:06:04 PST
Attachment 78576 [details] did not build on win:
Build output: http://queues.webkit.org/results/7465111
Comment 26 WebKit Commit Bot 2011-01-11 17:26:19 PST
Comment on attachment 78593 [details]
Patch

Clearing flags on attachment: 78593

Committed r75577: <http://trac.webkit.org/changeset/75577>
Comment 27 WebKit Commit Bot 2011-01-11 17:26:28 PST
All reviewed patches have been landed.  Closing bug.