Bug 44876 - QWebElement::toOuterXml() and QWebElement::toInnerXml() return invalid XML
Summary: QWebElement::toOuterXml() and QWebElement::toInnerXml() return invalid XML
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Qt (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords: Qt
Depends on:
Blocks:
 
Reported: 2010-08-30 09:55 PDT by Bernhard Rosenkraenzer
Modified: 2010-11-04 02:23 PDT (History)
3 users (show)

See Also:


Attachments
Patch (2.18 KB, patch)
2010-10-31 11:51 PDT, Robert Hogan
no flags Details | Formatted Diff | Diff
Patch (1.88 KB, patch)
2010-11-03 13:53 PDT, Robert Hogan
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Rosenkraenzer 2010-08-30 09:55:04 PDT
QWebElement::toOuterXml() and QWebElement::toInnerXml() return HTML code that isn't valid XML when they hit certain tags, such as <meta>.

This snippet:

#include <QApplication>
#include <QWebView>
#include <QWebFrame>
#include <QWebElement>
#include <iostream>

int main(int argc, char **argv) {
        QApplication app(argc, argv);
        QWebView *v=new QWebView(0);
        v->setHtml("<html><head><meta name=\"author\" content=\"test1\"/></head><body><p>Test</p></body></html>");
        std::cerr << qPrintable(v->page()->mainFrame()->documentElement().toOuterXml()) << std::endl;
}


Prints:
<html><head><meta name="author" content="test1"></head><body><p>Test</p></body></html>


Note the absence of the closing / (or alternatively, </meta>) in the <meta tag.
This is valid HTML, but not valid X(HT)ML -- not what you'd expect from a function named to*Xml.
Comment 1 Enrico Ros 2010-09-01 20:17:34 PDT
Chromium 7 has the same behavior using "document.documentElement.outerHTML" from the JS console. The output is:
<html><head><meta name="author" content="test1"></head><body><p>Test</p>
</body></html>

So probably Bernhard is right on the Qt function naming.

In fact QWebElement::toOuterXml() returns (HTMLElement*)->outerHTML().
Comment 2 Robert Hogan 2010-10-31 10:25:12 PDT
(In reply to comment #0)
> QWebElement::toOuterXml() and QWebElement::toInnerXml() return HTML code that isn't valid XML when they hit certain tags, such as <meta>.
> 
> This snippet:
> 
> #include <QApplication>
> #include <QWebView>
> #include <QWebFrame>
> #include <QWebElement>
> #include <iostream>
> 
> int main(int argc, char **argv) {
>         QApplication app(argc, argv);
>         QWebView *v=new QWebView(0);
>         v->setHtml("<html><head><meta name=\"author\" content=\"test1\"/></head><body><p>Test</p></body></html>");
>         std::cerr << qPrintable(v->page()->mainFrame()->documentElement().toOuterXml()) << std::endl;
> }
> 
> 
> Prints:
> <html><head><meta name="author" content="test1"></head><body><p>Test</p></body></html>
> 
> 

There is a lack of clarity in the documentation, but the behaviour of outerXml() given the method you've used to set the content is sane. To get the result you want you need to do:

v->setContent("<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><meta name=\"author\" content=\"test1\"/></head><body><p>Test</p></body></html>", "application/xhtml+xml");

toOuterXml() will then give you the result you expect.

Note the warning about setHTML() in the documentation:

\warning This function works only for HTML, for other mime types (i.e. XHTML, SVG) setContent() should be used instead.


This should probably be given in outerXml() and innerXml() too.
Comment 3 Robert Hogan 2010-10-31 11:51:36 PDT
Created attachment 72468 [details]
Patch
Comment 4 Andreas Kling 2010-10-31 12:03:37 PDT
Comment on attachment 72468 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=72468&action=review

r=me, just two things:

> WebKit/qt/Api/qwebelement.cpp:319
> +    text/xhtml+xml.

text/xhtml+xml should be in apostrophes like below.

> WebKit/qt/ChangeLog:23
> +        Reviewed by NOBODY (OOPS!).

Something's up with this ChangeLog entry :)
Comment 5 Robert Hogan 2010-11-03 13:53:50 PDT
Created attachment 72866 [details]
Patch
Comment 6 WebKit Commit Bot 2010-11-04 02:23:00 PDT
Comment on attachment 72866 [details]
Patch

Clearing flags on attachment: 72866

Committed r71315: <http://trac.webkit.org/changeset/71315>
Comment 7 WebKit Commit Bot 2010-11-04 02:23:05 PDT
All reviewed patches have been landed.  Closing bug.