Bug 74626

Summary: Support HTML documents in XHR.responseXML
Product: WebKit Reporter: Jarred Nicholls <jarred>
Component: WebCore Misc.Assignee: Jarred Nicholls <jarred>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, darin, dglazkov, oliver, sam, sng, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 54162    
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Jarred Nicholls 2011-12-15 10:48:11 PST
Bring XHR.responseXML behavior up to the latest W3C spec
Comment 2 Jarred Nicholls 2011-12-15 11:07:53 PST
Created attachment 119467 [details]
Patch
Comment 3 WebKit Review Bot 2011-12-15 12:10:39 PST
Comment on attachment 119467 [details]
Patch

Attachment 119467 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/10910403

New failing tests:
fast/frames/iframe-reparenting.html
Comment 4 Alexey Proskuryakov 2011-12-15 12:21:32 PST
Comment on attachment 119467 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119467&action=review

Looks great. I think that tests can be made much clearer, so you may want to post an updated patch for another quick review round despite already having an r+.

Please add a test verifying that scripts don't run in HTML documents loaded with XHR. We have one for XHTML, and we should have one for HTML.

Please also add a test checking for strict/quirks modes - it's not immediately obvious whether setContent gets it right.

> Source/WebCore/ChangeLog:8
> +        Latest XHR spec from the W3C:
> +        http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#the-responsexml-attribute
> +        http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#document-response-entity-body

Please mention what changed. A link to the spec provides extremely little information, especially since editor's draft text will change in the future.

Added support for HTML is something that should be in bug title, in fact. This is the big change.

> Source/WebCore/xml/XMLHttpRequest.cpp:227
> +        bool isHTML = responseMIMEType() == "text/html";

This appears to match existing code for XML types, but I don't see where the code that changes MIME type to lower case is. MIME types are not case sensitive.

Would you be willing to investigate this (not in this bug)?

> Source/WebCore/xml/XMLHttpRequest.cpp:237
> +                m_responseXML = static_pointer_cast<Document, HTMLDocument>(HTMLDocument::create(0, m_url));

Why is this cast needed? PassRefPtr<U> can be assigned to RefPtr<T> when U inherits from T.

Can we remove m_responseXML to m_responseDocument now?

> LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML.html:60
> +        var tests = [{

These tests are not checking variations of the same behavior, they are completely unrelated.

I strongly encourage you to split these into separate tests with their own expectations. This test is largely incomprehensible due to lots of things combined together, and failures in it will be very difficult to investigate when they happen.
Comment 5 Jarred Nicholls 2011-12-15 13:29:52 PST
Thanks very much for the review!

(In reply to comment #4)
> (From update of attachment 119467 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=119467&action=review
> 
> Looks great. I think that tests can be made much clearer, so you may want to post an updated patch for another quick review round despite already having an r+.
> 
> Please add a test verifying that scripts don't run in HTML documents loaded with XHR. We have one for XHTML, and we should have one for HTML.

Good call.

> 
> Please also add a test checking for strict/quirks modes - it's not immediately obvious whether setContent gets it right.

Ok sure that's simple.  I'm not sure of how valuable of a test it is, but there's no harm in it I suppose.

> 
> > Source/WebCore/ChangeLog:8
> > +        Latest XHR spec from the W3C:
> > +        http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#the-responsexml-attribute
> > +        http://dvcs.w3.org/hg/xhr/raw-file/tip/Overview.html#document-response-entity-body
> 
> Please mention what changed. A link to the spec provides extremely little information, especially since editor's draft text will change in the future.
> 
> Added support for HTML is something that should be in bug title, in fact. This is the big change.

Totally, I'm spazzing today and meant to be verbose in the WebCore ChangeLog but not in the LayoutTests, and I ended up forgetting to fill in WebCore.

> 
> > Source/WebCore/xml/XMLHttpRequest.cpp:227
> > +        bool isHTML = responseMIMEType() == "text/html";
> 
> This appears to match existing code for XML types, but I don't see where the code that changes MIME type to lower case is. MIME types are not case sensitive.
> 
> Would you be willing to investigate this (not in this bug)?

I just did and in fact I should use equalIgnoringCase() here, particularly because someone can override the MIME type and may use mixed case.  Thanks!

> 
> > Source/WebCore/xml/XMLHttpRequest.cpp:237
> > +                m_responseXML = static_pointer_cast<Document, HTMLDocument>(HTMLDocument::create(0, m_url));
> 
> Why is this cast needed? PassRefPtr<U> can be assigned to RefPtr<T> when U inherits from T.

I had a ternary (? :) statement and needed the types to be the same, but didn't change it back when I turned this into an if statement.  Thanks.

> 
> Can we remove m_responseXML to m_responseDocument now?

Yep!

> 
> > LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML.html:60
> > +        var tests = [{
> 
> These tests are not checking variations of the same behavior, they are completely unrelated.
> 
> I strongly encourage you to split these into separate tests with their own expectations. This test is largely incomprehensible due to lots of things combined together, and failures in it will be very difficult to investigate when they happen.

Yeah I will split this test into multiple tests - the idea was to treat it as a unit test "suite" for responseXML, and unit test suites usually cover a complete function.  Nevertheless I can split them out.

I'll throw out a new patch soon, as soon as I figure out why trunk builds are broken w/ CSS Filters enabled :)
Comment 6 Alexey Proskuryakov 2011-12-15 13:32:38 PST
> I just did and in fact I should use equalIgnoringCase() here, particularly because someone can override the MIME type and may use mixed case.

Thanks. Eventually, we'll need to take care of XML MIME types, too.
Comment 7 Alexey Proskuryakov 2011-12-15 13:34:44 PST
By the way, this is largely a duplicate of bug 40982. It's usually best to fix the oldest duplicate.
Comment 8 Jarred Nicholls 2011-12-15 13:43:35 PST
(In reply to comment #7)
> By the way, this is largely a duplicate of bug 40982. It's usually best to fix the oldest duplicate.

Ah, thanks for pointing that out.  This patch does resolve that bug as well as the other compliancy issues ("text" responseType and error flag handling).  I can mark that one as being blocked by this so it's reporter is notified, etc.
Comment 9 Jarred Nicholls 2011-12-15 20:23:48 PST
*** Bug 40982 has been marked as a duplicate of this bug. ***
Comment 10 Jarred Nicholls 2011-12-16 09:11:32 PST
Created attachment 119618 [details]
Patch
Comment 11 Alexey Proskuryakov 2011-12-16 11:52:48 PST
Comment on attachment 119618 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119618&action=review

Thank you! Separating the tests made them much easier to follow indeed.

Please do follow up with a bug on XML MIME type case sensitivity. Even for HTML, we still have a case sensitive comparison in XMLHttpRequest::didReceiveData!

I have s few comments, some of which need to be addressed. It doesn't appear necessary to have another review round, but you are of course welcome to post the patch for review again if you see a need for it.

> Source/WebCore/ChangeLog:3
> +        Bring XHR.responseXML behavior up to the latest W3C spec (error handling, HTML doc support, responseType compliance)

I still think that the correct title would be "Support HTML documents in XHR.responseXML". It does make sense to structure your effort by API, but for people following WebKit development, HTML support is an order of magnitude more interesting than the other changes. In fact, if the other changes were anything major, I would have suggested moving those into separate patches.

> Source/WebCore/xml/XMLHttpRequest.cpp:227
> +        bool isHTML = equalIgnoringCase(responseMIMEType(), "text/html");

It would seem more robust to perform lowercasing earlier (either store a normalized content type, or have responseMIMEType() perform it). Not something that would be appropriate to do in this patch.

> LayoutTests/fast/frames/iframe-reparenting.html:67
>      xhr.open("GET","data:text/html,<html xmlns='http://www.w3.org/1999/xhtml'><body id='body'>Hello</body></html>", false);
> +    xhr.responseType = 'document';

I would have changed data URL MIME type instead, so that the test would take the same code path as before. It used to perform XML parsing.

> LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-after-error.html:6
> +        description('This tests the XMLHttpRequest responseXML attribute accessibility after a network error.');

Maybe s/accessibility/availability/ ?

> LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-after-error.html:18
> +            shouldBeNull('xhr.responseXML');

How did this test fail before your changes? There is no data, so it seems that parsing would have failed anyway, producing a null document.

> LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-html-no-responsetype.html:41
> +    <!-- Leave these break tags malformed/open to test HTML parsing.
> +         They should be siblings to div#description, and not nested within one another.
> +         The XMLDocument parser would treat the second <br> as a child to the first. -->
> +    <br><br>
> +
> +    <div id="console"></div>
> +
> +    <script>
> +        // This code will manipulate the first BR node by adding an "id" to it.
> +        // This same BR will be inspected after XHR loads the document to see
> +        // if this script executed or not.  If it didn't execute, the first BR
> +        // will not have an "id" specified and the test passes.
> +        var br = document.querySelector('div#description + br');
> +        br.id = 'break-tag';
> +    </script>

Please remove this part, it's not used in this test.

> LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-xml-no-responsetype.html:6
> +        description('This tests the XMLHttpRequest responseXML loading an XML document with no specified responseType.');

Is this test needed? We have lots of tests for responseXML that were written before responseType existed.

> LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-xml-text-responsetype.html:13
> +            shouldThrow('xhr.responseXML', '"Error: INVALID_STATE_ERR: DOM Exception 11"');

Whenever possible, I prefer tests that can pass in other browsers. Hardcoding error string will unnecessarily limit PASS to WebKit, or maybe even to WebKit with JSC (or V8) bindings.
Comment 12 Jarred Nicholls 2011-12-16 12:13:15 PST
(In reply to comment #11)
> (From update of attachment 119618 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=119618&action=review
> 
> Thank you! Separating the tests made them much easier to follow indeed.
> 
> Please do follow up with a bug on XML MIME type case sensitivity. Even for HTML, we still have a case sensitive comparison in XMLHttpRequest::didReceiveData!

Yep, I'm going to do a separate bug for this :)  Already have down where I'm making changes.

> 
> I have s few comments, some of which need to be addressed. It doesn't appear necessary to have another review round, but you are of course welcome to post the patch for review again if you see a need for it.
> 
> > Source/WebCore/ChangeLog:3
> > +        Bring XHR.responseXML behavior up to the latest W3C spec (error handling, HTML doc support, responseType compliance)
> 
> I still think that the correct title would be "Support HTML documents in XHR.responseXML". It does make sense to structure your effort by API, but for people following WebKit development, HTML support is an order of magnitude more interesting than the other changes. In fact, if the other changes were anything major, I would have suggested moving those into separate patches.

Okay this is fine.  You don't mind that the other compliance corrections are a part of this patch?

> 
> > Source/WebCore/xml/XMLHttpRequest.cpp:227
> > +        bool isHTML = equalIgnoringCase(responseMIMEType(), "text/html");
> 
> It would seem more robust to perform lowercasing earlier (either store a normalized content type, or have responseMIMEType() perform it). Not something that would be appropriate to do in this patch.

I can do a separate patch to cache a lowercased MIME type string in responseMIMEType(), which is reset along with all the other buffers in clearBuffers()..or wherever it makes sense (we can have that conversation on that patch).

> 
> > LayoutTests/fast/frames/iframe-reparenting.html:67
> >      xhr.open("GET","data:text/html,<html xmlns='http://www.w3.org/1999/xhtml'><body id='body'>Hello</body></html>", false);
> > +    xhr.responseType = 'document';
> 
> I would have changed data URL MIME type instead, so that the test would take the same code path as before. It used to perform XML parsing.

Will change.

> 
> > LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-after-error.html:6
> > +        description('This tests the XMLHttpRequest responseXML attribute accessibility after a network error.');
> 
> Maybe s/accessibility/availability/ ?
> 
> > LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-after-error.html:18
> > +            shouldBeNull('xhr.responseXML');
> 
> How did this test fail before your changes? There is no data, so it seems that parsing would have failed anyway, producing a null document.

Yes this is true, the output is the same and I've already considered dropping this test since it's not capable of proving the path of the code in a black-box manner.  You're ok with me dropping it, or do you have any ideas on how to test the error flag check is occurring?  The current test assumes knowledge of the code: 1) an error has occurred because of a forced network error, and 2) the readyState is DONE - so with the knowledge of the code itself, you can deduct we are exiting the function early and not attempting to create a Document.  But this is not an acceptable black-box test as-is.

> 
> > LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-html-no-responsetype.html:41
> > +    <!-- Leave these break tags malformed/open to test HTML parsing.
> > +         They should be siblings to div#description, and not nested within one another.
> > +         The XMLDocument parser would treat the second <br> as a child to the first. -->
> > +    <br><br>
> > +
> > +    <div id="console"></div>
> > +
> > +    <script>
> > +        // This code will manipulate the first BR node by adding an "id" to it.
> > +        // This same BR will be inspected after XHR loads the document to see
> > +        // if this script executed or not.  If it didn't execute, the first BR
> > +        // will not have an "id" specified and the test passes.
> > +        var br = document.querySelector('div#description + br');
> > +        br.id = 'break-tag';
> > +    </script>
> 
> Please remove this part, it's not used in this test.

Yep.

> 
> > LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-xml-no-responsetype.html:6
> > +        description('This tests the XMLHttpRequest responseXML loading an XML document with no specified responseType.');
> 
> Is this test needed? We have lots of tests for responseXML that were written before responseType existed.

True, I can remove it.

> 
> > LayoutTests/fast/xmlhttprequest/xmlhttprequest-responseXML-xml-text-responsetype.html:13
> > +            shouldThrow('xhr.responseXML', '"Error: INVALID_STATE_ERR: DOM Exception 11"');
> 
> Whenever possible, I prefer tests that can pass in other browsers. Hardcoding error string will unnecessarily limit PASS to WebKit, or maybe even to WebKit with JSC (or V8) bindings.

That's true, this would only pass in WebKit browers (this does happen to work in V8 and JSC, but I got lucky).  I'll leave out the string and just call shouldThrow('xhr.responseXML');
Comment 13 Alexey Proskuryakov 2011-12-16 13:08:33 PST
> You're ok with me dropping it, or do you have any ideas on how to test the error flag check is occurring?

I'm a little confused here. For "resources/xmlhttprequest-get-data.xml/i/dont/exist", you're getting an onerror callback. Why? Can you use a PHP script to have the same HTTP status code, but with an XML response body?
Comment 14 Jarred Nicholls 2011-12-16 13:10:56 PST
(In reply to comment #13)
> > You're ok with me dropping it, or do you have any ideas on how to test the error flag check is occurring?
> 
> I'm a little confused here. For "resources/xmlhttprequest-get-data.xml/i/dont/exist", you're getting an onerror callback. Why? Can you use a PHP script to have the same HTTP status code, but with an XML response body?

Easy cheesy :)
Comment 15 Jarred Nicholls 2011-12-16 14:03:29 PST
(In reply to comment #13)
> > You're ok with me dropping it, or do you have any ideas on how to test the error flag check is occurring?
> 
> I'm a little confused here. For "resources/xmlhttprequest-get-data.xml/i/dont/exist", you're getting an onerror callback. Why? Can you use a PHP script to have the same HTTP status code, but with an XML response body?

Ok I misread this.  So the idea here was to trip the m_error flag to true, which only occurs on a network error or when abort() is called (we don't have timeout support it seems, which I will be adding no doubt!).  A network error is unrelated to an HTTP error (rightfully so), so unfortunately the idea of returning an HTTP error status code w/ XML data will not work: the error flag will not be triggered and the "load" event is fired.

I'm going to remove the aforementioned test, and will do response availability tests when adding in timeout support (because then this is easily testable w/ timeout support).
Comment 16 Jarred Nicholls 2011-12-16 14:11:33 PST
Created attachment 119667 [details]
Patch
Comment 17 Jarred Nicholls 2011-12-16 14:17:06 PST
Committed r103106: <http://trac.webkit.org/changeset/103106>