43683 – Implement iframe.srcdoc

RESOLVED DUPLICATE of bug 8299143683

Implement iframe.srcdoc

https://bugs.webkit.org/show_bug.cgi?id=43683

Summary Implement iframe.srcdoc

Justin Schuh

Reported 2010-08-07 16:02:40 PDT

Implement iframe.srcdoc

Attachments
Patch (11.69 KB, patch) 2010-08-07 16:04 PDT, Justin Schuh	no flags	Details Formatted Diff Diff
Patch (11.87 KB, patch) 2010-08-08 20:08 PDT, Justin Schuh	no flags	Details Formatted Diff Diff
Patch (9.33 KB, patch) 2010-08-15 22:51 PDT, Justin Schuh	no flags	Details Formatted Diff Diff
Patch (24.77 KB, patch) 2010-08-23 08:13 PDT, Justin Schuh	no flags	Details Formatted Diff Diff
Patch (30.57 KB, patch) 2010-08-29 22:23 PDT, Justin Schuh	no flags	Details Formatted Diff Diff
Patch (43.04 KB, patch) 2010-09-11 17:46 PDT, Justin Schuh	no flags	Details Formatted Diff Diff
Patch (42.99 KB, patch) 2010-09-11 20:24 PDT, Justin Schuh	abarth: review-	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Justin Schuh

Comment 1 2010-08-07 16:04:35 PDT

Created attachment 63833 [details] Patch This patch is just for discussion right now. I still have to figure out the double onload events, XSS auditor hooks, history, write several more tests, etc.

Justin Schuh

Comment 2 2010-08-07 16:12:34 PDT

Adam, the test you asked about was: LayoutTests/fast/frames/iframe-srcdoc-frame-onload.html

Adam Barth

Comment 3 2010-08-07 16:19:01 PDT

*** Bug 42507 has been marked as a duplicate of this bug. ***

Justin Schuh

Comment 4 2010-08-08 20:08:46 PDT

Created attachment 63856 [details] Patch Cleaned up some obvious mistakes, but still trying to work out exactly when to load the content; the current spot is definitely not right. Once again, just posting this for discussion purposes right now.

Justin Schuh

Comment 5 2010-08-11 10:38:18 PDT

The only way I see to get events and navigation right is to inject this into the network stack, just like data: URL does. I've pinged the whatwg list on why about:srcdoc was chosen as the location address, because it seems better to use the data URL and should make implementation simpler.

Justin Schuh

Comment 6 2010-08-15 22:51:19 PDT

Created attachment 64467 [details] Patch Adam, can you let me know if this is a reasonable strategy? I hooked @srcdoc into FrameLoader via SubstituteData, so it loads asynch and all the events and navigation work out correctly.

Adam Barth

Comment 7 2010-08-15 23:29:30 PDT

Comment on attachment 64467 [details] Patch WebCore/html/HTMLFrameOwnerElement.cpp:102 + return SubstituteData(data.release(), document()->isXHTMLDocument() ? "text/xml" : "text/html", "UTF-8", KURL(), srcdocURL()); text/xml, really? I thought srcdoc was always text/html. WebCore/html/HTMLFrameOwnerElement.cpp:100 + const CString srcdocUtf8 = srcdoc.string().utf8(); Lame that we need to convert to UTF8. Can't we just use UTF16 the whole time? WebCore/html/HTMLIFrameElement.cpp:133 + setLocation(getAttribute(srcAttr), false); Isn't false implied? WebCore/html/HTMLIFrameElement.cpp:135 + setLocation(srcdocURL(), true); Should we use an enum instead of true/false? Maybe we just need a new method setLocationToSrcdoc?

Adam Barth

Comment 8 2010-08-15 23:29:44 PDT

Yeah, that's a nice approach.

Justin Schuh

Comment 9 2010-08-16 10:10:07 PDT

Thanks for the detailed feedback; I didn't think I'd get more than a "yes" or "no" on the approach at this stage. (In reply to comment #7) > (From update of attachment 64467 [details]) > WebCore/html/HTMLFrameOwnerElement.cpp:102 > + return SubstituteData(data.release(), document()->isXHTMLDocument() ? "text/xml" : "text/html", "UTF-8", KURL(), srcdocURL()); > text/xml, really? I thought srcdoc was always text/html. I think I misread the standard here. It looks like you are correct and it should always be text/html. Hooray for easier! > WebCore/html/HTMLFrameOwnerElement.cpp:100 > + const CString srcdocUtf8 = srcdoc.string().utf8(); > Lame that we need to convert to UTF8. Can't we just use UTF16 the whole time? We can use UTF16 (as long as I account for byte-ordering). However, I'm not sure it matters much because we'll be copying the string regardless. Maybe I should just make SubstituteData smart enough to avoid the copying and conversion for strings? > WebCore/html/HTMLIFrameElement.cpp:133 > + setLocation(getAttribute(srcAttr), false); > Isn't false implied? > > WebCore/html/HTMLIFrameElement.cpp:135 > + setLocation(srcdocURL(), true); > Should we use an enum instead of true/false? Maybe we just need a new method setLocationToSrcdoc? Yeah, setLocationToSrcdoc sounds like a much better idea than the boolean argument.

Anne van Kesteren

Comment 10 2010-08-16 13:56:00 PDT

srcdoc is either HTML or XML depending on whether the document the srcdoc <iframe> is in is HTML or XML. You were correct. The specification states quite clearly: "For iframe elements in HTML documents, the attribute, ..." "For iframe elements in XML documents, the attribute, ..." (I'm not sure whether this is ideal, but that is what it says. And is similar to e.g. the "semantics" of innerHTML.)

Justin Schuh

Comment 11 2010-08-23 08:13:49 PDT

Created attachment 65121 [details] Patch I think the implentation is done, but I'm still trying to figure out more tests. I've added implicit sandboxing in this version. Tab is raising the question with WHATWG, but we can't think of any use case for @srcdoc that doesn't include the sandbox.

Justin Schuh

Comment 12 2010-08-29 22:23:40 PDT

Created attachment 65877 [details] Patch

Darin Adler

Comment 13 2010-08-29 23:45:00 PDT

Comment on attachment 65877 [details] Patch > + if (attr->name() == srcAttr) { > + if (!hasTagName(iframeTag) || getAttribute(srcdocAttr).isNull()) > + setLocation(deprecatedParseURL(attr->value())); It's bad layering to have iframe-specific code in HTMLFrameElementBase and then check hasTagName(iframeTag). Instead, iframe-specific code should go in HTMLIFrameElement. If we need a hook for iframe, we can add a virtual function, but it should ask a question, not "is this an iframe". To check if an attribute exists, we should use hasAttribute, not getAttribute().isNull(). It's more efficient. But also, is null the only value that matters? I'd like to see test cases covering values like empty string or all whitespace for the srcdoc attribute. Code like this that works with one attribute looking at the value of the other usually is a problem because there's no guarantee the src attribute will be parsed before the srcdoc attribute. I need to see how this will work in either order, and we'd want to make sure we had test cases that covered both. > KURL HTMLFrameElementBase::location() const > { > + if (hasTagName(iframeTag) && !getAttribute(srcdocAttr).isEmpty()) > + return srcdocURL(); > return document()->completeURL(getAttribute(srcAttr)); > } Again, not good to have iframe-specific code in HTMLFrameElementBase. Should be factored differently. Same comment about getAttribute(srcdocAttr) too. > + if (str == srcdocURL().string()) > + return; Is it OK that this is a case-sensitive comparison? I would like to see test cases with "ABOUT:SRCDOC" too. > +void HTMLFrameElementBase::setLocationToSrcdoc() > +{ > + m_URL = srcdocURL().string(); > + > + if (inDocument()) > + openURL(false, false); > +} It's bad for long term maintenance to copy and paste the last part of setLocation into this separate function. Even though it's just a single function call, I'd like to factor things so the code is shared. > +SubstituteData HTMLFrameOwnerElement::srcdocSubstituteData(const KURL& url) const > +{ > + if (url == srcdocURL() && hasTagName(HTMLNames::iframeTag)) { > + const String srcdoc = getAttribute(HTMLNames::srcdocAttr); > + if (!srcdoc.isNull()) > + return SubstituteData(srcdoc, document()->isXHTMLDocument() ? "text/xml" : "text/html", KURL(), url); > + } > + return SubstituteData(); > +} It's not appropriate to have the frame owner element have this logic that's specific to iframe. Just as with HTMLFrameElementBase, this needs to be based on virtual functions, not "is iframe" checks in the base class. The result of getAttribute is a const AtomicString&, not a String. Putting it into a String causes a bit of unnecessary reference count churn. > + // Navigate using @src if @srcdoc is getting cleared. The use of "@" here is not normal for comments in Webit. > + if (attr->isNull()) > + setLocation(getAttribute(srcAttr)); > + else { > +#if ENABLE(SANDBOX) > + SandboxFlags flags = sandboxFlags(); > + if (flags == SandboxNone) > + setSandboxFlags(SandboxAll); > +#endif > + setLocationToSrcdoc(); > + } It's important that this logic be localized in a single class. We don't want this part in one class and the rest in another. While other classes can forward things here, the logic specific to the srcdoc attribute should be localized all in a single class, HTMLIFrameElement. The other classes should not even know the name of the attribute. Also note that you can call fastGetAttribute in this case. It's a factoring warning sigh that this needs its own extra sandboxing logic. Is there any way to share this with the normal code path? Also, this is good place for a "why" comment. I see what is happening to the sandbox flags, but not why, and I am not the only one. I don't see any point to putting the flags into a local variable here. > + attribute [Reflect] DOMString srcdoc; If the srcdoc attribute is a URL, then it needs to be "[Reflect, URL]" and needs to be covered by the URL reflection test, and you'll also need to override isURLAttribute to pass that test. > + Frame* realParentFrame = m_frame->tree()->parent(); > + if (realParentFrame) { > + // Walk a srcdoc hierarchy to a parent with a real encoding > + const HTMLFrameOwnerElement* owner = realParentFrame->ownerElement(); > + while (owner && owner->hasTagName(HTMLNames::iframeTag) && owner->hasAttribute(HTMLNames::srcdocAttr)) { > + realParentFrame = realParentFrame->tree()->parent(); > + owner = realParentFrame ? realParentFrame->ownerElement() : 0; > + } > + } This logic should go into a separate helper function. Adding it in to DocumentWriter::createDecoderIfNeeded is not good factoring. Further, why is determining the encoding the one place where we need this "real parent" logic that skips srcdoc frames? What other calls to parent() might want this? Maybe FrameTree itself should know about this. Again, the logic should not check iframeTag and srcdocAttr. It should use a virtual function -- only the HTMLIFrameElement class itself should know about srcdocAttr. > - RefPtr<DocumentLoader> loader = m_client->createDocumentLoader(request, SubstituteData()); > + SubstituteData substituteData = m_frame->ownerElement() ? m_frame->ownerElement()->srcdocSubstituteData(request.url()) : SubstituteData(); > + RefPtr<DocumentLoader> loader = m_client->createDocumentLoader(request, substituteData); The way it's used here makes it clear that srcdocSubstituteData is not the right name for this function. While the frame only uses this when a srcdoc is involved, the actual name should simply state that this is SubstituteData to be used when loading the frame. > - if (!SecurityOrigin::canLoad(url, referrer, 0)) { > + if (!SecurityOrigin::canLoad(url, referrer, ownerElement->document())) { Why this change? What test covers it? Can this go in by itself? > - SubstituteData(PassRefPtr<SharedBuffer> content, const String& mimeType, const String& textEncoding, const KURL& failingURL, const KURL& responseURL = KURL()) > - : m_content(content) > - , m_mimeType(mimeType) > + > + SubstituteData(PassRefPtr<SharedBuffer> dataBuffer, const String& mimeType, const String& textEncoding, const KURL& failingURL, const KURL& responseURL = KURL()) The term "data buffer" here doesn't seem very good. I think just "buffer" is probably OK given the context. > + : m_mimeType(mimeType) > , m_textEncoding(textEncoding) > , m_failingURL(failingURL) > , m_responseURL(responseURL) > { > + m_content = adoptRef(new SubstituteContentBuffer(dataBuffer)); > } Why is m_content being set up in an assignment instead of initialization? It should be initialization. It's OK to call a function in an initializer. In new code, you should be using a create function, not a direct call to adoptRef(new). > +#if defined(WTF_CPU_BIG_ENDIAN) || defined(WTF_CPU_MIDDLE_ENDIAN) > + , m_textEncoding("UTF-16BE") > +#else > + , m_textEncoding("UTF-16LE") > +#endif This is really weak way to do this. Can we find a cleaner way to accomplish it? I think we might want to offer a function that returns a suitable TextEncoding for UChar without looking up the encoding name as a string each time. Also, the WTF_XXX macros aren't intended to be used directly with defined. If you were going to do the #if it would be #if CPU(BIG_ENDIAN). But there has to be a better way to handle it. > + m_content = adoptRef(new SubstituteContentString(dataString)); Same comments as above about assignment, adoptRef, and create. > + virtual ~SubstituteContent() {}; No semicolon needed here. Also, we put a space between braces in cases like this. > + SubstituteContentBuffer(PassRefPtr<SharedBuffer> dataBuffer) : m_dataBuffer(dataBuffer) {} Put space between the braces in cases like this. > + const char* data() const { return m_dataBuffer->data(); } > + unsigned size() const { return m_dataBuffer->size(); } Please make these private and mark them virtual. We like to be explicit when we are overriding virtual functions and repeat the virtual, and it would be inefficient for someone to call these virtual functions if they knew the concrete type; making the functions private makes it more likely we'd catch that mistake. The word data in m_dataBuffer doesn't add much. Is unsigned OK for size? Elsewhere we often use size_t for such sizes. > + const char* data() const { return reinterpret_cast<const char *>(m_dataString.characters()); } There should not be a space before the "*". This needs a comment connecting this otherwise-unwise cast to the text encoding names that make it OK. > + private: > + String m_dataString; > + }; The word "data" in m_dataString doesn't add much. > + > }; > - > } > > #endif // SubstituteData_h Please don't remove that blank line. We want a blank line there. Do the test cases cover all the code changes here? review- because you'll want to fix at least some of the things I mentioned above

Justin Schuh

Comment 14 2010-09-07 10:19:01 PDT

I meant to respond sooner, but I was buried under other tasks. Please assume anything I'm not responding to inline is just me agreeing and it will be fixed in the next patch. (In reply to comment #13) >It's a factoring warning sigh that this needs its own extra sandboxing logic. Is there any way to share this with the normal code path? Also, this is good place for a "why" comment. I see what is happening to the sandbox flags, but not why, and I am not the only one. There's a a discussion on whatwg about whether the sandbox should be implicit for srcdoc: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-August/028274.html I'm strongly in favor of the implicit sandbox, and the additional logic here is the implementation. However, if it's a major sticking point I can take it out for now and submit it as a separate patch later (depending on the results of the whatwg discussion). > > + attribute [Reflect] DOMString srcdoc; > > If the srcdoc attribute is a URL, then it needs to be "[Reflect, URL]" and needs to be covered by the URL reflection test, and you'll also need to override isURLAttribute to pass that test. I might be mistaken on how the IDL URL extension works. However, the srcdoc attribute isn't a URL, it's used as the content of the frame. So, I would think it shouldn't be be tagged as a URL. > > + Frame* realParentFrame = m_frame->tree()->parent(); > > + if (realParentFrame) { > > + // Walk a srcdoc hierarchy to a parent with a real encoding > > + const HTMLFrameOwnerElement* owner = realParentFrame->ownerElement(); > > + while (owner && owner->hasTagName(HTMLNames::iframeTag) && owner->hasAttribute(HTMLNames::srcdocAttr)) { > > + realParentFrame = realParentFrame->tree()->parent(); > > + owner = realParentFrame ? realParentFrame->ownerElement() : 0; > > + } > > + } > > This logic should go into a separate helper function. Adding it in to DocumentWriter::createDecoderIfNeeded is not good factoring. Further, why is determining the encoding the one place where we need this "real parent" logic that skips srcdoc frames? What other calls to parent() might want this? Maybe FrameTree itself should know about this. Again, the logic should not check iframeTag and srcdocAttr. It should use a virtual function -- only the HTMLIFrameElement class itself should know about srcdocAttr. Sorry. After looking back I realized this is definitely the wrong place for this logic. What's actually going on is that I modified SubstituteData to avoid redundant conversions back and forth between UTF-8 and UTF-16. The problem occurs when the encoding sniffing sees the current frame as UTF-16 and defaults to the same encoding type for subframes. What I think I need to do is add a way to tell the document that it shouldn't use it's own encoding for hinting purposes, and instead use the encoding of it's parent (which would recurse all the way up to whatever parent is a real document and not a srcdoc). > > - if (!SecurityOrigin::canLoad(url, referrer, 0)) { > > + if (!SecurityOrigin::canLoad(url, referrer, ownerElement->document())) { > > Why this change? What test covers it? Can this go in by itself? I moved this and another unrelated change out to their own patches, which landed last week. Sorry for accidentally mixing them in. > > +#if defined(WTF_CPU_BIG_ENDIAN) || defined(WTF_CPU_MIDDLE_ENDIAN) > > + , m_textEncoding("UTF-16BE") > > +#else > > + , m_textEncoding("UTF-16LE") > > +#endif > > This is really weak way to do this. Can we find a cleaner way to accomplish it? I think we might want to offer a function that returns a suitable TextEncoding for UChar without looking up the encoding name as a string each time. > Also, the WTF_XXX macros aren't intended to be used directly with defined. If you were going to do the #if it would be #if CPU(BIG_ENDIAN). But there has to be a better way to handle it. Is it reasonable to just add a static function that provides a constant String with the native encoding for the platform (I didn't find one)? And if so, any suggestions on where that should live? It seems like it really shouldn't live in SubstituteData. > Do the test cases cover all the code changes here? One of your comments pointed out that I do need tests for explicit navigation to about:srcdoc. I think I have the rest covered, but I'll do another round and double check if there's anything I'm missing.

Justin Schuh

Comment 15 2010-09-11 17:46:36 PDT

Created attachment 67319 [details] Patch

Justin Schuh

Comment 16 2010-09-11 18:03:24 PDT

Comment on attachment 67319 [details] Patch I think I've addressed the layering issues in this version. Also, Adam and I discussed the issue of whether about:srcdoc should be navigable. We decided that there's good reason to make it navigable (and it would be extremely difficult to do otherwise). So I'm sending an email to whatwg to have the standard modified (since I anticipate other implementors will reach the same conclusion).

Early Warning System Bot

Comment 17 2010-09-11 18:15:15 PDT

Attachment 67319 [details] did not build on qt: Build output: http://queues.webkit.org/results/3918416

Justin Schuh

Comment 18 2010-09-11 20:24:46 PDT

Created attachment 67328 [details] Patch Just a minor tweak to fix QT breakage.

Adam Barth

Comment 19 2010-09-16 11:44:07 PDT

Comment on attachment 67328 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=67328&action=prettypatch You should also do the parts in the parser. > WebCore/html/HTMLIFrameElement.cpp:76 > - return SandboxNone; > + return hasAttribute(srcdocAttr) ? SandboxAll : SandboxNone; Is this what the WG decided? > WebCore/html/HTMLIFrameElement.cpp:140 > +#if ENABLE(SANDBOX) We should get rid of this ifdef, but that's for another patch. > WebCore/html/HTMLIFrameElement.cpp:190 > + const String srcdoc = fastGetAttribute(HTMLNames::srcdocAttr); We don't use const here. Also, we're probably using the HTMLNames namespace. No need to be explicit here. > WebCore/html/HTMLIFrameElement.cpp:200 > + if (!hasAttribute(srcdocAttr)) > + return srcdocURL(); This not seems wrong.

Adam Barth

Comment 20 2012-04-03 13:41:09 PDT

*** This bug has been marked as a duplicate of bug 82991 ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution DUPLICATE

of bug 82991

Priority P2

Severity Enhancement

Classification Unclassified

Version 528+ (Nightly build)

Hardware Other

OS All

Product WebKit

Component Frames

Assignee

Justin Schuh

Reported

2010-08-07 16:02 PDT

Modified

2012-04-03 13:41 PDT History

CC List

6 users Show

URL

Keywords

Duplicates (1)

42507 View as bug list

Depends on

Blocks