Bug 43683 - Implement iframe.srcdoc
Summary: Implement iframe.srcdoc
Status: RESOLVED DUPLICATE of bug 82991
Alias: None
Product: WebKit
Classification: Unclassified
Component: Frames (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other All
: P2 Enhancement
Assignee: Justin Schuh
URL:
Keywords:
: 42507 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-08-07 16:02 PDT by Justin Schuh
Modified: 2012-04-03 13:41 PDT (History)
6 users (show)

See Also:


Attachments
Patch (11.69 KB, patch)
2010-08-07 16:04 PDT, Justin Schuh
no flags Details | Formatted Diff | Diff
Patch (11.87 KB, patch)
2010-08-08 20:08 PDT, Justin Schuh
no flags Details | Formatted Diff | Diff
Patch (9.33 KB, patch)
2010-08-15 22:51 PDT, Justin Schuh
no flags Details | Formatted Diff | Diff
Patch (24.77 KB, patch)
2010-08-23 08:13 PDT, Justin Schuh
no flags Details | Formatted Diff | Diff
Patch (30.57 KB, patch)
2010-08-29 22:23 PDT, Justin Schuh
no flags Details | Formatted Diff | Diff
Patch (43.04 KB, patch)
2010-09-11 17:46 PDT, Justin Schuh
no flags Details | Formatted Diff | Diff
Patch (42.99 KB, patch)
2010-09-11 20:24 PDT, Justin Schuh
abarth: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Schuh 2010-08-07 16:02:40 PDT
Implement iframe.srcdoc
Comment 1 Justin Schuh 2010-08-07 16:04:35 PDT
Created attachment 63833 [details]
Patch

This patch is just for discussion right now. I still have to figure out the double onload events, XSS auditor hooks, history, write several more tests, etc.
Comment 2 Justin Schuh 2010-08-07 16:12:34 PDT
Adam, the test you asked about was: LayoutTests/fast/frames/iframe-srcdoc-frame-onload.html
Comment 3 Adam Barth 2010-08-07 16:19:01 PDT
*** Bug 42507 has been marked as a duplicate of this bug. ***
Comment 4 Justin Schuh 2010-08-08 20:08:46 PDT
Created attachment 63856 [details]
Patch

Cleaned up some obvious mistakes, but still trying to work out exactly when to load the content; the current spot is definitely not right. Once again, just posting this for discussion purposes right now.
Comment 5 Justin Schuh 2010-08-11 10:38:18 PDT
The only way I see to get events and navigation right is to inject this into the network stack, just like data: URL does. I've pinged the whatwg list on why about:srcdoc was chosen as the location address, because it seems better to use the data URL and should make implementation simpler.
Comment 6 Justin Schuh 2010-08-15 22:51:19 PDT
Created attachment 64467 [details]
Patch

Adam, can you let me know if this is a reasonable strategy? I hooked @srcdoc into FrameLoader via SubstituteData, so it loads asynch and all the events and navigation work out correctly.
Comment 7 Adam Barth 2010-08-15 23:29:30 PDT
Comment on attachment 64467 [details]
Patch

WebCore/html/HTMLFrameOwnerElement.cpp:102
 +              return SubstituteData(data.release(), document()->isXHTMLDocument() ? "text/xml" : "text/html", "UTF-8", KURL(), srcdocURL());
text/xml, really?  I thought srcdoc was always text/html.

WebCore/html/HTMLFrameOwnerElement.cpp:100
 +              const CString srcdocUtf8 = srcdoc.string().utf8();
Lame that we need to convert to UTF8.  Can't we just use UTF16 the whole time?

WebCore/html/HTMLIFrameElement.cpp:133
 +              setLocation(getAttribute(srcAttr), false);
Isn't false implied?

WebCore/html/HTMLIFrameElement.cpp:135
 +              setLocation(srcdocURL(), true);
Should we use an enum instead of true/false?  Maybe we just need a new method setLocationToSrcdoc?
Comment 8 Adam Barth 2010-08-15 23:29:44 PDT
Yeah, that's a nice approach.
Comment 9 Justin Schuh 2010-08-16 10:10:07 PDT
Thanks for the detailed feedback; I didn't think I'd get more than a "yes" or "no" on the approach at this stage.

(In reply to comment #7)
> (From update of attachment 64467 [details])
> WebCore/html/HTMLFrameOwnerElement.cpp:102
>  +              return SubstituteData(data.release(), document()->isXHTMLDocument() ? "text/xml" : "text/html", "UTF-8", KURL(), srcdocURL());
> text/xml, really?  I thought srcdoc was always text/html.

I think I misread the standard here. It looks like you are correct and it should always be text/html. Hooray for easier!

> WebCore/html/HTMLFrameOwnerElement.cpp:100
>  +              const CString srcdocUtf8 = srcdoc.string().utf8();
> Lame that we need to convert to UTF8.  Can't we just use UTF16 the whole time?

We can use UTF16 (as long as I account for byte-ordering). However, I'm not sure it matters much because we'll be copying the string regardless. Maybe I should just make SubstituteData smart enough to avoid the copying and conversion for strings?

> WebCore/html/HTMLIFrameElement.cpp:133
>  +              setLocation(getAttribute(srcAttr), false);
> Isn't false implied?
> 
> WebCore/html/HTMLIFrameElement.cpp:135
>  +              setLocation(srcdocURL(), true);
> Should we use an enum instead of true/false?  Maybe we just need a new method setLocationToSrcdoc?

Yeah, setLocationToSrcdoc sounds like a much better idea than the boolean argument.
Comment 10 Anne van Kesteren 2010-08-16 13:56:00 PDT
srcdoc is either HTML or XML depending on whether the document the srcdoc <iframe> is in is HTML or XML. You were correct.

The specification states quite clearly:

"For iframe elements in HTML documents, the attribute, ..."

"For iframe elements in XML documents, the attribute, ..."

(I'm not sure whether this is ideal, but that is what it says. And is similar to e.g. the "semantics" of innerHTML.)
Comment 11 Justin Schuh 2010-08-23 08:13:49 PDT
Created attachment 65121 [details]
Patch

I think the implentation is done, but I'm still trying to figure out more tests. I've added implicit sandboxing in this version. Tab is raising the question with WHATWG, but we can't think of any use case for @srcdoc that doesn't include the sandbox.
Comment 12 Justin Schuh 2010-08-29 22:23:40 PDT
Created attachment 65877 [details]
Patch
Comment 13 Darin Adler 2010-08-29 23:45:00 PDT
Comment on attachment 65877 [details]
Patch

> +    if (attr->name() == srcAttr) {
> +        if (!hasTagName(iframeTag) || getAttribute(srcdocAttr).isNull())
> +            setLocation(deprecatedParseURL(attr->value()));

It's bad layering to have iframe-specific code in HTMLFrameElementBase and then check hasTagName(iframeTag). Instead, iframe-specific code should go in HTMLIFrameElement. If we need a hook for iframe, we can add a virtual function, but it should ask a question, not "is this an iframe".

To check if an attribute exists, we should use hasAttribute, not getAttribute().isNull(). It's more efficient. But also, is null the only value that matters? I'd like to see test cases covering values like empty string or all whitespace for the srcdoc attribute.

Code like this that works with one attribute looking at the value of the other usually is a problem because there's no guarantee the src attribute will be parsed before the srcdoc attribute. I need to see how this will work in either order, and we'd want to make sure we had test cases that covered both.

>  KURL HTMLFrameElementBase::location() const
>  {
> +    if (hasTagName(iframeTag) && !getAttribute(srcdocAttr).isEmpty())
> +        return srcdocURL();
>      return document()->completeURL(getAttribute(srcAttr));
>  }

Again, not good to have iframe-specific code in HTMLFrameElementBase. Should be factored differently. Same comment about getAttribute(srcdocAttr) too.

> +    if (str == srcdocURL().string())
> +        return;

Is it OK that this is a case-sensitive comparison? I would like to see test cases with "ABOUT:SRCDOC" too.

> +void HTMLFrameElementBase::setLocationToSrcdoc()
> +{
> +    m_URL = srcdocURL().string();
> +
> +    if (inDocument())
> +        openURL(false, false);
> +}

It's bad for long term maintenance to copy and paste the last part of setLocation into this separate function. Even though it's just a single function call, I'd like to factor things so the code is shared.

> +SubstituteData HTMLFrameOwnerElement::srcdocSubstituteData(const KURL& url) const
> +{
> +    if (url == srcdocURL() && hasTagName(HTMLNames::iframeTag)) {
> +        const String srcdoc = getAttribute(HTMLNames::srcdocAttr);
> +        if (!srcdoc.isNull())
> +            return SubstituteData(srcdoc, document()->isXHTMLDocument() ? "text/xml" : "text/html", KURL(), url);
> +    }
> +    return SubstituteData();
> +}

It's not appropriate to have the frame owner element have this logic that's specific to iframe. Just as with HTMLFrameElementBase, this needs to be based on virtual functions, not "is iframe" checks in the base class.

The result of getAttribute is a const AtomicString&, not a String. Putting it into a String causes a bit of unnecessary reference count churn.

> +        // Navigate using @src if @srcdoc is getting cleared.

The use of "@" here is not normal for comments in Webit.

> +        if (attr->isNull())
> +            setLocation(getAttribute(srcAttr));
> +        else {
> +#if ENABLE(SANDBOX)
> +            SandboxFlags flags = sandboxFlags();
> +            if (flags == SandboxNone)
> +                setSandboxFlags(SandboxAll);
> +#endif
> +            setLocationToSrcdoc();
> +        }

It's important that this logic be localized in a single class. We don't want this part in one class and the rest in another. While other classes can forward things here, the logic specific to the srcdoc attribute should be localized all in a single class, HTMLIFrameElement. The other classes should not even know the name of the attribute.

Also note that you can call fastGetAttribute in this case.

It's a factoring warning sigh that this needs its own extra sandboxing logic. Is there any way to share this with the normal code path? Also, this is good place for a "why" comment. I see what is happening to the sandbox flags, but not why, and I am not the only one.

I don't see any point to putting the flags into a local variable here.

> +        attribute [Reflect] DOMString srcdoc;

If the srcdoc attribute is a URL, then it needs to be "[Reflect, URL]" and needs to be covered by the URL reflection test, and you'll also need to override isURLAttribute to pass that test.

> +        Frame* realParentFrame = m_frame->tree()->parent();
> +        if (realParentFrame) {
> +            // Walk a srcdoc hierarchy to a parent with a real encoding
> +            const HTMLFrameOwnerElement* owner = realParentFrame->ownerElement();
> +            while (owner && owner->hasTagName(HTMLNames::iframeTag) && owner->hasAttribute(HTMLNames::srcdocAttr)) {
> +                realParentFrame = realParentFrame->tree()->parent();
> +                owner = realParentFrame ? realParentFrame->ownerElement() : 0;
> +            }
> +        }

This logic should go into a separate helper function. Adding it in to DocumentWriter::createDecoderIfNeeded is not good factoring. Further, why is determining the encoding the one place where we need this "real parent" logic that skips srcdoc frames? What other calls to parent() might want this? Maybe FrameTree itself should know about this. Again, the logic should not check iframeTag and srcdocAttr. It should use a virtual function -- only the HTMLIFrameElement class itself should know about srcdocAttr.

> -    RefPtr<DocumentLoader> loader = m_client->createDocumentLoader(request, SubstituteData());
> +    SubstituteData substituteData = m_frame->ownerElement() ? m_frame->ownerElement()->srcdocSubstituteData(request.url()) : SubstituteData();
> +    RefPtr<DocumentLoader> loader = m_client->createDocumentLoader(request, substituteData);

The way it's used here makes it clear that srcdocSubstituteData is not the right name for this function. While the frame only uses this when a srcdoc is involved, the actual name should simply state that this is SubstituteData to be used when loading the frame.

> -    if (!SecurityOrigin::canLoad(url, referrer, 0)) {
> +    if (!SecurityOrigin::canLoad(url, referrer, ownerElement->document())) {

Why this change? What test covers it? Can this go in by itself?

> -        SubstituteData(PassRefPtr<SharedBuffer> content, const String& mimeType, const String& textEncoding, const KURL& failingURL, const KURL& responseURL = KURL())
> -            : m_content(content)
> -            , m_mimeType(mimeType)
> +        
> +        SubstituteData(PassRefPtr<SharedBuffer> dataBuffer, const String& mimeType, const String& textEncoding, const KURL& failingURL, const KURL& responseURL = KURL())

The term "data buffer" here doesn't seem very good. I think just "buffer" is probably OK given the context.

> +            : m_mimeType(mimeType)
>              , m_textEncoding(textEncoding)
>              , m_failingURL(failingURL)
>              , m_responseURL(responseURL)
>          {
> +            m_content = adoptRef(new SubstituteContentBuffer(dataBuffer));
>          }

Why is m_content being set up in an assignment instead of initialization? It should be initialization. It's OK to call a function in an initializer.

In new code, you should be using a create function, not a direct call to adoptRef(new).

> +#if defined(WTF_CPU_BIG_ENDIAN) || defined(WTF_CPU_MIDDLE_ENDIAN)
> +            , m_textEncoding("UTF-16BE")
> +#else
> +            , m_textEncoding("UTF-16LE")
> +#endif

This is really weak way to do this. Can we find a cleaner way to accomplish it? I think we might want to offer a function that returns a suitable TextEncoding for UChar without looking up the encoding name as a string each time.

Also, the WTF_XXX macros aren't intended to be used directly with defined. If you were going to do the #if it would be #if CPU(BIG_ENDIAN). But there has to be a better way to handle it.

> +            m_content = adoptRef(new SubstituteContentString(dataString));

Same comments as above about assignment, adoptRef, and create.

> +            virtual ~SubstituteContent() {};

No semicolon needed here. Also, we put a space between braces in cases like this.

> +            SubstituteContentBuffer(PassRefPtr<SharedBuffer> dataBuffer) : m_dataBuffer(dataBuffer) {}

Put space between the braces in cases like this.

> +            const char* data() const { return m_dataBuffer->data(); }
> +            unsigned size() const { return m_dataBuffer->size(); }

Please make these private and mark them virtual. We like to be explicit when we are overriding virtual functions and repeat the virtual, and it would be inefficient for someone to call these virtual functions if they knew the concrete type; making the functions private makes it more likely we'd catch that mistake.

The word data in m_dataBuffer doesn't add much.

Is unsigned OK for size? Elsewhere we often use size_t for such sizes.

> +            const char* data() const { return reinterpret_cast<const char *>(m_dataString.characters()); }

There should not be a space before the "*". This needs a comment connecting this otherwise-unwise cast to the text encoding names that make it OK.

> +        private:
> +            String m_dataString;
> +        };

The word "data" in m_dataString doesn't add much.

> +        
>      };
> -
>  }
>  
>  #endif // SubstituteData_h

Please don't remove that blank line. We want a blank line there.

Do the test cases cover all the code changes here?

review- because you'll want to fix at least some of the things I mentioned above
Comment 14 Justin Schuh 2010-09-07 10:19:01 PDT
I meant to respond sooner, but I was buried under other tasks. Please assume anything I'm not responding to inline is just me agreeing and it will be fixed in the next patch.


(In reply to comment #13)
>It's a factoring warning sigh that this needs its own extra sandboxing logic. Is there any way to share this with the normal code path? Also, this is good place for a "why" comment. I see what is happening to the sandbox flags, but not why, and I am not the only one.

There's a a discussion on whatwg about whether the sandbox should be implicit for srcdoc:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-August/028274.html

I'm strongly in favor of the implicit sandbox, and the additional logic here is the implementation. However, if it's a major sticking point I can take it out for now and submit it as a separate patch later (depending on the results of the whatwg discussion).


> > +        attribute [Reflect] DOMString srcdoc;
> 
> If the srcdoc attribute is a URL, then it needs to be "[Reflect, URL]" and needs to be covered by the URL reflection test, and you'll also need to override isURLAttribute to pass that test.

I might be mistaken on how the IDL URL extension works. However, the srcdoc attribute isn't a URL, it's used as the content of the frame. So, I would think it shouldn't be be tagged as a URL.


> > +        Frame* realParentFrame = m_frame->tree()->parent();
> > +        if (realParentFrame) {
> > +            // Walk a srcdoc hierarchy to a parent with a real encoding
> > +            const HTMLFrameOwnerElement* owner = realParentFrame->ownerElement();
> > +            while (owner && owner->hasTagName(HTMLNames::iframeTag) && owner->hasAttribute(HTMLNames::srcdocAttr)) {
> > +                realParentFrame = realParentFrame->tree()->parent();
> > +                owner = realParentFrame ? realParentFrame->ownerElement() : 0;
> > +            }
> > +        }
> 
> This logic should go into a separate helper function. Adding it in to DocumentWriter::createDecoderIfNeeded is not good factoring. Further, why is determining the encoding the one place where we need this "real parent" logic that skips srcdoc frames? What other calls to parent() might want this? Maybe FrameTree itself should know about this. Again, the logic should not check iframeTag and srcdocAttr. It should use a virtual function -- only the HTMLIFrameElement class itself should know about srcdocAttr.

Sorry. After looking back I realized this is definitely the wrong place for this logic. What's actually going on is that I modified SubstituteData to avoid redundant conversions back and forth between UTF-8 and UTF-16. The problem occurs when the encoding sniffing sees the current frame as UTF-16 and defaults to the same encoding type for subframes. What I think I need to do is add a way to tell the document that it shouldn't use it's own encoding for hinting purposes, and instead use the encoding of it's parent (which would recurse all the way up to whatever parent is a real document and not a srcdoc).


> > -    if (!SecurityOrigin::canLoad(url, referrer, 0)) {
> > +    if (!SecurityOrigin::canLoad(url, referrer, ownerElement->document())) {
> 
> Why this change? What test covers it? Can this go in by itself?

I moved this and another unrelated change out to their own patches, which landed last week. Sorry for accidentally mixing them in.


> > +#if defined(WTF_CPU_BIG_ENDIAN) || defined(WTF_CPU_MIDDLE_ENDIAN)
> > +            , m_textEncoding("UTF-16BE")
> > +#else
> > +            , m_textEncoding("UTF-16LE")
> > +#endif
> 
> This is really weak way to do this. Can we find a cleaner way to accomplish it? I think we might want to offer a function that returns a suitable TextEncoding for UChar without looking up the encoding name as a string each time.
> Also, the WTF_XXX macros aren't intended to be used directly with defined. If you were going to do the #if it would be #if CPU(BIG_ENDIAN). But there has to be a better way to handle it.

Is it reasonable to just add a static function that provides a constant String with the native encoding for the platform (I didn't find one)? And if so, any suggestions on where that should live? It seems like it really shouldn't live in SubstituteData.

> Do the test cases cover all the code changes here?

One of your comments pointed out that I do need tests for explicit navigation to about:srcdoc. I think I have the rest covered, but I'll do another round and double check if there's anything I'm missing.
Comment 15 Justin Schuh 2010-09-11 17:46:36 PDT
Created attachment 67319 [details]
Patch
Comment 16 Justin Schuh 2010-09-11 18:03:24 PDT
Comment on attachment 67319 [details]
Patch

I think I've addressed the layering issues in this version. Also, Adam and I discussed the issue of whether about:srcdoc should be navigable. We decided that there's good reason to make it navigable (and it would be extremely difficult to do otherwise). So I'm sending an email to whatwg to have the standard modified (since I anticipate other implementors will reach the same conclusion).
Comment 17 Early Warning System Bot 2010-09-11 18:15:15 PDT
Attachment 67319 [details] did not build on qt:
Build output: http://queues.webkit.org/results/3918416
Comment 18 Justin Schuh 2010-09-11 20:24:46 PDT
Created attachment 67328 [details]
Patch

Just a minor tweak to fix QT breakage.
Comment 19 Adam Barth 2010-09-16 11:44:07 PDT
Comment on attachment 67328 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=67328&action=prettypatch

You should also do the parts in the parser.

> WebCore/html/HTMLIFrameElement.cpp:76
> -        return SandboxNone;
> +        return hasAttribute(srcdocAttr) ? SandboxAll : SandboxNone;

Is this what the WG decided?

> WebCore/html/HTMLIFrameElement.cpp:140
> +#if ENABLE(SANDBOX)

We should get rid of this ifdef, but that's for another patch.

> WebCore/html/HTMLIFrameElement.cpp:190
> +        const String srcdoc = fastGetAttribute(HTMLNames::srcdocAttr);

We don't use const here.  Also, we're probably using the HTMLNames namespace.  No need to be explicit here.

> WebCore/html/HTMLIFrameElement.cpp:200
> +    if (!hasAttribute(srcdocAttr))
> +        return srcdocURL();

This not seems wrong.
Comment 20 Adam Barth 2012-04-03 13:41:09 PDT

*** This bug has been marked as a duplicate of bug 82991 ***