Bug 78648 - FileReader is dysfunctional in documents with "null" origin string
Summary: FileReader is dysfunctional in documents with "null" origin string
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jian Li
URL:
Keywords:
Depends on: 89156 89157
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-14 16:14 PST by Alexey Proskuryakov
Modified: 2012-06-15 02:57 PDT (History)
9 users (show)

See Also:


Attachments
Patch (4.00 KB, patch)
2012-06-08 17:39 PDT, Jian Li
abarth: review-
jianli: commit-queue-
Details | Formatted Diff | Diff
Patch (13.20 KB, patch)
2012-06-14 15:59 PDT, Jian Li
no flags Details | Formatted Diff | Diff
Fix styles (12.99 KB, patch)
2012-06-14 16:06 PDT, Jian Li
no flags Details | Formatted Diff | Diff
Patch for landing (13.87 KB, patch)
2012-06-15 01:02 PDT, Hajime Morrita
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2012-02-14 16:14:16 PST
FileReader doesn't work in documents with unique security origin, or in file documents when file path separation is enforced. The reason is that:
1. Origin string for such is null.
2. FileReaderLoader::start tries to create a blob URL.
3. BlobURL::createBlobURL bails out when it sees a "null" origin string.

As an example of configuration where this is broken, try opening any FileReader regression test from fast/files in Google Chrome.

Why does FileReader need a blob URL to read a file, in the first place?
Comment 1 Jian Li 2012-02-14 16:44:21 PST
(In reply to comment #0)
> FileReader doesn't work in documents with unique security origin, or in file documents when file path separation is enforced. The reason is that:
> 1. Origin string for such is null.
> 2. FileReaderLoader::start tries to create a blob URL.
> 3. BlobURL::createBlobURL bails out when it sees a "null" origin string.
> 
> As an example of configuration where this is broken, try opening any FileReader regression test from fast/files in Google Chrome.
> 
> Why does FileReader need a blob URL to read a file, in the first place?

This is because we want to share the same code for both file/blob reading and blob URL request fetching in multi-process environment. When a file/blob is created for the first time, it is identified by an internal used blob URL, that can allow it to be read by either FileReader or blob URL request.
Comment 2 Alexey Proskuryakov 2012-02-14 21:08:41 PST
Why do you want to share file and blob reading code, but not http and blob, for example? This is still not making any sense to me.
Comment 3 Jian Li 2012-02-16 10:52:23 PST
(In reply to comment #2)
> Why do you want to share file and blob reading code, but not http and blob, for example? This is still not making any sense to me.

Not sure I understand what you mean. Which http do you mean? We want to share the common code between blob reading and blob: request handling.
Comment 4 Alexey Proskuryakov 2012-02-16 11:02:28 PST
Let me put it in a different way. Do you have a suggestion for how to fix this bug?
Comment 5 Jian Li 2012-02-16 11:19:33 PST
(In reply to comment #4)
> Let me put it in a different way. Do you have a suggestion for how to fix this bug?

FileReader used to work in file: documents. But due to the tightening of security origin, it did not work any more. I would prefer a security expert, like Adam Barth, to confirm if we do want to make FileReader work in file: or unique security origin context.

If you launch Google Chrome with the command line option "--allow-file-access-from-files", you can run those blob/file regression tests from fast/files. I am not sure if Safari/WebKit has such option to set.

If we decide to support this case, I can work on a fix on this.
Comment 6 Adam Barth 2012-02-16 11:36:39 PST
> FileReader used to work in file: documents. But due to the tightening of security origin, it did not work any more. I would prefer a security expert, like Adam Barth, to confirm if we do want to make FileReader work in file: or unique security origin context.

FileReader should work regardless of the security origin.  As far as I can tell, our current behavior violates the specs.
Comment 7 Jian Li 2012-02-16 11:41:07 PST
(In reply to comment #6)
> > FileReader used to work in file: documents. But due to the tightening of security origin, it did not work any more. I would prefer a security expert, like Adam Barth, to confirm if we do want to make FileReader work in file: or unique security origin context.
> 
> FileReader should work regardless of the security origin.  As far as I can tell, our current behavior violates the specs.

Sounds good. I am going to work on a patch for this.
Comment 8 Alexey Proskuryakov 2012-06-06 17:16:08 PDT
Jian, did you have a chance to look into this?
Comment 9 Jian Li 2012-06-06 17:49:02 PDT
(In reply to comment #8)
> Jian, did you have a chance to look into this?

Sorry for late response. I will pick up this one to see if I can do sth about it.
Comment 10 Jian Li 2012-06-08 17:39:15 PDT
Created attachment 146668 [details]
Patch
Comment 11 Alexey Proskuryakov 2012-06-08 23:58:10 PDT
Comment on attachment 146668 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=146668&action=review

> Source/WebCore/ChangeLog:8
> +        We only fix the problem with using FileReader from a file document. The

What approach would you take to fix this in general case? File URLs are of course most important in this scenario, but this does not look like a step in the direction of a general fix.
Comment 12 Adam Barth 2012-06-09 10:42:10 PDT
Comment on attachment 146668 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=146668&action=review

This patch looks like the wrong approach.  Rather than trying to hack together a blob URL that works for file URLs, it seems like a better approach is to teach FileReader to work with Blobs directly without needing to use URLs at all.

> Source/WebCore/fileapi/BlobURL.cpp:52
> -    return createBlobURL(securityOrigin->toString());
> +    String originString;
> +    if (securityOrigin->protocol() == "file")
> +        originString = "file://" + securityOrigin->filePath();
> +    else
> +        originString = securityOrigin->toString();
> +    return createBlobURL(originString);

This isn't right.  We shouldn't be special-casing protocol() == "file".  It also won't work right depending on various bools in Settings.

> Source/WebCore/fileapi/BlobURL.cpp:67
> +KURL BlobURL::extractInnerURL(const KURL& url)
> +{
> +    String unescapedUrl = decodeURLEscapeSequences(url.path());
> +    size_t lastSlashPos = unescapedUrl.reverseFind('/');
> +    if (lastSlashPos != notFound)
> +        unescapedUrl = unescapedUrl.substring(0, lastSlashPos);
> +    return KURL(ParsedURLString, unescapedUrl);
> +}

This is wrong.  KURL has a function for doing this correctly these days.

> Source/WebCore/page/SecurityOrigin.cpp:92
> +#if ENABLE(BLOB)
> +    if (url.protocolIs("blob"))
> +        return BlobURL::extractInnerURL(url);
> +#endif

Why isn't KURL able to extract this inner URL properly?

> Source/WebCore/page/SecurityOrigin.h:68
> +    String filePath() const { return m_filePath; }

Please don't add an accessor for this information.  It's private data for SecurityOrigin that no code outside this class should need.
Comment 13 Jian Li 2012-06-12 17:48:00 PDT
The reason for using Blob URL to read the blob data is to share the code in both FileReader and blob URL fetching scenarios for all supported platforms, including both single-process and multi-process platforms.

For single-process platform, we could try to make FileReader interact with the blob directly. However, for multi-process platform, the blob data is supposed to live in the browser process and we do not want to invent another protocol to fetch the data from the browser process.

If you want the direct hook with the blob data for single-process platform, I could make the change now. But if we want to also support multi-process platform, I am afraid that making FileReader read via blob URL is the simplest solution. If we want to support more than file URL, we can create a special URL for FileReader and twist SecurityOrigin check to work for it. How do you think? If you know about better solution, please let me know.


(In reply to comment #12)
> (From update of attachment 146668 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=146668&action=review
> 
> This patch looks like the wrong approach.  Rather than trying to hack together a blob URL that works for file URLs, it seems like a better approach is to teach FileReader to work with Blobs directly without needing to use URLs at all.
> 
> > Source/WebCore/fileapi/BlobURL.cpp:52
> > -    return createBlobURL(securityOrigin->toString());
> > +    String originString;
> > +    if (securityOrigin->protocol() == "file")
> > +        originString = "file://" + securityOrigin->filePath();
> > +    else
> > +        originString = securityOrigin->toString();
> > +    return createBlobURL(originString);
> 
> This isn't right.  We shouldn't be special-casing protocol() == "file".  It also won't work right depending on various bools in Settings.
> 
> > Source/WebCore/fileapi/BlobURL.cpp:67
> > +KURL BlobURL::extractInnerURL(const KURL& url)
> > +{
> > +    String unescapedUrl = decodeURLEscapeSequences(url.path());
> > +    size_t lastSlashPos = unescapedUrl.reverseFind('/');
> > +    if (lastSlashPos != notFound)
> > +        unescapedUrl = unescapedUrl.substring(0, lastSlashPos);
> > +    return KURL(ParsedURLString, unescapedUrl);
> > +}
> 
> This is wrong.  KURL has a function for doing this correctly these days.
> 
> > Source/WebCore/page/SecurityOrigin.cpp:92
> > +#if ENABLE(BLOB)
> > +    if (url.protocolIs("blob"))
> > +        return BlobURL::extractInnerURL(url);
> > +#endif
> 
> Why isn't KURL able to extract this inner URL properly?
> 
> > Source/WebCore/page/SecurityOrigin.h:68
> > +    String filePath() const { return m_filePath; }
> 
> Please don't add an accessor for this information.  It's private data for SecurityOrigin that no code outside this class should need.
Comment 14 Adam Barth 2012-06-12 19:56:18 PDT
Running a simple experiment, it looks like we're using blob URLs of the following form:

blob:https%3A//bugs.webkit.org/2e92ba1a-89b9-45b0-90a5-1cba0d901540

Would it be possible to use blob URLs of one of the following forms:

blob:2e92ba1a-89b9-45b0-90a5-1cba0d901540
blob:null/2e92ba1a-89b9-45b0-90a5-1cba0d901540
blob:about%3Ablank/2e92ba1a-89b9-45b0-90a5-1cba0d901540

If we did that, we could avoid serializing the origin into the blob URL.

There's a larger problem of the syntax of blob URLs not interoperating between browsers, which is a problem in the long run but not something we need to solve in this bug.
Comment 15 Jian Li 2012-06-13 11:34:17 PDT
(In reply to comment #14)
> Running a simple experiment, it looks like we're using blob URLs of the following form:
> 
> blob:https%3A//bugs.webkit.org/2e92ba1a-89b9-45b0-90a5-1cba0d901540
> 
> Would it be possible to use blob URLs of one of the following forms:
> 
> blob:2e92ba1a-89b9-45b0-90a5-1cba0d901540
> blob:null/2e92ba1a-89b9-45b0-90a5-1cba0d901540
> blob:about%3Ablank/2e92ba1a-89b9-45b0-90a5-1cba0d901540
> 
> If we did that, we could avoid serializing the origin into the blob URL.
> 
> There's a larger problem of the syntax of blob URLs not interoperating between browsers, which is a problem in the long run but not something we need to solve in this bug.

I think the blob URL should only be used in the 2nd form. That is, SecurityOrigin::toString returns null when the document has unique security origin or the file url. This is the reason I tried to fix the most common case: file URL. I am not sure what kind of user case we want to support for the document with unique security origin.

We have to embed the origin in the blob URL because we need to perform security origin check when fetching an URL. If we do not embed the origin, we will have to get the origin of the page when the blob is spawned. For single-process platform, this should not be a problem. For multi-process platform, this means that an expensive and time-consuming IPC needs to be invoked to get the origin information from the browser process before the origin check could be performed.

This security origin check is only needed for using blob URL. Currently reading blob via FileReader is routed through blob URL. I think we can skip origin check for FileReader since we create a one-time used blob URL internally. How do you think about this? Another idea is to let FileReader read from blob data directly but this approach will only work single-process platform. If you think the latter is better, I can start to produce a patch for this.
Comment 16 Adam Barth 2012-06-13 17:41:09 PDT
> I am not sure what kind of user case we want to support for the document with unique security origin.

The main use case I'm interested in is reading an object out of a FileSystem and then postMessaging the Blob into an sandboxed iframe.  However, an easier case is probably something like the following:

var b = new Blob(["...."], { type="image/png" });
var u = webkitURL.createObjectURL(b);
var i = document.createElement("img");
i.src = u;
document.appendChild(i);

> We have to embed the origin in the blob URL because we need to perform security origin check when fetching an URL. If we do not embed the origin, we will have to get the origin of the page when the blob is spawned. For single-process platform, this should not be a problem. For multi-process platform, this means that an expensive and time-consuming IPC needs to be invoked to get the origin information from the browser process before the origin check could be performed.

In the case of a unique origin, the URL is only usable in one process anyway because it's not possible for a document in another process to have the same origin.  Perhaps we should just keep a map from blob URLs to SecurityOrigins in memory for the unique origin case?

> This security origin check is only needed for using blob URL. Currently reading blob via FileReader is routed through blob URL. I think we can skip origin check for FileReader since we create a one-time used blob URL internally. How do you think about this? Another idea is to let FileReader read from blob data directly but this approach will only work single-process platform. If you think the latter is better, I can start to produce a patch for this.

I'd rather solve the general problem if we can.  What do you think about keeping an in-memory map as described above?  If the blob URL isn't in the map and has a "null" origin, we can conclude that the access check has failed.
Comment 17 Jian Li 2012-06-13 17:50:40 PDT
(In reply to comment #16)
> > I am not sure what kind of user case we want to support for the document with unique security origin.
> 
> The main use case I'm interested in is reading an object out of a FileSystem and then postMessaging the Blob into an sandboxed iframe.  However, an easier case is probably something like the following:
> 
> var b = new Blob(["...."], { type="image/png" });
> var u = webkitURL.createObjectURL(b);
> var i = document.createElement("img");
> i.src = u;
> document.appendChild(i);
> 
> > We have to embed the origin in the blob URL because we need to perform security origin check when fetching an URL. If we do not embed the origin, we will have to get the origin of the page when the blob is spawned. For single-process platform, this should not be a problem. For multi-process platform, this means that an expensive and time-consuming IPC needs to be invoked to get the origin information from the browser process before the origin check could be performed.
> 
> In the case of a unique origin, the URL is only usable in one process anyway because it's not possible for a document in another process to have the same origin.  Perhaps we should just keep a map from blob URLs to SecurityOrigins in memory for the unique origin case?
> 
> > This security origin check is only needed for using blob URL. Currently reading blob via FileReader is routed through blob URL. I think we can skip origin check for FileReader since we create a one-time used blob URL internally. How do you think about this? Another idea is to let FileReader read from blob data directly but this approach will only work single-process platform. If you think the latter is better, I can start to produce a patch for this.
> 
> I'd rather solve the general problem if we can.  What do you think about keeping an in-memory map as described above?  If the blob URL isn't in the map and has a "null" origin, we can conclude that the access check has failed.

Using in-memory map for unique origin case sounds like a good idea. I will proceed with this approach.
Comment 18 Adam Barth 2012-06-13 18:04:47 PDT
Thanks.  I appreciate your talking though the design with me.
Comment 19 Jian Li 2012-06-14 15:59:16 PDT
Created attachment 147669 [details]
Patch
Comment 20 WebKit Review Bot 2012-06-14 16:00:58 PDT
Attachment 147669 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1
Source/WebCore/page/SecurityOrigin.cpp:99:  Use 0 instead of NULL.  [readability/null] [5]
Source/WebCore/fileapi/ThreadableBlobRegistry.cpp:106:  Use 0 or null instead of NULL (even in *comments*).  [readability/null] [4]
Source/WebCore/fileapi/ThreadableBlobRegistry.cpp:112:  One line control clauses should not use braces.  [whitespace/braces] [4]
Total errors found: 3 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 21 Jian Li 2012-06-14 16:06:07 PDT
Created attachment 147671 [details]
Fix styles
Comment 22 Adam Barth 2012-06-14 16:08:21 PDT
Comment on attachment 147669 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=147669&action=review

> LayoutTests/fast/files/file-reader-file-url.html:13
> +<iframe src="resources/file-reader-file-url-iframe.html"></iframe>

Can you add a test in an iframe with the sandbox attribute?  See the tests in http://trac.webkit.org/browser/trunk/LayoutTests/http/tests/security/ with *sandbox* for examples.

>> Source/WebCore/fileapi/ThreadableBlobRegistry.cpp:106
>> +    // If the blob URL contains NULL origin, as in the context with unique security origin or file URL, save the mapping between url and origin so that the origin can be retrived when doing security origin check.
> 
> Use 0 or null instead of NULL (even in *comments*).  [readability/null] [4]

NULL -> "null" should make the style checker happy.

>> Source/WebCore/fileapi/ThreadableBlobRegistry.cpp:112
>> +    } else {
> 
> One line control clauses should not use braces.  [whitespace/braces] [4]

The style checker is right here.

> Source/WebCore/fileapi/ThreadableBlobRegistry.h:51
> +    static PassRefPtr<SecurityOrigin> getCachedOrigin(const KURL&);

getCachedOrigin -> cachedOrigin

We usually skip the "get" prefix because get is such a weak verb.

>> Source/WebCore/page/SecurityOrigin.cpp:99
>> +    return NULL;
> 
> Use 0 instead of NULL.  [readability/null] [5]

The style checker is right here.
Comment 23 Adam Barth 2012-06-14 16:09:37 PDT
Comment on attachment 147671 [details]
Fix styles

My only real substantial comment is to add a test for the sandboxed case.  If you're feeling up to it, you might also want to add a test for postMessaging a Blob into a sandboxed origin.  I suspect that will work fine too after this patch.
Comment 24 Adam Barth 2012-06-14 16:09:55 PDT
Thanks for fixing this bug!
Comment 25 Build Bot 2012-06-14 16:36:58 PDT
Comment on attachment 147671 [details]
Fix styles

Attachment 147671 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/12954535
Comment 26 Jian Li 2012-06-14 17:59:27 PDT
Committed as http://trac.webkit.org/changeset/120384.
Comment 27 WebKit Review Bot 2012-06-14 19:24:43 PDT
Re-opened since this is blocked by 89156
Comment 28 Hajime Morrita 2012-06-15 01:02:08 PDT
Created attachment 147768 [details]
Patch for landing
Comment 29 Hajime Morrita 2012-06-15 01:03:09 PDT
My rollout was wrong. Let me land it again.
Comment 30 WebKit Review Bot 2012-06-15 02:57:31 PDT
Comment on attachment 147768 [details]
Patch for landing

Clearing flags on attachment: 147768

Committed r120433: <http://trac.webkit.org/changeset/120433>
Comment 31 WebKit Review Bot 2012-06-15 02:57:37 PDT
All reviewed patches have been landed.  Closing bug.