Bug 143711 - Synchronous XMLHttpRequest should get access to AppCache resources stored as flat files
Summary: Synchronous XMLHttpRequest should get access to AppCache resources stored as ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: youenn fablet
URL:
Keywords:
Depends on: 144272
Blocks:
  Show dependency treegraph
 
Reported: 2015-04-14 07:04 PDT by youenn fablet
Modified: 2015-04-29 01:19 PDT (History)
3 users (show)

See Also:


Attachments
Patch (171.80 KB, patch)
2015-04-14 07:14 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Direct file reading (172.67 KB, patch)
2015-04-15 10:09 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Trying to fix mac build (172.74 KB, patch)
2015-04-15 11:31 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
using mmap (18.74 KB, patch)
2015-04-17 06:26 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Patch (13.78 KB, patch)
2015-04-25 13:00 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
SharedBuffer approach (20.89 KB, patch)
2015-04-25 13:16 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Patch for landing (20.82 KB, patch)
2015-04-27 03:11 PDT, youenn fablet
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description youenn fablet 2015-04-14 07:04:50 PDT
This bug is a split from https://bugs.webkit.org/show_bug.cgi?id=138506
It handles the same issue for synchronous loads.
Comment 1 youenn fablet 2015-04-14 07:14:44 PDT
Created attachment 250704 [details]
Patch
Comment 2 Darin Adler 2015-04-14 09:34:56 PDT
Comment on attachment 250704 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250704&action=review

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:235
> +                ResourceResponse fileResponse;
> +                ResourceRequest fileRequest(createFileURL(resource->path()));
> +                platformStrategies()->loaderStrategy()->loadResourceSynchronously(networkingContext, 0, fileRequest, DoNotAllowStoredCredentials, DoNotAskClientForAnyCredentials, error, fileResponse, data);

I’m not sure it’s acceptable to do synchronous disk I/O. The other synchronous loading is just moving things around in memory, I believe. We should discuss this. Can we make this asynchronous instead?

> Source/WebCore/loader/appcache/ApplicationCacheHost.h:131
> +        bool maybeLoadSynchronously(NetworkingContext*, ResourceRequest&, ResourceError&, ResourceResponse&, Vector<char>&);

This should be NetworkingContext&.
Comment 3 Alexey Proskuryakov 2015-04-14 09:52:46 PDT
As this is for synchronous requests, and we need to wait for completion anyway, it's not clear to me what we would achieve by moving loading to a secondary thread. Could you please elaborate?
Comment 4 Darin Adler 2015-04-14 10:00:20 PDT
I was mixed up, that’s all. Not a useful comment on my part.
Comment 5 Alexey Proskuryakov 2015-04-14 10:05:21 PDT
Comment on attachment 250704 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250704&action=review

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:212
> +    // URL should have a function to create a url from a path, but it does not. This function
> +    // is not suitable because URL::setPath uses encodeWithURLEscapeSequences, which it notes
> +    // does not correctly escape '#' and '?'. This function works for our purposes because
> +    // app cache media files are always created with encodeForFileName(createCanonicalUUIDString()).

This is a moved comment, but I have difficulty understanding it. Which two "this function" functions is it talking about? What's wrong with fileURLWithFileSystemPath()?

>> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:235
>> +                platformStrategies()->loaderStrategy()->loadResourceSynchronously(networkingContext, 0, fileRequest, DoNotAllowStoredCredentials, DoNotAskClientForAnyCredentials, error, fileResponse, data);
> 
> I’m not sure it’s acceptable to do synchronous disk I/O. The other synchronous loading is just moving things around in memory, I believe. We should discuss this. Can we make this asynchronous instead?

It seems quite wasteful to use the Networking process for a simple file read. I don't see anything good about that, honestly, and this will likely complicate future security hardening work. File URL access is a very special case that we want to have a layered defense against, and using a normal subresource loading code path for something that is otherwise forbidden seems wrong.

I just worked myself up enough to override the r+ with an r-, sorry.

> LayoutTests/http/tests/appcache/simple-video-sync.html:1
> +<html manifest="resources/simple-video.manifest">

Please use a unique manifest for each test (which means that the name of manifest would be simple-video-sync.manifest). Even if not strictly necessary, it's good to know that we don't have manifest collisions as a source of flakiness when running tests.

> LayoutTests/http/tests/appcache/simple-video-sync.html:19
> +        finishTest((req.response == 0) ? "FAILURE: Did not get correct data from cached resource" : "SUCCESS");

Seems worth checking that we actually receive any data in the response, and that content type is correct.
Comment 6 youenn fablet 2015-04-15 04:02:51 PDT
(In reply to comment #5)
> Comment on attachment 250704 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=250704&action=review
> 
> > Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:212
> > +    // URL should have a function to create a url from a path, but it does not. This function
> > +    // is not suitable because URL::setPath uses encodeWithURLEscapeSequences, which it notes
> > +    // does not correctly escape '#' and '?'. This function works for our purposes because
> > +    // app cache media files are always created with encodeForFileName(createCanonicalUUIDString()).
> 
> This is a moved comment, but I have difficulty understanding it. Which two
> "this function" functions is it talking about? What's wrong with
> fileURLWithFileSystemPath()?
> 
> >> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:235
> >> +                platformStrategies()->loaderStrategy()->loadResourceSynchronously(networkingContext, 0, fileRequest, DoNotAllowStoredCredentials, DoNotAskClientForAnyCredentials, error, fileResponse, data);
> > 
> > I’m not sure it’s acceptable to do synchronous disk I/O. The other synchronous loading is just moving things around in memory, I believe. We should discuss this. Can we make this asynchronous instead?
> 
> It seems quite wasteful to use the Networking process for a simple file
> read. I don't see anything good about that, honestly, and this will likely
> complicate future security hardening work. File URL access is a very special
> case that we want to have a layered defense against, and using a normal
> subresource loading code path for something that is otherwise forbidden
> seems wrong.
> 
> I just worked myself up enough to override the r+ with an r-, sorry.

OK, then I guess we are back to sync file loading for sync XHR.

> 
> > LayoutTests/http/tests/appcache/simple-video-sync.html:1
> > +<html manifest="resources/simple-video.manifest">
> 
> Please use a unique manifest for each test (which means that the name of
> manifest would be simple-video-sync.manifest). Even if not strictly
> necessary, it's good to know that we don't have manifest collisions as a
> source of flakiness when running tests.

OK

> > LayoutTests/http/tests/appcache/simple-video-sync.html:19
> > +        finishTest((req.response == 0) ? "FAILURE: Did not get correct data from cached resource" : "SUCCESS");
> 
> Seems worth checking that we actually receive any data in the response, and
> that content type is correct.

OK
Comment 7 youenn fablet 2015-04-15 10:09:49 PDT
Created attachment 250815 [details]
Direct file reading
Comment 8 youenn fablet 2015-04-15 10:11:36 PDT
(In reply to comment #5)
> Comment on attachment 250704 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=250704&action=review
> 
> > Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:212
> > +    // URL should have a function to create a url from a path, but it does not. This function
> > +    // is not suitable because URL::setPath uses encodeWithURLEscapeSequences, which it notes
> > +    // does not correctly escape '#' and '?'. This function works for our purposes because
> > +    // app cache media files are always created with encodeForFileName(createCanonicalUUIDString()).
> 
> This is a moved comment, but I have difficulty understanding it. Which two
> "this function" functions is it talking about? What's wrong with
> fileURLWithFileSystemPath()?

The moved function adds a CF/WIN specific code path, which I am no expert there.
I can check whether this would cause test failures to remove this code path and file a later path if ok.
Comment 9 youenn fablet 2015-04-15 11:31:57 PDT
Created attachment 250827 [details]
Trying to fix mac build
Comment 10 Alexey Proskuryakov 2015-04-16 11:49:28 PDT
Comment on attachment 250827 [details]
Trying to fix mac build

View in context: https://bugs.webkit.org/attachment.cgi?id=250827&action=review

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:211
> +    // URL should have a function to create a url from a path, but it does not. This function
> +    // is not suitable because URL::setPath uses encodeWithURLEscapeSequences, which it notes
> +    // does not correctly escape '#' and '?'. This function works for our purposes because
> +    // app cache media files are always created with encodeForFileName(createCanonicalUUIDString()).

Please add a FIXME here:

// FIXME: Can this be replaced with URL::fileURLWithFileSystemPath()?

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:251
> +            else if (!loadDataFromFile(resource->path(), data))
> +                return false;

This is incorrect. The resource is in application cache, and even if we fail to read it, we should not fall back to the network.

> Source/WebCore/platform/FileSystem.h:178
> +// Reads a file into the provided data buffer. It returns true if all data up to length is read, false otherwise.
> +bool readWholeDataFromFile(const String&, char*, int);

This looks like a somewhat strange API. Why not bool readFile(const String&, Vector<uint8_t>& data)?

But also, for the application cache case, I don't see why we would want to read, an mmap seems a lot more appropriate.

> LayoutTests/http/tests/appcache/simple-video-sync.html:20
> +        if (req.getResponseHeader("Content-Type") != "video/mp4")
> +            finishTest("FAILURE: Did not get correct content type from cached resource");

This still doesn't test response data. Can it be tested?
Comment 11 youenn fablet 2015-04-17 06:26:56 PDT
Created attachment 251017 [details]
using mmap
Comment 12 Darin Adler 2015-04-17 09:41:40 PDT
Comment on attachment 251017 [details]
using mmap

View in context: https://bugs.webkit.org/attachment.cgi?id=251017&action=review

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:210
> +    // URL should have a function to create a url from a path, but it does not. This function

This comment is false. fileURLWithFileSystemPath is such a function. I don’t think it’s good for us to keep this comment exactly as is when that’s inaccurate. Maybe just remove the first sentence?

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:227
> +static inline bool loadDataFromFile(const String& path, Vector<char>& data)

Why map the file if we are just going to copy all the data into a vector? The whole point of a mapped file is that we use the mapped-in data directly, avoiding one copy. I think we need to refactor so we don’t have to copy into the Vector<char>.

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:246
> +                error = m_documentLoader.frameLoader()->client().cannotShowURLError(request);
> +                return true;

Should restructure this so we don’t have to repeat these two lines of code twice. The exact same thing is done just below.

> Source/WebCore/platform/FileSystem.h:182
> +typedef struct S_MappedFile MappedFile;

The name S_MappedFile doesn’t match WebKit coding style and it’s unnecessary. The right thing here is probably just to forward declare a struct named MappedFile; no value in having a separate typedef and struct tag. That’s a C construct that’s not needed in C++.

> Source/WebCore/platform/FileSystem.h:183
> +MappedFile* mapFile(const String&);

It’s not good coding style to have this return a raw pointer that a client can easily forget to deallocate. We could use a std::unique_ptr for this, or we could return a MappedFile rather than a MappedFile* and use C++ constructors and destructors to handle object lifetime. This should probably just be a simple class rather than being a pointer.

> Source/WebCore/platform/gtk/FileSystemGtk.cpp:382
> +    GUniquePtr<gchar> filename = unescapedFilename(path);
> +    int fd = open(filename.get(), O_RDONLY);
> +    if (fd < 0)
> +        return nullptr;

Is there some simple way to encapsulate this so we don’t need to repeat all the code for GTK when it’s almost identical to the POSIX version?
Comment 13 Alexey Proskuryakov 2015-04-17 09:46:02 PDT
Comment on attachment 251017 [details]
using mmap

View in context: https://bugs.webkit.org/attachment.cgi?id=251017&action=review

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:232
> +    bool result = copyMappedFileData(file, data);

Are you saying that we need to copy the data, because XMLHttpRequest cannot take a memory mapped pointer as data? In this case, the right thing to do would be one of:

- Make XMLHttpRequest (or even Vector) work with arbitrary memory provided to it. This should be helpful in the case of loading from the network too, because we get shared memory pointers from NetworkProcess, and not copying these could be nice. But probably fairly hard to implement.

- Or disagree with me when I asked you to use mmap. It doesn't make any sense to mmap and then copy out, direct reading is obviously more straightforward, and quite possibly faster.
Comment 14 youenn fablet 2015-04-17 14:33:09 PDT
(In reply to comment #13)
> Comment on attachment 251017 [details]
> using mmap
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=251017&action=review
> 
> > Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:232
> > +    bool result = copyMappedFileData(file, data);
> 
> Are you saying that we need to copy the data, because XMLHttpRequest cannot
> take a memory mapped pointer as data? In this case, the right thing to do
> would be one of:
> 
> - Make XMLHttpRequest (or even Vector) work with arbitrary memory provided
> to it. This should be helpful in the case of loading from the network too,
> because we get shared memory pointers from NetworkProcess, and not copying
> these could be nice. But probably fairly hard to implement.
> 
> - Or disagree with me when I asked you to use mmap. It doesn't make any
> sense to mmap and then copy out, direct reading is obviously more
> straightforward, and quite possibly faster.

Thanks to both of you for your thorough review, that is helpful.

Ap, I may have misunderstood your comments.
I thought you suggested to use mmap to get a single sys call vs several read call which may make sense for potentially large files I guess.

I am not sure Vector is the best option for the kind of refactoring you are suggesting. SharedBuffer may be more amenable to that. At least I am more familiar with it. If time permits, I will give it some thoughts.

But, to be honest, this sounds to me like a very good refactoring project for something else, which is improving blob implementation.

The scope of this particular patch is appcache. I would just either piggyback on blob implementation or implement something as simple as possible. I understand from your comments that the latter is the best option.

I am away next week but I will try preparing a patch on the week after based on open/read/close. Please let me know if I got something wrong.
Comment 15 Darin Adler 2015-04-17 14:37:00 PDT
As far as what could manage a mmap’d pointer, SharedBuffer, that’s right. Not Vector.
Comment 16 youenn fablet 2015-04-25 10:37:33 PDT
(In reply to comment #15)
> As far as what could manage a mmap’d pointer, SharedBuffer, that’s right.
> Not Vector.

I just took a look and will upload a patch for that.
Related bug is https://bugs.webkit.org/show_bug.cgi?id=144192
Comment 17 youenn fablet 2015-04-25 13:00:15 PDT
Created attachment 251643 [details]
Patch
Comment 18 youenn fablet 2015-04-25 13:16:15 PDT
Created attachment 251646 [details]
SharedBuffer approach
Comment 19 Darin Adler 2015-04-26 14:43:08 PDT
Comment on attachment 251646 [details]
SharedBuffer approach

View in context: https://bugs.webkit.org/attachment.cgi?id=251646&action=review

Looks good.

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:231
> +            // FIXME: We should not copy the SharedBuffer here since upper users of SharedBuffer do not modify the SharedBuffer.

Makes sense. So why copy it instead of just taking a reference to it?

I don’t think the term “upper users” is clear here. I think what you want to say is: “Clients probably don’t really need a copy of the SharedBuffer. Remove the call to copy() if we study the code enough to determine that’s safe.” Or something like that.

> Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:257
> +            // FIXME: We should not copy the SharedBuffer here since upper users of SharedBuffer do not modify the SharedBuffer.

Same comment.

> Source/WebCore/loader/appcache/ApplicationCacheHost.h:34
> +#include "SharedBuffer.h"

We should just be able to forward-declare SharedBuffer and not include the header here. Generally you can compile RefPtr<X>& without having the definition of X, as long as you don’t actually use the RefPtr.

> Source/WebCore/xml/XSLTProcessorLibxslt.cpp:135
> +            else if (data)
> +                data->clear();

We just want data = nullptr here, not data->clear().

> Source/WebCore/xml/XSLTProcessorLibxslt.cpp:139
> +            if (data)
> +                data->clear();

We just want data = nullptr here, not data->clear();
Comment 20 youenn fablet 2015-04-27 03:10:49 PDT
> Looks good.

Thanks for the review.

> > Source/WebCore/loader/appcache/ApplicationCacheHost.cpp:231
> > +            // FIXME: We should not copy the SharedBuffer here since upper users of SharedBuffer do not modify the SharedBuffer.
> 
> Makes sense. So why copy it instead of just taking a reference to it?

From what I saw, removing the copy is safe currently.
But nothing prevents a new user to break this assumption.
It might be useful to introduce some kind of immutable SharedBuffer.

> I don’t think the term “upper users” is clear here. I think what you want to
> say is: “Clients probably don’t really need a copy of the SharedBuffer.
> Remove the call to copy() if we study the code enough to determine that’s
> safe.” Or something like that.

OK. I will add:
// FIXME: Clients proably do not need a copy of the SharedBuffer.
// Remove the call to copy() once we ensure that SharedBuffer will not be modified.


> > Source/WebCore/loader/appcache/ApplicationCacheHost.h:34
> > +#include "SharedBuffer.h"
> 
> We should just be able to forward-declare SharedBuffer and not include the
> header here. Generally you can compile RefPtr<X>& without having the
> definition of X, as long as you don’t actually use the RefPtr.

Thanks, I'll keep that in mind for the future.

> > Source/WebCore/xml/XSLTProcessorLibxslt.cpp:135
> > +            else if (data)
> > +                data->clear();
> 
> We just want data = nullptr here, not data->clear().

OK
Comment 21 youenn fablet 2015-04-27 03:11:33 PDT
Created attachment 251732 [details]
Patch for landing
Comment 22 WebKit Commit Bot 2015-04-27 04:04:37 PDT
Comment on attachment 251732 [details]
Patch for landing

Clearing flags on attachment: 251732

Committed r183393: <http://trac.webkit.org/changeset/183393>
Comment 23 WebKit Commit Bot 2015-04-27 04:04:44 PDT
All reviewed patches have been landed.  Closing bug.
Comment 24 Alexey Proskuryakov 2015-04-27 12:30:08 PDT
This caused a reproducible crash on http/tests/appcache/access-via-redirect.php with GuardMalloc. Will roll out.

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib      	0x00007fff89c6b9b3 _platform_memmove$VARIANT$Nehalem + 243
1   com.apple.WebCore             	0x000000010ce8e2c5 WebCore::TextResourceDecoder::checkForHeadCharset(char const*, unsigned long, bool&) + 117
2   com.apple.WebCore             	0x000000010ce8de19 WebCore::TextResourceDecoder::decode(char const*, unsigned long) + 137
3   com.apple.WebCore             	0x000000010cf33db4 WebCore::XMLHttpRequest::didReceiveData(char const*, int) + 1156
4   com.apple.WebCore             	0x000000010cf11426 WebCore::DocumentThreadableLoader::loadRequest(WebCore::ResourceRequest const&, WebCore::SecurityCheckPolicy) + 774
5   com.apple.WebCore             	0x000000010d2812dd WebCore::DocumentThreadableLoader::loadResourceSynchronously(WebCore::Document&, WebCore::ResourceRequest const&, WebCore::ThreadableLoaderClient&, WebCore::ThreadableLoaderOptions const&) + 61
6   com.apple.WebCore             	0x000000010cf87c69 WebCore::ThreadableLoader::loadResourceSynchronously(WebCore::ScriptExecutionContext*, WebCore::ResourceRequest const&, WebCore::ThreadableLoaderClient&, WebCore::ThreadableLoaderOptions const&) + 73
7   com.apple.WebCore             	0x000000010cf10de4 WebCore::XMLHttpRequest::createRequest(int&) + 1556
8   com.apple.WebCore             	0x000000010cf10790 WebCore::XMLHttpRequest::send(WTF::String const&, int&) + 1072
9   com.apple.WebCore             	0x000000010cf0feac WebCore::JSXMLHttpRequest::send(JSC::ExecState*) + 252
Comment 25 WebKit Commit Bot 2015-04-27 12:32:30 PDT
Re-opened since this is blocked by bug 144272
Comment 26 youenn fablet 2015-04-28 05:10:03 PDT
(In reply to comment #24)
> This caused a reproducible crash on
> http/tests/appcache/access-via-redirect.php with GuardMalloc. Will roll out.

Thanks for checking this and the roll out.
Is there a way to discover these type of issues through build.webkit.org?

> Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
> 0   libsystem_platform.dylib      	0x00007fff89c6b9b3
> _platform_memmove$VARIANT$Nehalem + 243
> 1   com.apple.WebCore             	0x000000010ce8e2c5
> WebCore::TextResourceDecoder::checkForHeadCharset(char const*, unsigned
> long, bool&) + 117

This should be cleared by bug 144321 patch.
If correct, the current patch could be cq+ once bug 144321 is fixed. 

Bug seems to be located in SharedBuffer::copy.
I do not know why this bug did not show up before.
Comment 27 Alexey Proskuryakov 2015-04-28 10:06:39 PDT
> Is there a way to discover these type of issues through build.webkit.org?

We don't have any GuardMalloc bots on webkit.org currently. One can test with GuardMalloc locally, using -g option, although I wouldn't recommend that for every patch - it's usually more efficient to roll out and to fix issues after the fact than to perform super thorough testing locally.
Comment 28 WebKit Commit Bot 2015-04-29 01:19:25 PDT
Comment on attachment 251732 [details]
Patch for landing

Clearing flags on attachment: 251732

Committed r183537: <http://trac.webkit.org/changeset/183537>
Comment 29 WebKit Commit Bot 2015-04-29 01:19:30 PDT
All reviewed patches have been landed.  Closing bug.