Bug 78879 - Allow FileSystem API implementation to pass snapshot metadata at File creation time
Summary: Allow FileSystem API implementation to pass snapshot metadata at File creatio...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Kinuko Yasuda
URL:
Keywords:
Depends on: 86745
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-16 23:24 PST by Kinuko Yasuda
Modified: 2012-05-17 09:53 PDT (History)
12 users (show)

See Also:


Attachments
Patch (26.50 KB, patch)
2012-02-17 00:09 PST, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (27.11 KB, patch)
2012-02-17 01:56 PST, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (27.31 KB, patch)
2012-02-20 03:48 PST, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (27.18 KB, patch)
2012-02-20 16:22 PST, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (31.40 KB, patch)
2012-02-23 11:49 PST, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (19.81 KB, patch)
2012-05-11 03:44 PDT, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (17.92 KB, patch)
2012-05-15 05:38 PDT, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (18.02 KB, patch)
2012-05-15 05:59 PDT, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (19.30 KB, patch)
2012-05-15 22:50 PDT, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (19.38 KB, patch)
2012-05-16 22:47 PDT, Kinuko Yasuda
jianli: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kinuko Yasuda 2012-02-16 23:24:18 PST
Capture File snapshot at its creation time for files from FileSystem API.

We roughly consider File object as a 'snapshot' of the underlying file, though we do not strictly force the model in normal use cases (e.g. for non-sliced, non-concatenated single files).

We should probably tighten this concept for files from FileSystem API, since we want to get rid of the possibility of making synchronous file operations on unknown sandboxed filesystems (e.g. the filesystem could be a remote filesystem), and also have another non-snapshot type of file object in the API (i.e. FileEntry).  Also it'll be much safer to change (fix) the behavior for FileSystem files than regular files since the API is still new.
Comment 1 Kinuko Yasuda 2012-02-17 00:09:25 PST
Created attachment 127541 [details]
Patch
Comment 2 Philippe Normand 2012-02-17 00:24:50 PST
Comment on attachment 127541 [details]
Patch

Attachment 127541 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/11540386
Comment 3 Early Warning System Bot 2012-02-17 01:29:10 PST
Comment on attachment 127541 [details]
Patch

Attachment 127541 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/11536503
Comment 4 Kinuko Yasuda 2012-02-17 01:56:00 PST
Created attachment 127551 [details]
Patch
Comment 5 Jian Li 2012-02-17 16:17:07 PST
Comment on attachment 127551 [details]
Patch

Do you have the test for FileSystem snapshot?

View in context: https://bugs.webkit.org/attachment.cgi?id=127551&action=review

> Source/WebCore/fileapi/Blob.cpp:97
> +    if (asFile())

Can you cache asFile since it is accessed more than once?

> Source/WebCore/fileapi/File.h:108
> +    mutable double m_snapshotModificationTime;

m_snapshotTimestamp? I think we do not care if the timestamp is created from modification time or not.

> Source/WebCore/fileapi/WebKitBlobBuilder.cpp:104
> +    if (blob->asFile()) {

ditto.

> Source/WebCore/fileapi/WebKitBlobBuilder.cpp:107
> +        blob->asFile()->maybeCaptureSnapshot();

This seems to change the semantics. We used to capture and use the latest snapshot when the blob is appended to the BlobBuilder. But now we use the same snapshot the first time that the blob is added to the builder.
Comment 6 Kinuko Yasuda 2012-02-20 03:48:44 PST
Created attachment 127801 [details]
Patch
Comment 7 Kinuko Yasuda 2012-02-20 03:59:04 PST
Comment on attachment 127551 [details]
Patch

Thanks for reviewing!

View in context: https://bugs.webkit.org/attachment.cgi?id=127551&action=review

>> Source/WebCore/fileapi/Blob.cpp:97
>> +    if (asFile())
> 
> Can you cache asFile since it is accessed more than once?

Done.  (Though compiler will likely produce the same code for both)

>> Source/WebCore/fileapi/File.h:108
>> +    mutable double m_snapshotModificationTime;
> 
> m_snapshotTimestamp? I think we do not care if the timestamp is created from modification time or not.

Sounds good, but in the current code we specifically set this value from metadata.modificationTime and return the value for lastModifiedDate so probably it'd be clearer to keep this as is.

>> Source/WebCore/fileapi/WebKitBlobBuilder.cpp:104
>> +    if (blob->asFile()) {
> 
> ditto.

Done.

>> Source/WebCore/fileapi/WebKitBlobBuilder.cpp:107
>> +        blob->asFile()->maybeCaptureSnapshot();
> 
> This seems to change the semantics. We used to capture and use the latest snapshot when the blob is appended to the BlobBuilder. But now we use the same snapshot the first time that the blob is added to the builder.

Right.  I made the change since the File API seems to be going to include more specific error for modified-after-snapshot cases and slice/BlobBuilder usage looked relatively safe to change (since we already take snapshot approach for them).
...but I don't want to introduce unwanted regressions either, so I just removed the line 138 in File.cpp.  (TL;DR: Fixed.)
Comment 8 Kinuko Yasuda 2012-02-20 16:22:39 PST
Created attachment 127867 [details]
Patch
Comment 9 Kinuko Yasuda 2012-02-20 16:24:22 PST
(In reply to comment #8)
> Created an attachment (id=127867) [details]
> Patch

Removed some duplicated code and simplified the patch.
Comment 10 Kinuko Yasuda 2012-02-22 14:18:07 PST
(In reply to comment #9)
> (In reply to comment #8)
> > Created an attachment (id=127867) [details] [details]
> > Patch
> 
> Removed some duplicated code and simplified the patch.

Friendly ping.  Jian could you take another look?  Thanks very much.
Comment 11 Kinuko Yasuda 2012-02-23 11:49:03 PST
Created attachment 128519 [details]
Patch

Added a test.
Comment 12 Jian Li 2012-02-23 14:32:54 PST
Comment on attachment 128519 [details]
Patch

It seems that we now have 2 different semantics when slicing a file into blob. For file-system file, we always use the snapshot that is obtained at the File object creation time. For real file, we always obtain and use the latest snapshot. Is this slicing behavior for file-system file somewhere described in the FileSystem spec? If not, can we also try to use the same behavior for file-system file?

View in context: https://bugs.webkit.org/attachment.cgi?id=128519&action=review

> Source/WebCore/fileapi/Blob.cpp:70
> +    // FIXME: Calling size() and lastModifiedTimeInMicroseconds() may involve synchronous file operation. We need to figure out how to make it asynchronous or make File/Blob act as snapshot all the time as the new draft specifies.

Now you split the call of captureSnapshot into 2 calls: size() and lastModifiedTimeInMicroseconds() and this might not guarantee that size and lastModifiedTime are for the same snapshot when slicing a file. Though current implementation of captureSnapshot also has this problem but we has a FIXME for this, your current change exaggerates this problem.
Comment 13 Kinuko Yasuda 2012-02-24 14:08:25 PST
Thanks.  After some discussion this patch is no longer in a critical path for M19, so I'd like to revisit this sometime later.

(In reply to comment #12)
> (From update of attachment 128519 [details])
> It seems that we now have 2 different semantics when slicing a file into blob. For file-system file, we always use the snapshot that is obtained at the File object creation time. For real file, we always obtain and use the latest snapshot. Is this slicing behavior for file-system file somewhere described in the FileSystem spec? If not, can we also try to use the same behavior for file-system file?

It's not really the spec'ed behavior, though I hoped changing only for files from FileSystem, possibly with some noisy FIXME comment, would be probably acceptable (at least it'd be harmless).

> View in context: https://bugs.webkit.org/attachment.cgi?id=128519&action=review
> 
> > Source/WebCore/fileapi/Blob.cpp:70
> > +    // FIXME: Calling size() and lastModifiedTimeInMicroseconds() may involve synchronous file operation. We need to figure out how to make it asynchronous or make File/Blob act as snapshot all the time as the new draft specifies.
> 
> Now you split the call of captureSnapshot into 2 calls: size() and lastModifiedTimeInMicroseconds() and this might not guarantee that size and lastModifiedTime are for the same snapshot when slicing a file. Though current implementation of captureSnapshot also has this problem but we has a FIXME for this, your current change exaggerates this problem.

Ok, I'll revert the change when I resume working on this patch.
Comment 14 Kinuko Yasuda 2012-05-11 02:01:34 PDT
Changed summary text.  In general we should query File metadata (e.g. size and modifiedTime) on the fly, i.e. when it is queried, to keep the File semantics, but for some platform-specific filesystems it may not be feasible since synchronous metadata query could take very long time.

We should allow each FileSystem API implementation to pass snapshot metadata so that File object could cache the given metadata not to make synchronous query.

This should NOT be allowed for regular FileSystem types (i.e. Temporary or Persistent) not to break File API semantics.
Comment 15 Kinuko Yasuda 2012-05-11 03:44:41 PDT
Created attachment 141372 [details]
Patch
Comment 16 Jian Li 2012-05-14 15:46:06 PDT
Comment on attachment 141372 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=141372&action=review

> Source/WebCore/Modules/filesystem/DOMFileSystem.cpp:126
> +class GetSnapshotFileCallback : public FileSystemCallbacksBase {

I think it'd better to call it as GetMetadataCallback since we're only creating snapshot for certain scenario.

> Source/WebCore/Modules/filesystem/DOMFileSystem.cpp:139
> +        // For regular filesystem types (temporary or persistent) we should not cache file metadata as it could change File semantics.

Add comma before we.

> Source/WebCore/Modules/filesystem/DOMFileSystem.cpp:140
> +        // For other filesystem types (which could be platform-specific ones) there's a chance that the files are on remote filesystem. If the port has returned metadata just pass it to File constructor (so we may cache the metadata).

ditto.

> Source/WebCore/Modules/filesystem/DOMFileSystemSync.cpp:118
> +        // For regular filesystem types (temporary or persistent) we should not cache file metadata as it could change File semantics.

ditto.

> Source/WebCore/Modules/filesystem/DOMFileSystemSync.cpp:119
> +        // For other filesystem types (which could be platform-specific ones) there's a chance that the files are on remote filesystem. If the port has returned metadata just pass it to File constructor (so we may cache the metadata).

ditto.

> Source/WebCore/Modules/mediastream/PeerConnection00.cpp:99
> +    dictionary.get("has_video", audio);

??

> Source/WebCore/fileapi/Blob.cpp:78
> +    long long size = m_size;

It seems the original code is a bit more efficient. What do you need to change that?

> Source/WebCore/fileapi/Blob.h:44
> +class File;

Is this forward declaration needed?

> Source/WebCore/fileapi/File.cpp:97
> +    : Blob(createBlobDataForFileWithName(metadata.platformPath, name), -1)

Should we also store both size and modification time, if provided by the remote system, in the blob data? Should we not pass -1 when caching the metadata?

> Source/WebCore/fileapi/File.cpp:157
> +bool File::hasValidSnapshotMetadata() const

It seems that hasValidSnapshotMetadata might'd better be exposed in FileMetadata.

> Source/WebCore/fileapi/File.h:57
> +    // For filesystem files. It takes metadata which could optionally give valid snapshot metadata, which will suppress future metadata query on this File object. The metadata argument must be used carefully if and only if it is infeasible to query file metadata on the fly (e.g. file is on remote filesystem).

nit: grammar error.

> Source/WebCore/fileapi/File.h:105
> +    const FileMetadata m_snapshotMetadata;

This might be duplicate. We could probable get the metadata directly out of the blob data. It seems that we can use a boolean flag to denote a snapshot file case.
Comment 17 Kinuko Yasuda 2012-05-15 05:38:00 PDT
Created attachment 141933 [details]
Patch
Comment 18 Kinuko Yasuda 2012-05-15 05:44:10 PDT
Thanks, I updated the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=141372&action=review

>> Source/WebCore/Modules/filesystem/DOMFileSystem.cpp:126
>> +class GetSnapshotFileCallback : public FileSystemCallbacksBase {
> 
> I think it'd better to call it as GetMetadataCallback since we're only creating snapshot for certain scenario.

Fixed.

>> Source/WebCore/Modules/filesystem/DOMFileSystem.cpp:139
>> +        // For regular filesystem types (temporary or persistent) we should not cache file metadata as it could change File semantics.
> 
> Add comma before we.

Done.

>> Source/WebCore/Modules/filesystem/DOMFileSystem.cpp:140
>> +        // For other filesystem types (which could be platform-specific ones) there's a chance that the files are on remote filesystem. If the port has returned metadata just pass it to File constructor (so we may cache the metadata).
> 
> ditto.

Done.

>> Source/WebCore/Modules/filesystem/DOMFileSystemSync.cpp:118
>> +        // For regular filesystem types (temporary or persistent) we should not cache file metadata as it could change File semantics.
> 
> ditto.

Done.

>> Source/WebCore/Modules/filesystem/DOMFileSystemSync.cpp:119
>> +        // For other filesystem types (which could be platform-specific ones) there's a chance that the files are on remote filesystem. If the port has returned metadata just pass it to File constructor (so we may cache the metadata).
> 
> ditto.

Done.

>> Source/WebCore/Modules/mediastream/PeerConnection00.cpp:99
>> +    dictionary.get("has_video", audio);
> 
> ??

Bad merge... dropped.

>> Source/WebCore/fileapi/Blob.cpp:78
>> +    long long size = m_size;
> 
> It seems the original code is a bit more efficient. What do you need to change that?

It doesn't make difference in binaries with most compilers. Anyway this change isn't really necessary so I reverted it.

>> Source/WebCore/fileapi/Blob.h:44
>> +class File;
> 
> Is this forward declaration needed?

Nope, removed.

>> Source/WebCore/fileapi/File.cpp:97
>> +    : Blob(createBlobDataForFileWithName(metadata.platformPath, name), -1)
> 
> Should we also store both size and modification time, if provided by the remote system, in the blob data? Should we not pass -1 when caching the metadata?

Good point. I updated the code along with the line.

>> Source/WebCore/fileapi/File.cpp:157
>> +bool File::hasValidSnapshotMetadata() const
> 
> It seems that hasValidSnapshotMetadata might'd better be exposed in FileMetadata.

I removed this method.  (Now the validation check is a bit relaxed than the previous patch but I think it'd be ok)

>> Source/WebCore/fileapi/File.h:57
>> +    // For filesystem files. It takes metadata which could optionally give valid snapshot metadata, which will suppress future metadata query on this File object. The metadata argument must be used carefully if and only if it is infeasible to query file metadata on the fly (e.g. file is on remote filesystem).
> 
> nit: grammar error.

I revised the comment trying to make the sentence simpler.  (Sorry I wasn't sure which part was particularly wrong)

>> Source/WebCore/fileapi/File.h:105
>> +    const FileMetadata m_snapshotMetadata;
> 
> This might be duplicate. We could probable get the metadata directly out of the blob data. It seems that we can use a boolean flag to denote a snapshot file case.

I changed the code  to use Blob::m_size and only cache snapshot modificationTime.
Comment 19 Build Bot 2012-05-15 05:54:16 PDT
Comment on attachment 141933 [details]
Patch

Attachment 141933 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/12685746
Comment 20 Kinuko Yasuda 2012-05-15 05:59:54 PDT
Created attachment 141940 [details]
Patch
Comment 21 Jian Li 2012-05-15 16:40:27 PDT
Comment on attachment 141940 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=141940&action=review

> Source/WebCore/fileapi/File.cpp:68
> +        contentType = MIMETypeRegistry::getWellKnownMIMETypeForExtension(fileSystemName.substring(index + 1));

Can this content type extraction code be shared among all these 3 static methods?

> Source/WebCore/fileapi/File.cpp:146
> +    if (m_size >= 0)

Though this logic works, I think it is inclined to mistakes in the future since m_size is indeed defined and initialized in Blob. Suppose someone changes the default value for m_size or assign some value for m_size in Blob implementation, we will return m_size without doing query for normal filesystem files. I think it is safer to explicitly define a boolean flag in File to mean that a snapshot is provided. Or, can we solely use m_snapshotModificationTime to find out if the snapshot is valid?

> Source/WebCore/fileapi/File.h:30
> +#include "FileMetadata.h"

It seems that we can use forward declaration instead.

> Source/WebCore/fileapi/File.h:57
> +    // For filesystem files. It takes metadata argument, which may suppress future metadata query on this File object if it has valid information (i.e. non-zero modificationTime and/or non-negative file length). The metadata argument must be given carefully only when it is infeasible to query file metadata on the fly (e.g. file is on remote filesystem).

Maybe simpler and easier to understand if this is rephrased to something like:
  If filesystem files lives in the remote filesystem, the port might pass the valid metadata (non-zero modificationTime and/or non-negative file size) and cache it in the File object.

> LayoutTests/ChangeLog:9
> +        * fast/filesystem/file-read-after-write.html: Added.

This tests seems to purely verify that the metadata is not cached, right?

> LayoutTests/fast/filesystem/file-metadata-after-write.html:59
> +function test(expect, actual)

Can we rely on those test utils provided by js-test-pre.js?
Comment 22 Kinuko Yasuda 2012-05-15 22:50:13 PDT
Created attachment 142155 [details]
Patch
Comment 23 Kinuko Yasuda 2012-05-15 22:56:16 PDT
Comment on attachment 141940 [details]
Patch

Thanks, I updated the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=141940&action=review

>> Source/WebCore/fileapi/File.cpp:68
>> +        contentType = MIMETypeRegistry::getWellKnownMIMETypeForExtension(fileSystemName.substring(index + 1));
> 
> Can this content type extraction code be shared among all these 3 static methods?

Done.

>> Source/WebCore/fileapi/File.cpp:146
>> +    if (m_size >= 0)
> 
> Though this logic works, I think it is inclined to mistakes in the future since m_size is indeed defined and initialized in Blob. Suppose someone changes the default value for m_size or assign some value for m_size in Blob implementation, we will return m_size without doing query for normal filesystem files. I think it is safer to explicitly define a boolean flag in File to mean that a snapshot is provided. Or, can we solely use m_snapshotModificationTime to find out if the snapshot is valid?

Good point.  Adding a flag sounds good, but it still may not stop Blob to assign some value in the future...
To make it explicit and parallel to snapshotModificationTime I just added snapshotSize field, assuming 64-bit increase per File object will be ok.

>> Source/WebCore/fileapi/File.h:30
>> +#include "FileMetadata.h"
> 
> It seems that we can use forward declaration instead.

Done.

>> Source/WebCore/fileapi/File.h:57
>> +    // For filesystem files. It takes metadata argument, which may suppress future metadata query on this File object if it has valid information (i.e. non-zero modificationTime and/or non-negative file length). The metadata argument must be given carefully only when it is infeasible to query file metadata on the fly (e.g. file is on remote filesystem).
> 
> Maybe simpler and easier to understand if this is rephrased to something like:
>   If filesystem files lives in the remote filesystem, the port might pass the valid metadata (non-zero modificationTime and/or non-negative file size) and cache it in the File object.

Sounds good!  Updated.

>> LayoutTests/ChangeLog:9
>> +        * fast/filesystem/file-read-after-write.html: Added.
> 
> This tests seems to purely verify that the metadata is not cached, right?

Yes.  I added the explanation in the ChangeLog.

>> LayoutTests/fast/filesystem/file-metadata-after-write.html:59
>> +function test(expect, actual)
> 
> Can we rely on those test utils provided by js-test-pre.js?

This is a tiny piece of code and I prefer making this test standalone, but if using it is still a recommended way I'll follow.
Comment 24 Kinuko Yasuda 2012-05-16 22:47:19 PDT
Created attachment 142414 [details]
Patch
Comment 25 Kinuko Yasuda 2012-05-16 22:50:12 PDT
Comment on attachment 141940 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=141940&action=review

>>> LayoutTests/fast/filesystem/file-metadata-after-write.html:59
>>> +function test(expect, actual)
>> 
>> Can we rely on those test utils provided by js-test-pre.js?
> 
> This is a tiny piece of code and I prefer making this test standalone, but if using it is still a recommended way I'll follow.

On the second thought using existing function would be always nice, I updated the patch to use js-test-pre.js.
Comment 26 Jian Li 2012-05-17 00:22:02 PDT
Comment on attachment 142414 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=142414&action=review

> Source/WebCore/fileapi/File.cpp:163
> +    if (m_snapshotSize >= 0 && m_snapshotModificationTime) {

This seems a bit inconsistent w/ if checks in lastModifiedDate() and size(). Your comment below says valid metadata is meant by non-zero modificationTime and/or non-negative file size. Can we have "or" scenario? If yes, please change && to ||. Otherwise, please always use "m_snapshotSize >= 0 && m_snapshotModificationTime" for all 3 checks.

> Source/WebCore/fileapi/File.h:57
> +    // If filesystem files live in the remote filesystem, the port might pass the valid metdata (non-zero modificationTime and/or non-negative file size) and cache in the File object.

typo: metdata

> Source/WebCore/fileapi/File.h:102
> +#if ENABLE(FILE_SYSTEM)

Please comment that what scenario these 2 members should be used. Also please illustrate the default values.

> LayoutTests/fast/filesystem/file-metadata-after-write-expected.txt:1
> +Writing 1234567890 to the file...

Better to output a line of description to explain what we want to test here:
  Test that metadata is not cached in the regular temporary filesystem.

> LayoutTests/fast/filesystem/file-metadata-after-write.html:84
> +      layoutTestController.notifyDone();

nit: wrong indenting.

> LayoutTests/fast/filesystem/file-metadata-after-write.html:90
> +

nit: extra line
Comment 27 Kinuko Yasuda 2012-05-17 00:40:11 PDT
Comment on attachment 142414 [details]
Patch

Thanks!

View in context: https://bugs.webkit.org/attachment.cgi?id=142414&action=review

>> Source/WebCore/fileapi/File.cpp:163
>> +    if (m_snapshotSize >= 0 && m_snapshotModificationTime) {
> 
> This seems a bit inconsistent w/ if checks in lastModifiedDate() and size(). Your comment below says valid metadata is meant by non-zero modificationTime and/or non-negative file size. Can we have "or" scenario? If yes, please change && to ||. Otherwise, please always use "m_snapshotSize >= 0 && m_snapshotModificationTime" for all 3 checks.

By the comment I meant we check two values with '&&' for snapshot but only check each value in size()/lastModifiedDate().

For better consistency I'll change the code to always check with '&&' in all cases.
Comment 28 Kinuko Yasuda 2012-05-17 05:07:15 PDT
Committed r117432: <http://trac.webkit.org/changeset/117432>
Comment 29 WebKit Review Bot 2012-05-17 08:51:19 PDT
Re-opened since this is blocked by 86745
Comment 30 Kinuko Yasuda 2012-05-17 09:53:34 PDT
Looks ok now.