Bug 119477

Summary: [GStreamer] Store preloaded media in webkit's cache
Product: WebKit Reporter: Philippe Normand <pnormand>
Component: WebKitGTKAssignee: Enrique Ocaña <eocanha>
Status: RESOLVED FIXED    
Severity: Normal CC: aperez, bugs-noreply, calvaris, cgarcia, clopez, commit-queue, cristian.ciupitu, csaavedra, eocanha, ht990332, magomez, mcatanzaro, slomo
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugzilla.gnome.org/show_bug.cgi?id=755280
https://bugzilla.gnome.org/show_bug.cgi?id=761035
https://bugs.webkit.org/show_bug.cgi?id=156369
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Philippe Normand 2013-08-04 02:10:13 PDT
Right now preloaded videos go to ~/.cache but they should be stored in the webkit's cache folder instead.

The cache location can be configured using queue2::temp-location property but I'm not sure yet how to access the right queue2 element at runtime. Some investigation is needed :)
Comment 1 Sebastian Dröge (slomo) 2013-08-14 02:07:23 PDT
It currently is impossible to do that properly, best solution would probably be to expose that queue2 property on uridecodebin and playbin.
Comment 2 Carlos Garcia Campos 2015-09-21 06:16:07 PDT
Is that cache permanent? I mean, can it be reused among sessions? Otherwise I don't think it should be stored in the home dir at all, but in /tmp
Comment 3 Claudio Saavedra 2015-09-21 06:17:49 PDT
Carlos nailed it. If it cannot be reused between sessions then it doesn't belong in .cache.
Comment 4 Sebastian Dröge (slomo) 2015-09-21 06:38:58 PDT
It can be used between sessions if webkit gives an appropriate name to the files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main problem with that is that some people put /tmp on a tmpfs... so if you want to watch a long enough movie, your memory runs full.
Comment 5 Michael Catanzaro 2015-09-21 06:46:26 PDT
(In reply to comment #4)
> It can be used between sessions if webkit gives an appropriate name to the
> files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> problem with that is that some people put /tmp on a tmpfs... so if you want
> to watch a long enough movie, your memory runs full.

/tmp on tmpfs has been default in Fedora for several years now... I was amazed quite recently to learn that other distros are still not doing this. Anyway, the videos would have to go in /var/tmp since they are big. But /var/tmp is not cleaned automatically, so there needs to be some measure to clean up leaked videos in the event of a web process crash.
Comment 6 Philippe Normand 2015-09-21 08:58:22 PDT
The UIProcess is notified when the WebProcess crashes, so perhaps we could do the clean-up at that moment.
Comment 7 Sebastian Dröge (slomo) 2015-09-22 02:13:58 PDT
Or you could make the videos actually cacheable so they can be reused in future sessions until cleared from the cache ;)


Cleaning up from the webprocess crashes would mean that you need to have a way to identify which files belong to the pid of the crashed process
Comment 8 Michael Catanzaro 2016-01-24 06:59:41 PST
Note that we've gotten multiple reports of ~/.cache rising to excessive size; apparently there is no way to clean up cached video files if there is a web process crash.
Comment 9 Michael Catanzaro 2016-12-16 13:23:47 PST
CCing some media folks... how do we want to go about solving this?

Anything cached should go under WebKitWebsiteDataManager's base-cache-dir if it's persistent, or /var/tmp otherwise.

Note that we have to be robust to UI process crashes as well. That happens. We are already careful to never leak in the normal disk cache.
Comment 10 Michael Catanzaro 2016-12-22 12:52:54 PST
Probably bug #156369 needs to be fixed at the same time as this.
Comment 11 Michael Catanzaro 2016-12-22 12:53:29 PST
(Changing component just so that we can find this bug again.)
Comment 12 Carlos Alberto Lopez Perez 2016-12-23 09:42:47 PST
(In reply to comment #5)
> (In reply to comment #4)
> > It can be used between sessions if webkit gives an appropriate name to the
> > files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> > problem with that is that some people put /tmp on a tmpfs... so if you want
> > to watch a long enough movie, your memory runs full.
> 
> /tmp on tmpfs has been default in Fedora for several years now... I was
> amazed quite recently to learn that other distros are still not doing this.
> Anyway, the videos would have to go in /var/tmp since they are big. But
> /var/tmp is not cleaned automatically, so there needs to be some measure to
> clean up leaked videos in the event of a web process crash.

I suggest removing the file just after opening it for write.

That way you can continue to use the file for reading or writing to it, by passing the file descriptor id.

The kernel will automatically remove the file once the descriptor id is closed or the process holding the file descriptor id dies (crashes or exits)

So you not longer have to take care of cleaning the files. The file will be automatically cleaned when you not longer use it.
Comment 13 Carlos Alberto Lopez Perez 2016-12-23 09:59:15 PST
(In reply to comment #5)
> (In reply to comment #4)
> > It can be used between sessions if webkit gives an appropriate name to the
> > files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> > problem with that is that some people put /tmp on a tmpfs... so if you want
> > to watch a long enough movie, your memory runs full.
> 
> /tmp on tmpfs has been default in Fedora for several years now... 

Personally, I don't think /tmp on tmpfs is a good idea.

In the worst case (you have configured your system without swap) you end wasting precious memory space for tmp files. Which is nuts.

In the best case, you end wasting swap space and risking incurring in more swappiness related freezes than otherwise you would have.

/tmp on tmpfs supporters main argument was they were worried about incrementing their SSD wearing level more than needed. This has been exaggerated a lot. I still have to find anyone that has nuked their SSD because of not having /tmp on tmpfs.

Related:
https://lists.debian.org/debian-devel/2012/05/msg01092.html
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=666096
Comment 14 Enrique Ocaña 2016-12-26 04:08:13 PST
(In reply to comment #5)
> (In reply to comment #4)
> > It can be used between sessions if webkit gives an appropriate name to the
> > files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> > problem with that is that some people put /tmp on a tmpfs... so if you want
> > to watch a long enough movie, your memory runs full.
> 
> /tmp on tmpfs has been default in Fedora for several years now... I was
> amazed quite recently to learn that other distros are still not doing this.
> Anyway, the videos would have to go in /var/tmp since they are big. But
> /var/tmp is not cleaned automatically, so there needs to be some measure to
> clean up leaked videos in the event of a web process crash.

A naive suggestion would be to treat /var/tmp as system storing /tmp on disk treat it. Those systems clean /tmp on boot. My suggestion is to use /var/tmp/${USER}/WebKit-media (or some better name) as "tmp", and not cleaning it at webprocess' death (which can be uncontrolled in the very worst case), but at WebKit start. Each time WebKit starts, it cleans the previous "tmp".

More efficient approaches can be implemented on top of that, but I think this one could be fine as a baseline.
Comment 15 Adrian Perez 2017-01-02 04:49:36 PST
(In reply to comment #12)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > It can be used between sessions if webkit gives an appropriate name to the
> > > files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> > > problem with that is that some people put /tmp on a tmpfs... so if you want
> > > to watch a long enough movie, your memory runs full.
> > 
> > /tmp on tmpfs has been default in Fedora for several years now... I was
> > amazed quite recently to learn that other distros are still not doing this.
> > Anyway, the videos would have to go in /var/tmp since they are big. But
> > /var/tmp is not cleaned automatically, so there needs to be some measure to
> > clean up leaked videos in the event of a web process crash.

This is not necessarily true, many (most?) GNU/Linux distributions clean
“/var/tmp”:

  % grep '/var/tmp ' /usr/lib/tmpfiles.d/tmp.conf
  q /var/tmp 1777 root root 30d
  %

Still, using “/var/tmp” is not a good option because many systems are known
*not to* clean it by default ({Free,Open,Net}BSD), and applications are
expected to clean up after themselves when creating files under ”/var/tmp”.

> I suggest removing the file just after opening it for write.

+1

If we can make GStreamer happy without having named files, passing the FD
around is the absolute best option.

> That way you can continue to use the file for reading or writing to it, by
> passing the file descriptor id.
> 
> The kernel will automatically remove the file once the descriptor id is
> closed or the process holding the file descriptor id dies (crashes or exits)
> 
> So you not longer have to take care of cleaning the files. The file will be
> automatically cleaned when you not longer use it.

As an additional small side bonus unlinking the file after opening causes less name space pollution in the directory used for temporary files. Which reduces
contention for creating new temporary files — anyway this is most likely 
irrelevant for web browsing, but still nice :)
Comment 16 Michael Catanzaro 2017-01-02 08:12:52 PST
(In reply to comment #15)
> This is not necessarily true, many (most?) GNU/Linux distributions clean
> “/var/tmp”:

You're right, systemd does this, even when the GNOME option to clean temporary files is disabled....
Comment 17 Enrique Ocaña 2017-01-31 11:59:38 PST
Today I tried to solve this by setting the temp-remove[1] property of GstDownloadBuffer to true, only to find that it's already true by default and that the "delete on READY" statement in the documentation doesnt refer to the NULL -> READY -> PAUSED -> PLAYING transition (ie: delete after creation), but to the PLAYING -> PAUSED -> READY -> NULL one (ie: delete on destruction, the current behaviour).

I guess I'll have to try a different approach and delete the file by myself.


[1] https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer-plugins/html/gstreamer-plugins-downloadbuffer.html#GstDownloadBuffer--temp-remove
Comment 18 Enrique Ocaña 2017-02-01 09:41:20 PST
Created attachment 300331 [details]
Patch
Comment 19 Michael Catanzaro 2017-02-01 10:04:23 PST
LGTM. Would be good for a GStreamer reviewer to review it. Please add it to https://trac.webkit.org/wiki/WebKitGTK/2.14.x after committing.
Comment 20 Carlos Garcia Campos 2017-02-01 10:29:48 PST
Comment on attachment 300331 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300331&action=review

> Source/WebCore/ChangeLog:13
> +        Files cached on disk by MediaPlayerPrivateGStreamer are deleted only when the player is closed. If the
> +        WebProcess crashed, they're just left there in the cache directory. This patch changes the location
> +        of those temporary files to a proper temporary directory (/var/tmp, as those files aren't actually
> +        reusable, so they don't belong to a cache directory, and /tmp is a bad place because it's RAM-based on
> +        some distros), unlinks (deletes) them right after creation and also deletes any other stalled temporary
> +        file on the old legacy cache directory.

Wouldn't it be a problem if /var is in a different partition? I'm not sure if the solution to leaked files is placing them in /tmp to be honest. If we can decide the folder, then we can use a specific one in the user cache dir, and clean it up at startup, for example.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1348
> +    if (!g_strcmp0(className, "GstDownloadBuffer")) {

Early return here instead.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1350
> +        g_signal_handlers_disconnect_by_func(bin, gpointer(uriDecodeBinElementAddedCallback), player);

Use C++ style casts

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1358
> +            processName = "GStreamer";

shouldn't it be WebKitWebProcess by default?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1360
> +        GUniquePtr<gchar> newDownloadTemplate(g_strdup_printf("/var/tmp/%s-XXXXXX", processName));

Use g_build_filename

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
> +        player->purgeOldDownloadFiles(oldDownloadTemplate.get());

So, if we purge the old files, then we don't really need to use /var/tmp, no?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1372
> +    g_signal_handlers_disconnect_by_func(player->m_downloadBuffer.get(), gpointer(downloadBufferFileCreatedCallback), player);

Use C++ style cast

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1379
> +    GUniqueOutPtr<gchar> downloadFile;
> +    g_object_get(player->m_downloadBuffer.get(), "temp-location", &downloadFile.outPtr(), nullptr);
> +    if (g_unlink(downloadFile.get()) == -1)
> +        GST_WARNING("Couldn't unlink media temporary file %s after creation", downloadFile.get());
> +    else
> +        GST_TRACE("Unlinked media temporary file %s after creation", downloadFile.get());

Do we need a signal for this? Doesn't gst changes it when g_object_set(element, "temp-template", synchronously?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1411
> +    GDir* directory = g_dir_open(templatePath.get(), 0, nullptr);
> +
> +    if (!directory)
> +        return;
> +
> +    for (const gchar* fileName = g_dir_read_name(directory); fileName; fileName = g_dir_read_name(directory)) {
> +        if (g_str_has_prefix(fileName, templateFile.get())) {
> +            GUniquePtr<gchar> filePath(g_build_filename(templatePath.get(), fileName, nullptr));
> +            if (g_unlink(filePath.get()) == -1)
> +                GST_WARNING("Couldn't unlink legacy media temporary file: %s", filePath.get());
> +            else
> +                GST_TRACE("Unlinked legacy media temporary file: %s", filePath.get());
> +        }
> +    }

You can simplify this by using FileSystem::listDirectory(), you can pass a regexp to mathc the files directly

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1418
>      m_source.clear();

You should disconnect previous signals here and in the destructor I guess, since you are disconnecting only if the signals was emitted.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.h:249
> +    GRefPtr<GstElement> m_downloadBuffer;

Do we really want to keepo this buffer object alive for the whole life of the media platyer? Or can we release it at some point?
Comment 21 Michael Catanzaro 2017-02-01 10:48:34 PST
(In reply to comment #20)
> Wouldn't it be a problem if /var is in a different partition?

Why would that be a problem?
Comment 22 Enrique Ocaña 2017-02-01 12:25:37 PST
Comment on attachment 300331 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300331&action=review

Thanks for the rest of the comments I'm not replying, they're really useful.

>> Source/WebCore/ChangeLog:13
>> +        file on the old legacy cache directory.
> 
> Wouldn't it be a problem if /var is in a different partition? I'm not sure if the solution to leaked files is placing them in /tmp to be honest. If we can decide the folder, then we can use a specific one in the user cache dir, and clean it up at startup, for example.

I don't think GstDownloadBuffer is ready to reuse the media files between sessions with its current design, so as per https://bugs.webkit.org/show_bug.cgi?id=119477#c2 and https://bugs.webkit.org/show_bug.cgi?id=119477#c3 the files shouldn't be considered "cache" but temporary files. They can be huge (eg: 2GB or similar movie sizes), so they aren't appropriate for /tmp (because of some distros using tmpfs) but they're appropriate for /var/tmp as per https://bugs.webkit.org/show_bug.cgi?id=119477#c5

I considered /var/tmp as the consensus solution for storing this kind of files. If it's not the consensus solution, then discuss it first among yourselves before requesting an implementation.

Having /var in a different partition is as problematic as having /home or /tmp in a different partition too. If the former hasn't been a problem until now, using /var shouldn't be a problem too.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1358
>> +            processName = "GStreamer";
> 
> shouldn't it be WebKitWebProcess by default?

That's what g_get_prgname() should return, yes. In the very unlikely case that it returns NULL, I'm just defaulting to the same value that GstUriDecodeBin would:

https://github.com/GStreamer/gst-plugins-base/blob/1.8.0/gst/playback/gsturidecodebin.c#L1951

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
>> +        player->purgeOldDownloadFiles(oldDownloadTemplate.get());
> 
> So, if we purge the old files, then we don't really need to use /var/tmp, no?

The "$HOME/.cache" vs. "/var/tmp" discussion was about if the files were considered as "cache" or as "temporary". It's the second, so they belong to a temporary directory.

This purging is here just for legacy files generated by WebKit prior to this patch. Michael told me that leaking those legacy files wasn't unacceptable. That's why I added the purge method.

Ideally, when all the users have migrated to a WebKit version having this patch, the purge shouldn't be needed anymore and might be removed.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1379
>> +        GST_TRACE("Unlinked media temporary file %s after creation", downloadFile.get());
> 
> Do we need a signal for this? Doesn't gst changes it when g_object_set(element, "temp-template", synchronously?

I don't understand this comment. GstDownloadBuffer was already removing the file when the element was destroyed (when destroying the player). That's not enough for us. We want to delete the file immediately after it's created.

The only way I found to get notified of the file creation is listening to changes on the temp-location property, because it's updated with the final (random) filename right after the file is created. "temp-template" is just a template. The file isn't created when you set it, but when GstDownloadBuffer decides that it's time to create the file.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.h:249
>> +    GRefPtr<GstElement> m_downloadBuffer;
> 
> Do we really want to keepo this buffer object alive for the whole life of the media platyer? Or can we release it at some point?

Right, I should have set it to null in downloadBufferFileCreatedCallback(), when it's not needed anymore.
Comment 23 Carlos Garcia Campos 2017-02-01 22:32:12 PST
(In reply to comment #21)
> (In reply to comment #20)
> > Wouldn't it be a problem if /var is in a different partition?
> 
> Why would that be a problem?

I don't know that's why I'm asking :-)
Comment 24 Carlos Garcia Campos 2017-02-01 22:46:55 PST
(In reply to comment #22)
> Comment on attachment 300331 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=300331&action=review
> 
> Thanks for the rest of the comments I'm not replying, they're really useful.
> 
> >> Source/WebCore/ChangeLog:13
> >> +        file on the old legacy cache directory.
> > 
> > Wouldn't it be a problem if /var is in a different partition? I'm not sure if the solution to leaked files is placing them in /tmp to be honest. If we can decide the folder, then we can use a specific one in the user cache dir, and clean it up at startup, for example.
> 
> I don't think GstDownloadBuffer is ready to reuse the media files between
> sessions with its current design, so as per
> https://bugs.webkit.org/show_bug.cgi?id=119477#c2 and
> https://bugs.webkit.org/show_bug.cgi?id=119477#c3 the files shouldn't be
> considered "cache" but temporary files. They can be huge (eg: 2GB or similar
> movie sizes), so they aren't appropriate for /tmp (because of some distros
> using tmpfs) but they're appropriate for /var/tmp as per
> https://bugs.webkit.org/show_bug.cgi?id=119477#c5
> 
> I considered /var/tmp as the consensus solution for storing this kind of
> files. If it's not the consensus solution, then discuss it first among
> yourselves before requesting an implementation.

Ok, my fault, I tried to help reviewing this, but I didn't read the discussion first, there's even a comment by me suggesting not to use home cache dir :-P

> Having /var in a different partition is as problematic as having /home or
> /tmp in a different partition too. If the former hasn't been a problem until
> now, using /var shouldn't be a problem too.

Yes, sure, but it's more unlikely. Anyway, I only asked. I know, for example, that we write the partial download files in the same download directory instead of /tmp to avoid copies between partitions.

> >> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1358
> >> +            processName = "GStreamer";
> > 
> > shouldn't it be WebKitWebProcess by default?
> 
> That's what g_get_prgname() should return, yes. In the very unlikely case
> that it returns NULL, I'm just defaulting to the same value that
> GstUriDecodeBin would:
> 
> https://github.com/GStreamer/gst-plugins-base/blob/1.8.0/gst/playback/
> gsturidecodebin.c#L1951

Because gst can't know in which process is running, but we know it.

> >> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
> >> +        player->purgeOldDownloadFiles(oldDownloadTemplate.get());
> > 
> > So, if we purge the old files, then we don't really need to use /var/tmp, no?
> 
> The "$HOME/.cache" vs. "/var/tmp" discussion was about if the files were
> considered as "cache" or as "temporary". It's the second, so they belong to
> a temporary directory.

Right, ok.

> This purging is here just for legacy files generated by WebKit prior to this
> patch. Michael told me that leaking those legacy files wasn't unacceptable.
> That's why I added the purge method.

Ah, I see, I wonder if we should purge also leaked files from previous sessions too. That would solve the problem for systems where /var/tmp is not cleaned, or people who hardly ever reboot. Btw, I have old files in my /var/tmp so I'm not sure systemd is cleaning it up, and I reboot every day. All of them are actually empty dirs though.

> Ideally, when all the users have migrated to a WebKit version having this
> patch, the purge shouldn't be needed anymore and might be removed.
> 
> >> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1379
> >> +        GST_TRACE("Unlinked media temporary file %s after creation", downloadFile.get());
> > 
> > Do we need a signal for this? Doesn't gst changes it when g_object_set(element, "temp-template", synchronously?
> 
> I don't understand this comment. GstDownloadBuffer was already removing the
> file when the element was destroyed (when destroying the player). That's not
> enough for us. We want to delete the file immediately after it's created.
> 
> The only way I found to get notified of the file creation is listening to
> changes on the temp-location property, because it's updated with the final
> (random) filename right after the file is created. "temp-template" is just a
> template. The file isn't created when you set it, but when GstDownloadBuffer
> decides that it's time to create the file.

Ok, understood. I was wondering if setting the template would set the location too, but it doesn't.

> >> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.h:249
> >> +    GRefPtr<GstElement> m_downloadBuffer;
> > 
> > Do we really want to keepo this buffer object alive for the whole life of the media platyer? Or can we release it at some point?
> 
> Right, I should have set it to null in downloadBufferFileCreatedCallback(),
> when it's not needed anymore.
Comment 25 Xabier Rodríguez Calvar 2017-02-02 03:44:44 PST
Comment on attachment 300331 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300331&action=review

Lots of comments and suggestions, let's see this patch again.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1348
>> +    const gchar* className = G_OBJECT_CLASS_NAME(G_OBJECT_GET_CLASS(G_OBJECT(element)));
>> +    if (!g_strcmp0(className, "GstDownloadBuffer")) {
> 
> Early return here instead.

Remove the variable because it is only used at the if. Besides, type should have been const char*.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1353
> +        GUniqueOutPtr<gchar> oldDownloadTemplate;

char

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1356
> +        const gchar* processName = g_get_prgname();

char

>>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1358
>>>> +            processName = "GStreamer";
>>> 
>>> shouldn't it be WebKitWebProcess by default?
>> 
>> That's what g_get_prgname() should return, yes. In the very unlikely case that it returns NULL, I'm just defaulting to the same value that GstUriDecodeBin would:
>> 
>> https://github.com/GStreamer/gst-plugins-base/blob/1.8.0/gst/playback/gsturidecodebin.c#L1951
> 
> Because gst can't know in which process is running, but we know it.

I am with Carlos here. We don't need the process name actually to name the temp file. I'd go for something like WebKit-Media-xxxxxx. This would be positive for not having to rename in case in the future we can do these downloads thru the network process.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1360
>> +        GUniquePtr<gchar> newDownloadTemplate(g_strdup_printf("/var/tmp/%s-XXXXXX", processName));
> 
> Use g_build_filename

char.

Another thing is that I think we should ensure the files are created in a different directory for each user and ensure that directory has the proper permissions so that it cannot be access by anyone else other than the user. I think media people watch online can vary a lot from cats to other things. I'd write a function to return the template and ensure it is placed in the proper directory.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1362
> +        GST_TRACE("Reconfiguring file download template from '%s' to '%s'\n", oldDownloadTemplate.get(), newDownloadTemplate.get());

The \n is not advisable.

>>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
>>>> +        player->purgeOldDownloadFiles(oldDownloadTemplate.get());
>>> 
>>> So, if we purge the old files, then we don't really need to use /var/tmp, no?
>> 
>> The "$HOME/.cache" vs. "/var/tmp" discussion was about if the files were considered as "cache" or as "temporary". It's the second, so they belong to a temporary directory.
>> 
>> This purging is here just for legacy files generated by WebKit prior to this patch. Michael told me that leaking those legacy files wasn't unacceptable. That's why I added the purge method.
>> 
>> Ideally, when all the users have migrated to a WebKit version having this patch, the purge shouldn't be needed anymore and might be removed.
> 
> Right, ok.

Yep, this will purge the old files. I have a concern though for the files that fall under the GST_WARNING of line 1377 of not being able to unlink. Should we assume that is is very unlikely and forget (and trust distros to purge this from time to time)? Btw, if it is so unlikely, we can tag the decission as UNLIKELY().

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1374
> +    GUniqueOutPtr<gchar> downloadFile;

char

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1382
> +void MediaPlayerPrivateGStreamer::purgeOldDownloadFiles(const gchar* downloadFileTemplate)

char

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1388
> +    GUniquePtr<gchar> templatePath(g_path_get_dirname(downloadFileTemplate));
> +    GUniquePtr<gchar> templateFile(g_path_get_basename(downloadFileTemplate));

char

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1391
> +    gchar* found = g_strrstr(templateFile.get(), "-XX");

char

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1393
> +    if (!found)

This might be UNLIKELY too.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1405
> +    for (const gchar* fileName = g_dir_read_name(directory); fileName; fileName = g_dir_read_name(directory)) {
> +        if (g_str_has_prefix(fileName, templateFile.get())) {
> +            GUniquePtr<gchar> filePath(g_build_filename(templatePath.get(), fileName, nullptr));

If you use FileSystem::listDirectory you won't probably use this, but char.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.h:171
> +    void purgeOldDownloadFiles(const gchar* downloadFileTemplate);

char, remove parameter name
Comment 26 Enrique Ocaña 2017-02-02 04:37:59 PST
Comment on attachment 300331 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300331&action=review

>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1360
>>> +        GUniquePtr<gchar> newDownloadTemplate(g_strdup_printf("/var/tmp/%s-XXXXXX", processName));
>> 
>> Use g_build_filename
> 
> char.
> 
> Another thing is that I think we should ensure the files are created in a different directory for each user and ensure that directory has the proper permissions so that it cannot be access by anyone else other than the user. I think media people watch online can vary a lot from cats to other things. I'd write a function to return the template and ensure it is placed in the proper directory.

This is tricky and, IMHO, not needed.

Having and extra directory under /var/tmp means that we would have to maintain (delete) that directory by ourselves. This is problematic because we can't use the elegant trick of deleting the file right after its creation to delete the directory. If several videos are being played at once by different WebProcesses a race condition can happen:

- Process A creates /var/tmp/$USER
- Process A creates WebKit-Media-123456 there
- Process B tries to create /var/tmp/$USER (but it's already created, no problem)
- Process A deletes WebKit-Media-123456 there and /var/tmp/$USER
- Process B creates WebKit-Media-ABCDEF there but... the directory doesn't exist anymore. ERROR!

Anyway, the directory itself isn't needed because the new temporary files are now deleted right after their creation. This means that no other processes can open them anymore, so there shouldn't be any concern regarding privacy. I've checked it by myself with a "while true; do ls /var/tmp; done" script.

>>>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
>>>>> +        player->purgeOldDownloadFiles(oldDownloadTemplate.get());
>>>> 
>>>> So, if we purge the old files, then we don't really need to use /var/tmp, no?
>>> 
>>> The "$HOME/.cache" vs. "/var/tmp" discussion was about if the files were considered as "cache" or as "temporary". It's the second, so they belong to a temporary directory.
>>> 
>>> This purging is here just for legacy files generated by WebKit prior to this patch. Michael told me that leaking those legacy files wasn't unacceptable. That's why I added the purge method.
>>> 
>>> Ideally, when all the users have migrated to a WebKit version having this patch, the purge shouldn't be needed anymore and might be removed.
>> 
>> Right, ok.
> 
> Yep, this will purge the old files. I have a concern though for the files that fall under the GST_WARNING of line 1377 of not being able to unlink. Should we assume that is is very unlikely and forget (and trust distros to purge this from time to time)? Btw, if it is so unlikely, we can tag the decission as UNLIKELY().

Michael would say that WebKit should try hard to clean its own files. I don't think distros are going to mess with the contents of the user home dir, even for the .cache dir. Thanks for the UNLIKELY() suggestion, btw.
Comment 27 Enrique Ocaña 2017-02-02 04:49:20 PST
Created attachment 300398 [details]
Patch
Comment 28 Carlos Garcia Campos 2017-02-02 06:50:13 PST
Comment on attachment 300398 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300398&action=review

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:40
> +#include <WebCore/FileSystem.h>

Don't use <> since you are in WebCore.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:203
> +    if (m_source && WEBKIT_IS_WEB_SRC(m_source.get()))

G_TYPE_CHECK_INSTANCE_TYPE already null checks the pointer.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:204
> +        g_signal_handlers_disconnect_by_func(GST_ELEMENT_PARENT(m_source.get()), reinterpret_cast<gpointer>(uriDecodeBinElementAddedCallback), this);

What if it has been disconnected before? I think it would be better to save the handler id and use g_signal_handler_disconnect() and invalidate the id.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1350
> +    if (g_strcmp0(G_OBJECT_CLASS_NAME(G_OBJECT_GET_CLASS(G_OBJECT(element))), "GstDownloadBuffer"))

GST_IS_DOWNLOAD_BUFFER?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1395
> +    // Transform "-XXXXXX" into "-??????" to get a file wildcard expression from the template.
> +    for (char* p = &templateFile.get()[strlen(templateFile.get()) - 1]; p >= templateFile.get() && *p == 'X'; --p)
> +        *p = '?';

I'm not sure I understand this, maybe you can simply do g_strdelimit (templateFile.get(), "X", '?');

Can't you pass something like "*-XXXXXX*" as pattern to listDirectory?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1397
> +    for (auto filePath : listDirectory(templatePath.get(), templateFile.get())) {

auto&

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1409
> +    if (m_source && WEBKIT_IS_WEB_SRC(m_source.get()))

G_TYPE_CHECK_INSTANCE_TYPE already null checks the pointer.
Comment 29 Michael Catanzaro 2017-02-02 07:04:32 PST
(In reply to comment #25)
> Another thing is that I think we should ensure the files are created in a
> different directory for each user and ensure that directory has the proper
> permissions so that it cannot be access by anyone else other than the user.
> I think media people watch online can vary a lot from cats to other things.
> I'd write a function to return the template and ensure it is placed in the
> proper directory.

It should be sufficient to just create the files with 600 permission, right?
Comment 30 Xabier Rodríguez Calvar 2017-02-02 07:05:50 PST
Comment on attachment 300398 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300398&action=review

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:203
> +    if (m_source && WEBKIT_IS_WEB_SRC(m_source.get()))

You don't need the m_source && as WEBKIT_IS_WEB_SRC checks for nulls as expected.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1362
> +    GUniqueOutPtr<char> oldDownloadTemplate;
> +    g_object_get(element, "temp-template", &oldDownloadTemplate.outPtr(), nullptr);
> +
> +    GUniquePtr<char> newDownloadTemplate(g_build_filename(G_DIR_SEPARATOR_S, "var", "tmp", "WebKit-Media-XXXXXX", nullptr));
> +    g_object_set(element, "temp-template", newDownloadTemplate.get(), nullptr);
> +    GST_TRACE("Reconfiguring file download template from '%s' to '%s'", oldDownloadTemplate.get(), newDownloadTemplate.get());

Nit: you can move getting oldDownloadTemplate after setting newDownloadTemplate (it is going to be actually used before the GST_TRACE and specially relevant for the purge).

Another nit: "Reconfigured" is better because the relevant action already happened.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1395
> +    // Transform "-XXXXXX" into "-??????" to get a file wildcard expression from the template.
> +    for (char* p = &templateFile.get()[strlen(templateFile.get()) - 1]; p >= templateFile.get() && *p == 'X'; --p)
> +        *p = '?';

Maybe you can use String::replace instead of doing this (it can take position and length.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1409
> +    if (m_source && WEBKIT_IS_WEB_SRC(m_source.get()))

You don't need the m_source && as WEBKIT_IS_WEB_SRC checks for nulls as expected.
Comment 31 Xabier Rodríguez Calvar 2017-02-02 10:35:15 PST
(In reply to comment #26)
> > Another thing is that I think we should ensure the files are created in a different directory for each user and ensure that directory has the proper permissions so that it cannot be access by anyone else other than the user. I think media people watch online can vary a lot from cats to other things. I'd write a function to return the template and ensure it is placed in the proper directory.
> 
> This is tricky and, IMHO, not needed.
> 
> Having and extra directory under /var/tmp means that we would have to
> maintain (delete) that directory by ourselves. This is problematic because
> we can't use the elegant trick of deleting the file right after its creation
> to delete the directory. If several videos are being played at once by
> different WebProcesses a race condition can happen:
> 
> - Process A creates /var/tmp/$USER
> - Process A creates WebKit-Media-123456 there
> - Process B tries to create /var/tmp/$USER (but it's already created, no
> problem)
> - Process A deletes WebKit-Media-123456 there and /var/tmp/$USER
> - Process B creates WebKit-Media-ABCDEF there but... the directory doesn't
> exist anymore. ERROR!
> 
> Anyway, the directory itself isn't needed because the new temporary files
> are now deleted right after their creation. This means that no other
> processes can open them anymore, so there shouldn't be any concern regarding
> privacy. I've checked it by myself with a "while true; do ls /var/tmp; done"
> script.

Well, I don't think deleting the directory is needed, it is just 4k on disk.

> > Yep, this will purge the old files. I have a concern though for the files that fall under the GST_WARNING of line 1377 of not being able to unlink. Should we assume that is is very unlikely and forget (and trust distros to purge this from time to time)? Btw, if it is so unlikely, we can tag the decission as UNLIKELY().
> 
> Michael would say that WebKit should try hard to clean its own files. I
> don't think distros are going to mess with the contents of the user home
> dir, even for the .cache dir. Thanks for the UNLIKELY() suggestion, btw.

I think I wasn't clear here. Line 1377, if the regular unlink fails (not the purge one, the regular) we are leaking them (under /var/tmp/).
Comment 32 Xabier Rodríguez Calvar 2017-02-02 10:40:23 PST
Comment on attachment 300398 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300398&action=review

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1362
>> +    GST_TRACE("Reconfiguring file download template from '%s' to '%s'", oldDownloadTemplate.get(), newDownloadTemplate.get());
> 
> Nit: you can move getting oldDownloadTemplate after setting newDownloadTemplate (it is going to be actually used before the GST_TRACE and specially relevant for the purge).
> 
> Another nit: "Reconfigured" is better because the relevant action already happened.

Forget this, it is wrong.
Comment 33 Enrique Ocaña 2017-02-02 11:04:20 PST
Comment on attachment 300398 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300398&action=review

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:204
>> +        g_signal_handlers_disconnect_by_func(GST_ELEMENT_PARENT(m_source.get()), reinterpret_cast<gpointer>(uriDecodeBinElementAddedCallback), this);
> 
> What if it has been disconnected before? I think it would be better to save the handler id and use g_signal_handler_disconnect() and invalidate the id.

If the callback has been disconnected before, then g_signal_handlers_disconnect_matched() (the actual function called by the macro) just returns "0" matches and nothing happens.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1350
>> +    if (g_strcmp0(G_OBJECT_CLASS_NAME(G_OBJECT_GET_CLASS(G_OBJECT(element))), "GstDownloadBuffer"))
> 
> GST_IS_DOWNLOAD_BUFFER?

Sorry, that's not possible. GstDownloadBuffer is an element, and as such, comes from a dynamically loaded plugin. It seems that the gstdownloadbuffer.h header isn't distributed for usage from third party applications. If I can't find it in WebKit/WebKitBuild/DependenciesGTK/Root/include/gstreamer-1.0/gst when using jhbuild, I doubt it's going to be available on regular distros.

>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1395
>>> +        *p = '?';
>> 
>> I'm not sure I understand this, maybe you can simply do g_strdelimit (templateFile.get(), "X", '?');
>> 
>> Can't you pass something like "*-XXXXXX*" as pattern to listDirectory?
> 
> Maybe you can use String::replace instead of doing this (it can take position and length.

Carlos: It doesn't work like that. "X" isn't an actual letter X, but the character g_mkstemp()[1] uses as wildcard to construct actual random filenames. That character is equivalent to "?" in shell glob syntax.

Now that I think about it, if we're comfident about what the legacy base name was ("WebKitWebProcess-XXXXXX") and are sure that it doesn't have "X" in any other place than in the pattern, we can do a simple character replacement in the base name without fear to screw legitimate "X" letters up.

[1] https://developer.gnome.org/glib/stable/glib-File-Utilities.html#g-mkstemp
Comment 34 Xabier Rodríguez Calvar 2017-02-02 11:10:54 PST
After discussing in private I think the best solution because of permissions is sticking to ~/.cache/ but ensuring deleting and purge of old files. I guess a new patch would be needed.
Comment 35 Carlos Alberto Lopez Perez 2017-02-02 11:14:04 PST
(In reply to comment #34)
> After discussing in private I think the best solution because of permissions
> is sticking to ~/.cache/ but ensuring deleting and purge of old files. I
> guess a new patch would be needed.

Why you can't you apply there the same approach of unlinking the file just after its created? Its simpler to implement and more robust.
Comment 36 Enrique Ocaña 2017-02-02 11:39:53 PST
(In reply to comment #34)
> After discussing in private I think the best solution because of permissions
> is sticking to ~/.cache/ but ensuring deleting and purge of old files. I
> guess a new patch would be needed.

I've checked it (after adding a hack to avoid removing the file, normally external processes can't see it), and the file permissions are "600":

1605111 81156 -rw-------  1 enrique enrique 83103373 Feb  2 19:33 WebKit-Media-VPHGVY

Having this into account, there's no reason why plain /var/tmp is a bad idea, and there are several reasons why it's a good idea (as per the "cache" vs. "temporary" file discussion in previous comments).
Comment 37 Xabier Rodríguez Calvar 2017-02-02 23:59:13 PST
(In reply to comment #35)
> Why you can't you apply there the same approach of unlinking the file just
> after its created? Its simpler to implement and more robust.

Probably I wasn't clear enough but my proposal implied this inside WebKit code the problem was a possible permissions race condition between creation and deletion.(In reply to comment #36)

> (In reply to comment #34)
> I've checked it (after adding a hack to avoid removing the file, normally
> external processes can't see it), and the file permissions are "600":
> 
> 1605111 81156 -rw-------  1 enrique enrique 83103373 Feb  2 19:33
> WebKit-Media-VPHGVY
> 
> Having this into account, there's no reason why plain /var/tmp is a bad
> idea, and there are several reasons why it's a good idea (as per the "cache"
> vs. "temporary" file discussion in previous comments).

Then I think we would be good to go once you fix the less serious comments such as extra null checks, chars replacing, etc.
Comment 38 Enrique Ocaña 2017-02-03 03:04:39 PST
(In reply to comment #31)
> (In reply to comment #26)
> > > Yep, this will purge the old files. I have a concern though for the files that fall under the GST_WARNING of line 1377 of not being able to unlink. Should we assume that is is very unlikely and forget (and trust distros to purge this from time to time)? Btw, if it is so unlikely, we can tag the decission as UNLIKELY().
> > 
> > Michael would say that WebKit should try hard to clean its own files. I
> > don't think distros are going to mess with the contents of the user home
> > dir, even for the .cache dir. Thanks for the UNLIKELY() suggestion, btw.
> 
> I think I wasn't clear here. Line 1377, if the regular unlink fails (not the
> purge one, the regular) we are leaking them (under /var/tmp/).

Trying to purge leaked files in the new /var/tmp location doesn't make sense. If the files couldn't be deleted right after creation in the first place (because of a race condition with permissions, where some malicious (root?) process changes the permissions in the milisecond that the file is visible), then ther's no point on trying to delete the file later: the deletion will also fail for the same reasons it failed initially.

Purging files in the old location still makes sense, because with the old implementation more time passed from file creation to file deletion (eg: 2h until the movie finishes being played) and the probability of a crash or kill (the only reason why the file would have been leaked) was higher.
Comment 39 Enrique Ocaña 2017-02-03 03:19:04 PST
Created attachment 300517 [details]
Patch
Comment 40 WebKit Commit Bot 2017-02-03 04:04:37 PST
Comment on attachment 300517 [details]
Patch

Clearing flags on attachment: 300517

Committed r211627: <http://trac.webkit.org/changeset/211627>
Comment 41 WebKit Commit Bot 2017-02-03 04:04:44 PST
All reviewed patches have been landed.  Closing bug.
Comment 42 Michael Catanzaro 2017-02-03 04:26:54 PST
\o/

Kinda a shame that this went in with a misleading commit name, since the media is most definitely not being stored in WebKit's cache, but oh well.
Comment 43 Carlos Alberto Lopez Perez 2017-02-03 12:37:27 PST
Storing this big media files on the homedir is a potentially performance problem on systems where the home-dir is mounted over a shared network file-system (like happens on many shared computers at places like universities)

I think /var/tmp is the right place for this. This directory will be served from local storage, and never from a network file-system.

Also, the possibility of an attacker opening the file in the millisecond that happens between the file is created and the file is unlinked are pretty low.

Adding that to the fact that videos you watch on your browser is not something that I would consider critical information (like a password).

If you are that kind of person, is better you ensure that you are not sharing your computer with anyone when you watch the videos, rating than expecting WebKitGTK+ to do magic tricks to protect your privacy from the other users of your own computer.

So... I'm happy that in the end the simplest approach of storing the file on /var/tmp and unlink it was committed, and we didn't end creating an over-engineered solution to something I don't think is a real problem

Just my 2 cents.
Comment 44 Philippe Normand 2018-01-09 07:06:39 PST
Comment on attachment 300517 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300517&action=review

> Source/WebCore/ChangeLog:13
> +        some distros), unlinks (deletes) them right after creation and also deletes any other stalled temporary
> +        file on the old legacy cache directory.

The problem with this approach is that if a page contains 2 video element for instance, one player will indirectly hijack the other and we could end-up with some hard-to-debug fallout issues.