Bug 129626 - WebGL slow video to texture
Summary: WebGL slow video to texture
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebGL (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac (Intel) OS X 10.7
: P2 Normal
Assignee: Jer Noble
URL: http://codeflow.org/issues/slow_video...
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2014-03-03 15:03 PST by Florian Bösch
Modified: 2016-09-09 03:37 PDT (History)
12 users (show)

See Also:


Attachments
Patch (16.10 KB, patch)
2015-05-11 10:29 PDT, Jer Noble
no flags Details | Formatted Diff | Diff
Patch (17.34 KB, patch)
2015-05-11 10:47 PDT, Jer Noble
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-mavericks (593.19 KB, application/zip)
2015-05-11 11:41 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews107 for mac-mavericks-wk2 (606.06 KB, application/zip)
2015-05-11 11:48 PDT, Build Bot
no flags Details
Patch (20.40 KB, patch)
2015-05-11 13:11 PDT, Jer Noble
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews101 for mac-mavericks (526.10 KB, application/zip)
2015-05-11 14:04 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews106 for mac-mavericks-wk2 (565.61 KB, application/zip)
2015-05-11 14:10 PDT, Build Bot
no flags Details
Patch (20.40 KB, patch)
2015-05-11 14:53 PDT, Jer Noble
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-mavericks (525.99 KB, application/zip)
2015-05-11 15:46 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews104 for mac-mavericks-wk2 (561.33 KB, application/zip)
2015-05-11 15:52 PDT, Build Bot
no flags Details
Patch (30.52 KB, patch)
2015-08-20 16:58 PDT, Jer Noble
no flags Details | Formatted Diff | Diff
Patch (30.47 KB, patch)
2015-08-21 09:53 PDT, Jer Noble
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews101 for mac-mavericks (632.56 KB, application/zip)
2015-08-21 10:30 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews104 for mac-mavericks-wk2 (620.06 KB, application/zip)
2015-08-21 10:31 PDT, Build Bot
no flags Details
Patch (28.64 KB, patch)
2015-09-25 15:34 PDT, Jer Noble
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-mavericks (733.16 KB, application/zip)
2015-09-25 15:56 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews104 for mac-mavericks-wk2 (664.34 KB, application/zip)
2015-09-25 16:16 PDT, Build Bot
no flags Details
Patch (28.69 KB, patch)
2015-09-25 23:23 PDT, Jer Noble
no flags Details | Formatted Diff | Diff
Patch (28.64 KB, patch)
2015-09-28 13:11 PDT, Jer Noble
dino: review+
Details | Formatted Diff | Diff
Patch for landing (28.51 KB, patch)
2015-11-18 16:40 PST, Jer Noble
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Bösch 2014-03-03 15:03:20 PST
The performance of texImage2D and texSubImage2D calls passing a video is very slow (+4ms) for most video sizes (+240p).

The linked in test can record the timings these uploads and the framerate, and it contains historical measurements across a variety of browsers and platforms.
Comment 1 Florian Bösch 2014-08-09 03:54:58 PDT
Retested with Chrome 36.0.1985.125, Firefox Aurora 33.0a2, Safari 6.1.4 and Internet Explorer 11

TL;DR Firefox now outperforms Chrome by around 600% on OSX and Linux. Safari has the worst performance and worst video format support.

Summary of result tables, for full results please see: http://codeflow.org/issues/slow_video_to_texture/

Ubuntu:
			FPS	UPms	FPS	UPms	FPS	UPms
Chrome 36	1080p	46.42	8.08	46.22	7.97	46.72	7.80
Chrome 35	1080p	45.28	8.02	45.10	8.03	45.39	7.98
Firefox Aurora	1080p	60.04	1.00	60.04	1.14	60.02	1.18

OSX:
			FPS	UPms	FPS	UPms	FPS	UPms
Chrome 36	1080p	59.56	9.91	59.72	9.96	59.78	9.91
Chrome 35	1080p	59.77	13.47	58.88	14.51	59.85	13.69
Firefox Aurora	1080p	N/A	N/A	59.50	1.54	59.07	1.50
Safari 6.1.4	1080p	20.07	34.94	N/A	N/A	N/A	N/A

Windows 8.1:

			FPS	UPms	FPS	UPms	FPS	UPms
Chrome 36	1080p	51.22	11.53	59.53	7.16	59.23	7.23
Firefox Aurora	1080p			57.14	9.55	56.89	9.41
IE11		video not supported for upload

Conclusion: Firefox has made good efforts in its Aurora build (scheduled to be beta early September, and release early October) on OSX and Linux, these now outperform Chrome by a substantial margin (by ~600%) and pushed upload times into the range of 1-2ms. Chrome 36 has also made some gains on Chrome 35 on OSX (by ~45%). Firefox has yet to implement their improvements on Windows and has also got to deal with bugs in their h.264 support on Windows.

Safari Issues:

- Only H.264 is supported
- The H.264 support is very slow and choppy
Comment 2 Jer Noble 2015-05-11 10:29:35 PDT
Created attachment 252863 [details]
Patch
Comment 3 Jer Noble 2015-05-11 10:47:22 PDT
Created attachment 252865 [details]
Patch
Comment 4 Build Bot 2015-05-11 11:41:19 PDT
Comment on attachment 252865 [details]
Patch

Attachment 252865 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/4992284621799424

New failing tests:
webgl/1.0.2/conformance/textures/tex-image-and-uniform-binding-bugs.html
webgl/1.0.2/conformance/textures/texture-npot-video.html
fast/canvas/webgl/tex-image-and-uniform-binding-bugs.html
Comment 5 Build Bot 2015-05-11 11:41:21 PDT
Created attachment 252875 [details]
Archive of layout-test-results from ews100 for mac-mavericks

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-mavericks  Platform: Mac OS X 10.9.5
Comment 6 Build Bot 2015-05-11 11:48:33 PDT
Comment on attachment 252865 [details]
Patch

Attachment 252865 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/6589910987309056

New failing tests:
webgl/1.0.2/conformance/textures/tex-image-and-uniform-binding-bugs.html
webgl/1.0.2/conformance/textures/texture-npot-video.html
Comment 7 Build Bot 2015-05-11 11:48:35 PDT
Created attachment 252877 [details]
Archive of layout-test-results from ews107 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 8 Jer Noble 2015-05-11 13:11:51 PDT
Created attachment 252886 [details]
Patch

Added iOS support; fixed a crash in the conformance tests.
Comment 9 Build Bot 2015-05-11 14:04:02 PDT
Comment on attachment 252886 [details]
Patch

Attachment 252886 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/6631852953567232

New failing tests:
webgl/1.0.2/conformance/textures/texture-npot-video.html
Comment 10 Build Bot 2015-05-11 14:04:05 PDT
Created attachment 252891 [details]
Archive of layout-test-results from ews101 for mac-mavericks

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-mavericks  Platform: Mac OS X 10.9.5
Comment 11 Build Bot 2015-05-11 14:10:13 PDT
Comment on attachment 252886 [details]
Patch

Attachment 252886 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/5513033837182976

New failing tests:
webgl/1.0.2/conformance/textures/texture-npot-video.html
Comment 12 Build Bot 2015-05-11 14:10:16 PDT
Created attachment 252892 [details]
Archive of layout-test-results from ews106 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 13 Jer Noble 2015-05-11 14:53:44 PDT
Created attachment 252893 [details]
Patch
Comment 14 Build Bot 2015-05-11 15:46:10 PDT
Comment on attachment 252893 [details]
Patch

Attachment 252893 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.appspot.com/results/5106708993015808

New failing tests:
webgl/1.0.2/conformance/textures/texture-npot-video.html
Comment 15 Build Bot 2015-05-11 15:46:12 PDT
Created attachment 252898 [details]
Archive of layout-test-results from ews100 for mac-mavericks

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-mavericks  Platform: Mac OS X 10.9.5
Comment 16 Jon Lee 2015-05-11 15:48:30 PDT
Comment on attachment 252893 [details]
Patch

Tests?
Comment 17 Build Bot 2015-05-11 15:52:40 PDT
Comment on attachment 252893 [details]
Patch

Attachment 252893 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/5505874126700544

New failing tests:
webgl/1.0.2/conformance/textures/texture-npot-video.html
Comment 18 Build Bot 2015-05-11 15:52:43 PDT
Created attachment 252900 [details]
Archive of layout-test-results from ews104 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 19 Jer Noble 2015-08-12 22:53:54 PDT
Florian, your server does not support HTTP Range Requests; as such, WebKit falls back to 32-bit QuickTime to load, decode, and display these movies. This may go a long way towards explaining why WebKit seems to take so much longer to upload images.
Comment 20 Florian Bösch 2015-08-13 00:09:46 PDT
(In reply to comment #19)
> Florian, your server does not support HTTP Range Requests; as such, WebKit
> falls back to 32-bit QuickTime to load, decode, and display these movies.
> This may go a long way towards explaining why WebKit seems to take so much
> longer to upload images.

It's true that my server does not support range requests. However, two points about that.

1) So you're saying that to performantly put video into WebGL on the client you have to serve video with range requests. Or do you rather mean to express that this is a bug that you haven't filed yet? Do you want me to file it? I'd be happy to file "WebKit does not performantly handle video if the server does not support http range requests". Unfortunately that will then attract half a dozen other bug tickets, see point #2.

2) Browser behavior regarding to requests for video in regards to ranges is highly inconsistent, non http conformant and frequently leads to videos breaking. Actually supporting range requests for videos at this time is inadvisable, and the official recommendation I've gotten by Google and Apple is to use MSE to handle streaming video and not to rely on http range requests.
Comment 21 Jer Noble 2015-08-13 08:20:44 PDT
(In reply to comment #20)
> (In reply to comment #19)
> > Florian, your server does not support HTTP Range Requests; as such, WebKit
> > falls back to 32-bit QuickTime to load, decode, and display these movies.
> > This may go a long way towards explaining why WebKit seems to take so much
> > longer to upload images.
> 
> It's true that my server does not support range requests. However, two
> points about that.
> 
> 1) So you're saying that to performantly put video into WebGL on the client
> you have to serve video with range requests. Or do you rather mean to
> express that this is a bug that you haven't filed yet? Do you want me to
> file it? I'd be happy to file "WebKit does not performantly handle video if
> the server does not support http range requests". Unfortunately that will
> then attract half a dozen other bug tickets, see point #2.

No, I'm saying that expecting fast texture upload from an out-of-process, 32-bit media stack as ancient as QuickTime is unreasonable.

> 2) Browser behavior regarding to requests for video in regards to ranges is
> highly inconsistent, non http conformant and frequently leads to videos
> breaking. Actually supporting range requests for videos at this time is
> inadvisable, and the official recommendation I've gotten by Google and Apple
> is to use MSE to handle streaming video and not to rely on http range
> requests.

That is entirely misinformed, and I HIGHLY doubt anyone at Apple told you to use MSE to do streaming video.

HTTP Range Request support is a necessity. Without it, media-over-HTTP will not play at all in iOS, and soon, it will not be supported on desktop Safari as well.
Comment 22 Florian Bösch 2015-08-13 08:46:41 PDT
(In reply to comment #21)
> No, I'm saying that expecting fast texture upload from an out-of-process,
> 32-bit media stack as ancient as QuickTime is unreasonable.

Then fix it. I see no reason you should do that no matter where the bytes you consume come from. That constitutes a bug if I ever saw one. Plus, do you even have a testcase of your own to check video upload speed? Cause I kinda doubt it, seeing as this issue goes into its 4th year aniversary and nobody done jack squat about it. So if you disagree with my testcase, or sever, or whatever, go make your own, I promise it's quite an eye opening experience to see how utterly horrid the implementation across most every UA, including yours, is.

> That is entirely misinformed, and I HIGHLY doubt anyone at Apple told you to
> use MSE to do streaming video.
> 
> HTTP Range Request support is a necessity. Without it, media-over-HTTP will
> not play at all in iOS, and soon, it will not be supported on desktop Safari
> as well.

Well, good luck with that, don't say I didn't warn you that it's a shambles.
Comment 23 Jer Noble 2015-08-13 09:06:17 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > No, I'm saying that expecting fast texture upload from an out-of-process,
> > 32-bit media stack as ancient as QuickTime is unreasonable.
> 
> Then fix it.

No. QuickTime and QTKit are deprecated technologies. Support for QTKit has been disabled in WebKit Nightlies, and will be disabled in Safari 9.

> I see no reason you should do that no matter where the bytes
> you consume come from. That constitutes a bug if I ever saw one. Plus, do
> you even have a testcase of your own to check video upload speed? Cause I
> kinda doubt it, seeing as this issue goes into its 4th year anniversary

...2nd...

> and
> nobody done jack squat about it.

You do realize you're directing this at the person who is currently in the middle of fixing the very issue you're complaining about (video upload speed), right? 

> So if you disagree with my testcase, or
> sever, or whatever, go make your own, I promise it's quite an eye opening
> experience to see how utterly horrid the implementation across most every
> UA, including yours, is.

The number of loads which fall back to QTKit due to lack of server Range support is incredibly, incredibly small.  So, the balance of probabilities leads me to believe that the problem is not with HTTP Range requests, but how well your server implement them.
Comment 24 Eric Carlson 2015-08-13 09:09:47 PDT
Comment on attachment 252893 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=252893&action=review

> Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.h:242
> +    virtual bool copyVideoTextureToPlatformTexture(GraphicsContext3D*, Platform3DObject, GC3Denum target, GC3Dint level, GC3Denum internalFormat, GC3Denum format, GC3Denum type, bool premultiplyAlpha, bool flipY) override;

Nit: don't need both "virtual" and "override" (the later implies the former).

> Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2370
> +void MediaPlayerPrivateAVFoundationObjC::destroyOpenGLVideoOutput()

This is never called.

> Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2382
> +bool MediaPlayerPrivateAVFoundationObjC::openGLVideoOutputHasAvailableFrame()

Nit: this and updateLastOpenGLImage should take the item time instead of both calling -[AVPlayerItem currentTime]

> Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2406
> +void MediaPlayerPrivateAVFoundationObjC::updateLastOpenGLImage()
> +{
> +    if (!m_openGLVideoOutput)
> +        return;
> +
> +    CMTime currentTime = [m_avPlayerItem currentTime];
> +    if (![m_openGLVideoOutput hasNewPixelBufferForItemTime:currentTime])
> +        return;
> +
> +    m_lastOpenGLImage = adoptCF([m_openGLVideoOutput copyPixelBufferForItemTime:currentTime itemTimeForDisplay:nil]);
> +}

Nit: this seems unnecessary - it is never called if m_openGLVideoOutput is null or there isn't a pixel buffer for the current time. Why not just use copyPixelBufferForItemTime:itemTimeForDisplay: directly?
Comment 25 Florian Bösch 2015-08-13 09:19:16 PDT
I won't entertain more discussions of http range or not, that's got nothing whatsoever todo with tex[Sub]Image2D speed.

I have clocked an optimal path for putting the texels of a video into a texture using a C++ test program quite a while ago (just consult the chrome ticket that's been open for the last 4 years).

If you are uploading YCbCr (4:2:2) to the GPU from ram and then convert to RGB on GPU this should not take more than 2ms (even on fairly slow machines). And if you already have a YCbCr texture in vram, this will take approximately 10 microseconds.
Comment 26 Florian Bösch 2015-08-13 09:52:19 PDT
Btw. my server does support ranges, I just checked, so I don't know why you would say it wouldn't.
Comment 27 Jer Noble 2015-08-13 10:00:49 PDT
(In reply to comment #26)
> Btw. my server does support ranges, I just checked, so I don't know why you
> would say it wouldn't.

> curl -I -r 0-100 'http://codeflow.org/issues/slow_video_to_texture/mp4/720p.mp4'
HTTP/1.1 200 OK
Content-Length: 1446854
Expires: Thu, 13 Aug 2015 16:59:46 GMT
Server: ftlpy 0.1
Connection: Keep-Alive
ETag: 28d3d94bd36539de2218bd5beafe6d2c
Cache-Control: private, max-age=0, no-cache, no-transform
Date: Thu, 13 Aug 2015 16:59:46 GMT
Content-Type: video/mp4

If your server supported range requests, it would have returned a '206 Partial Content' response, not a '200 OK'.
Comment 28 Jer Noble 2015-08-13 10:04:39 PDT
(In reply to comment #24)
> Comment on attachment 252893 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=252893&action=review
> 
> > Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.h:242
> > +    virtual bool copyVideoTextureToPlatformTexture(GraphicsContext3D*, Platform3DObject, GC3Denum target, GC3Dint level, GC3Denum internalFormat, GC3Denum format, GC3Denum type, bool premultiplyAlpha, bool flipY) override;
> 
> Nit: don't need both "virtual" and "override" (the later implies the former).

Ok.

> > Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2370
> > +void MediaPlayerPrivateAVFoundationObjC::destroyOpenGLVideoOutput()
> 
> This is never called.

Well, that's a mistake! :)

> > Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2382
> > +bool MediaPlayerPrivateAVFoundationObjC::openGLVideoOutputHasAvailableFrame()
> 
> Nit: this and updateLastOpenGLImage should take the item time instead of
> both calling -[AVPlayerItem currentTime]

Actually, see below.

> > Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2406
> > +void MediaPlayerPrivateAVFoundationObjC::updateLastOpenGLImage()
> > +{
> > +    if (!m_openGLVideoOutput)
> > +        return;
> > +
> > +    CMTime currentTime = [m_avPlayerItem currentTime];
> > +    if (![m_openGLVideoOutput hasNewPixelBufferForItemTime:currentTime])
> > +        return;
> > +
> > +    m_lastOpenGLImage = adoptCF([m_openGLVideoOutput copyPixelBufferForItemTime:currentTime itemTimeForDisplay:nil]);
> > +}
> 
> Nit: this seems unnecessary - it is never called if m_openGLVideoOutput is
> null or there isn't a pixel buffer for the current time. Why not just use
> copyPixelBufferForItemTime:itemTimeForDisplay: directly?

Maybe we shouldn't have a openGLVideoOutputHasAvailableFrame(), and should instead just call updateLastOpenGLImage() and check to see if m_lastOpenGLImage is non-NULL.
Comment 29 Florian Bösch 2015-08-13 10:06:38 PDT
(In reply to comment #27)
> (In reply to comment #26)
> > Btw. my server does support ranges, I just checked, so I don't know why you
> > would say it wouldn't.
> 
> > curl -I -r 0-100 'http://codeflow.org/issues/slow_video_to_texture/mp4/720p.mp4'
> HTTP/1.1 200 OK
> Content-Length: 1446854
> Expires: Thu, 13 Aug 2015 16:59:46 GMT
> Server: ftlpy 0.1
> Connection: Keep-Alive
> ETag: 28d3d94bd36539de2218bd5beafe6d2c
> Cache-Control: private, max-age=0, no-cache, no-transform
> Date: Thu, 13 Aug 2015 16:59:46 GMT
> Content-Type: video/mp4
> 
> If your server supported range requests, it would have returned a '206
> Partial Content' response, not a '200 OK'.

GET /issues/slow_video_to_texture/mp4/720p.mp4 HTTP/1.1
Host: codeflow.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept-Encoding: identity;q=1, *;q=0
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36
Accept: */*
Referer: http://codeflow.org/issues/slow_video_to_texture/mp4/720p.mp4
Accept-Language: en-US,en;q=0.8,de;q=0.6
Cookie: __utma=240947345.1694194387.1439291172.1439291172.1439291172.1; __utmc=240947345; __utmz=240947345.1439291172.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Range: bytes=0-

HTTP/1.1 206 Partial Content
Content-Length: 1446854
Accept-Ranges: bytes
Expires: Thu, 13 Aug 2015 17:04:42 GMT
Content-Range: bytes 0-1446853/1446854
Server: ftlpy 0.1
Connection: Keep-Alive
ETag: 28d3d94bd36539de2218bd5beafe6d2c
Cache-Control: private, max-age=0, no-cache, no-transform
Date: Thu, 13 Aug 2015 17:04:42 GMT
Content-Type: video/mp4

It just doesn't do it on HEAD because I've seen no UA do that for video.
Comment 30 Jer Noble 2015-08-13 10:07:32 PDT
(In reply to comment #25)
> I won't entertain more discussions of http range or not, that's got nothing
> whatsoever todo with tex[Sub]Image2D speed.

Well, except that once this bug is fixed, you'll still need Byte Range support enabled on your server to see it work.

> I have clocked an optimal path for putting the texels of a video into a
> texture using a C++ test program quite a while ago (just consult the chrome
> ticket that's been open for the last 4 years).
> 
> If you are uploading YCbCr (4:2:2) to the GPU from ram and then convert to
> RGB on GPU this should not take more than 2ms (even on fairly slow
> machines). And if you already have a YCbCr texture in vram, this will take
> approximately 10 microseconds.

That assumes that the image starts off on the CPU.  Modern hardware decoders will write directly to the GPU, so your "optimal" path would necessitate a GPU readback, even when the decoded image was already in RGB.
Comment 31 Florian Bösch 2015-08-13 10:09:39 PDT
(In reply to comment #30) 
> That assumes that the image starts off on the CPU.  Modern hardware decoders
> will write directly to the GPU, so your "optimal" path would necessitate a
> GPU readback, even when the decoded image was already in RGB.

It assumes that you either have it on CPU already, or recognizes that you don't, in which case it's even faster because it's already on the GPU and you can do a GPU -> GPU transfer. In either case, whatever it assumes, implementations of this are bad, and have been, and still are.
Comment 32 Jer Noble 2015-08-13 10:10:20 PDT
(In reply to comment #29)
> GET /issues/slow_video_to_texture/mp4/720p.mp4 HTTP/1.1
> Host: codeflow.org
> Connection: keep-alive
> Pragma: no-cache
> Cache-Control: no-cache
> Accept-Encoding: identity;q=1, *;q=0
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like
> Gecko) Chrome/44.0.2403.155 Safari/537.36
> Accept: */*
> Referer: http://codeflow.org/issues/slow_video_to_texture/mp4/720p.mp4
> Accept-Language: en-US,en;q=0.8,de;q=0.6
> Cookie: __utma=240947345.1694194387.1439291172.1439291172.1439291172.1;
> __utmc=240947345;
> __utmz=240947345.1439291172.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
> Range: bytes=0-
> 
> HTTP/1.1 206 Partial Content
> Content-Length: 1446854
> Accept-Ranges: bytes
> Expires: Thu, 13 Aug 2015 17:04:42 GMT
> Content-Range: bytes 0-1446853/1446854

Try with a actual sub-range.

> curl -O -r 0-100 'http://codeflow.org/issues/slow_video_to_texture/mp4/720p.mp4'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1412k  100 1412k    0     0   403k      0  0:00:03  0:00:03 --:--:--  403k

A request for range 0-100 returns the entire file. This is not compliant behavior.
Comment 33 Jer Noble 2015-08-13 10:19:16 PDT
(In reply to comment #31)
> (In reply to comment #30) 
> > That assumes that the image starts off on the CPU.  Modern hardware decoders
> > will write directly to the GPU, so your "optimal" path would necessitate a
> > GPU readback, even when the decoded image was already in RGB.
> 
> It assumes that you either have it on CPU already, or recognizes that you
> don't, in which case it's even faster because it's already on the GPU and
> you can do a GPU -> GPU transfer.

You'll note, in the patch attached to this bug, that's exactly what we'll do.
Comment 34 Jer Noble 2015-08-13 11:07:03 PDT
Here are the results from Florian's test case with the current WIP patch:

upload         resolution  frame/s  upload time (ms)
texImage2D     240p        57.64    0.89
texImage2D     480p        55.63    1.28
texImage2D     720p        45.82    2.18
texImage2D     1080p       53.94    1.76
texSubImage2D  240p        56.73    1.09
texSubImage2D  480p        36.82    1.06
texSubImage2D  720p        55.74    1.23
texSubImage2D  1080p       54.55    2.06
texImage2D     240p        55.81    1.39
texImage2D     480p        56.29    0.94
texImage2D     720p        55.61    1.14
texImage2D     1080p       54.81    1.83
texSubImage2D  240p        56.26    1.25
texSubImage2D  480p        56.28    0.95
texSubImage2D  720p        55.29    1.39
texSubImage2D  1080p       54.03    2.10

There may be a few more ms of efficiency to squeeze out, but it's generally very much improved.
Comment 35 Klaus Reinfeld 2015-08-13 12:12:01 PDT
That GPU to GPU patch sounds good! 

Is there a chance that it might be included in the final iOS 9 release?

(I know - no comments on future releases, but maybe a 'maybe'? :-))
Comment 36 Florian Bösch 2015-08-13 12:37:42 PDT
(In reply to comment #34)
> Here are the results from Florian's test case with the current WIP patch:
> 
> upload         resolution  frame/s  upload time (ms)
> texImage2D     240p        57.64    0.89
> texImage2D     480p        55.63    1.28
> texImage2D     720p        45.82    2.18
> texImage2D     1080p       53.94    1.76
> texSubImage2D  240p        56.73    1.09
> texSubImage2D  480p        36.82    1.06
> texSubImage2D  720p        55.74    1.23
> texSubImage2D  1080p       54.55    2.06
> texImage2D     240p        55.81    1.39
> texImage2D     480p        56.29    0.94
> texImage2D     720p        55.61    1.14
> texImage2D     1080p       54.81    1.83
> texSubImage2D  240p        56.26    1.25
> texSubImage2D  480p        56.28    0.95
> texSubImage2D  720p        55.29    1.39
> texSubImage2D  1080p       54.03    2.10
> 
> There may be a few more ms of efficiency to squeeze out, but it's generally
> very much improved.

Why is the FPS not 60 if the upload time is so small? You're probably still taking upwards of 16ms for a video frame upload. I daresay, there's more than a "few" improvements you can squeeze out there, like approximately 1200%.
Comment 37 Jer Noble 2015-08-13 14:02:49 PDT
(In reply to comment #36)
> Why is the FPS not 60 if the upload time is so small? You're probably still
> taking upwards of 16ms for a video frame upload. I daresay, there's more
> than a "few" improvements you can squeeze out there, like approximately
> 1200%.

Commenting out the call to texImage2D and re-running the test least to frame rates which hover around 57fps; so there is some inefficiency outside of video texture upload, and out of scope of this bug, to figure out.
Comment 38 Florian Bösch 2015-08-13 14:05:56 PDT
(In reply to comment #37)
> (In reply to comment #36)
> > Why is the FPS not 60 if the upload time is so small? You're probably still
> > taking upwards of 16ms for a video frame upload. I daresay, there's more
> > than a "few" improvements you can squeeze out there, like approximately
> > 1200%.
> 
> Commenting out the call to texImage2D and re-running the test least to frame
> rates which hover around 57fps; so there is some inefficiency outside of
> video texture upload, and out of scope of this bug, to figure out.

There's a stripped down simple version of the test here: http://codeflow.org/issues/slow_video_to_texture/simple.html that you can use to compare against the more complex one.
Comment 39 Jer Noble 2015-08-13 14:35:01 PDT
If you modify your Statistics object to return a weighted moving average (that gives more weight to recent entries), the FPS averages for each test rapidly converge on ~59.7 FPS. This implies that the slowness is almost entirely in startup of each test. And indeed, we have some code which, if no GPU frames are available, blocks until a CPU frame can be generated. This was originally added for (canvas and WebGL) correctness reasons; and we are indeed hitting that path at the beginning of each of your tests.
Comment 40 Florian Bösch 2015-08-13 22:01:40 PDT
(In reply to comment #39)
> If you modify your Statistics object to return a weighted moving average
> (that gives more weight to recent entries), the FPS averages for each test
> rapidly converge on ~59.7 FPS. This implies that the slowness is almost
> entirely in startup of each test. And indeed, we have some code which, if no
> GPU frames are available, blocks until a CPU frame can be generated. This
> was originally added for (canvas and WebGL) correctness reasons; and we are
> indeed hitting that path at the beginning of each of your tests.

You can modify the simple test to use a rolling average or start measuring only N frames in or something. Anyway, if you want to know how much better you've gotten beyond satisfying 60fps and a low upload time (which is somewhat unreliable because of asynchronicity) you should repeat the upload multiple times until you drop FPS.
Comment 41 Jer Noble 2015-08-20 16:58:05 PDT
Created attachment 259527 [details]
Patch
Comment 42 Jer Noble 2015-08-21 09:53:19 PDT
Created attachment 259622 [details]
Patch
Comment 43 Build Bot 2015-08-21 10:30:31 PDT
Comment on attachment 259622 [details]
Patch

Attachment 259622 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/86322

New failing tests:
webgl/1.0.2/conformance/textures/gl-teximage.html
webgl/1.0.2/conformance/textures/texture-npot-video.html
fast/canvas/webgl/gl-teximage.html
webgl/1.0.2/conformance/context/premultiplyalpha-test.html
fast/canvas/webgl/premultiplyalpha-test.html
Comment 44 Build Bot 2015-08-21 10:30:33 PDT
Created attachment 259627 [details]
Archive of layout-test-results from ews101 for mac-mavericks

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-mavericks  Platform: Mac OS X 10.9.5
Comment 45 Build Bot 2015-08-21 10:31:15 PDT
Comment on attachment 259622 [details]
Patch

Attachment 259622 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/86326

New failing tests:
webgl/1.0.2/conformance/textures/gl-teximage.html
webgl/1.0.2/conformance/textures/texture-npot-video.html
fast/canvas/webgl/gl-teximage.html
webgl/1.0.2/conformance/context/premultiplyalpha-test.html
fast/canvas/webgl/premultiplyalpha-test.html
Comment 46 Build Bot 2015-08-21 10:31:18 PDT
Created attachment 259629 [details]
Archive of layout-test-results from ews104 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 47 Jer Noble 2015-09-25 15:34:42 PDT
Created attachment 261948 [details]
Patch
Comment 48 Build Bot 2015-09-25 15:56:45 PDT
Comment on attachment 261948 [details]
Patch

Attachment 261948 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/212076

New failing tests:
webgl/1.0.2/conformance/textures/texture-npot-video.html
Comment 49 Build Bot 2015-09-25 15:56:48 PDT
Created attachment 261950 [details]
Archive of layout-test-results from ews100 for mac-mavericks

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-mavericks  Platform: Mac OS X 10.9.5
Comment 50 Build Bot 2015-09-25 16:16:03 PDT
Comment on attachment 261948 [details]
Patch

Attachment 261948 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/212104

New failing tests:
webgl/1.0.2/conformance/textures/texture-npot-video.html
Comment 51 Build Bot 2015-09-25 16:16:10 PDT
Created attachment 261952 [details]
Archive of layout-test-results from ews104 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 52 Jer Noble 2015-09-25 23:23:14 PDT
Created attachment 261963 [details]
Patch

Finally figured out the failing test; I wasn't handling the flipY = true case. This should finally make the EWS bots happy.
Comment 53 Jer Noble 2015-09-28 13:11:13 PDT
Created attachment 262015 [details]
Patch
Comment 54 Dean Jackson 2015-11-13 09:52:24 PST
Comment on attachment 262015 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=262015&action=review

> Source/WebCore/ChangeLog:9
> +        when lazily-created, will emit CVPixelBuffers which are guaranteed to be compatable with

Typo: compatible

> Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2430
> +}
> +
> +
> +#if !LOG_DISABLED

Extra blank line.

> Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2566
> +    size_t height = CVPixelBufferGetHeight(m_lastOpenGLImage.get());
> +
> +
> +#if PLATFORM(IOS)

Another extra line.

> Source/WebCore/platform/graphics/avfoundation/objc/MediaPlayerPrivateAVFoundationObjC.mm:2597
> +    GC3Denum readFramebufferTarget = GraphicsContext3D::READ_FRAMEBUFFER;
> +    GC3Denum readFramebufferBinding = GraphicsContext3D::READ_FRAMEBUFFER_BINDING;

I'm not sure it makes it any clearer to have these variables around that point to constants. You never change their values. It might be fine to just explicitly include them as parameters when you use them.
Comment 55 Dean Jackson 2015-11-13 09:59:29 PST
> texSubImage2D  480p        36.82    1.06
> texSubImage2D  720p        55.74    1.23

Why is 480p slower than 720p? Is it because the test doesn't converge in time?

It would be nice if this patch could be automatically tested. However, given the state of our testing, I doubt we could get reliable results.
Comment 56 Simon Fraser (smfr) 2015-11-13 11:10:27 PST
We should be able to make an animometer benchmark for this.
Comment 57 Jer Noble 2015-11-13 14:41:22 PST
(In reply to comment #55)
> > texSubImage2D  480p        36.82    1.06
> > texSubImage2D  720p        55.74    1.23
> 
> Why is 480p slower than 720p? Is it because the test doesn't converge in
> time?
> 
> It would be nice if this patch could be automatically tested. However, given
> the state of our testing, I doubt we could get reliable results.

See the comments following. The frame rate averages are thrown off greatly by the first few frames, which are, at that point, still using the software painting path. We need time to spin up the AVPlayerItemOutput which is capable of generating OpenGL-compatable buffers.  If you use a weighted moving-average, you see that we hover right at 60 fps during steady-state.
Comment 58 Jer Noble 2015-11-18 16:40:40 PST
Created attachment 265807 [details]
Patch for landing
Comment 59 WebKit Commit Bot 2015-11-18 19:20:16 PST
Comment on attachment 265807 [details]
Patch for landing

Clearing flags on attachment: 265807

Committed r192607: <http://trac.webkit.org/changeset/192607>
Comment 60 Radar WebKit Bug Importer 2015-11-19 13:18:56 PST
<rdar://problem/23617699>
Comment 61 Csaba Osztrogonác 2015-11-23 03:38:13 PST
It is already landed. Can we close the bug?
Comment 62 Jer Noble 2015-12-02 08:18:45 PST
Yep!
Comment 63 Klaus Reinfeld 2016-01-18 05:45:06 PST
Landed/fixed but not included in iOS 9.3?

The performance in iOS 9.3 beta is still very bad...
Comment 64 Freddy Snijder 2016-06-24 07:42:39 PDT
Hello, 

Does anyone know if the patch landed in Safari iOS 9.3.2? (This there a way for me to find out, other then observation?)

Thank you!

  -- Freddy Snijder
Comment 65 Klaus Reinfeld 2016-09-09 03:37:01 PDT
Has anyone tested this patch with iOS 10?

That new 'Support a direct GPU-to-GPU copy of video textures' commit from here:
https://github.com/WebKit/webkit/commit/9abf87df6ed4f5b57e5770c993609962f6815625#diff-a8c799bd9a6162420acfd9bbb5b35d44

has huge problems on iOS 10!

When using RGB (not RGBA) textures:
- an iPad Pro 12.9 is directly crashing
- iPad Air 1 and iPhone 6+ are getting 'INVALID FRAMEBUFFER OPERATION' errors

Here a test case:
http://krpano.com/ios/bugs/ios10-webgl-video-texture-crash/

See the page source code (very simple) for details.
The example also includes several workaround solutions.