101187 – Reduce the memory usage of ConvolverNode

UNCONFIRMED 101187

Reduce the memory usage of ConvolverNode

https://bugs.webkit.org/show_bug.cgi?id=101187

Summary Reduce the memory usage of ConvolverNode

Xingnan Wang

Reported 2012-11-04 23:18:53 PST

As title.

Attachments
Patch (21.73 KB, patch) 2012-11-04 23:33 PST, Xingnan Wang	no flags	Details Formatted Diff Diff
Patch (13.56 KB, patch) 2012-11-08 21:15 PST, Xingnan Wang	no flags	Details Formatted Diff Diff
Patch (13.58 KB, patch) 2012-11-12 01:27 PST, Xingnan Wang	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Xingnan Wang

Comment 1 2012-11-04 23:33:36 PST

Created attachment 172275 [details] Patch

Xingnan Wang

Comment 2 2012-11-04 23:36:11 PST

(In reply to comment #1) > Created an attachment (id=172275) [details] > Patch From my test, it reduced around half of memory usage. The patch is only for ffmpeg back-end currently.

Build Bot

Comment 3 2012-11-04 23:41:27 PST

Comment on attachment 172275 [details] Patch Attachment 172275 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/14719701

Chris Rogers

Comment 4 2012-11-05 10:08:47 PST

The memory saving looks good, but I'm a bit concerned about the CPU hit, any comments about that?

Xingnan Wang

Comment 5 2012-11-05 22:22:38 PST

(In reply to comment #4) > The memory saving looks good, but I'm a bit concerned about the CPU hit, any comments about that? There are not obvious difference of CPU usage before and after applying my patch from my test. For the dram machine demo(http://chromium.googlecode.com/svn/trunk/samples/audio/shiny-drum-machine.html), the CPU hit nearly doesn't change. And the memory usage reduce ~40M, which it should come from ConvolverNode. The patch should not increase the CPU usage for the convolver, because I just replace multiply of the non-interleaved data with multiply of interleaved data and avoid the interleaving/de-interleaving before and after the "multiply()". But for getting the non-interleaving data(from the two functions real() and imag()), it needs to do the in-place shuffle which may consume some more CPU, but it's not in the hot path.

Chris Rogers

Comment 6 2012-11-07 12:13:05 PST

Comment on attachment 172275 [details] Patch Xingnan, thanks for looking into reducing memory usage for ConvolverNode. I think there's a potentially much better way to achieve even much higher memory savings and not impact performance. Even if your tests show little performance hit with the interleaving/deinterleaving, there are a range of processor architectures and chip models that this runs on. We'd have to test them all very carefully. Plus we don't want to take *any* performance hit at all. Here's my idea: FFTConvolver has a "m_frame" member variable which could be shared among all similar FFTConvolver objects of the same size (and which are running on the same thread). The process() method could be changed to take an additional argument which is the FFTFrame (shared). It could even be made more fancy where this additional argument is optional, and the FFTConvolver would lazily create its own "m_frame" if one wasn't provided. As an example, you can see where FFTConvolver is used in HRTFPanner which has four of them. Not only could a single FFTFrame be shared for all four, but it could also be shared among other HRTFPanner objects running in that same thread. Similarly, for ConvolverNode, all the ReverbConvolverStage's of the same size (and running on the same thread) can be given a shared FFTFrame. The memory savings this way could be very significant and would help ALL the different FFT back-ends, instead of just the FFmpeg one.

Xingnan Wang

Comment 7 2012-11-08 21:15:33 PST

Created attachment 173194 [details] Patch

Xingnan Wang

Comment 8 2012-11-08 21:26:08 PST

(In reply to comment #7) > Created an attachment (id=173194) [details] > Patch (In reply to comment #7) > Created an attachment (id=173194) [details] > Patch That`s a good idea to share the FFTFrame, and it really simplifies the problem. Update the patch which covers your idea and removed the non-interleaved multiply.

WebKit Review Bot

Comment 9 2012-11-09 00:42:35 PST

Comment on attachment 173194 [details] Patch Attachment 173194 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14786062 New failing tests: inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html

Xingnan Wang

Comment 10 2012-11-12 01:27:31 PST

Created attachment 173586 [details] Patch

Xingnan Wang

Comment 11 2012-11-12 18:01:03 PST

(In reply to comment #10) > Created an attachment (id=173586) [details] > Patch Resubmitted the patch, and made sure my patch did not result in last failing test.

Chris Rogers

Comment 12 2012-11-13 17:54:03 PST

Hi Xingnan, sorry this is taking me a little time to look at. Thanks for trying the new approach, I'll try to review as soon as I can...

Xingnan Wang

Comment 13 2012-11-13 18:10:51 PST

(In reply to comment #12) > Hi Xingnan, sorry this is taking me a little time to look at. Thanks for trying the new approach, I'll try to review as soon as I can... Chris, that doesn't matter, I am not in a hurry.

Anders Carlsson

Comment 14 2014-02-05 11:07:27 PST

Comment on attachment 173586 [details] Patch Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.

Note You need to log in before you can comment on or make changes to this bug.

Status UNCONFIRMED

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Web Audio

Assignee

Nobody

Reported

2012-11-04 23:18 PST

Modified

2014-02-05 11:07 PST History

CC List

6 users Show

URL

Keywords

Depends on

Blocks