Bug 101187 - Reduce the memory usage of ConvolverNode
Summary: Reduce the memory usage of ConvolverNode
Status: UNCONFIRMED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Audio (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-04 23:18 PST by Xingnan Wang
Modified: 2014-02-05 11:07 PST (History)
6 users (show)

See Also:


Attachments
Patch (21.73 KB, patch)
2012-11-04 23:33 PST, Xingnan Wang
no flags Details | Formatted Diff | Diff
Patch (13.56 KB, patch)
2012-11-08 21:15 PST, Xingnan Wang
no flags Details | Formatted Diff | Diff
Patch (13.58 KB, patch)
2012-11-12 01:27 PST, Xingnan Wang
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xingnan Wang 2012-11-04 23:18:53 PST
As title.
Comment 1 Xingnan Wang 2012-11-04 23:33:36 PST
Created attachment 172275 [details]
Patch
Comment 2 Xingnan Wang 2012-11-04 23:36:11 PST
(In reply to comment #1)
> Created an attachment (id=172275) [details]
> Patch

From my test, it reduced around half of memory usage. The patch is only for ffmpeg back-end currently.
Comment 3 Build Bot 2012-11-04 23:41:27 PST
Comment on attachment 172275 [details]
Patch

Attachment 172275 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/14719701
Comment 4 Chris Rogers 2012-11-05 10:08:47 PST
The memory saving looks good, but I'm a bit concerned about the CPU hit, any comments about that?
Comment 5 Xingnan Wang 2012-11-05 22:22:38 PST
(In reply to comment #4)
> The memory saving looks good, but I'm a bit concerned about the CPU hit, any comments about that?

There are not obvious difference of CPU usage before and after applying my patch from my test. For the dram machine demo(http://chromium.googlecode.com/svn/trunk/samples/audio/shiny-drum-machine.html), the CPU hit nearly doesn't change. And the memory usage reduce ~40M, which it should come from ConvolverNode.

The patch should not increase the CPU usage for the convolver, because I just replace multiply of the non-interleaved data with multiply of interleaved data and avoid the interleaving/de-interleaving before and after the "multiply()". But for getting the non-interleaving data(from the two functions real() and imag()), it needs to do the in-place shuffle which may consume some more CPU, but it's not in the hot path.
Comment 6 Chris Rogers 2012-11-07 12:13:05 PST
Comment on attachment 172275 [details]
Patch

Xingnan, thanks for looking into reducing memory usage for ConvolverNode.  I think there's a potentially much better way to achieve even much higher memory savings and not impact performance.  Even if your tests show little performance hit with the interleaving/deinterleaving, there are a range of processor architectures and chip models that this runs on.  We'd have to test them all very carefully.  Plus we don't want to take *any* performance hit at all.

Here's my idea:
FFTConvolver has a "m_frame" member variable which could be shared among all similar FFTConvolver objects of the same size (and which are running on the same thread).  The process() method could be changed to take an additional argument which is the FFTFrame (shared).  It could even be made more fancy where this additional argument is optional, and the FFTConvolver would lazily create its own "m_frame" if one wasn't provided.

As an example, you can see where FFTConvolver is used in HRTFPanner which has four of them.  Not only could a single FFTFrame be shared for all four, but it could also be shared among other HRTFPanner objects running in that same thread.

Similarly, for ConvolverNode, all the ReverbConvolverStage's of the same size (and running on the same thread) can be given a shared FFTFrame.

The memory savings this way could be very significant and would help ALL the different FFT back-ends, instead of just the FFmpeg one.
Comment 7 Xingnan Wang 2012-11-08 21:15:33 PST
Created attachment 173194 [details]
Patch
Comment 8 Xingnan Wang 2012-11-08 21:26:08 PST
(In reply to comment #7)
> Created an attachment (id=173194) [details]
> Patch

(In reply to comment #7)
> Created an attachment (id=173194) [details]
> Patch

That`s a good idea to share the FFTFrame, and it really simplifies the problem.


Update the patch which covers your idea and removed the non-interleaved multiply.
Comment 9 WebKit Review Bot 2012-11-09 00:42:35 PST
Comment on attachment 173194 [details]
Patch

Attachment 173194 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/14786062

New failing tests:
inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html
Comment 10 Xingnan Wang 2012-11-12 01:27:31 PST
Created attachment 173586 [details]
Patch
Comment 11 Xingnan Wang 2012-11-12 18:01:03 PST
(In reply to comment #10)
> Created an attachment (id=173586) [details]
> Patch

Resubmitted the patch, and made sure my patch did not result in last failing test.
Comment 12 Chris Rogers 2012-11-13 17:54:03 PST
Hi Xingnan, sorry this is taking me a little time to look at.  Thanks for trying the new approach, I'll try to review as soon as I can...
Comment 13 Xingnan Wang 2012-11-13 18:10:51 PST
(In reply to comment #12)
> Hi Xingnan, sorry this is taking me a little time to look at.  Thanks for trying the new approach, I'll try to review as soon as I can...

Chris, that doesn't matter, I am not in a hurry.
Comment 14 Anders Carlsson 2014-02-05 11:07:27 PST
Comment on attachment 173586 [details]
Patch

Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.