Bug 36849

Summary: Add zero() method to Vector class
Product: WebKit Reporter: Chris Rogers <crogers>
Component: Web Template FrameworkAssignee: Nobody <webkit-unassigned>
Status: RESOLVED WONTFIX    
Severity: Normal CC: abarth, ap, cmarrin, darin, jorlow, mjs, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Attachments:
Description Flags
Patch mjs: review-

Description Chris Rogers 2010-03-30 13:23:50 PDT
Add zero() method to Vector class
Comment 1 Chris Rogers 2010-03-30 15:35:53 PDT
Created attachment 52089 [details]
Patch
Comment 2 Chris Rogers 2010-03-30 15:40:15 PDT
This patch adds a zero() method to Vector (only for the POD types) which memsets the bytes to all zeroes.  For integer and floating-point values this corresponds to all values of 0 or 0.0.  The reason it's interesting to have a zero() method when there's already a fill() method is because zero() can be much more optimized (using memset) rather than using std::fill().
Comment 3 Alexey Proskuryakov 2010-03-30 16:12:50 PDT
GCC optimizes std::fill() using a memset when possible.

  /**
   *  @brief Fills the range [first,last) with copies of value.
   *  @param  first  A forward iterator.
   *  @param  last   A forward iterator.
   *  @param  value  A reference-to-const of arbitrary type.
   *  @return   Nothing.
   *
   *  This function fills a range with copies of the same value.  For one-byte
   *  types filling contiguous areas of memory, this becomes an inline call to
   *  @c memset.
  */

I think it's even done at two levels - both as a specialization in STL, and a compiler optimization for those engineers who use explicit loops.
Comment 4 Chris Rogers 2010-03-30 16:43:14 PDT
Right, but I'm interested in using this for Vector<float> and Vector<double> which won't have this optimization.
Comment 5 Jeremy Orlow 2010-03-31 03:30:41 PDT
I don't really like the idea of having a method that silently does nothing when it's not safe.  I actually think having it zero out even non-POD is better than silently doing nothing.

Is it possible to add a specialization of fill for doubles and floats when 0.0 is passed in?
Comment 6 Chris Rogers 2010-03-31 11:45:05 PDT
Actually, it doesn't silently do nothing.  I made the method private so it won't even compile in the cases where it doesn't make sense and could be dangerous.  For many non-POD type objects, simply initializing to zero would not be a good thing.  For example, OwnPtr would leak.  I suppose it's possible to specialize the fill() method for 0.0 with float and double (and long double, short, long long, int, etc.), but it seems more concise and precise to be able to call object.zero() rather than object.fill(0.0).
Comment 7 Darin Adler 2010-03-31 12:49:16 PDT
(In reply to comment #6)
> it seems more concise and precise to be able
> to call object.zero() rather than object.fill(0.0).

More concise, yes.

More precise? No. I believe it's less precise.

Is there a way to optimize object.fill(0.0) or not?

I suggest we not add this. But I could be convinced otherwise.
Comment 8 Chris Rogers 2010-03-31 13:15:42 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > it seems more concise and precise to be able
> > to call object.zero() rather than object.fill(0.0).
> 
> More concise, yes.
> 
> More precise? No. I believe it's less precise.
> 
> Is there a way to optimize object.fill(0.0) or not?
> 
> I suggest we not add this. But I could be convinced otherwise.

OK, maybe not more precise :)

But one point I was trying to make is that the current patch addresses all of the potential interesting cases where I could see this being useful (char, unsigned char, short, float, double, long double, long long, etc.).    I'm not sure how to get the same behavior other than creating a specialization of fill() for each of  these types separately which seems somewhat cumbersome.  Maybe there's a shortcut to avoid this - any suggestions?
Comment 9 Chris Marrin 2010-03-31 13:37:06 PDT
Do we really need this method in the Vector class? It may be a bit more concise, but it seems like overkill. I don't think it will improve performance in any significant way, and I don't thing 'zero()' is any more clear than 'fill(0)'. It seems like it just unnecessarily adds to the size of the API of this function.
Comment 10 Sam Weinig 2010-03-31 14:47:15 PDT
What about something like either of these.



namespace WTF {

template <bool canZero, typename T>
struct ZeroFiller;

template<typename T>
struct ZeroFiller<true, T> {
    static void zero(T* begin, T* end)
    {
        memset(begin, 0, reinterpret_cast<char*>(end) - reinterpret_cast<char*>(begin));
    }
};

template<typename T>
void zeroFill(T* begin, T* end)
{
    ZeroFiller::zero<IsPod<T>::value, T>(first, last)
}

} // namespace WTF


or


namespace WTF {

template<typename T>
void zeroFill(T* begin, T* end)
{
    COMPILE_ASSERT(IsPod<T>);
    memset(begin, 0, reinterpret_cast<char*>(end) - reinterpret_cast<char*>(begin));
}

} // namespace WTF
Comment 11 Chris Rogers 2010-03-31 14:59:31 PDT
(In reply to comment #9)
> Do we really need this method in the Vector class? It may be a bit more
> concise, but it seems like overkill. I don't think it will improve performance
> in any significant way, and I don't thing 'zero()' is any more clear than
> 'fill(0)'. It seems like it just unnecessarily adds to the size of the API of
> this function.

For many applications the performance benefit may be fairly small, but for audio it makes sense.  When I worked on the CoreAudio team we were pressured very hard to make these types of optimizations, unroll loops, write SIMD code (altivec, SSE), etc.

The "footprint" of having a zero() method doesn't seem that heavy to me.  But if people really don't like to have the method in the Vector class, then I could certainly write helper functions which use memset() - something like:

void zero(Vector<float>);
void zero(Vector<double>);

These could be defined in Vector.h, somewhere else in wtf, or directly in my audio directory.

The reason I made this patch is because people didn't like the idea of me sub-classing Vector in order to implement a zero() method (in addition to some other things) in this bug:

https://bugs.webkit.org/show_bug.cgi?id=34452

Adam Barth suggested I could implement this in Vector (using traits).  The other approaches (subclassing or helper functions) are also fine with me.
Comment 12 Chris Rogers 2010-03-31 15:02:08 PDT
(In reply to comment #10)
> What about something like either of these.
> 
> 
> 
> namespace WTF {
> 
> template <bool canZero, typename T>
> struct ZeroFiller;
> 
> template<typename T>
> struct ZeroFiller<true, T> {
>     static void zero(T* begin, T* end)
>     {
>         memset(begin, 0, reinterpret_cast<char*>(end) -
> reinterpret_cast<char*>(begin));
>     }
> };
> 
> template<typename T>
> void zeroFill(T* begin, T* end)
> {
>     ZeroFiller::zero<IsPod<T>::value, T>(first, last)
> }
> 
> } // namespace WTF
> 
> 
> or
> 
> 
> namespace WTF {
> 
> template<typename T>
> void zeroFill(T* begin, T* end)
> {
>     COMPILE_ASSERT(IsPod<T>);
>     memset(begin, 0, reinterpret_cast<char*>(end) -
> reinterpret_cast<char*>(begin));
> }
> 
> } // namespace WTF

Sam, thanks that looks pretty good.  Which file do you think this code belongs in?
Comment 13 Sam Weinig 2010-03-31 15:04:47 PDT
> Sam, thanks that looks pretty good.  Which file do you think this code belongs
> in?

I think StdLibExtras.h would be good.
Comment 14 Chris Rogers 2010-03-31 15:11:35 PDT
(In reply to comment #13)
> > Sam, thanks that looks pretty good.  Which file do you think this code belongs
> > in?
> 
> I think StdLibExtras.h would be good.

Sounds good - if nobody objects I'd like to take Sam's suggestion and just to round things out also add the following function:

template<typename T>
void zeroFill(Vector<T> v)
{
    zeroFill(v.begin(). v.end());
}
Comment 15 Chris Rogers 2010-03-31 15:12:39 PDT
Sorry I meant to pass by reference...

template<typename T>
void zeroFill(Vector<T>& v)
{
    zeroFill(v.begin(). v.end());
}
Comment 16 Maciej Stachowiak 2010-03-31 23:45:57 PDT
(In reply to comment #10)
> What about something like either of these.
> 
> 
> 
> namespace WTF {
> 
> template <bool canZero, typename T>
> struct ZeroFiller;
> 
> template<typename T>
> struct ZeroFiller<true, T> {
>     static void zero(T* begin, T* end)
>     {
>         memset(begin, 0, reinterpret_cast<char*>(end) -
> reinterpret_cast<char*>(begin));
>     }
> };
> 
> template<typename T>
> void zeroFill(T* begin, T* end)
> {
>     ZeroFiller::zero<IsPod<T>::value, T>(first, last)
> }
> 
> } // namespace WTF

1) How about implementing the non-POD version with a loop that just assigns 0? That will still fail at compile time if the type pointed to is one where you can't assign 0.

2) Is there any evidence that zeroFill() would measurably improve performance over fill(0)?

Regards,
Maciej
Comment 17 Maciej Stachowiak 2010-03-31 23:48:25 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > > Sam, thanks that looks pretty good.  Which file do you think this code belongs
> > > in?
> > 
> > I think StdLibExtras.h would be good.
> 
> Sounds good - if nobody objects I'd like to take Sam's suggestion and just to
> round things out also add the following function:
> 
> template<typename T>
> void zeroFill(Vector<T> v)
> {
>     zeroFill(v.begin(). v.end());
> }

Not sure that small bit of abstraction is much of a win. It also depends on the fact that the Vector iterator is a raw pointer, which isn't guaranteed. v.data() and v.data() + v.size() would be more robust ways to get the endpoints as pointers.
Comment 18 Maciej Stachowiak 2010-03-31 23:49:04 PDT
Comment on attachment 52089 [details]
Patch

r- because we seem to have rough consensus that the zero() method is not the right approach (though we are still discussing what ways of doing it would be better).
Comment 19 Chris Marrin 2010-04-01 11:17:05 PDT
It looks like std::fill() only uses a memset() optimization for 1 byte types. So I understand the need for an optimization. But can't this be done by optimizing Vector::fill() to use memset for any pod type being set to 0?
Comment 20 Chris Rogers 2010-04-01 12:51:12 PDT
(In reply to comment #19)
> It looks like std::fill() only uses a memset() optimization for 1 byte types.
> So I understand the need for an optimization. But can't this be done by
> optimizing Vector::fill() to use memset for any pod type being set to 0?

It clearly can be done in a brute-force way (template specializations for each one), but I'm not sure of an elegant solution there but will look closer at it since I suspect there is one.  Otherwise, I'm happy with Sam's solution.
Comment 21 Chris Rogers 2010-04-01 12:55:45 PDT
(In reply to comment #17)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > > Sam, thanks that looks pretty good.  Which file do you think this code belongs
> > > > in?
> > > 
> > > I think StdLibExtras.h would be good.
> > 
> > Sounds good - if nobody objects I'd like to take Sam's suggestion and just to
> > round things out also add the following function:
> > 
> > template<typename T>
> > void zeroFill(Vector<T> v)
> > {
> >     zeroFill(v.begin(). v.end());
> > }
> 
> Not sure that small bit of abstraction is much of a win. It also depends on the
> fact that the Vector iterator is a raw pointer, which isn't guaranteed.
> v.data() and v.data() + v.size() would be more robust ways to get the endpoints
> as pointers.

OK, v.data() sounds good.  At least in my use cases, and I suspect the majority of cases, zeroFill() would be called to clear the whole vector which is what prompted me to create the shortcut method which seems more concise.
Comment 22 Chris Rogers 2010-04-01 14:33:33 PDT
> 
> 2) Is there any evidence that zeroFill() would measurably improve performance
> over fill(0)?
> 
> Regards,
> Maciej

memset() is definitely quite a bit faster than a straight zero-filling loop.  It's tremendously optimized and its source code is quite interesting to read.

I created a quick benchmark to test my zero() method vs. fill():

    // Make sure memset() code is "hot" (so we're not measuring paging activity)
    float temp[2048];
    memset(temp, 0, sizeof(float) * 2048);

    printf("            n       fill()     memset()   speed improvement\n");
    
    for (unsigned i = 0; i < 24; ++i) {
        const int k = 1UL << i;
        const int n = k * 128;
        Vector<float> v1(n);
        Vector<float> v2(n);
        v1.fill(1.0);
        v2.fill(1.0);
        PerformanceTimer timer1("fill  "); // PerformanceTimer internally uses UpTime(), but could use mach_absolute_time()
        PerformanceTimer timer2("memset");
        timer1.Start();
        v1.fill(0.0);
        timer1.Stop(); // print elapsed time

        timer2.Start();
        v2.zero();
        timer2.Stop();
        
        // times in usec
        double time1 = timer1.GetElapsedTime();
        double time2 = timer2.GetElapsedTime();
        double speedUp = time1 / time2;
        printf("%14d %10.3f %10.3f %10.3f\n", n, time1, time2, speedUp);
    }


Here are the results run on the latest 8-core Xeon desktop Mac (laptops and ARM chips would be somewhat slower):

            n       fill()     memset()   speed improvement
           128      0.291      0.223      1.305
           256      0.411      0.191      2.152
           512      0.712      0.218      3.266
          1024      0.902      0.194      4.649
          2048      1.760      0.329      5.350
          4096      3.299      0.691      4.774
          8192      6.531      1.397      4.675
         16384     13.029      2.825      4.612
         32768     26.016      6.163      4.221
         65536     56.320     16.795      3.353
        131072    107.894     33.843      3.188
        262144    207.663     65.043      3.193
        524288    415.301    131.098      3.168
       1048576    841.864    549.443      1.532
       2097152   1721.820   1280.595      1.345
       4194304   3377.447   2427.105      1.392
       8388608   6726.940   4902.623      1.372
      16777216  13458.201   9882.302      1.362
      33554432  26849.611  19685.213      1.364
      67108864  53677.435  39297.878      1.366
     134217728 107947.327  79122.879      1.364
Comment 23 Chris Rogers 2011-06-14 12:37:57 PDT
Changing to won't fix there exist alternatives for WebGL and Web Audio such as:
ArrayBufferView::zeroRangeImpl()