Full Text Bug Listing

Chris Rogers

Reported 2010-03-30 13:23:50 PDT

Add zero() method to Vector class

Chris Rogers

Comment 1 2010-03-30 15:35:53 PDT

Created attachment 52089 [details] Patch

Chris Rogers

Comment 2 2010-03-30 15:40:15 PDT

This patch adds a zero() method to Vector (only for the POD types) which memsets the bytes to all zeroes. For integer and floating-point values this corresponds to all values of 0 or 0.0. The reason it's interesting to have a zero() method when there's already a fill() method is because zero() can be much more optimized (using memset) rather than using std::fill().

Alexey Proskuryakov

Comment 3 2010-03-30 16:12:50 PDT

GCC optimizes std::fill() using a memset when possible. /** * @brief Fills the range [first,last) with copies of value. * @param first A forward iterator. * @param last A forward iterator. * @param value A reference-to-const of arbitrary type. * @return Nothing. * * This function fills a range with copies of the same value. For one-byte * types filling contiguous areas of memory, this becomes an inline call to * @c memset. */ I think it's even done at two levels - both as a specialization in STL, and a compiler optimization for those engineers who use explicit loops.

Chris Rogers

Comment 4 2010-03-30 16:43:14 PDT

Right, but I'm interested in using this for Vector<float> and Vector<double> which won't have this optimization.

Jeremy Orlow

Comment 5 2010-03-31 03:30:41 PDT

I don't really like the idea of having a method that silently does nothing when it's not safe. I actually think having it zero out even non-POD is better than silently doing nothing. Is it possible to add a specialization of fill for doubles and floats when 0.0 is passed in?

Chris Rogers

Comment 6 2010-03-31 11:45:05 PDT

Actually, it doesn't silently do nothing. I made the method private so it won't even compile in the cases where it doesn't make sense and could be dangerous. For many non-POD type objects, simply initializing to zero would not be a good thing. For example, OwnPtr would leak. I suppose it's possible to specialize the fill() method for 0.0 with float and double (and long double, short, long long, int, etc.), but it seems more concise and precise to be able to call object.zero() rather than object.fill(0.0).

Darin Adler

Comment 7 2010-03-31 12:49:16 PDT

(In reply to comment #6) > it seems more concise and precise to be able > to call object.zero() rather than object.fill(0.0). More concise, yes. More precise? No. I believe it's less precise. Is there a way to optimize object.fill(0.0) or not? I suggest we not add this. But I could be convinced otherwise.

Chris Rogers

Comment 8 2010-03-31 13:15:42 PDT

(In reply to comment #7) > (In reply to comment #6) > > it seems more concise and precise to be able > > to call object.zero() rather than object.fill(0.0). > > More concise, yes. > > More precise? No. I believe it's less precise. > > Is there a way to optimize object.fill(0.0) or not? > > I suggest we not add this. But I could be convinced otherwise. OK, maybe not more precise :) But one point I was trying to make is that the current patch addresses all of the potential interesting cases where I could see this being useful (char, unsigned char, short, float, double, long double, long long, etc.). I'm not sure how to get the same behavior other than creating a specialization of fill() for each of these types separately which seems somewhat cumbersome. Maybe there's a shortcut to avoid this - any suggestions?

Chris Marrin

Comment 9 2010-03-31 13:37:06 PDT

Do we really need this method in the Vector class? It may be a bit more concise, but it seems like overkill. I don't think it will improve performance in any significant way, and I don't thing 'zero()' is any more clear than 'fill(0)'. It seems like it just unnecessarily adds to the size of the API of this function.

Sam Weinig

Comment 10 2010-03-31 14:47:15 PDT

What about something like either of these. namespace WTF { template <bool canZero, typename T> struct ZeroFiller; template<typename T> struct ZeroFiller<true, T> { static void zero(T* begin, T* end) { memset(begin, 0, reinterpret_cast<char*>(end) - reinterpret_cast<char*>(begin)); } }; template<typename T> void zeroFill(T* begin, T* end) { ZeroFiller::zero<IsPod<T>::value, T>(first, last) } } // namespace WTF or namespace WTF { template<typename T> void zeroFill(T* begin, T* end) { COMPILE_ASSERT(IsPod<T>); memset(begin, 0, reinterpret_cast<char*>(end) - reinterpret_cast<char*>(begin)); } } // namespace WTF

Chris Rogers

Comment 11 2010-03-31 14:59:31 PDT

(In reply to comment #9) > Do we really need this method in the Vector class? It may be a bit more > concise, but it seems like overkill. I don't think it will improve performance > in any significant way, and I don't thing 'zero()' is any more clear than > 'fill(0)'. It seems like it just unnecessarily adds to the size of the API of > this function. For many applications the performance benefit may be fairly small, but for audio it makes sense. When I worked on the CoreAudio team we were pressured very hard to make these types of optimizations, unroll loops, write SIMD code (altivec, SSE), etc. The "footprint" of having a zero() method doesn't seem that heavy to me. But if people really don't like to have the method in the Vector class, then I could certainly write helper functions which use memset() - something like: void zero(Vector<float>); void zero(Vector<double>); These could be defined in Vector.h, somewhere else in wtf, or directly in my audio directory. The reason I made this patch is because people didn't like the idea of me sub-classing Vector in order to implement a zero() method (in addition to some other things) in this bug: https://bugs.webkit.org/show_bug.cgi?id=34452 Adam Barth suggested I could implement this in Vector (using traits). The other approaches (subclassing or helper functions) are also fine with me.

Chris Rogers

Comment 12 2010-03-31 15:02:08 PDT

(In reply to comment #10) > What about something like either of these. > > > > namespace WTF { > > template <bool canZero, typename T> > struct ZeroFiller; > > template<typename T> > struct ZeroFiller<true, T> { > static void zero(T* begin, T* end) > { > memset(begin, 0, reinterpret_cast<char*>(end) - > reinterpret_cast<char*>(begin)); > } > }; > > template<typename T> > void zeroFill(T* begin, T* end) > { > ZeroFiller::zero<IsPod<T>::value, T>(first, last) > } > > } // namespace WTF > > > or > > > namespace WTF { > > template<typename T> > void zeroFill(T* begin, T* end) > { > COMPILE_ASSERT(IsPod<T>); > memset(begin, 0, reinterpret_cast<char*>(end) - > reinterpret_cast<char*>(begin)); > } > > } // namespace WTF Sam, thanks that looks pretty good. Which file do you think this code belongs in?

Sam Weinig

Comment 13 2010-03-31 15:04:47 PDT

> Sam, thanks that looks pretty good. Which file do you think this code belongs > in? I think StdLibExtras.h would be good.

Chris Rogers

Comment 14 2010-03-31 15:11:35 PDT

(In reply to comment #13) > > Sam, thanks that looks pretty good. Which file do you think this code belongs > > in? > > I think StdLibExtras.h would be good. Sounds good - if nobody objects I'd like to take Sam's suggestion and just to round things out also add the following function: template<typename T> void zeroFill(Vector<T> v) { zeroFill(v.begin(). v.end()); }

Chris Rogers

Comment 15 2010-03-31 15:12:39 PDT

Sorry I meant to pass by reference... template<typename T> void zeroFill(Vector<T>& v) { zeroFill(v.begin(). v.end()); }

Maciej Stachowiak

Comment 16 2010-03-31 23:45:57 PDT

(In reply to comment #10) > What about something like either of these. > > > > namespace WTF { > > template <bool canZero, typename T> > struct ZeroFiller; > > template<typename T> > struct ZeroFiller<true, T> { > static void zero(T* begin, T* end) > { > memset(begin, 0, reinterpret_cast<char*>(end) - > reinterpret_cast<char*>(begin)); > } > }; > > template<typename T> > void zeroFill(T* begin, T* end) > { > ZeroFiller::zero<IsPod<T>::value, T>(first, last) > } > > } // namespace WTF 1) How about implementing the non-POD version with a loop that just assigns 0? That will still fail at compile time if the type pointed to is one where you can't assign 0. 2) Is there any evidence that zeroFill() would measurably improve performance over fill(0)? Regards, Maciej

Maciej Stachowiak

Comment 17 2010-03-31 23:48:25 PDT

(In reply to comment #14) > (In reply to comment #13) > > > Sam, thanks that looks pretty good. Which file do you think this code belongs > > > in? > > > > I think StdLibExtras.h would be good. > > Sounds good - if nobody objects I'd like to take Sam's suggestion and just to > round things out also add the following function: > > template<typename T> > void zeroFill(Vector<T> v) > { > zeroFill(v.begin(). v.end()); > } Not sure that small bit of abstraction is much of a win. It also depends on the fact that the Vector iterator is a raw pointer, which isn't guaranteed. v.data() and v.data() + v.size() would be more robust ways to get the endpoints as pointers.

Maciej Stachowiak

Comment 18 2010-03-31 23:49:04 PDT

Comment on attachment 52089 [details] Patch r- because we seem to have rough consensus that the zero() method is not the right approach (though we are still discussing what ways of doing it would be better).

Chris Marrin

Comment 19 2010-04-01 11:17:05 PDT

It looks like std::fill() only uses a memset() optimization for 1 byte types. So I understand the need for an optimization. But can't this be done by optimizing Vector::fill() to use memset for any pod type being set to 0?

Chris Rogers

Comment 20 2010-04-01 12:51:12 PDT

(In reply to comment #19) > It looks like std::fill() only uses a memset() optimization for 1 byte types. > So I understand the need for an optimization. But can't this be done by > optimizing Vector::fill() to use memset for any pod type being set to 0? It clearly can be done in a brute-force way (template specializations for each one), but I'm not sure of an elegant solution there but will look closer at it since I suspect there is one. Otherwise, I'm happy with Sam's solution.

Chris Rogers

Comment 21 2010-04-01 12:55:45 PDT

(In reply to comment #17) > (In reply to comment #14) > > (In reply to comment #13) > > > > Sam, thanks that looks pretty good. Which file do you think this code belongs > > > > in? > > > > > > I think StdLibExtras.h would be good. > > > > Sounds good - if nobody objects I'd like to take Sam's suggestion and just to > > round things out also add the following function: > > > > template<typename T> > > void zeroFill(Vector<T> v) > > { > > zeroFill(v.begin(). v.end()); > > } > > Not sure that small bit of abstraction is much of a win. It also depends on the > fact that the Vector iterator is a raw pointer, which isn't guaranteed. > v.data() and v.data() + v.size() would be more robust ways to get the endpoints > as pointers. OK, v.data() sounds good. At least in my use cases, and I suspect the majority of cases, zeroFill() would be called to clear the whole vector which is what prompted me to create the shortcut method which seems more concise.

Chris Rogers

Comment 22 2010-04-01 14:33:33 PDT

> > 2) Is there any evidence that zeroFill() would measurably improve performance > over fill(0)? > > Regards, > Maciej memset() is definitely quite a bit faster than a straight zero-filling loop. It's tremendously optimized and its source code is quite interesting to read. I created a quick benchmark to test my zero() method vs. fill(): // Make sure memset() code is "hot" (so we're not measuring paging activity) float temp[2048]; memset(temp, 0, sizeof(float) * 2048); printf(" n fill() memset() speed improvement\n"); for (unsigned i = 0; i < 24; ++i) { const int k = 1UL << i; const int n = k * 128; Vector<float> v1(n); Vector<float> v2(n); v1.fill(1.0); v2.fill(1.0); PerformanceTimer timer1("fill "); // PerformanceTimer internally uses UpTime(), but could use mach_absolute_time() PerformanceTimer timer2("memset"); timer1.Start(); v1.fill(0.0); timer1.Stop(); // print elapsed time timer2.Start(); v2.zero(); timer2.Stop(); // times in usec double time1 = timer1.GetElapsedTime(); double time2 = timer2.GetElapsedTime(); double speedUp = time1 / time2; printf("%14d %10.3f %10.3f %10.3f\n", n, time1, time2, speedUp); } Here are the results run on the latest 8-core Xeon desktop Mac (laptops and ARM chips would be somewhat slower): n fill() memset() speed improvement 128 0.291 0.223 1.305 256 0.411 0.191 2.152 512 0.712 0.218 3.266 1024 0.902 0.194 4.649 2048 1.760 0.329 5.350 4096 3.299 0.691 4.774 8192 6.531 1.397 4.675 16384 13.029 2.825 4.612 32768 26.016 6.163 4.221 65536 56.320 16.795 3.353 131072 107.894 33.843 3.188 262144 207.663 65.043 3.193 524288 415.301 131.098 3.168 1048576 841.864 549.443 1.532 2097152 1721.820 1280.595 1.345 4194304 3377.447 2427.105 1.392 8388608 6726.940 4902.623 1.372 16777216 13458.201 9882.302 1.362 33554432 26849.611 19685.213 1.364 67108864 53677.435 39297.878 1.366 134217728 107947.327 79122.879 1.364

Chris Rogers

Comment 23 2011-06-14 12:37:57 PDT

Changing to won't fix there exist alternatives for WebGL and Web Audio such as: ArrayBufferView::zeroRangeImpl()

Attachments
Patch (4.04 KB, patch) 2010-03-30 15:35 PDT, Chris Rogers	mjs: review-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Bug 36849