Summary: | Add zero() method to Vector class | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Chris Rogers <crogers> | ||||
Component: | Web Template Framework | Assignee: | Nobody <webkit-unassigned> | ||||
Status: | RESOLVED WONTFIX | ||||||
Severity: | Normal | CC: | abarth, ap, cmarrin, darin, jorlow, mjs, sam | ||||
Priority: | P2 | ||||||
Version: | 528+ (Nightly build) | ||||||
Hardware: | Other | ||||||
OS: | OS X 10.5 | ||||||
Attachments: |
|
Description
Chris Rogers
2010-03-30 13:23:50 PDT
Created attachment 52089 [details]
Patch
This patch adds a zero() method to Vector (only for the POD types) which memsets the bytes to all zeroes. For integer and floating-point values this corresponds to all values of 0 or 0.0. The reason it's interesting to have a zero() method when there's already a fill() method is because zero() can be much more optimized (using memset) rather than using std::fill(). GCC optimizes std::fill() using a memset when possible. /** * @brief Fills the range [first,last) with copies of value. * @param first A forward iterator. * @param last A forward iterator. * @param value A reference-to-const of arbitrary type. * @return Nothing. * * This function fills a range with copies of the same value. For one-byte * types filling contiguous areas of memory, this becomes an inline call to * @c memset. */ I think it's even done at two levels - both as a specialization in STL, and a compiler optimization for those engineers who use explicit loops. Right, but I'm interested in using this for Vector<float> and Vector<double> which won't have this optimization. I don't really like the idea of having a method that silently does nothing when it's not safe. I actually think having it zero out even non-POD is better than silently doing nothing. Is it possible to add a specialization of fill for doubles and floats when 0.0 is passed in? Actually, it doesn't silently do nothing. I made the method private so it won't even compile in the cases where it doesn't make sense and could be dangerous. For many non-POD type objects, simply initializing to zero would not be a good thing. For example, OwnPtr would leak. I suppose it's possible to specialize the fill() method for 0.0 with float and double (and long double, short, long long, int, etc.), but it seems more concise and precise to be able to call object.zero() rather than object.fill(0.0). (In reply to comment #6) > it seems more concise and precise to be able > to call object.zero() rather than object.fill(0.0). More concise, yes. More precise? No. I believe it's less precise. Is there a way to optimize object.fill(0.0) or not? I suggest we not add this. But I could be convinced otherwise. (In reply to comment #7) > (In reply to comment #6) > > it seems more concise and precise to be able > > to call object.zero() rather than object.fill(0.0). > > More concise, yes. > > More precise? No. I believe it's less precise. > > Is there a way to optimize object.fill(0.0) or not? > > I suggest we not add this. But I could be convinced otherwise. OK, maybe not more precise :) But one point I was trying to make is that the current patch addresses all of the potential interesting cases where I could see this being useful (char, unsigned char, short, float, double, long double, long long, etc.). I'm not sure how to get the same behavior other than creating a specialization of fill() for each of these types separately which seems somewhat cumbersome. Maybe there's a shortcut to avoid this - any suggestions? Do we really need this method in the Vector class? It may be a bit more concise, but it seems like overkill. I don't think it will improve performance in any significant way, and I don't thing 'zero()' is any more clear than 'fill(0)'. It seems like it just unnecessarily adds to the size of the API of this function. What about something like either of these. namespace WTF { template <bool canZero, typename T> struct ZeroFiller; template<typename T> struct ZeroFiller<true, T> { static void zero(T* begin, T* end) { memset(begin, 0, reinterpret_cast<char*>(end) - reinterpret_cast<char*>(begin)); } }; template<typename T> void zeroFill(T* begin, T* end) { ZeroFiller::zero<IsPod<T>::value, T>(first, last) } } // namespace WTF or namespace WTF { template<typename T> void zeroFill(T* begin, T* end) { COMPILE_ASSERT(IsPod<T>); memset(begin, 0, reinterpret_cast<char*>(end) - reinterpret_cast<char*>(begin)); } } // namespace WTF (In reply to comment #9) > Do we really need this method in the Vector class? It may be a bit more > concise, but it seems like overkill. I don't think it will improve performance > in any significant way, and I don't thing 'zero()' is any more clear than > 'fill(0)'. It seems like it just unnecessarily adds to the size of the API of > this function. For many applications the performance benefit may be fairly small, but for audio it makes sense. When I worked on the CoreAudio team we were pressured very hard to make these types of optimizations, unroll loops, write SIMD code (altivec, SSE), etc. The "footprint" of having a zero() method doesn't seem that heavy to me. But if people really don't like to have the method in the Vector class, then I could certainly write helper functions which use memset() - something like: void zero(Vector<float>); void zero(Vector<double>); These could be defined in Vector.h, somewhere else in wtf, or directly in my audio directory. The reason I made this patch is because people didn't like the idea of me sub-classing Vector in order to implement a zero() method (in addition to some other things) in this bug: https://bugs.webkit.org/show_bug.cgi?id=34452 Adam Barth suggested I could implement this in Vector (using traits). The other approaches (subclassing or helper functions) are also fine with me. (In reply to comment #10) > What about something like either of these. > > > > namespace WTF { > > template <bool canZero, typename T> > struct ZeroFiller; > > template<typename T> > struct ZeroFiller<true, T> { > static void zero(T* begin, T* end) > { > memset(begin, 0, reinterpret_cast<char*>(end) - > reinterpret_cast<char*>(begin)); > } > }; > > template<typename T> > void zeroFill(T* begin, T* end) > { > ZeroFiller::zero<IsPod<T>::value, T>(first, last) > } > > } // namespace WTF > > > or > > > namespace WTF { > > template<typename T> > void zeroFill(T* begin, T* end) > { > COMPILE_ASSERT(IsPod<T>); > memset(begin, 0, reinterpret_cast<char*>(end) - > reinterpret_cast<char*>(begin)); > } > > } // namespace WTF Sam, thanks that looks pretty good. Which file do you think this code belongs in? > Sam, thanks that looks pretty good. Which file do you think this code belongs
> in?
I think StdLibExtras.h would be good.
(In reply to comment #13) > > Sam, thanks that looks pretty good. Which file do you think this code belongs > > in? > > I think StdLibExtras.h would be good. Sounds good - if nobody objects I'd like to take Sam's suggestion and just to round things out also add the following function: template<typename T> void zeroFill(Vector<T> v) { zeroFill(v.begin(). v.end()); } Sorry I meant to pass by reference... template<typename T> void zeroFill(Vector<T>& v) { zeroFill(v.begin(). v.end()); } (In reply to comment #10) > What about something like either of these. > > > > namespace WTF { > > template <bool canZero, typename T> > struct ZeroFiller; > > template<typename T> > struct ZeroFiller<true, T> { > static void zero(T* begin, T* end) > { > memset(begin, 0, reinterpret_cast<char*>(end) - > reinterpret_cast<char*>(begin)); > } > }; > > template<typename T> > void zeroFill(T* begin, T* end) > { > ZeroFiller::zero<IsPod<T>::value, T>(first, last) > } > > } // namespace WTF 1) How about implementing the non-POD version with a loop that just assigns 0? That will still fail at compile time if the type pointed to is one where you can't assign 0. 2) Is there any evidence that zeroFill() would measurably improve performance over fill(0)? Regards, Maciej (In reply to comment #14) > (In reply to comment #13) > > > Sam, thanks that looks pretty good. Which file do you think this code belongs > > > in? > > > > I think StdLibExtras.h would be good. > > Sounds good - if nobody objects I'd like to take Sam's suggestion and just to > round things out also add the following function: > > template<typename T> > void zeroFill(Vector<T> v) > { > zeroFill(v.begin(). v.end()); > } Not sure that small bit of abstraction is much of a win. It also depends on the fact that the Vector iterator is a raw pointer, which isn't guaranteed. v.data() and v.data() + v.size() would be more robust ways to get the endpoints as pointers. Comment on attachment 52089 [details]
Patch
r- because we seem to have rough consensus that the zero() method is not the right approach (though we are still discussing what ways of doing it would be better).
It looks like std::fill() only uses a memset() optimization for 1 byte types. So I understand the need for an optimization. But can't this be done by optimizing Vector::fill() to use memset for any pod type being set to 0? (In reply to comment #19) > It looks like std::fill() only uses a memset() optimization for 1 byte types. > So I understand the need for an optimization. But can't this be done by > optimizing Vector::fill() to use memset for any pod type being set to 0? It clearly can be done in a brute-force way (template specializations for each one), but I'm not sure of an elegant solution there but will look closer at it since I suspect there is one. Otherwise, I'm happy with Sam's solution. (In reply to comment #17) > (In reply to comment #14) > > (In reply to comment #13) > > > > Sam, thanks that looks pretty good. Which file do you think this code belongs > > > > in? > > > > > > I think StdLibExtras.h would be good. > > > > Sounds good - if nobody objects I'd like to take Sam's suggestion and just to > > round things out also add the following function: > > > > template<typename T> > > void zeroFill(Vector<T> v) > > { > > zeroFill(v.begin(). v.end()); > > } > > Not sure that small bit of abstraction is much of a win. It also depends on the > fact that the Vector iterator is a raw pointer, which isn't guaranteed. > v.data() and v.data() + v.size() would be more robust ways to get the endpoints > as pointers. OK, v.data() sounds good. At least in my use cases, and I suspect the majority of cases, zeroFill() would be called to clear the whole vector which is what prompted me to create the shortcut method which seems more concise. >
> 2) Is there any evidence that zeroFill() would measurably improve performance
> over fill(0)?
>
> Regards,
> Maciej
memset() is definitely quite a bit faster than a straight zero-filling loop. It's tremendously optimized and its source code is quite interesting to read.
I created a quick benchmark to test my zero() method vs. fill():
// Make sure memset() code is "hot" (so we're not measuring paging activity)
float temp[2048];
memset(temp, 0, sizeof(float) * 2048);
printf(" n fill() memset() speed improvement\n");
for (unsigned i = 0; i < 24; ++i) {
const int k = 1UL << i;
const int n = k * 128;
Vector<float> v1(n);
Vector<float> v2(n);
v1.fill(1.0);
v2.fill(1.0);
PerformanceTimer timer1("fill "); // PerformanceTimer internally uses UpTime(), but could use mach_absolute_time()
PerformanceTimer timer2("memset");
timer1.Start();
v1.fill(0.0);
timer1.Stop(); // print elapsed time
timer2.Start();
v2.zero();
timer2.Stop();
// times in usec
double time1 = timer1.GetElapsedTime();
double time2 = timer2.GetElapsedTime();
double speedUp = time1 / time2;
printf("%14d %10.3f %10.3f %10.3f\n", n, time1, time2, speedUp);
}
Here are the results run on the latest 8-core Xeon desktop Mac (laptops and ARM chips would be somewhat slower):
n fill() memset() speed improvement
128 0.291 0.223 1.305
256 0.411 0.191 2.152
512 0.712 0.218 3.266
1024 0.902 0.194 4.649
2048 1.760 0.329 5.350
4096 3.299 0.691 4.774
8192 6.531 1.397 4.675
16384 13.029 2.825 4.612
32768 26.016 6.163 4.221
65536 56.320 16.795 3.353
131072 107.894 33.843 3.188
262144 207.663 65.043 3.193
524288 415.301 131.098 3.168
1048576 841.864 549.443 1.532
2097152 1721.820 1280.595 1.345
4194304 3377.447 2427.105 1.392
8388608 6726.940 4902.623 1.372
16777216 13458.201 9882.302 1.362
33554432 26849.611 19685.213 1.364
67108864 53677.435 39297.878 1.366
134217728 107947.327 79122.879 1.364
Changing to won't fix there exist alternatives for WebGL and Web Audio such as: ArrayBufferView::zeroRangeImpl() |