RESOLVED FIXED 120211
[CSS Shapes] Improve the performance of image valued shapes
https://bugs.webkit.org/show_bug.cgi?id=120211
Summary [CSS Shapes] Improve the performance of image valued shapes
Hans Muller
Reported 2013-08-23 09:49:16 PDT
Created attachment 209468 [details] benchmark Shapes defined by large complex images can have a significant impact on layout performance. The time spent creating an internal RasterShape object dominates the overall cost of layout and rendering when the value of a shape-inside is a complex image. A simple benchmark that demonstrates the problem is attached.
Attachments
benchmark (11.59 KB, text/html)
2013-08-23 09:49 PDT, Hans Muller
no flags
Patch (13.89 KB, patch)
2013-09-04 18:00 PDT, Hans Muller
no flags
Patch (13.91 KB, patch)
2013-09-05 09:05 PDT, Hans Muller
no flags
Patch (16.00 KB, patch)
2013-09-05 10:39 PDT, Hans Muller
no flags
Patch (13.97 KB, patch)
2013-09-05 13:05 PDT, Hans Muller
achicu: review-
Patch (16.03 KB, patch)
2013-09-10 12:31 PDT, Hans Muller
achicu: review+
Patch (16.14 KB, patch)
2013-09-11 16:49 PDT, Hans Muller
commit-queue: commit-queue-
Patch (16.14 KB, patch)
2013-09-11 17:47 PDT, Hans Muller
no flags
Hans Muller
Comment 1 2013-08-23 09:50:22 PDT
I did some shape-inside benchmarking with a 2480x2480 snowflake image (http://imagesci.com/img/2013/04/snowflake-background-8906-hd-wallpapers.png). The run-length encoded version of the mask produced by thresholding this image's alpha channel is reasonably complex: 2170 vertical spans and 24322 horizontal segments. Profiling showed that 85% of the execution time was spent building the Region based representation of the shape. The culprit was Region::unite() which is used by RasterShapeIntervals.addInterval() to add each horizontal interval or "run" to the Region. Regions are actually just containers for Region::Shape objects and all of the logical Shape operations are constructive. So Region::unite(otherRegion) computes a new Region::Shape which is the union of this Region and otherRegion and replaces this Region's internal Region::Shape object with the new one. In our case otherRegion is just a 1 pixel high rectangle that represents a run. A Region and its Region::Shape must be created to represent the rectangle, which forces the allocation of two Vectors: Vector<int, 32> m_segments; Vector<Span, 16> m_spans; Not surprisingly, most of the benchmark's time spent in Region::unite() is spent inside the Vector memory management methods. By making some ugly hacks in Region.h,cpp I was able to demonstrate that if the Region::unite(IntRect) did its work in-place, without constructing any temporary Regions or Region::Shapes, I could improve the performance of laying out and rendering the snowflake benchmark by 10X. From an overall time of about 1.2 seconds to 120ms. This appears to be a worthy quest. I'd planned to add the following methods to Region: void unite(const IntRect&); void intersect(const IntRect&); bool contains(const IntRect&) const; bool intersects(const IntRect&) const; The goal was to do the work without allocating any temporary objects. This proved to be painful and has led me to reconsider using Regions as the internal representation for Image shapes. The Region class was based on "Scanline Coherent Shape Algebra", Jonathan E. Steinhart, Graphics Gems II, Chapter 1.9 (http://tinyurl.com/krbmpq5). Its design was motivated by the needs of window systems, where fast logical operations on sets of overlapping rectangles are needed. The WebKit implementation stores shapes that are predominantly rectangular, very compactly. Aside from the temporary memory allocation costs, logical operations are quite efficient. The representation lacks some of the explicit structure one would expect. For example all of the horizontal segments are stored in a single vector<int>, where each subsequent pair of elements represents the segment's start and end X coordinates. Vertical "spans" just contain their starting y coordinate and the index of their first segment. A m_span[i]'s height is m_spans[i+1].y - m_spans[i].y. Its segments are m_segments[ m_spans[i].segment_index ] to m_segments[ m_spans[i+1].segment_index ]. The implementation of logical operations uses begin/end pairs of iterators to navigate the packed data structure. I took a shot at writing an implementation of Region::unite(const IntRect&) that modified the Region in-place. Several shots actually. The results were messier than I'd be willing to share. Reconsidering the dependency of RasterShape (RasterShapeIntervals really) on Region: - The boundaries of RasterShapes rarely look like the logical combination of a small set of rectangles. The shapes tend to be ragged and complex, which defeats the Region's coalescing strategy. - The first fit, inside intervals, and outside intervals, RasterShape algorithms are primarily defined in terms of horizontal intervals, not shapes. Most of the Regions employed by the RasterShape implementation, notably the Regions used to construct the RasterShape in the first place, are really just glorified horizontal intervals. - An unpacked run-length encoding of the shape, where each image row was represented by a run-list, would trivialize a common RasterShape operation: constructing a read-only view of a horizontal band within the shape. Doing as much with Region requires constructing a temporary Region that contains a copy of the horizontal band. Which is considerably more expensive. Creating a simple run-length encoded type for RasterShape's internal use looks like it would be less difficult and more efficient than attempting stretch the Region type.
Eric Seidel (no email)
Comment 2 2013-08-23 12:14:08 PDT
Region has long been slow. The last time I ran into them was bug 98800.
Hans Muller
Comment 3 2013-08-27 15:58:28 PDT
Summary: I'd like to eliminate RasterShape's dependence on Region as well as improve its efficiency a bit, in three steps: - Convert ShapeInterval to a template class to enable ShapeInterval<float> for PolygonShape, and ShapeInterval<int> for RasterShape. - Revise the ShapeIntervals implementation to improve efficiency and clarity. - Redefine RasterShape in terms of ShapeInterval<int>. Here's a more detailed rationale, starting with RasterShape. A version of RasterShape that doesn't depend on Region could be as simple as the following. I've assumed that the existing ShapeInterval class has been turned into a template type. The cost of adding a run to the shape is just a vector append, and the cost of access to all of the runs for one image row, getIntervals(), is just a vector lookup. RasterShape's internal m_intervalLists Vector could be define its own allocater to reduce the overhead of the per-row ShapeInterval vectors, however the additional complexity might not be worth the trouble. class RasterShape { public: RLEShape(unsigned size) : m_bounds(-1, -1, -1, -1) { m_intervalLists.resize(size); } typedef ShapeInterval<int> RasterShapeInterval; unsigned size() const { return m_intervalLists.size(); } const IntRect& bounds() { m_bounds; } const Vector<RasterShapeInterval>& getIntervals(int y) const { ASSERT(y >= 0 && y < m_intervalLists.size()); return m_intervalLists[y]; } void appendInterval(int y, int x1, int x2) { ASSERT(y >= 0 && y < m_intervalLists.size()); m_intervalLists[y].append(RasterShapeInterval(x1, x2)); m_bounds.unite(IntRect(x1, y, x2 - x1, 0)); } private: IntRect m_bounds; Vector<Vector<RasterShapeInterval > > m_intervalLists; }; Templatizing the existing ShapeIntervals class, which current defines an interval as a pair floats, is straightforward. template <typename T> class ShapeInterval { public: Interval(T x1 = 0, T x2 = 0) : m_x1(x1) , m_x2(x2) { ASSERT(x2 >= x1); } T x1() const { return m_x1; } T x2() const { return m_x2; } void setX1(T value) { m_x1 = value; } void setX2(T value) { m_x2 = value; } void set(T x1, T x2) { ASSERT(x2 >= x1); m_x1 = x1; m_x2 = x2; } bool overlaps(const Interval<T>& interval) const { return x2() >= interval.x1() && x1() <= interval.x2(); } private: T m_x1; T m_x2; }; // ... overloads for the logical operators In light of this I've taken the opportunity to review the ShapeIntervals class. The logical operations could be written more simply and more efficiently. For example, the logical OR operation: template <typename T> void uniteShapeIntervals(const Vector<ShapeInterval<T> >& a, const Vector<ShapeInterval<T > >& b, Vector<ShapeInterval<T> >& result) { // ... assertions and the empty a/b special cases left out for clarity typedef typename Vector<ShapeInterval<T> >::const_iterator IntervalIterator; IntervalIterator aNext = a.begin(); IntervalIterator bNext = b.begin(); while(aNext != a.end() || bNext != b.end()) { IntervalIterator next = (bNext == b.end() || (aNext != a.end() && *aNext < *bNext)) ? aNext++ : bNext++; if (result.isEmpty() || !result.last().overlaps(*next)) result.append(*next); else result.last().setX2(std::max<T>(result.last().x2(), next->x2())); } } Aside from the typedef, this isn't particularly tricky either.
Hans Muller
Comment 4 2013-09-04 18:00:37 PDT
Created attachment 210533 [details] Patch Replaced the implementation of RasterShapeIntervals with one based on the new ShapeInterval<int> class. This eliminates the dependency the Region class and delivers a 10X layout speedup for the attached benchmark. This a just an implementation refactoring, no new tests were needed.
Hans Muller
Comment 5 2013-09-05 09:05:12 PDT
Created attachment 210623 [details] Patch Corrected a pre-existing inconsistency in the way intervals where converted to line segments.
Hans Muller
Comment 6 2013-09-05 10:39:34 PDT
Created attachment 210635 [details] Patch Corrected ShapeInterval<T>subtractShapeIntervals(), prevent attempts to dereference Vector iterators that had reached the end.
Hans Muller
Comment 7 2013-09-05 13:05:04 PDT
Created attachment 210651 [details] Patch Some implementation changes based on review feedback.
Hans Muller
Comment 8 2013-09-05 13:05:52 PDT
(In reply to comment #6) > Created an attachment (id=210635) [details] > Patch > > Corrected ShapeInterval<T>subtractShapeIntervals(), prevent attempts to dereference Vector iterators that had reached the end. I've removed this from the patch. It will be included in a separate bug.
Hans Muller
Comment 9 2013-09-06 08:21:04 PDT
(In reply to comment #8) > (In reply to comment #6) > > Created an attachment (id=210635) [details] [details] > > Patch > > > > Corrected ShapeInterval<T>subtractShapeIntervals(), prevent attempts to dereference Vector iterators that had reached the end. > > I've removed this from the patch. It will be included in a separate bug. FTR, it's https://bugs.webkit.org/show_bug.cgi?id=120802
Alexandru Chiculita
Comment 10 2013-09-10 10:22:03 PDT
Comment on attachment 210651 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=210651&action=review > Source/WebCore/rendering/shapes/RasterShape.cpp:95 > bool RasterShapeIntervals::firstIncludedIntervalY(int minY, const IntSize& minSize, LayoutUnit& result) const There's a bunch of code that lives in RasterShape::firstIncludedIntervalLogicalTop that checks that minY fits in the bounds of the intervals, but I would rather have that here instead. It seems like RasterShape knows too much about RasterShapeIntervals. > Source/WebCore/rendering/shapes/RasterShape.cpp:104 > + if (!getIntervalX1Values(lineY, lineY + minSize.height(), intervalX1Values)) Is there any reason to consider the intervals that are smaller than the minSize.width() ? > Source/WebCore/rendering/shapes/RasterShape.cpp:113 > + if (contains(IntRect(lineX, lineY, minSize.width(), minSize.height()))) { Keep a reference to the IntRect on the stack and just update the x() / y() members when needed. The size is never changing. > Source/WebCore/rendering/shapes/RasterShape.cpp:123 > +void RasterShapeIntervals::getIncludedIntervals(int y1, int y2, IntShapeIntervals& result) const ditto, I would move the checks from RasterShape::getIncludedIntervals in here instead. > Source/WebCore/rendering/shapes/RasterShape.cpp:135 > + IntShapeInterval::intersectShapeIntervals(result, getIntervals(y), intervals); I think you can check if you have any remaining intervals and just break early. > Source/WebCore/rendering/shapes/RasterShape.cpp:136 > + result = intervals; You can do a vector swap instead as you don't need to preserve previous vector. (There's a function in Vector.h for that) > Source/WebCore/rendering/shapes/RasterShape.cpp:153 > + result = intervals; ditto. Swap vectors instead. > Source/WebCore/rendering/shapes/RasterShape.cpp:193 > + for (unsigned i = 0; i < excludedIntervals.size(); ++i) > + result.append(LineSegment(excludedIntervals[i].x1(), excludedIntervals[i].x2() + 1)); It seems like this could be a function as you do it twice. > Source/WebCore/rendering/shapes/RasterShape.h:53 > + bool getIntervalX1Values(int minY, int maxY, Vector<int>& result) const; All the methods that are not supposed to be used by other classes should be private.
Hans Muller
Comment 11 2013-09-10 12:31:59 PDT
Created attachment 211223 [details] Patch Made the suggested changes.
Alexandru Chiculita
Comment 12 2013-09-11 16:11:43 PDT
Comment on attachment 211223 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=211223&action=review > Source/WebCore/rendering/shapes/RasterShape.cpp:220 > + float minY = std::max<float>(intervals.bounds().y(), minLogicalIntervalTop); There's no reason to have this outside the intervals.firstIncludedIntervalY method. It actually knows too much about how the method is supposed to work. > Source/WebCore/rendering/shapes/RasterShape.h:54 > + const IntShapeIntervals& getIntervals(int y) const This method is not needed outside RasterShapeIntervals. Make it private.
Hans Muller
Comment 13 2013-09-11 16:49:37 PDT
WebKit Commit Bot
Comment 14 2013-09-11 17:43:42 PDT
Comment on attachment 211365 [details] Patch Rejecting attachment 211365 [details] from commit-queue. Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.appspot.com', '--bot-id=webkit-cq-01', 'validate-changelog', '--check-oops', '--non-interactive', 211365, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit ChangeLog entry in Source/WebCore/ChangeLog contains OOPS!. Full output: http://webkit-queues.appspot.com/results/1799017
Hans Muller
Comment 15 2013-09-11 17:47:15 PDT
WebKit Commit Bot
Comment 16 2013-09-11 18:20:46 PDT
Comment on attachment 211374 [details] Patch Clearing flags on attachment: 211374 Committed r155583: <http://trac.webkit.org/changeset/155583>
WebKit Commit Bot
Comment 17 2013-09-11 18:20:49 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.