Created attachment 60610 [details] Simplified version of aol.co.uk landing page When going to certain heavyweight websites such as aol.co.uk, we noticed extremely poor performance on particular hardware which has a very high penalty for software fallbacks in the graphics pipeline. While investigating this, we locally added several features to Qt's X11 backend, allowing us to run transformed pixmaps, patterns, and linear gradients on the GPU. However, we discovered that Webkit does not take advantage of several of these features (including specifically linear gradients), and persists in using client-side software rendering for some page elements. The specific page element causing the most trouble on aol.co.uk is an "atlas map" implemented using CSS. The source images for this are extremely tall, 10k or 20k pixels in height, with large amounts of blank space between icons. Every icon drawn using this atlas map is apparently rendered in software within WebKit, despite no special features technically being required to accelerate them. We also noticed many instances when XPutImage is called to populate a pixmap, and (sometimes immediately) XGetImage is used to retrieve the exact same image data, which has not been changed in the interim. This data shuffling across the X11 socket is inherently inefficient. We have prepared a number of simplified test cases which expose the problem even on desktop hardware, through examination of X11 protocol traces. The main symptom is the excessive use of XGetImage. WebKit was built using sources from Subversion trunk on 5th of July 2010 and tested on desktop using QtTestBrowser with tiled backing store enabled.
Created attachment 61369 [details] Simple atlas-map test case. Here is a simpler reproduction of one of the problems. It consists of a small 256x128 map containing eight 64x64 icons, each of which is rendered in a separate DIV on the page. When run with native X11 backend and tiled backing store, this causes software fallbacks which show as XGetImage calls on the tiles in use. With a WVGA display and a large tilesize, the active tile area is 800x602. The 602 is calculated by 8*64 + 9*10, being the heights of the DIVs and their margins. XGetImage is extremely slow on our hardware, and we cannot avoid that situation. Avoiding XGetImage is our goal here.
I'd like to announce that due to the urgency of this issue to our customer, a bounty is now available for the above bug. This is split into three tiers: - For a thorough explanation of the reasons for the software fallbacks, which can be used to develop a solution, we are offering $500 (US dollars). - For a working fix for the problem on the Qt/X11/Linux platform, we are offering a further $400. - For a working fix which is platform independent, we are offering $100 on top of the above, in the interests of cleanliness. This is a total of $1000 for a complete, platform-independent fix. If further information is required to reliably reproduce the bug, please let me know ASAP.
Created attachment 61503 [details] Trace diagram from KCachegrind, showing some major code paths.
Interessting trace diagram. The diagram shows that Image::drawTiled() (called from RenderBoxModelObject::paintFillLayerExtended) calls Image::drawPattern(). When debugging on Safari/Mac, that code path is never taken. Snippet from WebCore/platform/graphics/Image.cpp: void Image::drawTiled(GraphicsContext* ctxt, const FloatRect& destRect, const FloatPoint& srcPoint, const FloatSize& scaledTileSize, ColorSpace styleColorSpace, CompositeOperator op) { ... // Check and see if a single draw of the image can cover the entire area we are supposed to tile. if (oneTileRect.contains(destRect)) { // <- LINE 130 .... draw(ctxt, destRect, visibleSrcRect, styleColorSpace, op); return; } ... drawPattern(ctxt, tileRect, patternTransform, oneTileRect.location(), styleColorSpace, op, destRect); .. } On Mac, it always returns true in the check on line 130, and never calls drawPattern. I suspect it may be a problem with ImageQt, you might want to set a breakpoint there, and verify whether it returns false in the oneTileRect.contains(destRect) check. When following the trace diagram it's obvious that you end up creating a QBrush with a QPixmap, through the drawPattern() code path. That's a QBrush, with QStyle::TexturePattern - which can be verified, as QTexturedBrushData is present at the bottom of the diagram. Later on QSpanData::setup(): void QSpanData::setup(const QBrush &brush, int alpha, QPainter::CompositionMode compositionMode) { ... case Qt::TexturePattern: type = Texture; if (!tempImage) tempImage = new QImage(); if (qHasPixmapTexture(brush) && brush.texture().isQBitmap()) *tempImage = rasterBuffer->colorizeBitmap(brush.textureImage(), brush.color()); else *tempImage = brush.textureImage(); initTexture(tempImage, alpha, QTextureData::Tiled, tempImage->rect()); ... convertes the brush pixmap to an image, see: QImage QBrush::textureImage() const { return d->style == Qt::TexturePattern ? (static_cast<QTexturedBrushData *>(d.data()))->image() : QImage(); } struct QTexturedBrushData : public QBrushData { .. QImage &image() { if (m_image.isNull() && m_pixmap) m_image = m_pixmap->toImage(); return m_image; } } This is the reason you're seeing the XGetImage calls, see qt-source/gui/image/qpixmap_x11.cpp: QImage QX11PixmapData::toImage() const { .. XImage *xi = XGetImage(X11->display, hd, 0, 0, w, h, AllPlanes, (d == 1) ? XYPixmap : ZPixmap); ... } Just wanted to share the trace, as I looked at it for fun :-) (Snippets from Qt 4.6.3 btw)
> On Mac, it always returns true in the check on line 130, and never calls drawPattern. Okay, that seems to be a difference between the posted testcase and the one we got that diagram from. > convertes the brush pixmap to an image, see: > > QImage QBrush::textureImage() const > { > return d->style == Qt::TexturePattern > ? (static_cast<QTexturedBrushData *>(d.data()))->image() > : QImage(); > } > > struct QTexturedBrushData : public QBrushData > { > .. > QImage &image() { > if (m_image.isNull() && m_pixmap) > m_image = m_pixmap->toImage(); > return m_image; > } > } > > This is the reason you're seeing the XGetImage calls... I think this is the key thing - that QImages are being converted to QPixmaps and later back to QImages. So what would be needed to preserve QImages through the render chain?
Created attachment 61620 [details] Trace diagram from KCachegrind, for the AOL test case attached.
Created attachment 61929 [details] reduced test case This demonstrates the problem of hitting XGetImage unnecessarily.
Created attachment 61930 [details] potential workaround Here is my analysis (the pixel metrics refer to my reduced test case). We try to fill an area, e.g. 400x400 pixels with a repeated pixmap (as the background). Usually this is not a problem, but since the pixmap is very big, e.g. 48x800 , Qt X11 hits the inefficient code path to handle that as pattern fill. In my proposed fix/workaround, this is solved by painting the pixmap repeatedly ourselves. We detect if the rectangle can be filled by the pixmap just a few times in a horizontal direction. If yes, there is no need to use the pattern fill approach. Have a look at the patch and see if this works for you. If performance is still problem due to the use clipping, I can unroll the clipping and set up the source and target bounding boxes ourselves (but I'd save that part until it's absolutely necessary, readability issue). Technically the fix should work everywhere, not only X11. If some graphics system (possibly Open GL) shows slight speed degradation, we can put #ifdef around the special case. However, I can't imagine where a small number of manual pixmap repeats is much slower vs pattern fill with a huge pixmap texture. Note: the patch is not for review, I just want to gather early feedback first.
> Have a look at the patch and see if this works for you. If performance is still problem due to the use clipping, I can unroll the clipping and set up the source and target bounding boxes ourselves (but I'd save that part until it's absolutely necessary, readability issue). I think we managed to eliminate the XGetImage calls with some tweaks to the Qt X11 backend. However, we are still seeing (and this is clearer now) multiple XPutImage calls for the entire atlas map. We think this probably does have something to do with the source-clipping. If an image containing only the clipped area was uploaded, this would already be a big win on our hardware, avoiding a lot of work in QX11Pixmap::fromImage() and traffic on the slow GPU bus, and VRAM pollution as well. We've looked into the possibility of recognising the clipped area at the driver level, but this is proving difficult to do reliably due to EXA's design. Do you think that such a thing would be easy to implement in Qt or Webkit?
First of all, can you at least kindly verify that my patch solves your excessive XGetImage problem? Note that Qt X11 is a complex and delicate paint engine (and very few people know the code well), my recommendation is always to put any workarounds outside Qt X11 because otherwise you may risk breaking too much. As for XPutImage, you need to give more information. Is my reduced test case triggering that problem, too? Or do I need to create another reduced test from the atlas map? Do you know which functions in Qt painting is the source of the XPutImage problem? Do you have a code path? Note that it was not too difficult for me to analyze and solve the XGetImage problem, because all I did were sprinkling breakpoints around XGetImage in Qt X11, running the complex test case, checking the stack trace, reducing the test case, and then looking for the workaround. Would it be too much to ask if you could do the same for the said XPutImage problem?
Closing, this seems to go nowhere. Please comment if you want it reopened.