Chromium side issue is here. http://code.google.com/p/chromium/issues/detail?id=123228 WebSocket send operation requires super liner time to send data against data size. This is because SocketStreamHandleBase requires O(n^2) time against buffed data size. My simple investigation results are here. | Chrome | Chrome with this patch =====+==========+======================= 1M | 300ms | 333ms 2M | 700ms | 660ms 10M | 4700ms | 3300ms 20M | 13000ms | 6600ms (x2 faster) 100M | 202000ms | 33000ms (x6 faster) After applying this patch, sending time becomes linear to data size. I also notice that receiving and binary handling operations have other bottle necks to be fixed.
Created attachment 143808 [details] Patch
Comment on attachment 143808 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=143808&action=review > Source/WebCore/ChangeLog:7 > + You should explain why the current implementation requires super-linear time and how you resolve that. > Source/WebCore/platform/network/SocketStreamHandleBase.cpp:39 > +const unsigned int vectorCoalescingSize = 4096; Looks too small to me, but I might be wrong. I feel like 10KiB or even 100KiB is safe. Ultimately profiling is required to figure out the best value. If my memory is correct, allocating 4096 bytes tended to be inefficient, because that requires allocating two pages instead of one as malloc needs 4096 bytes plus managing area. But the system has evolved and might have changed since then. > Source/WebCore/platform/network/SocketStreamHandleBase.h:60 > + class Block : public Vector<char> { I'm not particularly a fan of deriving a class that doesn't have a virtual destructor. This sounds like a sign of poor design to me. > Source/WebCore/platform/network/SocketStreamHandleBase.h:65 > + Block* prev() { return m_prev; }; > + Block* next() { return m_next; }; nit: The last colon isn't necessary > Source/WebCore/platform/network/SocketStreamHandleBase.h:83 > + DoublyLinkedList<Block> m_buffer; Why do you need DoublyLinkedList and need to manage memory manually? Is there any specific reason you can't use "Deque<Vector<char> >"? In this way you don't have to manage Block's memory by hand.
This is a whole lot of generic looking code for SocketStreamHandleBase. This looks something that may need to be abstracted out into WTF if we really need it. > I'm not particularly a fan of deriving a class that doesn't have a virtual destructor. This sounds like a sign of poor design to me. Private inheritance is sometimes OK, but public is dangerous.
Created attachment 144041 [details] Patch
Created attachment 144050 [details] Patch
4KiB is clearly small for this. Third patch applies 128KiB for block size. Deque size will be 128MiB/128KiB < 1000 and it looks enough small to be kept per WebSocket object.
Created attachment 144284 [details] Patch
After some performance tests, I notice that 128kB is still small. | Chrome | w/1st CL | 128kB blk | 1MB blk =====+==========+==========+===========+-------- 1M | 300ms | 333ms | 333ms | 330ms 2M | 700ms | 660ms | 666ms | 660ms 10M | 4700ms | 3300ms | 3500ms | 3300ms 20M | 13000ms | 6600ms | 7500ms | 6600ms 100M | 202000ms | 33000ms | 60100ms | 36000ms Kent-san, could you look my 4th patch?
Comment on attachment 144284 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=144284&action=review > Source/WebCore/platform/network/SocketStreamHandleBase.h:67 > + StreamBuffer<char, 1024 * 1024> m_buffer; Does this mean we allocate 1MB even if we'd like to send a few octets?
Comment on attachment 144284 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=144284&action=review >> Source/WebCore/platform/network/SocketStreamHandleBase.h:67 >> + StreamBuffer<char, 1024 * 1024> m_buffer; > > Does this mean we allocate 1MB even if we'd like to send a few octets? Yes. And following sending data might be coalesced to the previous buffer. When the pending data is sent, the buffer is released. I think Vector<char> could be alternative to FixedArray<char, 1024 * 1024>. But, it might cause performance regression on small but many payloads.
(In reply to comment #8) > After some performance tests, I notice that 128kB is still small. > > | Chrome | w/1st CL | 128kB blk | 1MB blk > =====+==========+==========+===========+-------- > 1M | 300ms | 333ms | 333ms | 330ms > 2M | 700ms | 660ms | 666ms | 660ms > 10M | 4700ms | 3300ms | 3500ms | 3300ms > 20M | 13000ms | 6600ms | 7500ms | 6600ms > 100M | 202000ms | 33000ms | 60100ms | 36000ms > > Kent-san, could you look my 4th patch? I feel 128KiB is enough. Sending 100MB frame sounds not realistic.
Comment on attachment 144284 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=144284&action=review > Source/WTF/wtf/StreamBuffer.h:50 > + } > + ~StreamBuffer() nit: Please add a blank line between function definitions. > Source/WTF/wtf/StreamBuffer.h:54 > + } > + bool isEmpty() const { return !size(); } > + void append(const T* data, size_t size) ditto. > Source/WTF/wtf/StreamBuffer.h:63 > + OwnPtr<Block> block = adoptPtr(new Block); > + m_buffer.append(block.release()); nit: You can omit the variable 'block'. > Source/WTF/wtf/StreamBuffer.h:75 > + } > + void consume(size_t size) Please add a blank line. We should add a comment that we can't specify size which is >= firstBlockSize(). > Source/WTF/wtf/StreamBuffer.h:81 > + if (m_readOffset == BlockSize) { We had better use >= for safety. > Source/WTF/wtf/StreamBuffer.h:87 > + } > + size_t size() const { return m_size; } > + const T* firstBlockData() const Please add blank lines. > Source/WTF/wtf/StreamBuffer.h:93 > + } > + size_t firstBlockSize() const Please add a blank line. > Source/WebCore/platform/network/SocketStreamHandleBase.cpp:2 > - * Copyright (C) 2009, 2011 Google Inc. All rights reserved. > + * Copyright (C) 2012 Google Inc. All rights reserved. We don't replace existing years in WebKit. > Source/WebCore/platform/network/SocketStreamHandleBase.h:3 > - * Copyright (C) 2009 Google Inc. All rights reserved. > + * Copyright (C) 2012 Google Inc. All rights reserved. ditto.
Created attachment 144465 [details] Patch
After an offline chatting with Kent, I decide to use Vector<char> instead of FixedArray. Actually, we have no idea how does 1MB allocation affect mobile platforms like iPhone. But, 100MB sending is also important for binary communication. In small size communication, using Vector<char> is equivalent to current implementation and introduced StreamBuffer achieve x6 times performance improvement. I believe this approach keep or improve performance for all types of communication usages. It never causes regression. Alexey, I think you are familiar with mobile platform use cases. Any thoughts? Kent, I fixed points you make comments.
Kent-san, There is no objection on this change for a week. Could you review my last patch again?
Comment on attachment 144465 [details] Patch ok
Comment on attachment 144465 [details] Patch Clearing flags on attachment: 144465 Committed r119446: <http://trac.webkit.org/changeset/119446>
All reviewed patches have been landed. Closing bug.