When we're parsing path data, we're building a Path object, and the bloated SVGPathSeg datastructure. Instead we should come up with a more memory-efficient way to store path segment data, that can be easily converted to a SVGPathSeg, lazily if someone accesses the SVGPathSeg datastructure from the bindings (eg. JavaScript). Uploading my prototype concept soon.
Created attachment 59655 [details] Prototyping This is NOT a patch meant for inclusion into WebKit. Just moved my prototype files into WebCore/test/ used svn-create-patch, to share my thoughts here.
If you want to apply the patch to test it. Grab patch, svn-apply FunPatch.diff cd WebCore/test; qmake-mac; make; Builds/dataStream. That will give you: <snippet> Dumping size information of our primitive types: BoolByte: 1 FloatByte: 4 UnsignedShortByte: 2 Serializing path data into bytes... -> moveTo(10.1, -99.9) -> lineTo(1.17549e-38, 3.40282e+38) -> arcTo(1, 2, 3, 4, 5, 0, 1) Generating bytestream... Dumping byte stream with 46 entries: 0x00 0x00 0x9a 0x99 0x21 0x41 0xcd 0xcc 0xc7 0xc2 0x01 0x00 0x00 0x00 0x80 0x00 0xff 0xff 0x7f 0x7f 0x02 0x00 0x00 0x00 0x80 0x3f 0x00 0x00 0x00 0x40 0x00 0x00 0x40 0x40 0x00 0x00 0x80 0x40 0x00 0x00 0xa0 0x40 0x00 0x01 0x03 0x00 Parsing bytestream... -> moveTo(10.1, -99.9) -> lineTo(1.17549e-38, 3.40282e+38) -> arcTo(1, 2, 3, 4, 5, 0, 1) -> end() </snippet> That's my plan for optimizing the memory-use. To convert the parsed "d" attribute of a SVGPathElement, into a bytestream, and not building any SVGPathSeg datastructure. SVGPathSeg can easily be build from this bytestream, if and only if needed (of course the other direction is easily possible as well). What do you think?
Having such a datastructure would also help ports like OpenVG, which doesn't have any way to access the path seg information.
Note this uses some STL code, when we'd have this in WebCore, we'd of course use Vector instead of std::vector, with a sane predefined capacity, to keep reallocations as small as possible. This is really just my first stab at this problem. Anoter idea to make the bytestream more robust would be to introduce PathDataTokenStart, and save the number of segments in the bytestream, to be sure we never read past the end. The way I've prototyped it for now, seems to be the most memory efficient way though (is that true? needs confirmation :-). Before continuing to work on that I need input from our experts, like Darin Adler & co - whether this approach is a good idea at all, or whether it's too dangerous etc.
You can implement a byte stream without all the type punning, just by shifting and masking integers.
Although for floats, I guess the type punning is required.
I'm fine with the ByteStreamWriter. But I think the ByteStreamReader goes into the wrong direction. ByteStreamWriter can be extended by new commands, dependent of the different needs and use cases. But ByteStreamReader should be as general as possible. That means, a caller should know himself, how the bytestream is builded. And ByteStreamReader should just have general methods to ask for bool, int and float and stores the current position in the stream. A caller can ask for the values one after the other without taking care of the current StreamPosition. If parsing fails, ByteStreamReader returns false.
(In reply to comment #7) > I'm fine with the ByteStreamWriter. But I think the ByteStreamReader goes into the wrong direction. ByteStreamWriter can be extended by new commands, dependent of the different needs and use cases. But ByteStreamReader should be as general as possible. That means, a caller should know himself, how the bytestream is builded. And ByteStreamReader should just have general methods to ask for bool, int and float and stores the current position in the stream. A caller can ask for the values one after the other without taking care of the current StreamPosition. If parsing fails, ByteStreamReader returns false. I don't think it makes sense to generalize them. Think of ByteStreamReader/Writer as PathByteStreamReader/Writer classes. See SVGPathParser abstract interface in SVGParserUtilities.cpp - something like that is meant to be passed to PathByteStreamReader, so it will just forward the call to it. The user which derives from SVGPathParser (needs a rename) can decide what do to with the data (eg. build d attribute string, build SVGPathSeg datastructure etc..) It would be very dangerous, to have just a generic reader classs, and a specific writer class - we should assure that they are always in sync. I have no idea why you think that the encapsulation doesn't make much sense? After all the whole byte stream should be an implementation detail. SVGPathEllement should just store the "ByteStream" object, and use convenience methods to build a d attribute string and/or a SVGPathSeg data structure from it. I can go more into details later, need to leave now, for the next 2 days.
(In reply to comment #8) > (In reply to comment #7) > > I'm fine with the ByteStreamWriter. But I think the ByteStreamReader goes into the wrong direction. ByteStreamWriter can be extended by new commands, dependent of the different needs and use cases. But ByteStreamReader should be as general as possible. That means, a caller should know himself, how the bytestream is builded. And ByteStreamReader should just have general methods to ask for bool, int and float and stores the current position in the stream. A caller can ask for the values one after the other without taking care of the current StreamPosition. If parsing fails, ByteStreamReader returns false. > > I don't think it makes sense to generalize them. Think of ByteStreamReader/Writer as PathByteStreamReader/Writer classes. See SVGPathParser abstract interface in SVGParserUtilities.cpp - something like that is meant to be passed to PathByteStreamReader, so it will just forward the call to it. The user which derives from SVGPathParser (needs a rename) can decide what do to with the data (eg. build d attribute string, build SVGPathSeg datastructure etc..) > > It would be very dangerous, to have just a generic reader classs, and a specific writer class - we should assure that they are always in sync. > > I have no idea why you think that the encapsulation doesn't make much sense? After all the whole byte stream should be an implementation detail. SVGPathEllement should just store the "ByteStream" object, and use convenience methods to build a d attribute string and/or a SVGPathSeg data structure from it. > > I can go more into details later, need to leave now, for the next 2 days. We have the same problem for SVGPolyElement, if we specialise your ByteStream, we end up in creating more than one instant of the Reader and Writer. Possible, but is it realy neccessary? Also, we need the values on Parsing the ByteStream one after the other to create a Path, compare different Bytestreams (normalized, unnormalized), create SVGPathSegLists. So either the ByteStreamParser can do all this, or we let SVGParseUtills and the elements do the work. Moving the code to SVGParseUtills makes it maybe possible to share more code.
Created attachment 62600 [details] Patch
Comment on attachment 62600 [details] Patch WebCore/svg/SVGGlyphElement.cpp:106 + SVGPathByteStream byteStream = byteStreamBuilder.buildFromPathData(value); Seems you meant an OwnPtr here. This is not safe at all. WebCore/svg/SVGPathByteStream.h:46 + class SVGPathByteStream { This needs to be Noncopyable. WebCore/svg/SVGPathByteStream.h:64 + size_t m_size; Why not a vector? WebCore/svg/SVGPathByteStreamBuilder.cpp:57 + unsigned char* bytes = new unsigned char[size]; Ick. Vector, please. Why do we want this? How do we test this? What's the observable benefit? Please write a test which shows the benefit. If its memory usage, write a manual test which we can evaluate memory usage from.
(In reply to comment #11) > (From update of attachment 62600 [details]) > Why do we want this? How do we test this? What's the observable benefit? Please write a test which shows the benefit. If its memory usage, write a manual test which we can evaluate memory usage from. If you look at the current code, we always create a SVGPathSegList and create the platform path out of it. If something changes, we recreacte the list and recreate the path. This is pretty slow and pretty unnecessary, since we don't need the SVGPathSegList most of the time (just for scripting together with SVGDOM). Otherwise we can't parse the dAttr directly to a platform path, since it wouldn't be possible to reproduce all path segments out of it for the synchonization with the segLists. The byteStream is a intermediate format that can be parsed more easily and faster than a SVGPathSegList. The memory efficiency is not the main target, but a side effect on the general case (drawing path's).
(In reply to comment #12) > If you look at the current code, we always create a SVGPathSegList and create the platform path out of it. If something changes, we recreacte the list and recreate the path. This is pretty slow and pretty unnecessary, since we don't need the SVGPathSegList most of the time (just for scripting together with SVGDOM). Otherwise we can't parse the dAttr directly to a platform path, since it wouldn't be possible to reproduce all path segments out of it for the synchonization with the segLists. > The byteStream is a intermediate format that can be parsed more easily and faster than a SVGPathSegList. The memory efficiency is not the main target, but a side effect on the general case (drawing path's). That's fine. :) We just need a test to demonstrate that. :)
It seems it should be easy to create both a parsing performance test as well as a memory test. Or maybe they could be one and the same. Either way, the tests are what are important here. They clarify what we're trying to do and allow us to evaluate what we've done or have yet to do. Without metrics/tests a patch like this is nearly impossible to review. The previously posted patch had some memory errors, but I wouldn't feel comfortable r+ing a fixed one until it has some tests to show what we're trying to do here. :)
Created attachment 63425 [details] Patch v2 Fixed issues raised by Eric. No performance test at the moment. It's also not realy possible to perf test it, since SVGPathElement still don't use the new logic. Just SVGGlyphElement makes use of it to demosntrate the behavior.
Comment on attachment 63425 [details] Patch v2 WebCore/ChangeLog:9 + is neither memory effiecient, nor very fast for the fact that we don't realy need SVGPathSegList most efficient. WebCore/ChangeLog:9 + is neither memory effiecient, nor very fast for the fact that we don't realy need SVGPathSegList most really. WebCore/ChangeLog:10 + of the time. It is just needed for manipulating SVGPathElements by JavaScript via SVGDOM. s/by JavaScript// (all other SVG DOM users are affected as well) WebCore/ChangeLog:12 + it's compact form by linking up short integer values for the segement types and float values for its. WebCore/ChangeLog:14 + SVGPathSegList together with the SVGPathData for the d attribute. s/SVGPathData/svg path data/ WebCore/ChangeLog:16 + platform path and SVGPathByteStreamBuilder creates a SVGPathByteStream out of SVGPathData. s/platform path/Path/. WebCore/ChangeLog:16 + platform path and SVGPathByteStreamBuilder creates a SVGPathByteStream out of SVGPathData. s/SVGPathData/svg path data (d attribute string)/ WebCore/ChangeLog:18 + The normalization code will get removed from SVGPathParser, wants we completely moved to SVGByteStream s/wants/once/ :-) WebCore/ChangeLog:18 + The normalization code will get removed from SVGPathParser, wants we completely moved to SVGByteStream s/SVGByteStream/SVGPathByteStream/ WebCore/ChangeLog:20 + SVGPathDataBuilder is a new SVGPathConsumer next to SVGPathBuilder and SVGPathSegListBuilder s/next to/just like/ WebCore/ChangeLog:21 + and was added to create a String out of the path segements from a SVGPathByteStream. and creates a String from the path segments of a SVGPathByteStream. WebCore/ChangeLog:41 + (WebCore::SVGPathByteStream::begin): Gives a constant iterator. a const iterator is not a constant iterator :-) WebCore/ChangeLog:85 + * svg/SVGPathConsumer.h: Make Consumers Noncopyable Inherit from Noncopyable. WebCore/ChangeLog:86 + * svg/SVGPathDataBuilder.cpp: Added. Creates a String out of a SVGPathByteStream. Serializes a SVGPathByteStream into a String. WebCore/ChangeLog:115 + * svg/SVGPathSegList.cpp: Use the new introduce enumeration of SVGPathSeg Use the new enum in the WebCore namespace, instead of the SVGPathSeg one. WebCore/svg/SVGPathByteStream.h:2 + * Copyright (C) Research In Motion Limited 2010. All rights reserved. Please assure the license template is _exactly_ the same as the other files in svg/ contain now. In doubt, just remove and readd it from one of the LGPL files. WebCore/svg/SVGPathByteStream.h:25 + #include "Noncopyable.h" It's wtf/Noncopyable.h WebCore/svg/SVGPathByteStream.h:26 + #include "SVGPathSegList.h" Not needed here. WebCore/svg/SVGPathByteStreamBuilder.cpp:2 + * Copyright (C) Research In Motion Limited 2010. All rights reserved. Please assure the license template is _exactly_ the same as the other files in svg/ contain now. In doubt, just remove and readd it from one of the LGPL files. WebCore/svg/SVGPathByteStreamBuilder.cpp:24 + #include "SVGPathByteStreamBuilder.h" Move this include just under the config.h include. WebCore/svg/SVGPathByteStreamBuilder.cpp:47 + bool SVGPathByteStreamBuilder::buildFromPathData(const String& d) I don't like this approach, it's too dangerous, as SVGPathByteStream is not refcounted. We should really have: PassOwnPtr<SVGPathByteStream> SVGPathByteStreamBuilder::buildFromPathData(const String& d, bool& ok) { m_byteStream = SVGPathByteStream::create(); SVGPathParser parser(this); ok = parser.parsePathDataString(d, UnalteredParsing); return m_byteStream.release(); } and store an OwnPtr<SVGPathByteStream> m_byteStream. This way it's guaranteed that no one deletes the SVGPathByteStream and tries to reuse SVGPathByteStreamBuilder afterwards. WebCore/svg/SVGPathByteStreamBuilder.cpp:55 + Superfluous empty lines. WebCore/svg/SVGPathByteStreamBuilder.h:28 + #include <wtf/Vector.h> Where is Vector.h needed here? WebCore/svg/SVGPathByteStreamBuilder.h:39 + // Used in UnalteredParisng/NormalizedParsing modes. typo: Parsing. WebCore/svg/SVGPathByteStreamParser.cpp:28 + #include "SVGPathByteStreamParser.h" Please move this right under the config.h include. WebCore/svg/SVGPathByteStreamBuilder.h:2 + * Copyright (C) Research In Motion Limited 2010. All rights reserved. Please assure the license template is _exactly_ the same as the other files in svg/ contain now. In doubt, just remove and readd it from one of the LGPL files. WebCore/svg/SVGPathByteStreamParser.cpp:2 + * Copyright (C) 2002, 2003 The Karbon Developers Please assure the license template is _exactly_ the same as the other files in svg/ contain now. In doubt, just remove and readd it from one of the LGPL files. (I see the 2006 \s\s Alexander stuff here, the wrong Copyright spelling, the missing Copyright (C) lines etc..., let's try to keep it consistent in new files now :-) WebCore/svg/SVGPathByteStreamParser.cpp:252 + if (!stream->data().size()) I'd expose an isEmpty function the SVGPathByteStream for that purpose. WebCore/svg/SVGPathByteStreamParser.cpp:324 + if (m_lastCommand != PathSegCurveToCubicAbs && m_lastCommand != PathSegCurveToCubicRel One condition per line please. WebCore/svg/SVGPathByteStreamParser.cpp:330 + ASSERT(m_bytesRead == m_size); Where's m_size? That can't compile :-) Rather ASSERT(m_currentStream == m_streamEnd) ? WebCore/svg/SVGPathByteStreamParser.h:2 + * Copyright (C) Research In Motion Limited 2010. All rights reserved. Please assure the license template is _exactly_ the same as the other files in svg/ contain now. In doubt, just remove and readd it from one of the LGPL files. WebCore/svg/SVGPathByteStreamParser.h:35 + bool parseFromStream(SVGPathByteStream*, bool nomrmalized); typo: normalized. WebCore/svg/SVGPathByteStreamParser.h:31 + class SVGPathByteStreamParser { Noncopyable inheritance missing. WebCore/svg/SVGPathByteStreamParser.h:53 + bool readFlag() { return readType<bool, BoolByte>(); } Just expand them, as you did with the write... methods. Looks clumsy. WebCore/svg/SVGPathConsumer.h:30 + #include "Noncopyable.h" wtf/Noncopyable.h WebCore/svg/SVGPathDataBuilder.cpp:2 + * Copyright (C) Research In Motion Limited 2010. All rights reserved. Please assure the license template is _exactly_ the same as the other files in svg/ contain now. In doubt, just remove and readd it from one of the LGPL files. WebCore/svg/SVGPathDataBuilder.cpp:39 + parser.parsePathDataString(d, parsingMode == UnalteredParsing); Just noticed: why don't you pass PathParsingMode to SVGPathParser? WebCore/svg/SVGPathDataBuilder.cpp:40 + return m_stringBuilder.toString().simplifyWhiteSpace(); The simplifyWhiteSpace() is a hack, you should rather ask the StringBuilder for its last string, and chomp the leading space on your own, way cheaper than using simplifyWhiteSpace. WebCore/svg/SVGPathDataBuilder.h:2 + * Copyright (C) Research In Motion Limited 2010. All rights reserved. Please assure the license template is _exactly_ the same as the other files in svg/ contain now. In doubt, just remove and readd it from one of the LGPL files. Happy fixing :-)
(In reply to comment #16) > WebCore/svg/SVGPathDataBuilder.cpp:40 > + return m_stringBuilder.toString().simplifyWhiteSpace(); > The simplifyWhiteSpace() is a hack, you should rather ask the StringBuilder for its last string, and chomp the leading space on your own, way cheaper than using simplifyWhiteSpace. Okay, I've just noticed that StringBuilder doesn't expose it's internal Vector. *grml* using String::remove() or String::truncate() is also slower than necessary, it creates a new StringImpl copy with length-1. (when using truncate(length() - 1)). All in all the approach is not good. (When I recommended StringBuilder, I did not have in mind that you might want to alter the resulting string...) So here's a new proposal: Store Vector<String, 16> m_strings; on your own. Duplicate the code in StringBuilder::toString(), to build your final string result with the main difference: If the last stored string ends with a trailing space, don't copy that last trailing space, to the final resulting string. That's more efficient than using simplifyWhiteSpace or sth. else
(In reply to comment #17) > (In reply to comment #16) > > WebCore/svg/SVGPathDataBuilder.cpp:40 > > + return m_stringBuilder.toString().simplifyWhiteSpace(); > > The simplifyWhiteSpace() is a hack, you should rather ask the StringBuilder for its last string, and chomp the leading space on your own, way cheaper than using simplifyWhiteSpace. > > > Okay, I've just noticed that StringBuilder doesn't expose it's internal Vector. > *grml* using String::remove() or String::truncate() is also slower than necessary, it creates a new StringImpl copy with length-1. (when using truncate(length() - 1)). > > All in all the approach is not good. (When I recommended StringBuilder, I did not have in mind that you might want to alter the resulting string...) > > So here's a new proposal: > Store Vector<String, 16> m_strings; on your own. > Duplicate the code in StringBuilder::toString(), to build your final string result with the main difference: > If the last stored string ends with a trailing space, don't copy that last trailing space, to the final resulting string. > > That's more efficient than using simplifyWhiteSpace or sth. else there is maybe another solution. Every path has to begin with either an m or an M. We could add a leading space to every seg-content and check if StringBuilder is empty for moveTo. If it is empty, we can skip the leading space.
Created attachment 63433 [details] Patch v3 Fixed more issues.
Created attachment 63494 [details] Patch v4 Fixes more isses and did some more clean-up.
Comment on attachment 63494 [details] Patch v4 WebCore/svg/SVGPathDataBuilder.h:31 + class SVGPathDataBuilder : private SVGPathConsumer { Make it public, like in the other classes as well. WebCore/svg/SVGPathDataBuilder.h:34 + String build(SVGPathByteStream*, PathParsingMode); Maybe rename it buildFromByteStream, for consistency? Just two copyright headers need updates, as discussed on IRC.
Created attachment 63816 [details] Patch v5 New patch.
Comment on attachment 63816 [details] Patch v5 WebCore/ChangeLog:8 + Introduce SVGPathByteStream to have a fast and effiecient way to organize the synchronization typo: efficient. suggestion: s/to have a/as/ WebCore/ChangeLog:9 + of Path, SVG path data string and the different SVGPathSegLists. s/the different/SVGPathSegList in normalized and unaltered modes/ (some readers of the ChangeLog may not know what you refer to). WebCore/ChangeLog:10 + SVGPathParserFactory is now able to parse a SVGPathByteStream as a SVGPathSource or Extended SVGPathParserFactory to accept SVGPathByteStreams as input source and to create a SVGPathByteStream from a SVG path data string. WebCore/WebCore.xcodeproj/project.pbxproj: + 81BE209811F4AB8D00915DFA /* IDBCursor.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 81BE209311F4AB8D00915DFA /* IDBCursor.cpp */; }; Eek, again the IDBCursor problems - you need to revert them :( WebCore/svg/SVGPathByteStream.h:61 + bool isEmpty() { return !m_data.size(); } isEmpty should be const. Do you need the "const Data& data" accessor somewhere? If not please remove. WebCore/svg/SVGPathByteStreamSource.h:46 + private: Add a newline before private please. WebCore/svg/SVGPathParserFactory.cpp:110 + if (stream->isEmpty()) ASSERT(stream); before accessing stream. WebCore/svg/SVGPathParserFactory.cpp:153 + SVGPathSegListBuilder* builder = globalSVGPathSegListBuilder(); ASSERT(stream); should be added. and ASSERT(result); WebCore/svg/SVGPathParserFactory.h:43 + PassOwnPtr<SVGPathByteStream> buildSVGPathByteStreamFromString(const String&, PathParsingMode, bool& ok); Maybe name this function "createSVGPathByteStreamFromString" to differentiate from the other methods, which build upon an existing object, where this method creates an object... r=me, with these suggestions.
Committed r64909: <http://trac.webkit.org/changeset/64909>
http://trac.webkit.org/changeset/64909 might have broken GTK Linux 64-bit Debug
(In reply to comment #25) > http://trac.webkit.org/changeset/64909 might have broken GTK Linux 64-bit Debug I think it broke everything :)
(In reply to comment #26) > (In reply to comment #25) > > http://trac.webkit.org/changeset/64909 might have broken GTK Linux 64-bit Debug > > I think it broke everything :) Yes :-( Landed the wrong version. Fix was landed in http://trac.webkit.org/changeset/64911.