This bug is tracking the eventual elimination of the gperf-based Id representation of tag names KHTML and its replacement with a new class called QualifiedName, a smart hashed class that hold a prefix, a namespace URI and a localName all as atomic strings. This conversion will reduce the memory footprint of the DOM and simplify the DOM significantly, since numerous classes like XMLElementImpl and HTMLGenericElementImpl can simply go away.
Created attachment 2199 [details] Work in progress (does not compile) Just a snapshot of work in progress. This does not compile.
Created attachment 2341 [details] Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags
Comment on attachment 2341 [details] Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags I reviewed this with Dave. We discovered that the <pre> problem was not as widespread as we had feared, so it's not an update candidate-worthy bug.
Created attachment 2343 [details] Add initial implementations of QualifiedName and the HTMLNames classes. This patch adds these two classes to the build. They are not used by anyone yet.
Comment on attachment 2343 [details] Add initial implementations of QualifiedName and the HTMLNames classes. Looks like this won't compile due to m_inner vs. m_impl and QualifiedNameImpl vs. QualifiedNameInner. The implementation file should be wrapped in namespace DOM rather than having a using namespace DOM at the top of the file. I suggest that m_qNameCache just be a private global inside the .cpp file (using static) and not a member of the QualifiedName class; no need to expose it, and then you don't have to include <qptrdict.h> in the header. I prefer that you put spaces after the < when you have to put a space before the > in expressions like: QPtrDict< QPtrDict< QPtrDict<QualifiedNameInner> > > Why all the (void*) casts? Those shouldn't be necessary. Are they for const casting? Can we change HTMLNames from a class to a namespace? The advantage of that is that you can do a using on individual tags, and there's really nothing here that calls for a class. Other than these nitpicks, it looks great. I think I'll mark it review- just to be pedantic.
Created attachment 2369 [details] Second pass at QualifiedName and HTMLNames classes Address Darin and Maciej's comments. I do not want to make HTMLNames a namespace just yet, since I envision it growing to include more factory-based functionality possibly at a later date, i.e., I'm not convinced it's going to just stay a "names" class.
Comment on attachment 2369 [details] Second pass at QualifiedName and HTMLNames classes Should use hasOneRef() instead of refCount() == 1; it's supposed to be for just this sort of thing. I still think you can use a namespace, regardless of the "factory" issue. Namespaces can have functions in them too. The only reason to have a class is if you want to create instances. No benefit to using a class to group a set of global functions. Anyway, r=me.
ditto on the review
Created attachment 2407 [details] More progress. kjs_html.cpp is done. The htmltags and scripts to make them have been removed. This is an update (still does not compile). The next big structural issue is the elimination of dtd.h and dtd.cpp completely. Validation code is going to move into the elements themselves and check against hashes in HTMLElement.
Created attachment 2412 [details] This patch converts all of the parser's error handling logic. This patch nearly completes the parser's conversion. I still need to handle isInline (hopefully in terms of tag priorities, which are closely related) and element creation. There is also a DTD check in the table error handling code that I will need to fix up.
Created attachment 2510 [details] Next round. Converts much more of the code. This patch gets the tokenizer and parser to compile, but there's still some work to do on them. Rendering and editing subdirectories are fully converted in this patch and complete.
Created attachment 2558 [details] Re-implement CSS namespace support to use QualifiedNames. Finish the HTML parser (except for 2 remaining issues). This patch finishes the HTML parser (only 2 open issues remain). It also re-implements CSS namespace support using QualifiedName (allowing another class, XMLNamespaceTable, to be removed from the build). Note that namespaced attributes in CSS are now broken, but this feature is minor enough that it can be fixed later when I do attribute names' conversion.
Created attachment 2561 [details] Factor out element creation code and get dom_docimpl.cpp compiling This patch factors out the element creation code into a single code path that goes through a new object, HTMLElementFactory.
Created attachment 2608 [details] Complete the tagpriority/tagstatus movement into the element code. This patch implements the isInline method in the parser and stops using renderers to make the determination (thus eliminating any potential variance in rendering caused by deferred renderer construction). The tag priority and tag status rules are fully implemented using functions on the elements now. The two tables that were used in dtd.cpp are gone.
Created attachment 2613 [details] Patch that compiles and works (mostly). The following patch compiles and can display some Web pages. The following open issues remain: (1) Redo tables for dtd (2) Fix the checkchild cases in parser/tokenizer/dom_element (3) Re-implement createElementNS (4) Re-implement createElement (5) checkSetPrefix is now busted (6) XMLElementImpl cloneNode issue and construction issue, move to ElementImpl (7) getElementsByTagName needs to work again (TagNodeListImpl) (8) Print namespace prefixes in selectortext now that CSS selectors store them
Created attachment 2823 [details] Patch that passes all layout tests This patch passes all layout tests (with some test results being adjusted). Performance analysis coming next.
Performance analysis shows: 0.145 avg. cached time without patch 0.155 avg. cached time with patch Slowdown is about 7%. That's the bad news. The good news is none of the slowdown is hiding and is all clearly visible when sampling. The HTMLNames accessors account for about 5-6%, and the operator == comparisons between QNames and AtomicStrings account for the remaining 1%. I think there are two problems that could be fixed pretty easily: (1) The accessors on HTMLNames need to be inlined, and that means having them return real global variables. (2) Introduce more hashtables. The element creation code and the parser are the two primary problems. I think starting with the element creation code will be helpful.
Created attachment 2842 [details] Ready for review. Now faster than trunk. This patch adds a lot of hashing, inlines and provides globals for HTMLNames access, inlines the QName comparison operator, optimizes lowercasing in the tokenizer, and is now faster than the trunk. 0.141 compared with 0.145. Ready for an initial review.
Never mind. I had been testing with harrison's regression when I established the baseline. The real # to beat is 0.139. So the patch is still slightly slower. Fixing handleError and getNode should get the speed back.
Just a little nitpick. I noticed in a few places that 'else' and 'else if' statements use braces on new lines such as + else if (localName == starAtom && cs->attr == ATTR_CLASS && cs->match == CSSSelector::Class) { as per style guidelines, should it not be like: } else if (condition) { and } else { And also some lines like + if ( prof == DefaultGUI ) . Should this not be like if (prof == DefaultGUI), without the spaces. Sorry for being a pain.
Created attachment 2856 [details] Convert getNode to use a hash. Convert some QPtrDicts to HashSets. This patch scores 0.140-0.141, compared with the trunk's 0.138-0.139. The reason why is pretty clear on the profile. AtomicString::add is about 1%, and 0.6% of that is the equality check that has to be done because it is not a perfect hash like the old gperf was. The equal method unfortunately is almost always having to scan the entire string, because most of the time, what you find in the hash table *is* equal, and so you hit the worst-case scenario of that function most of the time. I am unsure how to proceed now, other than to find another speedup somewhere to counteract the slowdown from not having a perfect hash.
Actually, the perfect hash has to do an equality check too. While there are no collisions for things in the hash, you can still miss entirely. However, I can find ways to micro-optimize the equality check. Patch is missing htmlfactory.{cpp,h} though so I can't build it.
Created attachment 2860 [details] Implement unrelated micro-optimizations to get back the remaining couple of milliseconds Maciej has ideas for how to further improve equality testing (and landed a patch that helps). I implemented three additional optimizations to gain 2ms of PLT time. (1) updatePseudoChild in RenderContainer is better about bailing early now when no :before/:after content is used. getPseudoStyle in RenderStyle has been inlined as well (and is also smarter about short-circuiting). (2) isBreakable no longer checks direction() for ASCII characters. This check was always returning false anyway. (3) QFont now caches whether or not it is fixed pitch to avoid a dictionary lookup and a call across the bridge.
Comment on attachment 2860 [details] Implement unrelated micro-optimizations to get back the remaining couple of milliseconds I'm missing some files. Need to add them.
Created attachment 2861 [details] Make sure to include the HTMLElementFactory. Same as previous patch, but includes htmlfactory.h/.cpp.
Created attachment 2873 [details] Patch that addresses some of maciej's comments and includes his hash fixes This patch includes some micro-optimizations Maciej made to the hashing done by AtomicString, and it also addresses his review comments (eliminating XHTML_NAMESPACE and renaming GetSetInfo to Accessors in kjs_html.h/.cpp).
Comment on attachment 2873 [details] Patch that addresses some of maciej's comments and includes his hash fixes r=me with the following caveats: 1) I think the implicit conversions on QualifiedName are dangerous and could lead to people getting themselves in trouble. 2) Please add a layout test for the behavior change on tag name validation (maybe something that uses a tag name wiht _ or * in it).
Fixed
Mass moving XML DOM bugs to the "DOM" Component.