This bug is tracking the eventual elimination of the gperf-based Id representation of tag names KHTML
and its replacement with a new class called QualifiedName, a smart hashed class that hold a prefix, a
namespace URI and a localName all as atomic strings.
This conversion will reduce the memory footprint of the DOM and simplify the DOM significantly, since
numerous classes like XMLElementImpl and HTMLGenericElementImpl can simply go away.
Comment on attachment 2341[details]
Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags
I reviewed this with Dave. We discovered that the <pre> problem was not as
widespread as we had feared, so it's not an update candidate-worthy bug.
Created attachment 2343[details]
Add initial implementations of QualifiedName and the HTMLNames classes.
This patch adds these two classes to the build. They are not used by anyone
yet.
Comment on attachment 2343[details]
Add initial implementations of QualifiedName and the HTMLNames classes.
Looks like this won't compile due to m_inner vs. m_impl and QualifiedNameImpl
vs. QualifiedNameInner.
The implementation file should be wrapped in namespace DOM rather than having a
using namespace DOM at the top of the file.
I suggest that m_qNameCache just be a private global inside the .cpp file
(using static) and not a member of the QualifiedName class; no need to expose
it, and then you don't have to include <qptrdict.h> in the header.
I prefer that you put spaces after the < when you have to put a space before
the > in expressions like:
QPtrDict< QPtrDict< QPtrDict<QualifiedNameInner> > >
Why all the (void*) casts? Those shouldn't be necessary. Are they for const
casting?
Can we change HTMLNames from a class to a namespace? The advantage of that is
that you can do a using on individual tags, and there's really nothing here
that calls for a class.
Other than these nitpicks, it looks great. I think I'll mark it review- just to
be pedantic.
Created attachment 2369[details]
Second pass at QualifiedName and HTMLNames classes
Address Darin and Maciej's comments. I do not want to make HTMLNames a
namespace just yet, since I envision it growing to include more factory-based
functionality possibly at a later date, i.e., I'm not convinced it's going to
just stay a "names" class.
Comment on attachment 2369[details]
Second pass at QualifiedName and HTMLNames classes
Should use hasOneRef() instead of refCount() == 1; it's supposed to be for just
this sort of thing.
I still think you can use a namespace, regardless of the "factory" issue.
Namespaces can have functions in them too. The only reason to have a class is
if you want to create instances. No benefit to using a class to group a set of
global functions.
Anyway, r=me.
Comment on attachment 2369[details]
Second pass at QualifiedName and HTMLNames classes
Should use hasOneRef() instead of refCount() == 1; it's supposed to be for just
this sort of thing.
I still think you can use a namespace, regardless of the "factory" issue.
Namespaces can have functions in them too. The only reason to have a class is
if you want to create instances. No benefit to using a class to group a set of
global functions.
Anyway, r=me.
Created attachment 2407[details]
More progress. kjs_html.cpp is done. The htmltags and scripts to make them have been removed.
This is an update (still does not compile). The next big structural issue is
the elimination of dtd.h and dtd.cpp completely. Validation code is going to
move into the elements themselves and check against hashes in HTMLElement.
Created attachment 2412[details]
This patch converts all of the parser's error handling logic.
This patch nearly completes the parser's conversion. I still need to handle
isInline (hopefully in terms of tag priorities, which are closely related) and
element creation. There is also a DTD check in the table error handling code
that I will need to fix up.
Created attachment 2510[details]
Next round. Converts much more of the code.
This patch gets the tokenizer and parser to compile, but there's still some
work to do on them. Rendering and editing subdirectories are fully converted
in this patch and complete.
Created attachment 2558[details]
Re-implement CSS namespace support to use QualifiedNames. Finish the HTML parser (except for 2 remaining issues).
This patch finishes the HTML parser (only 2 open issues remain). It also
re-implements CSS namespace support using QualifiedName (allowing another
class, XMLNamespaceTable, to be removed from the build). Note that namespaced
attributes in CSS are now broken, but this feature is minor enough that it can
be fixed later when I do attribute names' conversion.
Created attachment 2561[details]
Factor out element creation code and get dom_docimpl.cpp compiling
This patch factors out the element creation code into a single code path that
goes through a new object, HTMLElementFactory.
Created attachment 2608[details]
Complete the tagpriority/tagstatus movement into the element code.
This patch implements the isInline method in the parser and stops using
renderers to make the determination (thus eliminating any potential variance in
rendering caused by deferred renderer construction).
The tag priority and tag status rules are fully implemented using functions on
the elements now. The two tables that were used in dtd.cpp are gone.
Created attachment 2613[details]
Patch that compiles and works (mostly).
The following patch compiles and can display some Web pages. The following
open issues remain:
(1) Redo tables for dtd
(2) Fix the checkchild cases in parser/tokenizer/dom_element
(3) Re-implement createElementNS
(4) Re-implement createElement
(5) checkSetPrefix is now busted
(6) XMLElementImpl cloneNode issue and construction issue, move to ElementImpl
(7) getElementsByTagName needs to work again (TagNodeListImpl)
(8) Print namespace prefixes in selectortext now that CSS selectors store them
Created attachment 2823[details]
Patch that passes all layout tests
This patch passes all layout tests (with some test results being adjusted).
Performance analysis coming next.
Performance analysis shows:
0.145 avg. cached time without patch
0.155 avg. cached time with patch
Slowdown is about 7%. That's the bad news.
The good news is none of the slowdown is hiding and is all clearly visible when sampling. The
HTMLNames accessors account for about 5-6%, and the operator == comparisons between QNames
and AtomicStrings account for the remaining 1%.
I think there are two problems that could be fixed pretty easily:
(1) The accessors on HTMLNames need to be inlined, and that means having them return real global
variables.
(2) Introduce more hashtables. The element creation code and the parser are the two primary
problems. I think starting with the element creation code will be helpful.
Created attachment 2842[details]
Ready for review. Now faster than trunk.
This patch adds a lot of hashing, inlines and provides globals for HTMLNames
access, inlines the QName comparison operator, optimizes lowercasing in the
tokenizer, and is now faster than the trunk.
0.141 compared with 0.145.
Ready for an initial review.
Never mind. I had been testing with harrison's regression when I established the baseline. The real # to
beat is 0.139. So the patch is still slightly slower. Fixing handleError and getNode should get the speed
back.
Just a little nitpick. I noticed in a few places that 'else' and 'else if' statements use braces on new lines
such as
+ else if (localName == starAtom && cs->attr == ATTR_CLASS && cs->match == CSSSelector::Class)
{
as per style guidelines, should it not be like: } else if (condition) { and } else {
And also some lines like + if ( prof == DefaultGUI ) . Should this not be like if (prof == DefaultGUI),
without the spaces.
Sorry for being a pain.
Created attachment 2856[details]
Convert getNode to use a hash. Convert some QPtrDicts to HashSets.
This patch scores 0.140-0.141, compared with the trunk's 0.138-0.139. The
reason why is pretty clear on the profile. AtomicString::add is about 1%, and
0.6% of that is the equality check that has to be done because it is not a
perfect hash like the old gperf was.
The equal method unfortunately is almost always having to scan the entire
string, because most of the time, what you find in the hash table *is* equal,
and so you hit the worst-case scenario of that function most of the time.
I am unsure how to proceed now, other than to find another speedup somewhere to
counteract the slowdown from not having a perfect hash.
Actually, the perfect hash has to do an equality check too. While there are no collisions for things in the
hash, you can still miss entirely. However, I can find ways to micro-optimize the equality check.
Patch is missing htmlfactory.{cpp,h} though so I can't build it.
Created attachment 2860[details]
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds
Maciej has ideas for how to further improve equality testing (and landed a
patch that helps). I implemented three additional optimizations to gain 2ms of
PLT time.
(1) updatePseudoChild in RenderContainer is better about bailing early now when
no :before/:after content is used. getPseudoStyle in RenderStyle has been
inlined as well (and is also smarter about short-circuiting).
(2) isBreakable no longer checks direction() for ASCII characters. This check
was always returning false anyway.
(3) QFont now caches whether or not it is fixed pitch to avoid a dictionary
lookup and a call across the bridge.
Comment on attachment 2860[details]
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds
I'm missing some files. Need to add them.
Created attachment 2873[details]
Patch that addresses some of maciej's comments and includes his hash fixes
This patch includes some micro-optimizations Maciej made to the hashing done by
AtomicString, and it also addresses his review comments (eliminating
XHTML_NAMESPACE and renaming GetSetInfo to Accessors in kjs_html.h/.cpp).
Comment on attachment 2873[details]
Patch that addresses some of maciej's comments and includes his hash fixes
r=me with the following caveats:
1) I think the implicit conversions on QualifiedName are dangerous and could
lead to people getting themselves in trouble.
2) Please add a layout test for the behavior change on tag name validation
(maybe something that uses a tag name wiht _ or * in it).
2005-06-09 22:13 PDT, Dave Hyatt
2005-06-14 15:46 PDT, Dave Hyatt
2005-06-14 17:39 PDT, Dave Hyatt
2005-06-15 12:45 PDT, Dave Hyatt
2005-06-16 16:26 PDT, Dave Hyatt
2005-06-16 18:50 PDT, Dave Hyatt
2005-06-20 14:12 PDT, Dave Hyatt
2005-06-22 16:41 PDT, Dave Hyatt
2005-06-22 18:27 PDT, Dave Hyatt
2005-06-23 15:10 PDT, Dave Hyatt
2005-06-23 20:28 PDT, Dave Hyatt
2005-07-05 16:41 PDT, Dave Hyatt
2005-07-06 23:18 PDT, Dave Hyatt
2005-07-07 20:23 PDT, Dave Hyatt
2005-07-07 23:08 PDT, Dave Hyatt
2005-07-07 23:14 PDT, Dave Hyatt
2005-07-08 20:01 PDT, Dave Hyatt