Bug 3405 - Replace Id for tag names with QualifiedName
Summary: Replace Id for tag names with QualifiedName
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 412
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Dave Hyatt
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-09 22:12 PDT by Dave Hyatt
Modified: 2019-02-06 09:03 PST (History)
5 users (show)

See Also:


Attachments
Work in progress (does not compile) (351.25 KB, patch)
2005-06-09 22:13 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags (2.75 KB, patch)
2005-06-14 15:46 PDT, Dave Hyatt
sullivan: review+
Details | Formatted Diff | Diff
Add initial implementations of QualifiedName and the HTMLNames classes. (29.43 KB, patch)
2005-06-14 17:39 PDT, Dave Hyatt
darin: review-
Details | Formatted Diff | Diff
Second pass at QualifiedName and HTMLNames classes (28.90 KB, patch)
2005-06-15 12:45 PDT, Dave Hyatt
darin: review+
Details | Formatted Diff | Diff
More progress. kjs_html.cpp is done. The htmltags and scripts to make them have been removed. (434.56 KB, patch)
2005-06-16 16:26 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
This patch converts all of the parser's error handling logic. (469.54 KB, patch)
2005-06-16 18:50 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Next round. Converts much more of the code. (546.29 KB, patch)
2005-06-20 14:12 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Re-implement CSS namespace support to use QualifiedNames. Finish the HTML parser (except for 2 remaining issues). (578.86 KB, patch)
2005-06-22 16:41 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Factor out element creation code and get dom_docimpl.cpp compiling (609.92 KB, patch)
2005-06-22 18:27 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Complete the tagpriority/tagstatus movement into the element code. (621.81 KB, patch)
2005-06-23 15:10 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Patch that compiles and works (mostly). (671.83 KB, patch)
2005-06-23 20:28 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Patch that passes all layout tests (696.29 KB, patch)
2005-07-05 16:41 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Ready for review. Now faster than trunk. (768.70 KB, patch)
2005-07-06 23:18 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Convert getNode to use a hash. Convert some QPtrDicts to HashSets. (769.47 KB, patch)
2005-07-07 20:23 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds (775.47 KB, patch)
2005-07-07 23:08 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Make sure to include the HTMLElementFactory. (797.12 KB, patch)
2005-07-07 23:14 PDT, Dave Hyatt
no flags Details | Formatted Diff | Diff
Patch that addresses some of maciej's comments and includes his hash fixes (807.20 KB, patch)
2005-07-08 20:01 PDT, Dave Hyatt
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Hyatt 2005-06-09 22:12:34 PDT
This bug is tracking the eventual elimination of the gperf-based Id representation of tag names KHTML 
and its replacement with a new class called QualifiedName, a smart hashed class that hold a prefix, a 
namespace URI and a localName all as atomic strings.

This conversion will reduce the memory footprint of the DOM and simplify the DOM significantly, since 
numerous classes like XMLElementImpl and HTMLGenericElementImpl can simply go away.
Comment 1 Dave Hyatt 2005-06-09 22:13:46 PDT
Created attachment 2199 [details]
Work in progress (does not compile)

Just a snapshot of work in progress.  This does not compile.
Comment 2 Dave Hyatt 2005-06-14 15:46:41 PDT
Created attachment 2341 [details]
Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags
Comment 3 John Sullivan 2005-06-14 15:49:08 PDT
Comment on attachment 2341 [details]
Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags

I reviewed this with Dave. We discovered that the <pre> problem was not as
widespread as we had feared, so it's not an update candidate-worthy bug.
Comment 4 Dave Hyatt 2005-06-14 17:39:43 PDT
Created attachment 2343 [details]
Add initial implementations of QualifiedName and the HTMLNames classes.

This patch adds these two classes to the build.  They are not used by anyone
yet.
Comment 5 Darin Adler 2005-06-15 10:11:51 PDT
Comment on attachment 2343 [details]
Add initial implementations of QualifiedName and the HTMLNames classes.

Looks like this won't compile due to m_inner vs. m_impl and QualifiedNameImpl
vs. QualifiedNameInner.

The implementation file should be wrapped in namespace DOM rather than having a
using namespace DOM at the top of the file.

I suggest that m_qNameCache just be a private global inside the .cpp file
(using static) and not a member of the QualifiedName class; no need to expose
it, and then you don't have to include <qptrdict.h> in the header.

I prefer that you put spaces after the < when you have to put a space before
the > in expressions like:

QPtrDict< QPtrDict< QPtrDict<QualifiedNameInner> > >

Why all the (void*) casts? Those shouldn't be necessary. Are they for const
casting?

Can we change HTMLNames from a class to a namespace? The advantage of that is
that you can do a using on individual tags, and there's really nothing here
that calls for a class.

Other than these nitpicks, it looks great. I think I'll mark it review- just to
be pedantic.
Comment 6 Dave Hyatt 2005-06-15 12:45:59 PDT
Created attachment 2369 [details]
Second pass at QualifiedName and HTMLNames classes

Address Darin and Maciej's comments.  I do not want to make HTMLNames a
namespace just yet, since I envision it growing to include more factory-based
functionality possibly at a later date, i.e., I'm not convinced it's going to
just stay a "names" class.
Comment 7 Darin Adler 2005-06-15 14:04:27 PDT
Comment on attachment 2369 [details]
Second pass at QualifiedName and HTMLNames classes

Should use hasOneRef() instead of refCount() == 1; it's supposed to be for just
this sort of thing.

I still think you can use a namespace, regardless of the "factory" issue.
Namespaces can have functions in them too. The only reason to have a class is
if you want to create instances. No benefit to using a class to group a set of
global functions.

Anyway, r=me.
Comment 8 Darin Adler 2005-06-15 14:04:31 PDT
Comment on attachment 2369 [details]
Second pass at QualifiedName and HTMLNames classes

Should use hasOneRef() instead of refCount() == 1; it's supposed to be for just
this sort of thing.

I still think you can use a namespace, regardless of the "factory" issue.
Namespaces can have functions in them too. The only reason to have a class is
if you want to create instances. No benefit to using a class to group a set of
global functions.

Anyway, r=me.
Comment 9 Maciej Stachowiak 2005-06-15 15:36:42 PDT
ditto on the review
Comment 10 Dave Hyatt 2005-06-16 16:26:02 PDT
Created attachment 2407 [details]
More progress. kjs_html.cpp is done.  The htmltags and scripts to make them have been removed.

This is an update (still does not compile).  The next big structural issue is
the elimination of dtd.h and dtd.cpp completely.  Validation code is going to
move into the elements themselves and check against hashes in HTMLElement.
Comment 11 Dave Hyatt 2005-06-16 18:50:09 PDT
Created attachment 2412 [details]
This patch converts all of the parser's error handling logic. 

This patch nearly completes the parser's conversion.  I still need to handle
isInline (hopefully in terms of tag priorities, which are closely related) and
element creation.  There is also a DTD check in the table error handling code
that I will need to fix up.
Comment 12 Dave Hyatt 2005-06-20 14:12:36 PDT
Created attachment 2510 [details]
Next round. Converts much more of the code.

This patch gets the tokenizer and parser to compile, but there's still some
work to do on them.  Rendering and editing subdirectories are fully converted
in this patch and complete.
Comment 13 Dave Hyatt 2005-06-22 16:41:39 PDT
Created attachment 2558 [details]
Re-implement CSS namespace support to use QualifiedNames.  Finish the HTML parser (except for 2 remaining issues).

This patch finishes the HTML parser (only 2 open issues remain).  It also
re-implements CSS namespace support using QualifiedName (allowing another
class, XMLNamespaceTable, to be removed from the build).  Note that namespaced
attributes in CSS are now broken, but this feature is minor enough that it can
be fixed later when I do attribute names' conversion.
Comment 14 Dave Hyatt 2005-06-22 18:27:50 PDT
Created attachment 2561 [details]
Factor out element creation code and get dom_docimpl.cpp compiling

This patch factors out the element creation code into a single code path that
goes through a new object, HTMLElementFactory.
Comment 15 Dave Hyatt 2005-06-23 15:10:07 PDT
Created attachment 2608 [details]
Complete the tagpriority/tagstatus movement into the element code.

This patch implements the isInline method in the parser and stops using
renderers to make the determination (thus eliminating any potential variance in
rendering caused by deferred renderer construction).

The tag priority and tag status rules are fully implemented using functions on
the elements now.  The two tables that were used in dtd.cpp are gone.
Comment 16 Dave Hyatt 2005-06-23 20:28:03 PDT
Created attachment 2613 [details]
Patch that compiles and works (mostly).

The following patch compiles and can display some Web pages.  The following
open issues remain:

(1) Redo tables for dtd
(2) Fix the checkchild cases in parser/tokenizer/dom_element
(3) Re-implement createElementNS
(4) Re-implement createElement
(5) checkSetPrefix is now busted
(6) XMLElementImpl cloneNode issue and construction issue, move to ElementImpl
(7) getElementsByTagName needs to work again (TagNodeListImpl)
(8) Print namespace prefixes in selectortext now that CSS selectors store them
Comment 17 Dave Hyatt 2005-07-05 16:41:31 PDT
Created attachment 2823 [details]
Patch that passes all layout tests

This patch passes all layout tests (with some test results being adjusted). 
Performance analysis coming next.
Comment 18 Dave Hyatt 2005-07-05 19:16:19 PDT
Performance analysis shows:

0.145 avg. cached time without patch
0.155 avg. cached time with patch

Slowdown is about 7%.  That's the bad news.

The good news is none of the slowdown is hiding and is all clearly visible when sampling.  The 
HTMLNames accessors account for about 5-6%, and the operator == comparisons between QNames 
and AtomicStrings account for the remaining 1%.

I think there are two problems that could be fixed pretty easily:
(1) The accessors on HTMLNames need to be inlined, and that means having them return real global 
variables.
(2) Introduce more hashtables.  The element creation code and the parser are the two primary 
problems.  I think starting with the element creation code will be helpful.
Comment 19 Dave Hyatt 2005-07-06 23:18:28 PDT
Created attachment 2842 [details]
Ready for review.  Now faster than trunk.

This patch adds a lot of hashing, inlines and provides globals for HTMLNames
access, inlines the QName comparison operator, optimizes lowercasing in the
tokenizer, and is now faster than the trunk.

0.141 compared with 0.145.

Ready for an initial review.
Comment 20 Dave Hyatt 2005-07-06 23:39:47 PDT
Never mind.  I had been testing with harrison's regression when I established the baseline.  The real # to 
beat is 0.139.  So the patch is still slightly slower.  Fixing handleError and getNode should get the speed 
back.
Comment 21 David Storey 2005-07-07 09:02:23 PDT
Just a little nitpick.  I noticed in a few places that 'else' and 'else if' statements use braces on new lines 
such as   
+    else if (localName == starAtom && cs->attr == ATTR_CLASS && cs->match == CSSSelector::Class)
     {

as per style guidelines, should it not be like:  } else if  (condition) { and } else {

And also some lines like  +  if ( prof == DefaultGUI ) .  Should this not be like if (prof == DefaultGUI), 
without the spaces.  

Sorry for being a pain.  


Comment 22 Dave Hyatt 2005-07-07 20:23:32 PDT
Created attachment 2856 [details]
Convert getNode to use a hash.  Convert some QPtrDicts to HashSets.

This patch scores 0.140-0.141, compared with the trunk's 0.138-0.139.  The
reason why is pretty clear on the profile.  AtomicString::add is about 1%, and
0.6% of that is the equality check that has to be done because it is not a
perfect hash like the old gperf was.

The equal method unfortunately is almost always having to scan the entire
string, because most of the time, what you find in the hash table *is* equal,
and so you hit the worst-case scenario of that function most of the time.

I am unsure how to proceed now, other than to find another speedup somewhere to
counteract the slowdown from not having a perfect hash.
Comment 23 Maciej Stachowiak 2005-07-07 21:22:06 PDT
Actually, the perfect hash has to do an equality check too. While there are no collisions for things in the 
hash, you can still miss entirely. However, I can find ways to micro-optimize the equality check.

Patch is missing htmlfactory.{cpp,h} though so I can't build it.
Comment 24 Dave Hyatt 2005-07-07 23:08:54 PDT
Created attachment 2860 [details]
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds

Maciej has ideas for how to further improve equality testing (and landed a
patch that helps).  I implemented three additional optimizations to gain 2ms of
PLT time.

(1) updatePseudoChild in RenderContainer is better about bailing early now when
no :before/:after content is used.  getPseudoStyle in RenderStyle has been
inlined as well (and is also smarter about short-circuiting).
(2) isBreakable no longer checks direction() for ASCII characters.  This check
was always returning false anyway.
(3) QFont now caches whether or not it is fixed pitch to avoid a dictionary
lookup and a call across the bridge.
Comment 25 Dave Hyatt 2005-07-07 23:11:23 PDT
Comment on attachment 2860 [details]
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds

I'm missing some files.  Need to add them.
Comment 26 Dave Hyatt 2005-07-07 23:14:52 PDT
Created attachment 2861 [details]
Make sure to include the HTMLElementFactory.

Same as previous patch, but includes htmlfactory.h/.cpp.
Comment 27 Dave Hyatt 2005-07-08 20:01:43 PDT
Created attachment 2873 [details]
Patch that addresses some of maciej's comments and includes his hash fixes

This patch includes some micro-optimizations Maciej made to the hashing done by
AtomicString, and it also addresses his review comments (eliminating
XHTML_NAMESPACE and renaming GetSetInfo to Accessors in kjs_html.h/.cpp).
Comment 28 Maciej Stachowiak 2005-07-09 01:12:20 PDT
Comment on attachment 2873 [details]
Patch that addresses some of maciej's comments and includes his hash fixes

r=me with the following caveats:

1) I think the implicit conversions on QualifiedName are dangerous and could
lead to people getting themselves in trouble.

2) Please add a layout test for the behavior change on tag name validation
(maybe something that uses a tag name wiht _ or * in it).
Comment 29 Dave Hyatt 2005-07-09 13:46:30 PDT
Fixed
Comment 30 Lucas Forschler 2019-02-06 09:03:38 PST
Mass moving XML DOM bugs to the "DOM" Component.