WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
3405
Replace Id for tag names with QualifiedName
https://bugs.webkit.org/show_bug.cgi?id=3405
Summary
Replace Id for tag names with QualifiedName
Dave Hyatt
Reported
2005-06-09 22:12:34 PDT
This bug is tracking the eventual elimination of the gperf-based Id representation of tag names KHTML and its replacement with a new class called QualifiedName, a smart hashed class that hold a prefix, a namespace URI and a localName all as atomic strings. This conversion will reduce the memory footprint of the DOM and simplify the DOM significantly, since numerous classes like XMLElementImpl and HTMLGenericElementImpl can simply go away.
Attachments
Work in progress (does not compile)
(351.25 KB, patch)
2005-06-09 22:13 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags
(2.75 KB, patch)
2005-06-14 15:46 PDT
,
Dave Hyatt
sullivan
: review+
Details
Formatted Diff
Diff
Add initial implementations of QualifiedName and the HTMLNames classes.
(29.43 KB, patch)
2005-06-14 17:39 PDT
,
Dave Hyatt
darin
: review-
Details
Formatted Diff
Diff
Second pass at QualifiedName and HTMLNames classes
(28.90 KB, patch)
2005-06-15 12:45 PDT
,
Dave Hyatt
darin
: review+
Details
Formatted Diff
Diff
More progress. kjs_html.cpp is done. The htmltags and scripts to make them have been removed.
(434.56 KB, patch)
2005-06-16 16:26 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
This patch converts all of the parser's error handling logic.
(469.54 KB, patch)
2005-06-16 18:50 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Next round. Converts much more of the code.
(546.29 KB, patch)
2005-06-20 14:12 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Re-implement CSS namespace support to use QualifiedNames. Finish the HTML parser (except for 2 remaining issues).
(578.86 KB, patch)
2005-06-22 16:41 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Factor out element creation code and get dom_docimpl.cpp compiling
(609.92 KB, patch)
2005-06-22 18:27 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Complete the tagpriority/tagstatus movement into the element code.
(621.81 KB, patch)
2005-06-23 15:10 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Patch that compiles and works (mostly).
(671.83 KB, patch)
2005-06-23 20:28 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Patch that passes all layout tests
(696.29 KB, patch)
2005-07-05 16:41 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Ready for review. Now faster than trunk.
(768.70 KB, patch)
2005-07-06 23:18 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Convert getNode to use a hash. Convert some QPtrDicts to HashSets.
(769.47 KB, patch)
2005-07-07 20:23 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds
(775.47 KB, patch)
2005-07-07 23:08 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Make sure to include the HTMLElementFactory.
(797.12 KB, patch)
2005-07-07 23:14 PDT
,
Dave Hyatt
no flags
Details
Formatted Diff
Diff
Patch that addresses some of maciej's comments and includes his hash fixes
(807.20 KB, patch)
2005-07-08 20:01 PDT
,
Dave Hyatt
mjs
: review+
Details
Formatted Diff
Diff
Show Obsolete
(16)
View All
Add attachment
proposed patch, testcase, etc.
Dave Hyatt
Comment 1
2005-06-09 22:13:46 PDT
Created
attachment 2199
[details]
Work in progress (does not compile) Just a snapshot of work in progress. This does not compile.
Dave Hyatt
Comment 2
2005-06-14 15:46:41 PDT
Created
attachment 2341
[details]
Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags
John Sullivan
Comment 3
2005-06-14 15:49:08 PDT
Comment on
attachment 2341
[details]
Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags I reviewed this with Dave. We discovered that the <pre> problem was not as widespread as we had feared, so it's not an update candidate-worthy bug.
Dave Hyatt
Comment 4
2005-06-14 17:39:43 PDT
Created
attachment 2343
[details]
Add initial implementations of QualifiedName and the HTMLNames classes. This patch adds these two classes to the build. They are not used by anyone yet.
Darin Adler
Comment 5
2005-06-15 10:11:51 PDT
Comment on
attachment 2343
[details]
Add initial implementations of QualifiedName and the HTMLNames classes. Looks like this won't compile due to m_inner vs. m_impl and QualifiedNameImpl vs. QualifiedNameInner. The implementation file should be wrapped in namespace DOM rather than having a using namespace DOM at the top of the file. I suggest that m_qNameCache just be a private global inside the .cpp file (using static) and not a member of the QualifiedName class; no need to expose it, and then you don't have to include <qptrdict.h> in the header. I prefer that you put spaces after the < when you have to put a space before the > in expressions like: QPtrDict< QPtrDict< QPtrDict<QualifiedNameInner> > > Why all the (void*) casts? Those shouldn't be necessary. Are they for const casting? Can we change HTMLNames from a class to a namespace? The advantage of that is that you can do a using on individual tags, and there's really nothing here that calls for a class. Other than these nitpicks, it looks great. I think I'll mark it review- just to be pedantic.
Dave Hyatt
Comment 6
2005-06-15 12:45:59 PDT
Created
attachment 2369
[details]
Second pass at QualifiedName and HTMLNames classes Address Darin and Maciej's comments. I do not want to make HTMLNames a namespace just yet, since I envision it growing to include more factory-based functionality possibly at a later date, i.e., I'm not convinced it's going to just stay a "names" class.
Darin Adler
Comment 7
2005-06-15 14:04:27 PDT
Comment on
attachment 2369
[details]
Second pass at QualifiedName and HTMLNames classes Should use hasOneRef() instead of refCount() == 1; it's supposed to be for just this sort of thing. I still think you can use a namespace, regardless of the "factory" issue. Namespaces can have functions in them too. The only reason to have a class is if you want to create instances. No benefit to using a class to group a set of global functions. Anyway, r=me.
Darin Adler
Comment 8
2005-06-15 14:04:31 PDT
Comment on
attachment 2369
[details]
Second pass at QualifiedName and HTMLNames classes Should use hasOneRef() instead of refCount() == 1; it's supposed to be for just this sort of thing. I still think you can use a namespace, regardless of the "factory" issue. Namespaces can have functions in them too. The only reason to have a class is if you want to create instances. No benefit to using a class to group a set of global functions. Anyway, r=me.
Maciej Stachowiak
Comment 9
2005-06-15 15:36:42 PDT
ditto on the review
Dave Hyatt
Comment 10
2005-06-16 16:26:02 PDT
Created
attachment 2407
[details]
More progress. kjs_html.cpp is done. The htmltags and scripts to make them have been removed. This is an update (still does not compile). The next big structural issue is the elimination of dtd.h and dtd.cpp completely. Validation code is going to move into the elements themselves and check against hashes in HTMLElement.
Dave Hyatt
Comment 11
2005-06-16 18:50:09 PDT
Created
attachment 2412
[details]
This patch converts all of the parser's error handling logic. This patch nearly completes the parser's conversion. I still need to handle isInline (hopefully in terms of tag priorities, which are closely related) and element creation. There is also a DTD check in the table error handling code that I will need to fix up.
Dave Hyatt
Comment 12
2005-06-20 14:12:36 PDT
Created
attachment 2510
[details]
Next round. Converts much more of the code. This patch gets the tokenizer and parser to compile, but there's still some work to do on them. Rendering and editing subdirectories are fully converted in this patch and complete.
Dave Hyatt
Comment 13
2005-06-22 16:41:39 PDT
Created
attachment 2558
[details]
Re-implement CSS namespace support to use QualifiedNames. Finish the HTML parser (except for 2 remaining issues). This patch finishes the HTML parser (only 2 open issues remain). It also re-implements CSS namespace support using QualifiedName (allowing another class, XMLNamespaceTable, to be removed from the build). Note that namespaced attributes in CSS are now broken, but this feature is minor enough that it can be fixed later when I do attribute names' conversion.
Dave Hyatt
Comment 14
2005-06-22 18:27:50 PDT
Created
attachment 2561
[details]
Factor out element creation code and get dom_docimpl.cpp compiling This patch factors out the element creation code into a single code path that goes through a new object, HTMLElementFactory.
Dave Hyatt
Comment 15
2005-06-23 15:10:07 PDT
Created
attachment 2608
[details]
Complete the tagpriority/tagstatus movement into the element code. This patch implements the isInline method in the parser and stops using renderers to make the determination (thus eliminating any potential variance in rendering caused by deferred renderer construction). The tag priority and tag status rules are fully implemented using functions on the elements now. The two tables that were used in dtd.cpp are gone.
Dave Hyatt
Comment 16
2005-06-23 20:28:03 PDT
Created
attachment 2613
[details]
Patch that compiles and works (mostly). The following patch compiles and can display some Web pages. The following open issues remain: (1) Redo tables for dtd (2) Fix the checkchild cases in parser/tokenizer/dom_element (3) Re-implement createElementNS (4) Re-implement createElement (5) checkSetPrefix is now busted (6) XMLElementImpl cloneNode issue and construction issue, move to ElementImpl (7) getElementsByTagName needs to work again (TagNodeListImpl) (8) Print namespace prefixes in selectortext now that CSS selectors store them
Dave Hyatt
Comment 17
2005-07-05 16:41:31 PDT
Created
attachment 2823
[details]
Patch that passes all layout tests This patch passes all layout tests (with some test results being adjusted). Performance analysis coming next.
Dave Hyatt
Comment 18
2005-07-05 19:16:19 PDT
Performance analysis shows: 0.145 avg. cached time without patch 0.155 avg. cached time with patch Slowdown is about 7%. That's the bad news. The good news is none of the slowdown is hiding and is all clearly visible when sampling. The HTMLNames accessors account for about 5-6%, and the operator == comparisons between QNames and AtomicStrings account for the remaining 1%. I think there are two problems that could be fixed pretty easily: (1) The accessors on HTMLNames need to be inlined, and that means having them return real global variables. (2) Introduce more hashtables. The element creation code and the parser are the two primary problems. I think starting with the element creation code will be helpful.
Dave Hyatt
Comment 19
2005-07-06 23:18:28 PDT
Created
attachment 2842
[details]
Ready for review. Now faster than trunk. This patch adds a lot of hashing, inlines and provides globals for HTMLNames access, inlines the QName comparison operator, optimizes lowercasing in the tokenizer, and is now faster than the trunk. 0.141 compared with 0.145. Ready for an initial review.
Dave Hyatt
Comment 20
2005-07-06 23:39:47 PDT
Never mind. I had been testing with harrison's regression when I established the baseline. The real # to beat is 0.139. So the patch is still slightly slower. Fixing handleError and getNode should get the speed back.
David Storey
Comment 21
2005-07-07 09:02:23 PDT
Just a little nitpick. I noticed in a few places that 'else' and 'else if' statements use braces on new lines such as + else if (localName == starAtom && cs->attr == ATTR_CLASS && cs->match == CSSSelector::Class) { as per style guidelines, should it not be like: } else if (condition) { and } else { And also some lines like + if ( prof == DefaultGUI ) . Should this not be like if (prof == DefaultGUI), without the spaces. Sorry for being a pain.
Dave Hyatt
Comment 22
2005-07-07 20:23:32 PDT
Created
attachment 2856
[details]
Convert getNode to use a hash. Convert some QPtrDicts to HashSets. This patch scores 0.140-0.141, compared with the trunk's 0.138-0.139. The reason why is pretty clear on the profile. AtomicString::add is about 1%, and 0.6% of that is the equality check that has to be done because it is not a perfect hash like the old gperf was. The equal method unfortunately is almost always having to scan the entire string, because most of the time, what you find in the hash table *is* equal, and so you hit the worst-case scenario of that function most of the time. I am unsure how to proceed now, other than to find another speedup somewhere to counteract the slowdown from not having a perfect hash.
Maciej Stachowiak
Comment 23
2005-07-07 21:22:06 PDT
Actually, the perfect hash has to do an equality check too. While there are no collisions for things in the hash, you can still miss entirely. However, I can find ways to micro-optimize the equality check. Patch is missing htmlfactory.{cpp,h} though so I can't build it.
Dave Hyatt
Comment 24
2005-07-07 23:08:54 PDT
Created
attachment 2860
[details]
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds Maciej has ideas for how to further improve equality testing (and landed a patch that helps). I implemented three additional optimizations to gain 2ms of PLT time. (1) updatePseudoChild in RenderContainer is better about bailing early now when no :before/:after content is used. getPseudoStyle in RenderStyle has been inlined as well (and is also smarter about short-circuiting). (2) isBreakable no longer checks direction() for ASCII characters. This check was always returning false anyway. (3) QFont now caches whether or not it is fixed pitch to avoid a dictionary lookup and a call across the bridge.
Dave Hyatt
Comment 25
2005-07-07 23:11:23 PDT
Comment on
attachment 2860
[details]
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds I'm missing some files. Need to add them.
Dave Hyatt
Comment 26
2005-07-07 23:14:52 PDT
Created
attachment 2861
[details]
Make sure to include the HTMLElementFactory. Same as previous patch, but includes htmlfactory.h/.cpp.
Dave Hyatt
Comment 27
2005-07-08 20:01:43 PDT
Created
attachment 2873
[details]
Patch that addresses some of maciej's comments and includes his hash fixes This patch includes some micro-optimizations Maciej made to the hashing done by AtomicString, and it also addresses his review comments (eliminating XHTML_NAMESPACE and renaming GetSetInfo to Accessors in kjs_html.h/.cpp).
Maciej Stachowiak
Comment 28
2005-07-09 01:12:20 PDT
Comment on
attachment 2873
[details]
Patch that addresses some of maciej's comments and includes his hash fixes r=me with the following caveats: 1) I think the implicit conversions on QualifiedName are dangerous and could lead to people getting themselves in trouble. 2) Please add a layout test for the behavior change on tag name validation (maybe something that uses a tag name wiht _ or * in it).
Dave Hyatt
Comment 29
2005-07-09 13:46:30 PDT
Fixed
Lucas Forschler
Comment 30
2019-02-06 09:03:38 PST
Mass moving XML DOM bugs to the "DOM" Component.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug