3405 – Replace Id for tag names with QualifiedName

RESOLVED FIXED 3405

Replace Id for tag names with QualifiedName

https://bugs.webkit.org/show_bug.cgi?id=3405

Summary Replace Id for tag names with QualifiedName

Dave Hyatt

Reported 2005-06-09 22:12:34 PDT

This bug is tracking the eventual elimination of the gperf-based Id representation of tag names KHTML and its replacement with a new class called QualifiedName, a smart hashed class that hold a prefix, a namespace URI and a localName all as atomic strings. This conversion will reduce the memory footprint of the DOM and simplify the DOM significantly, since numerous classes like XMLElementImpl and HTMLGenericElementImpl can simply go away.

Attachments
Work in progress (does not compile) (351.25 KB, patch) 2005-06-09 22:13 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags (2.75 KB, patch) 2005-06-14 15:46 PDT, Dave Hyatt	sullivan: review+	Details Formatted Diff Diff
Add initial implementations of QualifiedName and the HTMLNames classes. (29.43 KB, patch) 2005-06-14 17:39 PDT, Dave Hyatt	darin: review-	Details Formatted Diff Diff
Second pass at QualifiedName and HTMLNames classes (28.90 KB, patch) 2005-06-15 12:45 PDT, Dave Hyatt	darin: review+	Details Formatted Diff Diff
More progress. kjs_html.cpp is done. The htmltags and scripts to make them have been removed. (434.56 KB, patch) 2005-06-16 16:26 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
This patch converts all of the parser's error handling logic. (469.54 KB, patch) 2005-06-16 18:50 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Next round. Converts much more of the code. (546.29 KB, patch) 2005-06-20 14:12 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Re-implement CSS namespace support to use QualifiedNames. Finish the HTML parser (except for 2 remaining issues). (578.86 KB, patch) 2005-06-22 16:41 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Factor out element creation code and get dom_docimpl.cpp compiling (609.92 KB, patch) 2005-06-22 18:27 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Complete the tagpriority/tagstatus movement into the element code. (621.81 KB, patch) 2005-06-23 15:10 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Patch that compiles and works (mostly). (671.83 KB, patch) 2005-06-23 20:28 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Patch that passes all layout tests (696.29 KB, patch) 2005-07-05 16:41 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Ready for review. Now faster than trunk. (768.70 KB, patch) 2005-07-06 23:18 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Convert getNode to use a hash. Convert some QPtrDicts to HashSets. (769.47 KB, patch) 2005-07-07 20:23 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Implement unrelated micro-optimizations to get back the remaining couple of milliseconds (775.47 KB, patch) 2005-07-07 23:08 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Make sure to include the HTMLElementFactory. (797.12 KB, patch) 2005-07-07 23:14 PDT, Dave Hyatt	no flags	Details Formatted Diff Diff
Patch that addresses some of maciej's comments and includes his hash fixes (807.20 KB, patch) 2005-07-08 20:01 PDT, Dave Hyatt	mjs: review+	Details Formatted Diff Diff
Show Obsolete (16) View All Add attachment proposed patch, testcase, etc.

Dave Hyatt

Comment 1 2005-06-09 22:13:46 PDT

Created attachment 2199 [details] Work in progress (does not compile) Just a snapshot of work in progress. This does not compile.

Dave Hyatt

Comment 2 2005-06-14 15:46:41 PDT

Created attachment 2341 [details] Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags

John Sullivan

Comment 3 2005-06-14 15:49:08 PDT

Comment on attachment 2341 [details] Groundwork: eliminate the need to query for FORBIDDEN by fixing our newline handing for pre and other tags I reviewed this with Dave. We discovered that the <pre> problem was not as widespread as we had feared, so it's not an update candidate-worthy bug.

Dave Hyatt

Comment 4 2005-06-14 17:39:43 PDT

Created attachment 2343 [details] Add initial implementations of QualifiedName and the HTMLNames classes. This patch adds these two classes to the build. They are not used by anyone yet.

Darin Adler

Comment 5 2005-06-15 10:11:51 PDT

Comment on attachment 2343 [details] Add initial implementations of QualifiedName and the HTMLNames classes. Looks like this won't compile due to m_inner vs. m_impl and QualifiedNameImpl vs. QualifiedNameInner. The implementation file should be wrapped in namespace DOM rather than having a using namespace DOM at the top of the file. I suggest that m_qNameCache just be a private global inside the .cpp file (using static) and not a member of the QualifiedName class; no need to expose it, and then you don't have to include <qptrdict.h> in the header. I prefer that you put spaces after the < when you have to put a space before the > in expressions like: QPtrDict< QPtrDict< QPtrDict<QualifiedNameInner> > > Why all the (void*) casts? Those shouldn't be necessary. Are they for const casting? Can we change HTMLNames from a class to a namespace? The advantage of that is that you can do a using on individual tags, and there's really nothing here that calls for a class. Other than these nitpicks, it looks great. I think I'll mark it review- just to be pedantic.

Dave Hyatt

Comment 6 2005-06-15 12:45:59 PDT

Created attachment 2369 [details] Second pass at QualifiedName and HTMLNames classes Address Darin and Maciej's comments. I do not want to make HTMLNames a namespace just yet, since I envision it growing to include more factory-based functionality possibly at a later date, i.e., I'm not convinced it's going to just stay a "names" class.

Darin Adler

Comment 7 2005-06-15 14:04:27 PDT

Comment on attachment 2369 [details] Second pass at QualifiedName and HTMLNames classes Should use hasOneRef() instead of refCount() == 1; it's supposed to be for just this sort of thing. I still think you can use a namespace, regardless of the "factory" issue. Namespaces can have functions in them too. The only reason to have a class is if you want to create instances. No benefit to using a class to group a set of global functions. Anyway, r=me.

Darin Adler

Comment 8 2005-06-15 14:04:31 PDT

Maciej Stachowiak

Comment 9 2005-06-15 15:36:42 PDT

ditto on the review

Dave Hyatt

Comment 10 2005-06-16 16:26:02 PDT

Created attachment 2407 [details] More progress. kjs_html.cpp is done. The htmltags and scripts to make them have been removed. This is an update (still does not compile). The next big structural issue is the elimination of dtd.h and dtd.cpp completely. Validation code is going to move into the elements themselves and check against hashes in HTMLElement.

Dave Hyatt

Comment 11 2005-06-16 18:50:09 PDT

Created attachment 2412 [details] This patch converts all of the parser's error handling logic. This patch nearly completes the parser's conversion. I still need to handle isInline (hopefully in terms of tag priorities, which are closely related) and element creation. There is also a DTD check in the table error handling code that I will need to fix up.

Dave Hyatt

Comment 12 2005-06-20 14:12:36 PDT

Created attachment 2510 [details] Next round. Converts much more of the code. This patch gets the tokenizer and parser to compile, but there's still some work to do on them. Rendering and editing subdirectories are fully converted in this patch and complete.

Dave Hyatt

Comment 13 2005-06-22 16:41:39 PDT

Created attachment 2558 [details] Re-implement CSS namespace support to use QualifiedNames. Finish the HTML parser (except for 2 remaining issues). This patch finishes the HTML parser (only 2 open issues remain). It also re-implements CSS namespace support using QualifiedName (allowing another class, XMLNamespaceTable, to be removed from the build). Note that namespaced attributes in CSS are now broken, but this feature is minor enough that it can be fixed later when I do attribute names' conversion.

Dave Hyatt

Comment 14 2005-06-22 18:27:50 PDT

Created attachment 2561 [details] Factor out element creation code and get dom_docimpl.cpp compiling This patch factors out the element creation code into a single code path that goes through a new object, HTMLElementFactory.

Dave Hyatt

Comment 15 2005-06-23 15:10:07 PDT

Created attachment 2608 [details] Complete the tagpriority/tagstatus movement into the element code. This patch implements the isInline method in the parser and stops using renderers to make the determination (thus eliminating any potential variance in rendering caused by deferred renderer construction). The tag priority and tag status rules are fully implemented using functions on the elements now. The two tables that were used in dtd.cpp are gone.

Dave Hyatt

Comment 16 2005-06-23 20:28:03 PDT

Created attachment 2613 [details] Patch that compiles and works (mostly). The following patch compiles and can display some Web pages. The following open issues remain: (1) Redo tables for dtd (2) Fix the checkchild cases in parser/tokenizer/dom_element (3) Re-implement createElementNS (4) Re-implement createElement (5) checkSetPrefix is now busted (6) XMLElementImpl cloneNode issue and construction issue, move to ElementImpl (7) getElementsByTagName needs to work again (TagNodeListImpl) (8) Print namespace prefixes in selectortext now that CSS selectors store them

Dave Hyatt

Comment 17 2005-07-05 16:41:31 PDT

Created attachment 2823 [details] Patch that passes all layout tests This patch passes all layout tests (with some test results being adjusted). Performance analysis coming next.

Dave Hyatt

Comment 18 2005-07-05 19:16:19 PDT

Performance analysis shows: 0.145 avg. cached time without patch 0.155 avg. cached time with patch Slowdown is about 7%. That's the bad news. The good news is none of the slowdown is hiding and is all clearly visible when sampling. The HTMLNames accessors account for about 5-6%, and the operator == comparisons between QNames and AtomicStrings account for the remaining 1%. I think there are two problems that could be fixed pretty easily: (1) The accessors on HTMLNames need to be inlined, and that means having them return real global variables. (2) Introduce more hashtables. The element creation code and the parser are the two primary problems. I think starting with the element creation code will be helpful.

Dave Hyatt

Comment 19 2005-07-06 23:18:28 PDT

Created attachment 2842 [details] Ready for review. Now faster than trunk. This patch adds a lot of hashing, inlines and provides globals for HTMLNames access, inlines the QName comparison operator, optimizes lowercasing in the tokenizer, and is now faster than the trunk. 0.141 compared with 0.145. Ready for an initial review.

Dave Hyatt

Comment 20 2005-07-06 23:39:47 PDT

Never mind. I had been testing with harrison's regression when I established the baseline. The real # to beat is 0.139. So the patch is still slightly slower. Fixing handleError and getNode should get the speed back.

David Storey

Comment 21 2005-07-07 09:02:23 PDT

Just a little nitpick. I noticed in a few places that 'else' and 'else if' statements use braces on new lines such as + else if (localName == starAtom && cs->attr == ATTR_CLASS && cs->match == CSSSelector::Class) { as per style guidelines, should it not be like: } else if (condition) { and } else { And also some lines like + if ( prof == DefaultGUI ) . Should this not be like if (prof == DefaultGUI), without the spaces. Sorry for being a pain.

Dave Hyatt

Comment 22 2005-07-07 20:23:32 PDT

Created attachment 2856 [details] Convert getNode to use a hash. Convert some QPtrDicts to HashSets. This patch scores 0.140-0.141, compared with the trunk's 0.138-0.139. The reason why is pretty clear on the profile. AtomicString::add is about 1%, and 0.6% of that is the equality check that has to be done because it is not a perfect hash like the old gperf was. The equal method unfortunately is almost always having to scan the entire string, because most of the time, what you find in the hash table *is* equal, and so you hit the worst-case scenario of that function most of the time. I am unsure how to proceed now, other than to find another speedup somewhere to counteract the slowdown from not having a perfect hash.

Maciej Stachowiak

Comment 23 2005-07-07 21:22:06 PDT

Actually, the perfect hash has to do an equality check too. While there are no collisions for things in the hash, you can still miss entirely. However, I can find ways to micro-optimize the equality check. Patch is missing htmlfactory.{cpp,h} though so I can't build it.

Dave Hyatt

Comment 24 2005-07-07 23:08:54 PDT

Created attachment 2860 [details] Implement unrelated micro-optimizations to get back the remaining couple of milliseconds Maciej has ideas for how to further improve equality testing (and landed a patch that helps). I implemented three additional optimizations to gain 2ms of PLT time. (1) updatePseudoChild in RenderContainer is better about bailing early now when no :before/:after content is used. getPseudoStyle in RenderStyle has been inlined as well (and is also smarter about short-circuiting). (2) isBreakable no longer checks direction() for ASCII characters. This check was always returning false anyway. (3) QFont now caches whether or not it is fixed pitch to avoid a dictionary lookup and a call across the bridge.

Dave Hyatt

Comment 25 2005-07-07 23:11:23 PDT

Comment on attachment 2860 [details] Implement unrelated micro-optimizations to get back the remaining couple of milliseconds I'm missing some files. Need to add them.

Dave Hyatt

Comment 26 2005-07-07 23:14:52 PDT

Created attachment 2861 [details] Make sure to include the HTMLElementFactory. Same as previous patch, but includes htmlfactory.h/.cpp.

Dave Hyatt

Comment 27 2005-07-08 20:01:43 PDT

Created attachment 2873 [details] Patch that addresses some of maciej's comments and includes his hash fixes This patch includes some micro-optimizations Maciej made to the hashing done by AtomicString, and it also addresses his review comments (eliminating XHTML_NAMESPACE and renaming GetSetInfo to Accessors in kjs_html.h/.cpp).

Maciej Stachowiak

Comment 28 2005-07-09 01:12:20 PDT

Comment on attachment 2873 [details] Patch that addresses some of maciej's comments and includes his hash fixes r=me with the following caveats: 1) I think the implicit conversions on QualifiedName are dangerous and could lead to people getting themselves in trouble. 2) Please add a layout test for the behavior change on tag name validation (maybe something that uses a tag name wiht _ or * in it).

Dave Hyatt

Comment 29 2005-07-09 13:46:30 PDT

Fixed

Lucas Forschler

Comment 30 2019-02-06 09:03:38 PST

Mass moving XML DOM bugs to the "DOM" Component.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 412

Hardware Mac

OS OS X 10.4

Product WebKit

Component DOM

Assignee

Dave Hyatt

Reported

2005-06-09 22:12 PDT

Modified

2019-02-06 09:03 PST History

CC List

5 users Show

URL

Keywords

Depends on

Blocks