40554 – Rename HTMLParser to LegacyHTMLTreeConstructor

RESOLVED FIXED40554

Rename HTMLParser to LegacyHTMLTreeConstructor

https://bugs.webkit.org/show_bug.cgi?id=40554

Summary Rename HTMLParser to LegacyHTMLTreeConstructor

Eric Seidel (no email)

Reported 2010-06-13 18:31:03 PDT

Rename HTMLParser to HTMLTreeBuilder

Attachments
Patch (60.55 KB, patch) 2010-06-13 18:39 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch for landing (51.61 KB, patch) 2010-06-13 21:09 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch for landing (120.50 KB, patch) 2010-06-13 21:19 PDT, Eric Seidel (no email)	eric: commit-queue+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2010-06-13 18:39:28 PDT

Created attachment 58613 [details] Patch

Adam Barth

Comment 2 2010-06-13 19:05:51 PDT

Comment on attachment 58613 [details] Patch Is TreeBuilder a name we made up, or is it something from the spec? In any case, this class is only part of the parser, so HTMLParser is/was a misnomer.

Eric Seidel (no email)

Comment 3 2010-06-13 19:09:23 PDT

I thought it was in the spec, but I guess the spec calls it "tree construction": http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tree-construction We could call it the TreeConstructor?

Darin Adler

Comment 4 2010-06-13 19:39:28 PDT

(In reply to comment #3) > We could call it the TreeConstructor? That seems like a good choice; the name is both clear and matches the specification. We could also break out the tokenizer part of the parser into a separate class at some point.

Eric Seidel (no email)

Comment 5 2010-06-13 20:43:26 PDT

In the new world, the Tokenizer (named HTML5Lexer) is a separate class. I don't think anyone is going to be hacking on the old HTMLDocumentParser (formerly HTMLTokenizer) anytime soon. I'm happy to regen this patch as HTMLTreeConstructor

Darin Adler

Comment 6 2010-06-13 20:58:26 PDT

I think it’s slightly unfortunate that the future parser is named HTML5Parser, since I assume we will be using it even once HTML becomes HTML6. And the old parser is named HTMLParser, but really it soon will be obsolete and perhaps even deleted. I suppose that it’s too late to discuss this since we’ve already done the renaming, though. Generally speaking the new stuff should have the most beautiful names and the old stuff needs to have some sort of old prefix or name that points to a specific point in time.

Eric Seidel (no email)

Comment 7 2010-06-13 21:02:57 PDT

My vision was that we would rename HTMLParser to LegacyHTMLDocumentParser and HTML5DocumentParser to HTMLDocumentParser once we've switched and stayed switched. Seemed premature at this point. Then again, renamings may have been pre-mature. They were prompted by our attempts as writing emails to explain what we've done and the class names being horribly confusing. :)

Adam Barth

Comment 8 2010-06-13 21:08:24 PDT

I might be overly optimistic, but I'm hopeful we'll eventually be able to remove the legacy parsing classes.

Eric Seidel (no email)

Comment 9 2010-06-13 21:09:22 PDT

Created attachment 58619 [details] Patch for landing

Darin Adler

Comment 10 2010-06-13 21:10:59 PDT

I expect we will be able to remove them too. So I think with planning we would want the new parser to have a name that reflects that optimism so we can avoid a rename later. Similarly if we are doing a rename for clarity I suggest going straight to that “Legacy” naming. Anyway, I think we’re good the way we are now, but the general principle that things should not have a “New” prefix or things that are “HTML5 and beyond” should not be named “HTML5” is what I wanted to mention. In case it informs future decisions.

Adam Barth

Comment 11 2010-06-13 21:15:17 PDT

I'm slightly sad we're bike shedding about naming for classes we're going to delete, but I can see how going straight to "legacy" might be clearer to some.

Eric Seidel (no email)

Comment 12 2010-06-13 21:19:26 PDT

Created attachment 58620 [details] Patch for landing

Darin Adler

Comment 13 2010-06-13 21:23:01 PDT

(In reply to comment #11) > I'm slightly sad we're bike shedding about naming for classes we're going to delete, but I can see how going straight to "legacy" might be clearer to some. My primary comments are about the names of the new classes. My primary concern is that we’re giving the old classes the names that the new classes should have later. I don’t think we’re “bike shedding” yet.

Adam Barth

Comment 14 2010-06-13 21:30:53 PDT

> My primary comments are about the names of the new classes. My primary concern is that we’re giving the old classes the names that the new classes should have later. My guess is the next phase of development is going to involve a lot of discussion comparing the behavior of the legacy parser with the HTML5 parser. For example, here's a doc we wrote up recently entitled "HTML5 parser vs WebKit legacy parser" https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en As the new parsing algorithm is commonly known in the industry as the "HTML5 parser" (e.g., http://hacks.mozilla.org/2010/05/firefox-4-the-html5-parser-inline-svg-speed-and-more/comment-page-1/), that seems like reasonable terminology to use, at least in the interim. In the long term, however, I agree that we'll want to remove the "5" from the class names. My guess is a good time to do that will be once we're happy with it's behavior and are no longer comparing it with the old parser in detail.

Eric Seidel (no email)

Comment 15 2010-06-13 22:20:13 PDT

Committed r61107: <http://trac.webkit.org/changeset/61107>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Other

OS OS X 10.5

Product WebKit

Component New Bugs

Assignee

Eric Seidel (no email)

Reported

2010-06-13 18:31 PDT

Modified

2010-06-13 22:20 PDT History

CC List

2 users Show

URL

Keywords

Depends on

Blocks