RESOLVED FIXED40554
Rename HTMLParser to LegacyHTMLTreeConstructor
https://bugs.webkit.org/show_bug.cgi?id=40554
Summary Rename HTMLParser to LegacyHTMLTreeConstructor
Eric Seidel (no email)
Reported 2010-06-13 18:31:03 PDT
Rename HTMLParser to HTMLTreeBuilder
Attachments
Patch (60.55 KB, patch)
2010-06-13 18:39 PDT, Eric Seidel (no email)
no flags
Patch for landing (51.61 KB, patch)
2010-06-13 21:09 PDT, Eric Seidel (no email)
no flags
Patch for landing (120.50 KB, patch)
2010-06-13 21:19 PDT, Eric Seidel (no email)
eric: commit-queue+
Eric Seidel (no email)
Comment 1 2010-06-13 18:39:28 PDT
Adam Barth
Comment 2 2010-06-13 19:05:51 PDT
Comment on attachment 58613 [details] Patch Is TreeBuilder a name we made up, or is it something from the spec? In any case, this class is only part of the parser, so HTMLParser is/was a misnomer.
Eric Seidel (no email)
Comment 3 2010-06-13 19:09:23 PDT
I thought it was in the spec, but I guess the spec calls it "tree construction": http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tree-construction We could call it the TreeConstructor?
Darin Adler
Comment 4 2010-06-13 19:39:28 PDT
(In reply to comment #3) > We could call it the TreeConstructor? That seems like a good choice; the name is both clear and matches the specification. We could also break out the tokenizer part of the parser into a separate class at some point.
Eric Seidel (no email)
Comment 5 2010-06-13 20:43:26 PDT
In the new world, the Tokenizer (named HTML5Lexer) is a separate class. I don't think anyone is going to be hacking on the old HTMLDocumentParser (formerly HTMLTokenizer) anytime soon. I'm happy to regen this patch as HTMLTreeConstructor
Darin Adler
Comment 6 2010-06-13 20:58:26 PDT
I think it’s slightly unfortunate that the future parser is named HTML5Parser, since I assume we will be using it even once HTML becomes HTML6. And the old parser is named HTMLParser, but really it soon will be obsolete and perhaps even deleted. I suppose that it’s too late to discuss this since we’ve already done the renaming, though. Generally speaking the new stuff should have the most beautiful names and the old stuff needs to have some sort of old prefix or name that points to a specific point in time.
Eric Seidel (no email)
Comment 7 2010-06-13 21:02:57 PDT
My vision was that we would rename HTMLParser to LegacyHTMLDocumentParser and HTML5DocumentParser to HTMLDocumentParser once we've switched and stayed switched. Seemed premature at this point. Then again, renamings may have been pre-mature. They were prompted by our attempts as writing emails to explain what we've done and the class names being horribly confusing. :)
Adam Barth
Comment 8 2010-06-13 21:08:24 PDT
I might be overly optimistic, but I'm hopeful we'll eventually be able to remove the legacy parsing classes.
Eric Seidel (no email)
Comment 9 2010-06-13 21:09:22 PDT
Created attachment 58619 [details] Patch for landing
Darin Adler
Comment 10 2010-06-13 21:10:59 PDT
I expect we will be able to remove them too. So I think with planning we would want the new parser to have a name that reflects that optimism so we can avoid a rename later. Similarly if we are doing a rename for clarity I suggest going straight to that “Legacy” naming. Anyway, I think we’re good the way we are now, but the general principle that things should not have a “New” prefix or things that are “HTML5 and beyond” should not be named “HTML5” is what I wanted to mention. In case it informs future decisions.
Adam Barth
Comment 11 2010-06-13 21:15:17 PDT
I'm slightly sad we're bike shedding about naming for classes we're going to delete, but I can see how going straight to "legacy" might be clearer to some.
Eric Seidel (no email)
Comment 12 2010-06-13 21:19:26 PDT
Created attachment 58620 [details] Patch for landing
Darin Adler
Comment 13 2010-06-13 21:23:01 PDT
(In reply to comment #11) > I'm slightly sad we're bike shedding about naming for classes we're going to delete, but I can see how going straight to "legacy" might be clearer to some. My primary comments are about the names of the new classes. My primary concern is that we’re giving the old classes the names that the new classes should have later. I don’t think we’re “bike shedding” yet.
Adam Barth
Comment 14 2010-06-13 21:30:53 PDT
> My primary comments are about the names of the new classes. My primary concern is that we’re giving the old classes the names that the new classes should have later. My guess is the next phase of development is going to involve a lot of discussion comparing the behavior of the legacy parser with the HTML5 parser. For example, here's a doc we wrote up recently entitled "HTML5 parser vs WebKit legacy parser" https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en As the new parsing algorithm is commonly known in the industry as the "HTML5 parser" (e.g., http://hacks.mozilla.org/2010/05/firefox-4-the-html5-parser-inline-svg-speed-and-more/comment-page-1/), that seems like reasonable terminology to use, at least in the interim. In the long term, however, I agree that we'll want to remove the "5" from the class names. My guess is a good time to do that will be once we're happy with it's behavior and are no longer comparing it with the old parser in detail.
Eric Seidel (no email)
Comment 15 2010-06-13 22:20:13 PDT
Note You need to log in before you can comment on or make changes to this bug.