WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
40554
Rename HTMLParser to LegacyHTMLTreeConstructor
https://bugs.webkit.org/show_bug.cgi?id=40554
Summary
Rename HTMLParser to LegacyHTMLTreeConstructor
Eric Seidel (no email)
Reported
2010-06-13 18:31:03 PDT
Rename HTMLParser to HTMLTreeBuilder
Attachments
Patch
(60.55 KB, patch)
2010-06-13 18:39 PDT
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Patch for landing
(51.61 KB, patch)
2010-06-13 21:09 PDT
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Patch for landing
(120.50 KB, patch)
2010-06-13 21:19 PDT
,
Eric Seidel (no email)
eric
: commit-queue+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Eric Seidel (no email)
Comment 1
2010-06-13 18:39:28 PDT
Created
attachment 58613
[details]
Patch
Adam Barth
Comment 2
2010-06-13 19:05:51 PDT
Comment on
attachment 58613
[details]
Patch Is TreeBuilder a name we made up, or is it something from the spec? In any case, this class is only part of the parser, so HTMLParser is/was a misnomer.
Eric Seidel (no email)
Comment 3
2010-06-13 19:09:23 PDT
I thought it was in the spec, but I guess the spec calls it "tree construction":
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tree-construction
We could call it the TreeConstructor?
Darin Adler
Comment 4
2010-06-13 19:39:28 PDT
(In reply to
comment #3
)
> We could call it the TreeConstructor?
That seems like a good choice; the name is both clear and matches the specification. We could also break out the tokenizer part of the parser into a separate class at some point.
Eric Seidel (no email)
Comment 5
2010-06-13 20:43:26 PDT
In the new world, the Tokenizer (named HTML5Lexer) is a separate class. I don't think anyone is going to be hacking on the old HTMLDocumentParser (formerly HTMLTokenizer) anytime soon. I'm happy to regen this patch as HTMLTreeConstructor
Darin Adler
Comment 6
2010-06-13 20:58:26 PDT
I think it’s slightly unfortunate that the future parser is named HTML5Parser, since I assume we will be using it even once HTML becomes HTML6. And the old parser is named HTMLParser, but really it soon will be obsolete and perhaps even deleted. I suppose that it’s too late to discuss this since we’ve already done the renaming, though. Generally speaking the new stuff should have the most beautiful names and the old stuff needs to have some sort of old prefix or name that points to a specific point in time.
Eric Seidel (no email)
Comment 7
2010-06-13 21:02:57 PDT
My vision was that we would rename HTMLParser to LegacyHTMLDocumentParser and HTML5DocumentParser to HTMLDocumentParser once we've switched and stayed switched. Seemed premature at this point. Then again, renamings may have been pre-mature. They were prompted by our attempts as writing emails to explain what we've done and the class names being horribly confusing. :)
Adam Barth
Comment 8
2010-06-13 21:08:24 PDT
I might be overly optimistic, but I'm hopeful we'll eventually be able to remove the legacy parsing classes.
Eric Seidel (no email)
Comment 9
2010-06-13 21:09:22 PDT
Created
attachment 58619
[details]
Patch for landing
Darin Adler
Comment 10
2010-06-13 21:10:59 PDT
I expect we will be able to remove them too. So I think with planning we would want the new parser to have a name that reflects that optimism so we can avoid a rename later. Similarly if we are doing a rename for clarity I suggest going straight to that “Legacy” naming. Anyway, I think we’re good the way we are now, but the general principle that things should not have a “New” prefix or things that are “HTML5 and beyond” should not be named “HTML5” is what I wanted to mention. In case it informs future decisions.
Adam Barth
Comment 11
2010-06-13 21:15:17 PDT
I'm slightly sad we're bike shedding about naming for classes we're going to delete, but I can see how going straight to "legacy" might be clearer to some.
Eric Seidel (no email)
Comment 12
2010-06-13 21:19:26 PDT
Created
attachment 58620
[details]
Patch for landing
Darin Adler
Comment 13
2010-06-13 21:23:01 PDT
(In reply to
comment #11
)
> I'm slightly sad we're bike shedding about naming for classes we're going to delete, but I can see how going straight to "legacy" might be clearer to some.
My primary comments are about the names of the new classes. My primary concern is that we’re giving the old classes the names that the new classes should have later. I don’t think we’re “bike shedding” yet.
Adam Barth
Comment 14
2010-06-13 21:30:53 PDT
> My primary comments are about the names of the new classes. My primary concern is that we’re giving the old classes the names that the new classes should have later.
My guess is the next phase of development is going to involve a lot of discussion comparing the behavior of the legacy parser with the HTML5 parser. For example, here's a doc we wrote up recently entitled "HTML5 parser vs WebKit legacy parser"
https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en
As the new parsing algorithm is commonly known in the industry as the "HTML5 parser" (e.g.,
http://hacks.mozilla.org/2010/05/firefox-4-the-html5-parser-inline-svg-speed-and-more/comment-page-1/
), that seems like reasonable terminology to use, at least in the interim. In the long term, however, I agree that we'll want to remove the "5" from the class names. My guess is a good time to do that will be once we're happy with it's behavior and are no longer comparing it with the old parser in detail.
Eric Seidel (no email)
Comment 15
2010-06-13 22:20:13 PDT
Committed
r61107
: <
http://trac.webkit.org/changeset/61107
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug