Bug 45649 - Move adjustLexerState to the HTMLTokenizer
: Move adjustLexerState to the HTMLTokenizer
Status: RESOLVED FIXED
Product: WebKit
Classification: Unclassified
Component: New Bugs
: 528+ (Nightly build)
: Other Mac OS X 10.5
: P2 Normal
Assigned To: Adam Barth
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-13 03:23 PDT by Adam Barth
Modified: 2010-09-14 09:14 PDT (History)
5 users (show)

See Also:


Attachments
Patch (7.94 KB, patch)
2010-09-13 03:25 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch (7.87 KB, patch)
2010-09-13 11:53 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch for landing (11.95 KB, patch)
2010-09-13 12:29 PDT, Adam Barth
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Barth 2010-09-13 03:23:40 PDT
Move adjustLexerState to the HTMLTokenizer
Comment 1 Adam Barth 2010-09-13 03:25:54 PDT
Created attachment 67391 [details]
Patch
Comment 2 Eric Seidel 2010-09-13 10:05:06 PDT
Comment on attachment 67391 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=67391&action=prettypatch

Seems we could remove includes of TreeBuilder.h now, no?

> WebCore/html/parser/HTMLTokenizer.h:142
> +    // Updates the tokenizer's state according to the given tag name.  This is
> +    // an approxmation of how the tree builder would update the tokenizer's
> +    // state.  This method is useful for approximating HTML tokenization.  To
> +    // get exactly the correct tokenization, you need the real tree builder.
> +    void updateState(const AtomicString& tagName, Frame*);
If this is an approximation, can we name it to say such?  approximateStateForTag or something?
Comment 3 Adam Barth 2010-09-13 11:53:31 PDT
Created attachment 67446 [details]
Patch
Comment 4 Darin Adler 2010-09-13 12:01:14 PDT
Comment on attachment 67446 [details]
Patch

> +    // Updates the tokenizer's state according to the given tag name.  This is
> +    // an approxmation of how the tree builder would update the tokenizer's
> +    // state.  This method is useful for approximating HTML tokenization.  To
> +    // get exactly the correct tokenization, you need the real tree builder.

Typo "approxmation"
Two spaces after periods.

I am uncomfortable with the use of the verb "approximate" here. First, it’s easy to confuse the verb with the adjective. And if this was an adjectival form then it would have a return value.

Your comment uses the verb "update" and I think that should be in the function name. Maybe updateStateForFakeElement? Or something else like that? Probably not because the element is not fake.

Another problem with "approximate" is that it doesn’t say what’s wrong. To decide if it’s OK to use the function you need to know how far off “correct” it is, not just that it’s not perfect. It’s not enough to say that it’s not 100% correct; that’s enough to get someone off the hook in a court of law or mathematical proof but is not so great for high-quality software development.

commit-queue- to give you a chance to consider my comments
Comment 5 Adam Barth 2010-09-13 12:05:41 PDT
> Your comment uses the verb "update" and I think that should be in the function name. Maybe updateStateForFakeElement? Or something else like that? Probably not because the element is not fake.

:)

I originally called it "updateState" but Eric asked me to change it.

> Another problem with "approximate" is that it doesn’t say what’s wrong. To decide if it’s OK to use the function you need to know how far off “correct” it is, not just that it’s not perfect. It’s not enough to say that it’s not 100% correct; that’s enough to get someone off the hook in a court of law or mathematical proof but is not so great for high-quality software development.

We might eventually need something fancier here where we run the tree builder in a side-effect free way.  The main thing this function doesn't understand is foreign content.  I'll add more detail to the comment.
Comment 6 Adam Barth 2010-09-13 12:29:07 PDT
Created attachment 67454 [details]
Patch for landing
Comment 7 WebKit Commit Bot 2010-09-14 07:26:07 PDT
Comment on attachment 67454 [details]
Patch for landing

Clearing flags on attachment: 67454

Committed r67467: <http://trac.webkit.org/changeset/67467>
Comment 8 WebKit Commit Bot 2010-09-14 07:26:12 PDT
All reviewed patches have been landed.  Closing bug.
Comment 9 WebKit Review Bot 2010-09-14 09:14:14 PDT
http://trac.webkit.org/changeset/67467 might have broken GTK Linux 32-bit Debug
The following changes are on the blame list:
http://trac.webkit.org/changeset/67465
http://trac.webkit.org/changeset/67466
http://trac.webkit.org/changeset/67467
http://trac.webkit.org/changeset/67468