Created attachment 59318 [details] screenshot showing bad layout The MacRuby website does not layout correctly on r61502. The main body of the website is shifted to the right and lines up with the red boxes on the right instead of the left edge of the black navigation bar. See attached screenshot. The website looks correct on stock 10.6.4 WebKit.
Regressed in r61285.
Created attachment 59397 [details] reduced test case This test passes in Safari 5 and in Firefox 3.6.3, but not in IE8. The site is also happily broken in IE.
Thanks. I'll look into this today.
We seem to be doing what the spec says to do here (and we match Minefield). I've sent an email to the HTML working group. This issue is on our "top five" list of behavioral differences like to cause compatibility problems: https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en Let's give the working group a couple days to respond before taking action.
> This test passes in Safari 5 and in Firefox 3.6.3, but not in IE8. The site is also happily broken in IE. The test also fails in Opera 10.54.
I filed a MacRuby bug with the fix for them: https://www.macruby.org/trac/ticket/759
See http://www.w3.org/Bugs/Public/show_bug.cgi?id=9985
*** Bug 41069 has been marked as a duplicate of this bug. ***
Applications linked against the shipping version of WebKit rely on the old parser behavior. There needs to be a way to give that behavior to applications (and perhaps websites) that rely on it. One such application is AIM for Mac.
> Applications linked against the shipping version of WebKit rely on the old parser behavior. Perhaps you'd be interested in <http://trac.webkit.org/browser/trunk/WebKit/mac/WebView/WebPreferenceKeysPrivate.h#L96>? > There needs to be a way to give that behavior to applications (and perhaps websites) that rely on it. One such application is AIM for Mac. Is there a sunset for how long the Mac port is required to support the old parser? It's quite complex, and it's going to bitrot.
(In reply to comment #10) > > Applications linked against the shipping version of WebKit rely on the old parser behavior. > > Perhaps you'd be interested in <http://trac.webkit.org/browser/trunk/WebKit/mac/WebView/WebPreferenceKeysPrivate.h#L96>? I am aware of it, but keeping the legacy parser around is not desirable for obvious reasons. > > There needs to be a way to give that behavior to applications (and perhaps websites) that rely on it. One such application is AIM for Mac. > > Is there a sunset for how long the Mac port is required to support the old parser? It's quite complex, and it's going to bitrot. The Mac port is not required to support the old parser, and in fact I just said that was un undesirble way to address the issue. Perhaps what caused the confusion was that I used the phrase “the old parser behavior”. I was referring specifically to the one aspect of the old parser that this bug is about. I think a good solution would be to add this quirk to the parser and a setting for enabling the quirk.
> Perhaps what caused the confusion was that I used the phrase “the old parser behavior”. I was referring specifically to the one aspect of the old parser that this bug is about. I think a good solution would be to add this quirk to the parser and a setting for enabling the quirk. Here's a document describing the parser changes that we're worried might case compatibility problems for WebKit-only content: https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en Do you think it would be better to have one setting that enabled all the quirks we need, or should we have a separate setting for each one? Same question: do we have a plan for sunsetting these quirks? For example, we could make them available only to applications linked before a certain date and then remove the setting them when that date is sufficiently far in the past. Also, we'll need to test the performance impact of adding these branches to the tokenizer. In general, you can see many of the branches in the benchmark score, so we might need to do something clever.
> Same question: do we have a plan for sunsetting these quirks? > For example, we could make them available only to applications > linked before a certain date and then remove the setting them when > that date is sufficiently far in the past. I don't think we have an equivalent mechanism for Dashboard widgets, so the quirk will likely have to be permanent.
(In reply to comment #12) > > Perhaps what caused the confusion was that I used the phrase “the old parser behavior”. I was referring specifically to the one aspect of the old parser that this bug is about. I think a good solution would be to add this quirk to the parser and a setting for enabling the quirk. > > Here's a document describing the parser changes that we're worried might case compatibility problems for WebKit-only content: > > https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en > > Do you think it would be better to have one setting that enabled all the quirks we need, or should we have a separate setting for each one? I don’t have a strong opinion on this. It might depend on the context in which we’d want to enable the quirks. For now, I can’t think of a reason to separate them. > Same question: do we have a plan for sunsetting these quirks? For example, we could make them available only to applications linked before a certain date and then remove the setting them when that date is sufficiently far in the past. For application compatibility, the quirks should only be enabled for select application bundle IDs (currently the set contains only AIM) and only if they were not linked against WebKit-534.1 or later. There is currently no plan to ever break AIM. For website compatibilty, if site-specific hacks are deployed, they can be revisited at any time. If specific websites are targeted, and those websites are updated to not require the quirk anymore, then it can be removed. If the quirk targets something different (for example, a specific version of some JavaScript library), then it’s harder to tell when to remove it. > Also, we'll need to test the performance impact of adding these branches to the tokenizer. In general, you can see many of the branches in the benchmark score, so we might need to do something clever. Of course. In the worst case, and if we opt for a single “quirks” switch, then templatizing a sufficiently large portion of the tokenizer should eliminate branches on any hot paths.
Sounds like a plan. I might need some help with the patch because I'm not overly familiar with the quirks infrastructure.
We’ve seen several more cases internal to Apple where this change in behavior has impacted existing content. They’ve primarily been web sites, but also email messages generated by reporting systems.
(In reply to comment #16) > We’ve seen several more cases internal to Apple where this change in behavior has impacted existing content. They’ve primarily been web sites, but also email messages generated by reporting systems. That's not overly surprising given that internal Apple content is more likely to be consumed with WebKit than with IE. Presumably these pieces of content are also broken in IE and Firefox 4.
<rdar://problem/8154379>
I offered to write a patch for this bug, but Maciej thought it would be better if someone else wrote the patch to get some experience with how this code works. One approach to fixing this bug is to add a Setting that can be used by the embedder for app-specific compatibility. The setting instructs the HTMLTokenizer to recognize '<' in the TagNameState and to re-consume it in the data state. One subtly is that you'll probably need to emit the current token before re-consuming the '<' character.
(In reply to comment #19) > I offered to write a patch for this bug, but Maciej thought it would be better if someone else wrote the patch to get some experience with how this code works. > > One approach to fixing this bug is to add a Setting that can be used by the embedder for app-specific compatibility. The setting instructs the HTMLTokenizer to recognize '<' in the TagNameState and to re-consume it in the data state. One subtly is that you'll probably need to emit the current token before re-consuming the '<' character. Adam, thanks for the pointer in the right direction. In addition to TagNameState, there were a few other states that needed this quirk to handle cases such as <tag attr<tag> and <tag <tag>, but it is essentially the same change in each state (for <tag attr<tag>, endAttributeName() needs to be called before reconsuming the < in DataState). I have the WebCore part of this patch written, but I still need to actually enable the quirk in WebKit and also write a layout test, which probably means also modifying DRT so that the quirk can be enabled from the test. I'll upload the WebCore part now if you want to take a look at it.
(In reply to comment #20) > I have the WebCore part of this patch written, but I still need to actually enable the quirk in WebKit and also write a layout test, which probably means also modifying DRT so that the quirk can be enabled from the test. On second thought, enabling this quirk is probably outside the scope of this bug. I'll just focus on adding the quirk capability and open other bugs for enabling it in specific cases.
Quirks like this should be easy to test using some iframes and layoutTestController.overridePreference
Created attachment 67516 [details] Patch
Comment on attachment 67516 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=67516&action=prettypatch Overall, looks spot on. Thanks for working on this patch. > WebCore/html/parser/HTMLDocumentParser.cpp:95 > - , m_tokenizer(HTMLTokenizer::create()) > + , m_tokenizer(HTMLTokenizer::create(fragment->document()->settings())) Be sure to add a test case that uses fragment parsing to make sure the setting properly propagates to the fragment parsing code. > WebCore/html/parser/HTMLTokenizer.cpp:106 > -HTMLTokenizer::HTMLTokenizer() > +HTMLTokenizer::HTMLTokenizer(Settings* settings) Do you think passing in the settings object is better than passing in the Document? It's slightly unclear to me which is better. > WebCore/html/parser/HTMLTokenizer.cpp:108 > + , m_useLegacyParserQuirks(settings && settings->useLegacyParserQuirks()) useLegacyTokenizerQuirks? I guess it depends on what granularity we want to be able to control the quirks... Are you sure that frameless documents don't need this quirk? > WebCore/html/parser/HTMLTokenizer.cpp:440 > + else if (cc == '<' && m_useLegacyParserQuirks) I bet you should put m_useLegacyParserQuirks first. That branch will be very well predicted. > WebCore/html/parser/HTMLTokenizer.cpp:919 > + } else if (cc == '<' && m_useLegacyParserQuirks) { > + m_token->endAttributeName(source.numberOfCharactersConsumed()); > + return emitAndReconsumeIn(source, DataState); It would be nice if we could skip this one. The characters used to terminate unquoted attributes are often security sensitive because folks sometimes allow untrusted content in unquoted attributes (even thought that's super scary). By adding "<" as a terminator character, "foo<script>alert(1)</script>" becomes an XSS vector in an unquoted attribute when it otherwise wouldn't be.
Comment on attachment 67516 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=67516&action=prettypatch > WebCore/html/parser/HTMLTokenizer.cpp:108 > + , m_useLegacyParserQuirks(settings && settings->useLegacyParserQuirks()) I disagree. One flag is scary enough. Having N different parser modes is bad news bears. Andy is right to just have one flag IMO.
(In reply to comment #24) > (From update of attachment 67516 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=67516&action=prettypatch > > Overall, looks spot on. Thanks for working on this patch. > > > WebCore/html/parser/HTMLDocumentParser.cpp:95 > > - , m_tokenizer(HTMLTokenizer::create()) > > + , m_tokenizer(HTMLTokenizer::create(fragment->document()->settings())) > Be sure to add a test case that uses fragment parsing to make sure the setting properly propagates to the fragment parsing code. Will do. > > > WebCore/html/parser/HTMLTokenizer.cpp:106 > > -HTMLTokenizer::HTMLTokenizer() > > +HTMLTokenizer::HTMLTokenizer(Settings* settings) > Do you think passing in the settings object is better than passing in the Document? It's slightly unclear to me which is better. My thought was that Settings was more specific to what HTMLTokenizer needs than Document. > > > WebCore/html/parser/HTMLTokenizer.cpp:108 > > + , m_useLegacyParserQuirks(settings && settings->useLegacyParserQuirks()) > useLegacyTokenizerQuirks? I guess it depends on what granularity we want to be able to control the quirks... I'd like to use this same quirk to handle the <script /> case, and I believe that will need to be patched in HTMLTreeBuilder. So, using the work tokenizer might make it too specific. > > Are you sure that frameless documents don't need this quirk? Ahh, so since document()->settings() is really document()->m_frame->settings(), frameless documents would return a NULL Settings. I didn't consider this, so I'll think about the right thing to do. I don't know a lot about frameless documents, but I don't see why they wouldn't want the quirk. > > > WebCore/html/parser/HTMLTokenizer.cpp:440 > > + else if (cc == '<' && m_useLegacyParserQuirks) > I bet you should put m_useLegacyParserQuirks first. That branch will be very well predicted. Will do. > > > WebCore/html/parser/HTMLTokenizer.cpp:919 > > + } else if (cc == '<' && m_useLegacyParserQuirks) { > > + m_token->endAttributeName(source.numberOfCharactersConsumed()); > > + return emitAndReconsumeIn(source, DataState); > It would be nice if we could skip this one. The characters used to terminate unquoted attributes are often security sensitive because folks sometimes allow untrusted content in unquoted attributes (even thought that's super scary). By adding "<" as a terminator character, "foo<script>alert(1)</script>" becomes an XSS vector in an unquoted attribute when it otherwise wouldn't be. This can probably be skipped for now, but I wouldn't be surprised if we find this case in the wild. The markup used by AIM comes close to hitting this case, but has whitespace after the end of the attribute (the attribute value is also quoted). Thanks for the feedback!
(In reply to comment #24) > > WebCore/html/parser/HTMLTokenizer.cpp:919 > > + } else if (cc == '<' && m_useLegacyParserQuirks) { > > + m_token->endAttributeName(source.numberOfCharactersConsumed()); > > + return emitAndReconsumeIn(source, DataState); > It would be nice if we could skip this one. The characters used to terminate unquoted attributes are often security sensitive because folks sometimes allow untrusted content in unquoted attributes (even thought that's super scary). By adding "<" as a terminator character, "foo<script>alert(1)</script>" becomes an XSS vector in an unquoted attribute when it otherwise wouldn't be. If this turns out to be necessary, maybe it could be restricted to quoted attributes.
(In reply to comment #27) > (In reply to comment #24) > > > WebCore/html/parser/HTMLTokenizer.cpp:919 > > > + } else if (cc == '<' && m_useLegacyParserQuirks) { > > > + m_token->endAttributeName(source.numberOfCharactersConsumed()); > > > + return emitAndReconsumeIn(source, DataState); > > It would be nice if we could skip this one. The characters used to terminate unquoted attributes are often security sensitive because folks sometimes allow untrusted content in unquoted attributes (even thought that's super scary). By adding "<" as a terminator character, "foo<script>alert(1)</script>" becomes an XSS vector in an unquoted attribute when it otherwise wouldn't be. > > If this turns out to be necessary, maybe it could be restricted to quoted attributes. I'm confused. Do you mean this case: <div foo="hello<script>alert(1)</script>there"> or this case: <div foo="hello"<script>alert(1)</script> ?
> m_useLegacyParserQuirks If this is going to be a single switch, it still needs a more specific name (maybe PreHTML5?). We won't want to add HTML5 behaviors here when HTML5v2 comes out, but this name will still be suggesting that we should.
(In reply to comment #28) > (In reply to comment #27) > > (In reply to comment #24) > > > > WebCore/html/parser/HTMLTokenizer.cpp:919 > > > > + } else if (cc == '<' && m_useLegacyParserQuirks) { > > > > + m_token->endAttributeName(source.numberOfCharactersConsumed()); > > > > + return emitAndReconsumeIn(source, DataState); > > > It would be nice if we could skip this one. The characters used to terminate unquoted attributes are often security sensitive because folks sometimes allow untrusted content in unquoted attributes (even thought that's super scary). By adding "<" as a terminator character, "foo<script>alert(1)</script>" becomes an XSS vector in an unquoted attribute when it otherwise wouldn't be. > > > > If this turns out to be necessary, maybe it could be restricted to quoted attributes. > > I'm confused. Do you mean this case: > > <div foo="hello<script>alert(1)</script>there"> > > or this case: > > <div foo="hello"<script>alert(1)</script> > > ? The second case. I don't believe the legacy parser emitted a script token in the first case.
> The second case. I don't believe the legacy parser emitted a script token in the first case. Ok. :)
Created attachment 67897 [details] Patch
(In reply to comment #32) > Created an attachment (id=67897) [details] > Patch A few notes: - I couldn't decide how to handle the case of frameless document so I didn't. I'm not aware of a case where this will cause a regression, but now that I think of it I should probably add a FIXME. - I made my best effort to add the new WebPreference to all the ports so they can run the layout tests, but I can't be sure they're all going to compile. I'll try to land this during a quiet period so I can clean up any breakage without causing too much trouble.
Attachment 67897 [details] did not build on gtk: Build output: http://queues.webkit.org/results/4012057
Created attachment 67899 [details] Patch
Attachment 67899 [details] did not build on gtk: Build output: http://queues.webkit.org/results/4043039
Created attachment 67900 [details] Patch
Attachment 67897 [details] did not build on win: Build output: http://queues.webkit.org/results/4029046
Created attachment 67902 [details] Patch
Attachment 67900 [details] did not build on win: Build output: http://queues.webkit.org/results/4063037
Comment on attachment 67902 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=67902&action=prettypatch > WebCore/html/parser/HTMLTokenizer.h:122 > - static PassOwnPtr<HTMLTokenizer> create() { return adoptPtr(new HTMLTokenizer); } > + static PassOwnPtr<HTMLTokenizer> create(Settings* settings) { return adoptPtr(new HTMLTokenizer(settings)); } It’s quite inelegant to tie the tokenizer and parser to document settings in this fashion. It would be quite a bit better to have the parser and tokenizer publish some sort of object or flags word that expresses the quirks to be used, then we could have the code that creates the parser convert the settings into that form. Having the tokenizer read the settings directly is inelegant and I think too big a change to the class. > WebCore/html/parser/HTMLTokenizer.h:268 > + inline bool usePreHTML5ParserQuirks(); Not sure why the existing code includes these “inline” keywords. They don’t do any good that I can see. I don’t think that all pre-HTML5 parser quirks should be grouped in a single flag. I think we should have an explicit flag for each quirk.
Comment on attachment 67902 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=67902&action=prettypatch > WebCore/html/parser/HTMLTokenizer.cpp:1085 > + else if (usePreHTML5ParserQuirks() && cc == '<') > + return emitAndReconsumeIn(source, DataState); It's slightly unclear to me why you chose these states for this quirk. I would have expected it in these states: TagNameState BeforeAttributeNameState AttributeNameState * AfterAttributeNameState * AfterAttributeValueQuotedState I've marked the ones you didn't add with a *. You can test those as follows: <foo bar<qux> <foo bar <qux> > WebCore/html/parser/HTMLTokenizer.cpp:1679 > +inline bool HTMLTokenizer::usePreHTML5ParserQuirks() > +{ > + return (m_settings && m_settings->usePreHTML5ParserQuirks()); > +} Have you run the HTML parsing benchmark on this code? You can run the benchmark by opening the file WebCore/benchmarks/parser/html-parser.html. I suspect you'll get a measurable slowdown from this patch. It's probably faster to read the setting at construction time and put it into a member variable than it is to take the extra m_settings null check all the time.
(In reply to comment #42) > Have you run the HTML parsing benchmark on this code? You can run the benchmark by opening the file WebCore/benchmarks/parser/html-parser.html. I suspect you'll get a measurable slowdown from this patch. It's probably faster to read the setting at construction time and put it into a member variable than it is to take the extra m_settings null check all the time. Reversing the check so the '<' test comes first before checking the setting would probably eliminate the slowdown. But I agree that putting the value of the setting into a data member is probably the right design.
> Reversing the check so the '<' test comes first before checking the setting would probably eliminate the slowdown. But I agree that putting the value of the setting into a data member is probably the right design. Our earlier thinking was that putting the quirk check first would be better since it should be a very well-predicted branch. However, instead of speculating, we can just run the benchmark.
(In reply to comment #41) > (From update of attachment 67902 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=67902&action=prettypatch > > > WebCore/html/parser/HTMLTokenizer.h:122 > > - static PassOwnPtr<HTMLTokenizer> create() { return adoptPtr(new HTMLTokenizer); } > > + static PassOwnPtr<HTMLTokenizer> create(Settings* settings) { return adoptPtr(new HTMLTokenizer(settings)); } > > It’s quite inelegant to tie the tokenizer and parser to document settings in this fashion. > > It would be quite a bit better to have the parser and tokenizer publish some sort of object or flags word that expresses the quirks to be used, then we could have the code that creates the parser convert the settings into that form. Having the tokenizer read the settings directly is inelegant and I think too big a change to the class. This was what I did in my first patch, but the issue was that layout tests enable the quirk after the parser is created, hence the current design. I probably shouldn't have designed for the pathological case, and now that I understand better how WebPreferences work, I think I can write a test that sets the preference then does its work in an iframe to ensure the right ordering. > > > WebCore/html/parser/HTMLTokenizer.h:268 > > + inline bool usePreHTML5ParserQuirks(); > > Not sure why the existing code includes these “inline” keywords. They don’t do any good that I can see. > > I don’t think that all pre-HTML5 parser quirks should be grouped in a single flag. I think we should have an explicit flag for each quirk. There seemed to be a consensus that a single quirk was a better approach. Would there be a case where we'd want to enable the quirk for <tag<tag> but not for <tag <tag>? To me, the three changes to the tokenizer handle variants of a single issue so a single quirk settings makes sense.
(In reply to comment #42) > (From update of attachment 67902 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=67902&action=prettypatch > > > WebCore/html/parser/HTMLTokenizer.cpp:1085 > > + else if (usePreHTML5ParserQuirks() && cc == '<') > > + return emitAndReconsumeIn(source, DataState); > > It's slightly unclear to me why you chose these states for this quirk. I would have expected it in these states: > > TagNameState > BeforeAttributeNameState > AttributeNameState * > AfterAttributeNameState * > AfterAttributeValueQuotedState > > I've marked the ones you didn't add with a *. You can test those as follows: > > <foo bar<qux> I thought you cited a security concern with unquoted attributes? Were you referring specifically to unquoted attribute values? > <foo bar <qux> You're right, I missed this one. > > > WebCore/html/parser/HTMLTokenizer.cpp:1679 > > +inline bool HTMLTokenizer::usePreHTML5ParserQuirks() > > +{ > > + return (m_settings && m_settings->usePreHTML5ParserQuirks()); > > +} > > Have you run the HTML parsing benchmark on this code? You can run the benchmark by opening the file WebCore/benchmarks/parser/html-parser.html. I suspect you'll get a measurable slowdown from this patch. It's probably faster to read the setting at construction time and put it into a member variable than it is to take the extra m_settings null check all the time. That's the design I'm going back to.
Comment on attachment 67902 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=67902&action=prettypatch >>> WebCore/html/parser/HTMLTokenizer.cpp:1085 >>> + return emitAndReconsumeIn(source, DataState); >> >> It's slightly unclear to me why you chose these states for this quirk. I would have expected it in these states: >> >> TagNameState >> BeforeAttributeNameState >> AttributeNameState * >> AfterAttributeNameState * >> AfterAttributeValueQuotedState >> >> I've marked the ones you didn't add with a *. You can test those as follows: >> >> <foo bar<qux> >> <foo bar <qux> > > I thought you cited a security concern with unquoted attributes? Were you referring specifically to unquoted attribute values? Yes. Unquoted attribute values are the problematic ones from a security point of view.
(In reply to comment #47) > (From update of attachment 67902 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=67902&action=prettypatch > > >>> WebCore/html/parser/HTMLTokenizer.cpp:1085 > >>> + return emitAndReconsumeIn(source, DataState); > >> > >> It's slightly unclear to me why you chose these states for this quirk. I would have expected it in these states: > >> > >> TagNameState > >> BeforeAttributeNameState > >> AttributeNameState * > >> AfterAttributeNameState * > >> AfterAttributeValueQuotedState > >> > >> I've marked the ones you didn't add with a *. You can test those as follows: > >> > >> <foo bar<qux> > >> <foo bar <qux> > > > > I thought you cited a security concern with unquoted attributes? Were you referring specifically to unquoted attribute values? > > Yes. Unquoted attribute values are the problematic ones from a security point of view. I guess there's no such thing as a quoted attribute name :) Nevermind...fixed both cases.
Created attachment 67984 [details] Patch
(In reply to comment #44) > > Reversing the check so the '<' test comes first before checking the setting would probably eliminate the slowdown. But I agree that putting the value of the setting into a data member is probably the right design. > > Our earlier thinking was that putting the quirk check first would be better since it should be a very well-predicted branch. However, instead of speculating, we can just run the benchmark. I'll run the benchmark against the latest patch and report what I find.
(In reply to comment #50) > (In reply to comment #44) > > > Reversing the check so the '<' test comes first before checking the setting would probably eliminate the slowdown. But I agree that putting the value of the setting into a data member is probably the right design. > > > > Our earlier thinking was that putting the quirk check first would be better since it should be a very well-predicted branch. However, instead of speculating, we can just run the benchmark. > > I'll run the benchmark against the latest patch and report what I find. I ran the benchmark a total of 6 times in a 64-bit production build of Safari, once with my patch applied and once without. Here is what I found: With patch: avg: 2741.5 stdev: 20.5 avg: 2739.2 stdev: 20.4 avg: 2753.2 stdev: 45.8 Without patch: avg: 2773.3 stdev: 20 avg: 2775.9 stdev: 21.4 avg: 2770.2 stdev: 20.5
Looks like you made it faster by a hair. :)
Comment on attachment 67984 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=67984&action=prettypatch > WebCore/html/parser/HTMLTokenizer.h:123 > - static PassOwnPtr<HTMLTokenizer> create() { return adoptPtr(new HTMLTokenizer); } > + static PassOwnPtr<HTMLTokenizer> create(Settings* settings) { return adoptPtr(new HTMLTokenizer(settings)); } This should take a usePreHTML5ParserQuirks boolean, not a Settings*. I don’t think it’s good layer for HTMLTokenizer to have a dependency on Settings at all. It’s the same kind of layering mistake as taking a Frame* in the updateStateFor function. And the forward declaration of class Element here is unneeded. I suppose we can live with this, but I don’t like it.
Comment on attachment 67984 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=67984&action=prettypatch >> WebCore/html/parser/HTMLTokenizer.h:123 >> + static PassOwnPtr<HTMLTokenizer> create(Settings* settings) { return adoptPtr(new HTMLTokenizer(settings)); } > > This should take a usePreHTML5ParserQuirks boolean, not a Settings*. I don’t think it’s good layer for HTMLTokenizer to have a dependency on Settings at all. > > It’s the same kind of layering mistake as taking a Frame* in the updateStateFor function. And the forward declaration of class Element here is unneeded. > > I suppose we can live with this, but I don’t like it. I'll post a patch to remove Frame from updateStateFor.
> I'll post a patch to remove Frame from updateStateFor. I looked at this, but it seems more error prone / verbose. :(
Attachment 67984 [details] did not build on win: Build output: http://queues.webkit.org/results/4009066
(In reply to comment #55) > > I'll post a patch to remove Frame from updateStateFor. > > I looked at this, but it seems more error prone / verbose. :( Yes, I noticed the same thing. Frame doesn’t express the set of tags allowed in a way that can easily be passed in, so this convenience function is only convenient with Frame*. I still think we might some day figure out a way to fix that, though. Or maybe may desire for “layering” here is misguided.
Created attachment 68055 [details] Patch
(In reply to comment #53) > (From update of attachment 67984 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=67984&action=prettypatch > > > WebCore/html/parser/HTMLTokenizer.h:123 > > - static PassOwnPtr<HTMLTokenizer> create() { return adoptPtr(new HTMLTokenizer); } > > + static PassOwnPtr<HTMLTokenizer> create(Settings* settings) { return adoptPtr(new HTMLTokenizer(settings)); } > > This should take a usePreHTML5ParserQuirks boolean, not a Settings*. I don’t think it’s good layer for HTMLTokenizer to have a dependency on Settings at all. > > It’s the same kind of layering mistake as taking a Frame* in the updateStateFor function. And the forward declaration of class Element here is unneeded. > > I suppose we can live with this, but I don’t like it. The latest patch moves the Settings* dependency up to HTMLDocumentParser.
Attachment 68055 [details] did not build on chromium: Build output: http://queues.webkit.org/results/4092005
Comment on attachment 68055 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68055&action=review You'll want Abarth to review the tokenizer changes. He's more familiar with that part of the parser than I am. > WebCore/html/parser/HTMLDocumentParser.cpp:83 > + , m_tokenizer(HTMLTokenizer::create(document->settings() && document->settings()->usePreHTML5ParserQuirks())) Seems this should be a helper function which takes a Document*. We used this pattern when the HTML5 parser was optional too. > WebCore/html/parser/HTMLDocumentParser.cpp:96 > + , m_tokenizer(HTMLTokenizer::create(fragment->document()->settings() && fragment->document()->settings()->usePreHTML5ParserQuirks())) Would be re-used here. > WebCore/html/parser/HTMLPreloadScanner.cpp:128 > + m_tokenizer = HTMLTokenizer::create(document->settings() && document->settings()->usePreHTML5ParserQuirks()); And here. This seems wrong. We don't need to move the initializer out of the normal c++ pattern. The ASSERT only helps in debug mode. ASSERTing after we've hosed ourselves is just as good as before. > WebCore/html/parser/HTMLViewSourceParser.cpp:37 > + , m_tokenizer(HTMLTokenizer::create(document->settings() && document->settings()->usePreHTML5ParserQuirks())) You want your inline again here. :) Although it's possible you won't make another copy of the inline in this file since it's only used once. When we had the HTML5 parser optional, we had a: static inline useLegacyHTMLParser(Document*) function which did similar to this check and handled the null frame case. WE had two copies. one in the treebuilder and one in the DocumentParser. Seems you need to do the check more often than that, so maybe it needs to be a class method/static on DocumentParser or HTMLDocumentParser? > WebKit/gtk/webkit/webkitwebsettings.cpp:111 > + gboolean enable_pre_html5_parser_quirks; I'm not sure any other port besides Mac wants this. I think this could be a mac-only quirk with mac-only layout tests. We really want this quirk to die ASAP (yes, I know that could be years).
Attachment 68055 [details] did not build on win: Build output: http://queues.webkit.org/results/4082033
(In reply to comment #61) > (From update of attachment 68055 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=68055&action=review > > You'll want Abarth to review the tokenizer changes. He's more familiar with that part of the parser than I am. > > > WebCore/html/parser/HTMLDocumentParser.cpp:83 > > + , m_tokenizer(HTMLTokenizer::create(document->settings() && document->settings()->usePreHTML5ParserQuirks())) > > Seems this should be a helper function which takes a Document*. We used this pattern when the HTML5 parser was optional too. Done. > > > WebKit/gtk/webkit/webkitwebsettings.cpp:111 > > + gboolean enable_pre_html5_parser_quirks; > > I'm not sure any other port besides Mac wants this. I think this could be a mac-only quirk with mac-only layout tests. We really want this quirk to die ASAP (yes, I know that could be years). I guess skipping it on non-Mac ports makes sense since only the Mac port will enable it, although at the time I wrote it I didn't see how having all platforms run the test could hurt. Now that various iterations of this patch are breaking on the EWS bots, I see it as a maintenance burden.
Created attachment 68134 [details] Patch
Comment on attachment 68134 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68134&action=review > WebCore/html/parser/HTMLDocumentParser.h:145 > +bool usePreHTML5ParserQuirks(Document*); Generally, we prefer to have these sorts of functions be static methods of a class: HTMLDocumentParser::usePreHTML5Quirks(...) That makes it easier to find the implementation of the function when you see it called in a random file. > LayoutTests/fast/parser/resources/pre-html5-parser-quirk-after-attribute-value-quoted-state.html:4 > + <p></p attr="value"</div> Can you add a test for <p attr=value</ ?
Committed r67890: <http://trac.webkit.org/changeset/67890>
Created attachment 68198 [details] Patch
Reopening to add additional test coverage.
Comment on attachment 68198 [details] Patch Thanks.
Comment on attachment 68198 [details] Patch Clearing flags on attachment: 68198 Committed r67936: <http://trac.webkit.org/changeset/67936>
All reviewed patches have been landed. Closing bug.