RESOLVED FIXED47748
[chromium] WebPageSerializerImpl duplicates (poorly) a lot of code from MarkupAccumulator
https://bugs.webkit.org/show_bug.cgi?id=47748
Summary [chromium] WebPageSerializerImpl duplicates (poorly) a lot of code from Marku...
Adam Barth
Reported 2010-10-15 15:52:23 PDT
WebPageSerializerImpl duplicates (poorly) a lot of code from MarkupAccumulator
Attachments
work in progress (13.14 KB, patch)
2010-10-15 15:53 PDT, Adam Barth
no flags
more code (25.94 KB, patch)
2010-10-15 16:47 PDT, Adam Barth
no flags
mostly there (32.87 KB, patch)
2010-10-15 17:32 PDT, Adam Barth
no flags
what-what the compile (34.20 KB, patch)
2010-10-18 14:00 PDT, Adam Barth
no flags
patch (seems to work) (36.72 KB, patch)
2010-10-18 15:05 PDT, Adam Barth
abarth: review-
abarth: commit-queue-
Adam Barth
Comment 1 2010-10-15 15:53:08 PDT
Created attachment 70910 [details] work in progress
Adam Barth
Comment 2 2010-10-15 15:55:54 PDT
Whoever write this code didn't understand how the web works. You can see them digging themselves deeper and deeper with each layer of complexity they added to try solve there problems. This is something of a re-write. The major design differences are as follows: 1) Start serializing from document instead of documentElement. This avoids the need to manufacture a bunch of fake markup. 2) Always serialize in UTF-8. This avoid a bunch of complexity about guess the charset wrong. UTF-8 is the one true charset of the web. 3) Instead of trying to produce markup, use the editing code to do the heavy lifting. This code should have no concept of a angle braket.
Adam Barth
Comment 3 2010-10-15 16:47:17 PDT
Created attachment 70918 [details] more code
Adam Barth
Comment 4 2010-10-15 17:32:33 PDT
Created attachment 70929 [details] mostly there
Ryosuke Niwa
Comment 5 2010-10-15 21:08:23 PDT
Comment on attachment 70929 [details] mostly there View in context: https://bugs.webkit.org/attachment.cgi?id=70929&action=review > WebCore/editing/MarkupAccumulator.cpp:342 > - if (shouldResolveURLs() && !element->document()->url().isLocalFile()) > - appendQuotedURLAttributeValue(out, element->document()->completeURL(attribute.value()).string()); > + if (shouldResolveURLs()) > + appendQuotedURLAttributeValue(out, resolveURL(element, attribute)); > else > appendQuotedURLAttributeValue(out, attribute.value()); You may combine these lines and just do: appendQuotedURLAttributeValue(out, shouldResolveURLs() ? resolveURL(element, attribute.value()) : attribute.value()); > WebKit/chromium/src/WebPageSerializerImpl.cpp:153 > + if (element->hasTagName(metaTag) && isCharsetDeclaration(element)) > + return Now you can move this to serializeNodes!
Adam Barth
Comment 6 2010-10-18 12:51:42 PDT
(In reply to comment #5) > (From update of attachment 70929 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=70929&action=review > > > WebCore/editing/MarkupAccumulator.cpp:342 > > - if (shouldResolveURLs() && !element->document()->url().isLocalFile()) > > - appendQuotedURLAttributeValue(out, element->document()->completeURL(attribute.value()).string()); > > + if (shouldResolveURLs()) > > + appendQuotedURLAttributeValue(out, resolveURL(element, attribute)); > > else > > appendQuotedURLAttributeValue(out, attribute.value()); > > You may combine these lines and just do: > appendQuotedURLAttributeValue(out, shouldResolveURLs() ? resolveURL(element, attribute.value()) : attribute.value()); Done. > > WebKit/chromium/src/WebPageSerializerImpl.cpp:153 > > + if (element->hasTagName(metaTag) && isCharsetDeclaration(element)) > > + return > > Now you can move this to serializeNodes! Investigating now that I've rebased to top-of-tree where serializeNodes exists.
Adam Barth
Comment 7 2010-10-18 12:58:07 PDT
Comment on attachment 70929 [details] mostly there View in context: https://bugs.webkit.org/attachment.cgi?id=70929&action=review >>> WebKit/chromium/src/WebPageSerializerImpl.cpp:153 >>> + virtual void appendElement(Vector<UChar>& out, Element* element, Namespaces* namespaces) >>> + { >>> + if (element->hasTagName(metaTag) && isCharsetDeclaration(element)) >>> + return >> >> Now you can move this to serializeNodes! > > Investigating now that I've rebased to top-of-tree where serializeNodes exists. So, that would work well for the <meta> case, but it doesn't work for the <base> case. In the <base> case, we actually and to re-write the serialization of the tag instead of omitting it. I'm not sure its worth adding that complexity to the machine just for the <meta> case.
Ryosuke Niwa
Comment 8 2010-10-18 13:28:49 PDT
(In reply to comment #7) > (From update of attachment 70929 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=70929&action=review > > >>> WebKit/chromium/src/WebPageSerializerImpl.cpp:153 > >>> + virtual void appendElement(Vector<UChar>& out, Element* element, Namespaces* namespaces) > >>> + { > >>> + if (element->hasTagName(metaTag) && isCharsetDeclaration(element)) > >>> + return > >> > >> Now you can move this to serializeNodes! > > > > Investigating now that I've rebased to top-of-tree where serializeNodes exists. > > So, that would work well for the <meta> case, but it doesn't work for the <base> case. In the <base> case, we actually and to re-write the serialization of the tag instead of omitting it. I'm not sure its worth adding that complexity to the machine just for the <meta> case. I see. I guess it's to okay to add it to appendElement then. But we might want to comment that this is fine only because meta & base do not have end tags.
Adam Barth
Comment 9 2010-10-18 14:00:50 PDT
Created attachment 71078 [details] what-what the compile
Adam Barth
Comment 10 2010-10-18 15:05:34 PDT
Created attachment 71090 [details] patch (seems to work)
Ryosuke Niwa
Comment 11 2010-10-18 15:16:32 PDT
Comment on attachment 71090 [details] patch (seems to work) View in context: https://bugs.webkit.org/attachment.cgi?id=71090&action=review Great to see a lot of code being removed! > WebCore/editing/MarkupAccumulator.cpp:367 > + appendQuotedURLAttributeValue(out, shouldResolveURLs() ? resolveURL(element, attribute) : attribute.value().string()); Nice change. > WebKit/chromium/src/WebPageSerializerImpl.cpp:117 > + bool contains(const String& url) Can you make contains const? > WebKit/chromium/src/WebPageSerializerImpl.cpp:161 > + if (element->hasTagName(metaTag) && isCharsetDeclaration(static_cast<HTMLMetaElement*>(element))) > + return; // Notice that this works because <meta> is self-closing! Why did you remove assertion here? > WebKit/chromium/src/WebPageSerializerImpl.cpp:207 > + String markOfTheWeb = WebPageSerializer::generateMarkOfTheWebDeclaration(url); You can probably move generateMarkOfTheWebDeclaration into this class in the future. In fact, everything in WebPageSerializer.cpp seems to belong to this file. I don't even understand why we have interface + impl for this class given there's only one implementation of the interface. > WebKit/chromium/src/WebPageSerializerImpl.cpp:305 > + KURL mainURL = m_webFrame->frame()->document()->url(); I don't like the name mainURL but I guess we should change that later.
Adam Barth
Comment 12 2010-10-18 15:27:13 PDT
Comment on attachment 71090 [details] patch (seems to work) View in context: https://bugs.webkit.org/attachment.cgi?id=71090&action=review >> WebKit/chromium/src/WebPageSerializerImpl.cpp:117 >> + bool contains(const String& url) > > Can you make contains const? Sure. >> WebKit/chromium/src/WebPageSerializerImpl.cpp:161 >> + return; // Notice that this works because <meta> is self-closing! > > Why did you remove assertion here? Because it failed in testing. I think I was calling the wrong function to tell if <meta> gets a close tag, or maybe I was calling it in the wrong way? >> WebKit/chromium/src/WebPageSerializerImpl.cpp:207 >> + String markOfTheWeb = WebPageSerializer::generateMarkOfTheWebDeclaration(url); > > You can probably move generateMarkOfTheWebDeclaration into this class in the future. In fact, everything in WebPageSerializer.cpp seems to belong to this file. I don't even understand why we have interface + impl for this class given there's only one implementation of the interface. The interface is part of the Chromium WebKit API. I'm not sure why we have an cpp file for it though. >> WebKit/chromium/src/WebPageSerializerImpl.cpp:305 >> + KURL mainURL = m_webFrame->frame()->document()->url(); > > I don't like the name mainURL but I guess we should change that later. I'm happy to change it now if you have a suggestion for a better name.
Ryosuke Niwa
Comment 13 2010-10-18 15:33:13 PDT
(In reply to comment #12) > >> WebKit/chromium/src/WebPageSerializerImpl.cpp:161 > >> + return; // Notice that this works because <meta> is self-closing! > > > > Why did you remove assertion here? > > Because it failed in testing. I think I was calling the wrong function to tell if <meta> gets a close tag, or maybe I was calling it in the wrong way? Try ASSERT(!node->hasChildNodes() && elementCannotHaveEndTag(node)); shouldSelfClose checks a bunch of extra conditions. Still, it's weird that we the assertion failed. We might have some bug there... > The interface is part of the Chromium WebKit API. I'm not sure why we have an cpp file for it though. I see. I think we can still get rid of WebPageSerializer.cpp later. > >> WebKit/chromium/src/WebPageSerializerImpl.cpp:305 > >> + KURL mainURL = m_webFrame->frame()->document()->url(); > > > > I don't like the name mainURL but I guess we should change that later. > > I'm happy to change it now if you have a suggestion for a better name. urlOfMainFrame or urlOfMainDocument?
Adam Barth
Comment 14 2010-10-18 15:42:50 PDT
> Try ASSERT(!node->hasChildNodes() && elementCannotHaveEndTag(node)); shouldSelfClose checks a bunch of extra conditions. Still, it's weird that we the assertion failed. We might have some bug there... That assert will fail if someone inserts an element as a child of <meta> using the DOM. Will we serialize an end tag in that case? > > >> WebKit/chromium/src/WebPageSerializerImpl.cpp:305 > > >> + KURL mainURL = m_webFrame->frame()->document()->url(); > > > > > > I don't like the name mainURL but I guess we should change that later. > > > > I'm happy to change it now if you have a suggestion for a better name. > > urlOfMainFrame or urlOfMainDocument? Ok. I'll try something like that.
Ryosuke Niwa
Comment 15 2010-10-18 15:43:45 PDT
(In reply to comment #14) > > urlOfMainFrame or urlOfMainDocument? > > Ok. I'll try something like that. How about urlOfRootDocument?
Eric Seidel (no email)
Comment 16 2010-10-28 13:45:54 PDT
Comment on attachment 71090 [details] patch (seems to work) LGTM.
Adam Barth
Comment 17 2010-11-07 16:28:01 PST
Comment on attachment 71090 [details] patch (seems to work) I'm going to land this by hand.
Eric Seidel (no email)
Comment 18 2010-12-20 22:42:41 PST
@abarth: ping?
Adam Barth
Comment 19 2010-12-20 23:40:02 PST
Comment on attachment 71090 [details] patch (seems to work) This patch turned out to be subtly wrong. I've lost context about how, sadly.
Eric Seidel (no email)
Comment 20 2012-02-08 16:08:47 PST
Sad to see this old patch die.
Ryosuke Niwa
Comment 21 2012-02-08 16:10:00 PST
(In reply to comment #20) > Sad to see this old patch die. I think abarth fixed this bug elsewhere. I see MarkupAccumulator used in WebPageSerializerImpl now.
Note You need to log in before you can comment on or make changes to this bug.