WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
58947
Add a way to serialize a WebView back to HTML
https://bugs.webkit.org/show_bug.cgi?id=58947
Summary
Add a way to serialize a WebView back to HTML
Jay Civelli
Reported
2011-04-19 17:23:37 PDT
We need an API that serializes the content of a WebView (all frames) back to HTML and also retrieves the resources for that page. That would allow to save a page to a self-contained file (ex: MHTML)
Attachments
Initial patch
(53.96 KB, patch)
2011-04-19 17:34 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Fixing style
(53.89 KB, patch)
2011-04-19 18:28 PDT
,
Jay Civelli
abarth
: review-
Details
Formatted Diff
Diff
Moved code to WebCore.
(53.33 KB, patch)
2011-04-21 11:27 PDT
,
Jay Civelli
abarth
: review-
Details
Formatted Diff
Diff
Addressed Adam's and Alexey's comments
(55.72 KB, patch)
2011-04-21 14:59 PDT
,
Jay Civelli
abarth
: review-
Details
Formatted Diff
Diff
Now keeping original files encoding
(59.23 KB, patch)
2011-04-22 13:49 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Only specifying the charset if the page does not do so.
(61.06 KB, patch)
2011-04-22 17:43 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Addressing Alexey's comments
(66.05 KB, patch)
2011-04-25 12:29 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch
(65.61 KB, patch)
2011-04-26 01:09 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch
(64.49 KB, patch)
2011-04-26 11:03 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch
(64.50 KB, patch)
2011-04-26 11:36 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch
(64.50 KB, patch)
2011-04-26 12:09 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch
(64.50 KB, patch)
2011-04-26 17:33 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch
(64.44 KB, patch)
2011-04-26 17:45 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch for landing
(64.44 KB, patch)
2011-04-26 23:00 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch for landing
(64.43 KB, patch)
2011-04-26 23:04 PDT
,
Jay Civelli
no flags
Details
Formatted Diff
Diff
Patch for landing
(64.43 KB, patch)
2011-04-27 08:51 PDT
,
Jay Civelli
commit-queue
: commit-queue-
Details
Formatted Diff
Diff
patch (updated to avoid Android.mk)
(63.96 KB, patch)
2011-04-28 02:13 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Show Obsolete
(16)
View All
Add attachment
proposed patch, testcase, etc.
Jay Civelli
Comment 1
2011-04-19 17:34:29 PDT
Created
attachment 90280
[details]
Initial patch
WebKit Review Bot
Comment 2
2011-04-19 17:46:00 PDT
Attachment 90280
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebKit/chromium/tests/data/htmlserializer/top_frame.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Source/WebKit/chromium/tests/data/htmlserializer/blank_frames.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Source/WebKit/chromium/tests/data/htmlserializer/iframe2.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Total errors found: 58 in 18 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 3
2011-04-19 18:07:16 PDT
Attachment 90280
[details]
did not build on mac: Build output:
http://queues.webkit.org/results/8471501
Jay Civelli
Comment 4
2011-04-19 18:28:49 PDT
Created
attachment 90288
[details]
Fixing style
WebKit Review Bot
Comment 5
2011-04-19 20:28:14 PDT
Attachment 90288
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8475511
Adam Barth
Comment 6
2011-04-19 22:28:31 PDT
Comment on
attachment 90288
[details]
Fixing style View in context:
https://bugs.webkit.org/attachment.cgi?id=90288&action=review
This looks great. The comments below are just a ton of nits. Once we get this working, we should remove the Save Web Page As code in favor of this code path.
> Source/WebCore/editing/MarkupAccumulator.h:82 > + virtual void appendCustomAttributes(Vector<UChar>&, Element*, Namespaces*) { }
Please don't put implementations of virtual methods in headers. That just makes the linker work hard.
> Source/WebKit/chromium/public/WebHTMLSerializer.h:46 > +// In the serialized version, any JavaScript tag or link is removed (event
JavaScript tag -> script tags link -> ???
> Source/WebKit/chromium/public/WebHTMLSerializer.h:57 > + SerializedResource() { } > + SerializedResource(const WebURL& url, const WebCString& mimeType, const WebCString& data) > + : url(url), mimeType(mimeType), data(data) { }
This isn't proper WebKit style.
> Source/WebKit/chromium/public/WebHTMLSerializer.h:62 > + WEBKIT_API static void serialize(WebView*, WebVector<SerializedResource>* resources);
This feels like a method of WebView rather than a static, but those are questions for fishd.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:61 > +static const WebCore::QualifiedName* ignoredTags[] = { > + &HTMLNames::scriptTag, > + &HTMLNames::noscriptTag, > +};
These could be references instead of pointers.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:72 > +class WebHTMLSerializerImpl::SerializerMarkupAccumulator : public MarkupAccumulator {
I'd just use a normal class name and put all this stuff in an anonymous namespace.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:99 > + : MarkupAccumulator(nodes, AbsoluteURLs), m_serializer(serializer)
This should be on two lines for proper WebKit style.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:118 > + // FIXME: for object (plugins) tags and video tag we could replace them by an image fo their current contents.
fo -> of
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:124 > + if (!element->hasTagName(HTMLNames::iframeTag) && !element->hasTagName(HTMLNames::frameTag)) > + return;
What about other FrameOwnerElement subclasses, like <object> ?
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:129 > + if (url.isValid() && url != blankURL()) > + return;
comparing against blankURL isn't right. You should look to see if it has the scheme "about"
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:132 > + url = m_serializer->getURLForBlankFrame(frameElement->contentFrame());
getURLForBlankFrame ->urlForBlankFrame (or something... we don't start function names with "get").
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:133 > + RefPtr<Attribute> attribute = Attribute::create(HTMLNames::srcAttr, AtomicString(url.string()));
You don't need to call AtomicString explicitly.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:144 > + : m_resources(resources), m_blankFrameCounter(0)
Same style problem.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:154 > + if (!url.isValid() || url == blankURL()) > + // For blank frames we generate a fake URL so they can be referenced by their containing frame. > + url = getURLForBlankFrame(frame);
Multi-line ifs should have { } even if they only have one statement. Also, same blankURL bug.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:180 > + KURL url = document->completeURL(imageElem->getAttribute(HTMLNames::srcAttr));
This won't be the URL of the image in the case where there's a redirect.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:183 > + } else if (element->hasTagName(HTMLNames::linkTag) && equalIgnoringCase(element->getAttribute(HTMLNames::typeAttr), "text/css")) {
This is wrong. You want to know whether the rel attribute contains stylesheet. You should be able to ask the HTMLLinkElement if it's a stylesheet.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:186 > + if (sheet->isCSSStyleSheet()) {
Yeah, looks like you've already do that. You should remove the typeAttr check.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:189 > + // Note that serializeCSSStyleSheet took care of adding the style sheet to the resources vector.
Can you change this into an ASSERT?
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:200 > + Frame* childFrame = frame->tree()->firstChild(); > + while (childFrame) {
This looks like a for loop in disguise.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:247 > + String mimeType = MIMETypeRegistry::getMIMETypeForExtension(image->image()->filenameExtension());
This is wrong.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:297 > + String url("
http://chromium_phony_frame/
");
Oh man. That's not a good idea. How about "wyciwyg:" as the stem?
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.h:60 > + class SerializerMarkupAccumulator; > + friend class SerializerMarkupAccumulator;
This doen't necessarily need to be a friend. We already have the API layer as a shield.
> Source/WebKit/chromium/src/WebHTMLSerializerImpl.h:62 > + // Serializes the style-sheet back to text and adds it to the resources if URL is not-empty.
style-sheet => stylesheet
> Source/WebKit/chromium/tests/WebHTMLSerializerTest.cpp:61 > + WebHTMLSerializerTest() : m_htmlMimeType(WebString::fromUTF8("text/html")), m_cssMimeType(WebString::fromUTF8("text/css")), m_pngMimeType(WebString::fromUTF8("img/png")) { }
Same style problem.
Alexey Proskuryakov
Comment 7
2011-04-19 23:55:34 PDT
Subclassing WebCore classes is not how we customize behavior from WebKit. This should be a client or a strategy call. Can you explain what the end goal here is? One that I can think of sounds questionable, but maybe it's just me.
Adam Barth
Comment 8
2011-04-20 00:04:48 PDT
> Subclassing WebCore classes is not how we customize behavior from WebKit. This should be a client or a strategy call.
That's not really what's going on in this patch. No WebCore behavior is being customized. There's a larger question of whether this code should be in WebCore or WebKit. In discussing this with Jay, I think moving more of it to WebCore would make sense as this logic isn't overly specific to Chromium.
> Can you explain what the end goal here is? One that I can think of sounds questionable, but maybe it's just me.
This code supports a browser feature much like the "Save as web archive" feature in Safari. Currently Chrome has a "Save as web page (complete)", but this is a different approach to that feature. I'd like to replace the existing approach (see
http://trac.webkit.org/browser/trunk/Source/WebKit/chromium/src/WebPageSerializerImpl.cpp
but please don't cry) with this approach, but that might or might not happen.
Alexey Proskuryakov
Comment 9
2011-04-20 09:07:31 PDT
> That's not really what's going on in this patch. No WebCore behavior is being customized.
If you mean that nothing observable from DOM is being customized, then sure. But that's not the point. Having a virtual method that's being overridden from WebKit (instead of using a client call) is exactly what is wrong about this patch technically - there is no exception like "we can avoid using a client-based design if we're implementing a feature that's only accessible from browser UI". I suspect that you may feel uneasy about clients since they are commonly per-frame. Well, since we're exposing serializer as a separate WebCore "API", then there can be per-serializer clients, too. Another approach is of course to have the logic in WebCore and to set a boolean flag in serializer instance.
> This code supports a browser feature much like the "Save as web archive" feature in Safari. Currently Chrome has a "Save as web page (complete)", but this is a different approach to that feature.
Can you describe what the difference with the desired Chromium behavior is? Can that be something that all WebKit-based browsers want?
Adam Barth
Comment 10
2011-04-20 09:28:32 PDT
> If you mean that nothing observable from DOM is being customized, then sure. But that's not the point. Having a virtual method that's being overridden from WebKit (instead of using a client call) is exactly what is wrong about this patch technically
That's exactly what a client is, right? The only difference is that clients are pure virtual.
> Can you describe what the difference with the desired Chromium behavior is?
Your question isn't well-posed. The difference between the desired Chromium behavior and what?
> Can that be something that all WebKit-based browsers want?
Potentially. There isn't really a "correct" way to save a live web page into a file. There are a bunch of trade-offs. This approach represents one set of trade-offs. The existing code represents a different set of trade-offs. Many browsers have this sort of save-as feature, so it might well be valuable to other WebKit-based browsers, depending on how they view the various trade-offs.
Alexey Proskuryakov
Comment 11
2011-04-20 10:10:54 PDT
> That's exactly what a client is, right? The only difference is that clients are pure virtual.
Yes, it's a major distinction. If you don't like this design principle, this bug is not a place to discuss it.
> Your question isn't well-posed. The difference between the desired Chromium behavior and what?
I can't do anything to make the question any more clear. You said that "this is a different approach", I asked what the difference was, and now you are asking "difference with what?"
> Potentially. There isn't really a "correct" way to save a live web page into a file.
Agreed. We have tons of customizations of this kind in WebCore, sometimes implemented with client calls, other times with preferences. See e.g. platform editing behaviors, where WebCore is customized to do different things for the same user actions. I'm asking someone to describe the desired behavior in plain English, to avoid trying to guess the purpose from poorly commented and admittedly bad ("but please don't cry ") code. I presume this is not a trade secret.
Adam Barth
Comment 12
2011-04-20 11:05:19 PDT
One major difference between WebPageSerializer and WebHTMLSerializer is that WebPageSerializer includes inline script whereas WebHTMLSerializer does not. Also, WebHTMLSerializer might replace a canvas with a static image whereas WebPageSerializer would not. Generalizing WebHTMLSerializer's perspective is that you want a static version of the page (but not a screenshot because you want text-reflow, etc, to work). WebPageSerializer is trying to create a local version of the page that works as closely as possible to the hosted version. WebPageSerializer roughly uses the same approach as Firefox's "save complete web page", which has a number of unfixable bugs: <script> document.write("This text should appear once"); </script> In WebPageSerializer, the text will appear twice but with WebHTMLSerializer the text appears only once. <button>Hello</button> <script> document.getElementsByTagName("button")[0].addEventListender("click", function () { alert("Hi there"); }, false); </script> In WebPageSerializer, the button works but with WebHTMLSerializer the button does not. Now, aside from the differences in approach, WebPageSerializer has very poor code quality and lots of bugs. I wasn't involved in reviewing the patch that added it, and I think the project should not have accepted it at that level of quality. I re-wrote it from scratch a number of months ago, but never got around to finishing it because other higher-priority things came up. Do you have other questions?
Alexey Proskuryakov
Comment 13
2011-04-20 11:27:13 PDT
Thank you - that's a great explanation, and I think that it's good idea to consider keeping more of the code in WebCore.
Jay Civelli
Comment 14
2011-04-21 11:25:34 PDT
Comment on
attachment 90288
[details]
Fixing style View in context:
https://bugs.webkit.org/attachment.cgi?id=90288&action=review
>> Source/WebCore/editing/MarkupAccumulator.h:82 >> + virtual void appendCustomAttributes(Vector<UChar>&, Element*, Namespaces*) { } > > Please don't put implementations of virtual methods in headers. That just makes the linker work hard.
Done.
>> Source/WebKit/chromium/public/WebHTMLSerializer.h:46 >> +// In the serialized version, any JavaScript tag or link is removed (event > > JavaScript tag -> script tags > > link -> ???
Corrected.
>> Source/WebKit/chromium/public/WebHTMLSerializer.h:57 >> + : url(url), mimeType(mimeType), data(data) { } > > This isn't proper WebKit style.
Move to WebPageSerialization.h, constructor actually removed.
>> Source/WebKit/chromium/public/WebHTMLSerializer.h:62 >> + WEBKIT_API static void serialize(WebView*, WebVector<SerializedResource>* resources); > > This feels like a method of WebView rather than a static, but those are questions for fishd.
Per fishd, moved it to WebPageSerializer.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:61 >> +}; > > These could be references instead of pointers.
makes compiler unhappy: "Array of references are illegal"
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:72 >> +class WebHTMLSerializerImpl::SerializerMarkupAccumulator : public MarkupAccumulator { > > I'd just use a normal class name and put all this stuff in an anonymous namespace.
OK, done.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:99 >> + : MarkupAccumulator(nodes, AbsoluteURLs), m_serializer(serializer) > > This should be on two lines for proper WebKit style.
Done.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:118 >> + // FIXME: for object (plugins) tags and video tag we could replace them by an image fo their current contents. > > fo -> of
Done.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:124 >> + return; > > What about other FrameOwnerElement subclasses, like <object> ?
Ah, good point. Now dealing with that.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:129 >> + return; > > comparing against blankURL isn't right. You should look to see if it has the scheme "about"
Done.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:132 >> + url = m_serializer->getURLForBlankFrame(frameElement->contentFrame()); > > getURLForBlankFrame ->urlForBlankFrame (or something... we don't start function names with "get").
Done
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:133 >> + RefPtr<Attribute> attribute = Attribute::create(HTMLNames::srcAttr, AtomicString(url.string())); > > You don't need to call AtomicString explicitly.
Removed.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:144 >> + : m_resources(resources), m_blankFrameCounter(0) > > Same style problem.
Fixed.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:154 >> + url = getURLForBlankFrame(frame); > > Multi-line ifs should have { } even if they only have one statement. Also, same blankURL bug.
Done.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:180 >> + KURL url = document->completeURL(imageElem->getAttribute(HTMLNames::srcAttr)); > > This won't be the URL of the image in the case where there's a redirect.
I think that's OK. All that matters is that the image URL in the serialized frame matches the URL for the image we store. We don't have to resolve the redirects.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:183 >> + } else if (element->hasTagName(HTMLNames::linkTag) && equalIgnoringCase(element->getAttribute(HTMLNames::typeAttr), "text/css")) { > > This is wrong. You want to know whether the rel attribute contains stylesheet. You should be able to ask the HTMLLinkElement if it's a stylesheet.
Removed.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:189 >> + // Note that serializeCSSStyleSheet took care of adding the style sheet to the resources vector. > > Can you change this into an ASSERT?
OK.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:200 >> + while (childFrame) { > > This looks like a for loop in disguise.
Disguise removed.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:247 >> + String mimeType = MIMETypeRegistry::getMIMETypeForExtension(image->image()->filenameExtension()); > > This is wrong.
OK, now using the mime type from the response that created the resource. Is that better?
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.cpp:297 >> + String url("
http://chromium_phony_frame/
"); > > Oh man. That's not a good idea. How about "wyciwyg:" as the stem?
OK.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.h:60 >> + friend class SerializerMarkupAccumulator; > > This doen't necessarily need to be a friend. We already have the API layer as a shield.
Not a friend anymore.
>> Source/WebKit/chromium/src/WebHTMLSerializerImpl.h:62 >> + // Serializes the style-sheet back to text and adds it to the resources if URL is not-empty. > > style-sheet => stylesheet
Done.
>> Source/WebKit/chromium/tests/WebHTMLSerializerTest.cpp:61 >> + WebHTMLSerializerTest() : m_htmlMimeType(WebString::fromUTF8("text/html")), m_cssMimeType(WebString::fromUTF8("text/css")), m_pngMimeType(WebString::fromUTF8("img/png")) { } > > Same style problem.
Fixed.
Jay Civelli
Comment 15
2011-04-21 11:27:05 PDT
Created
attachment 90562
[details]
Moved code to WebCore. As discussed, I moved this code to WebCore/page.
Adam Barth
Comment 16
2011-04-21 12:04:52 PDT
Comment on
attachment 90562
[details]
Moved code to WebCore. View in context:
https://bugs.webkit.org/attachment.cgi?id=90562&action=review
This looks great. A few minor comments.
> Source/WebCore/page/PageSerializer.cpp:56 > +namespace {
Apparently we're supposed to use the static keyword in favor of anonymous namespaces, but I see that you're using this because you want the SerializerMarkupAccumulator class to have internal linkage.
> Source/WebCore/page/PageSerializer.cpp:127 > + url = m_serializer->urlForBlankFrame(frame); > + RefPtr<Attribute> attribute = Attribute::create(HTMLNames::srcAttr, url.string()); > + appendAttribute(out, element, *attribute, namespaces);
This isn't quite correct in the case of an <object> tag, which uses the data attribute for the frame's URL. HTMLFrameElementBase::isURLAttribute exists to solve this problem in other places. See also HTMLObjectElement::imageSourceAttributeName for a slightly different solution to a similar problem.
> Source/WebCore/page/PageSerializer.cpp:146 > +
extra blank line here
> Source/WebCore/page/PageSerializer.cpp:163 > + if (!url.isValid() || url == blankURL()) {
This should use the protocolIs check instead of comparing against blankURL.
> Source/WebCore/page/PageSerializer.cpp:168 > + if (m_resourceURLs.contains(url)) {
Maybe we should convert all the URLs be unique?
> Source/WebCore/page/PageSerializer.cpp:191 > + HTMLImageElement* imageElem = static_cast<HTMLImageElement*>(element);
imageElem => imageElement (we don't like abbreviations)
> Source/WebCore/page/PageSerializer.cpp:193 > + CachedImage* cachedImg = imageElem->cachedImage();
cachedImage.
> Source/WebCore/page/PageSerializer.cpp:199 > + KURL url = document->completeURL(linkElement->getAttribute(HTMLNames::hrefAttr));
This isn't really correct. There might have been a redirect. You probably want StyleSheet::finalURL, but that might cause problems when loading the style sheet... I guess it depends on how we deserialize these. I'm not sure what to do here.
> Source/WebCore/page/PageSerializer.cpp:207 > + serializeCSSStyleSheet(static_cast<CSSStyleSheet*>(sheet), KURL());
What about style attributes? Seems like you'll want to grab the images and such out of those as well.
> Source/WebCore/page/PageSerializer.cpp:217 > + String cssText;
We should use StringBuilder instead. Calling append a lot on String is slow.
> Source/WebCore/page/PageSerializer.cpp:307 > + String url("wyciwyg://frame/"); > + url.append(String::number(m_blankFrameCounter++)); > + KURL fakeURL(ParsedURLString, url);
You should use makeString instead of calling append. I'm not sure it matters, but it's the new hotness.
> Source/WebCore/page/PageSerializer.h:92 > + // The list of resources retreived so far. > + Vector<Resource>* m_resources; > + > + // The list of URLs we have retrieved so far, use to ensure we only retrieved each resource once. > + ListHashSet<KURL> m_resourceURLs; > + > + // The URLs used to identify blank frames. > + HashMap<Frame*, KURL> m_blankFrameURLs; > + > + // Used to generate unique fake URLs for blank frames. > + unsigned m_blankFrameCounter;
We usually skip these comments and remove all the blank lines between member variables.
> Source/WebKit/chromium/src/WebPageSerializer.cpp:195 > + resource.data = WebCString(iter->data->data(), iter->data->size());
This creates a copy of all the images? There isn't a way to just take a reference to the SharedData ?
Adam Barth
Comment 17
2011-04-21 12:06:16 PDT
You should add the new WebCore files to all the other build systems.
Alexey Proskuryakov
Comment 18
2011-04-21 12:26:27 PDT
Comment on
attachment 90562
[details]
Moved code to WebCore. View in context:
https://bugs.webkit.org/attachment.cgi?id=90562&action=review
Looks pretty good to me, too. The new serializer uses utf-8 for all content, correct? That seems less than perfect, since page behavior does depend on encoding (most commonly to encode forms and request URLs). Is the encoding transferred out of band somewhere?
>> Source/WebCore/page/PageSerializer.cpp:56 >> +namespace { > > Apparently we're supposed to use the static keyword in favor of anonymous namespaces, but I see that you're using this because you want the SerializerMarkupAccumulator class to have internal linkage.
We usually don't care about linkage utility classes in WebCore namespace. So I suggest just removing the anonymous namespace.
> Source/WebCore/page/PageSerializer.cpp:138 > +PageSerializer::Resource::Resource() { }
Braces should be on separate lines.
> Source/WebCore/page/PageSerializer.cpp:171 > + // FIXME: we could have 2 frame with the same URL but which were dynamically changed and have now > + // different content. So we should serialize both and somehow rename the frame src in the > + // containing frame. Arg!
We don't align code and comments like that, please just use one space.
> Source/WebCore/page/PageSerializer.cpp:235 > + // FIXME: add support for font face rule. It is not clear to me at this point if the actual otf/eot file can > + // be retrieved from the CSSFontFaceRule object.
Same comment about indentation.
> Source/WebCore/page/PageSerializer.cpp:305 > + String url("wyciwyg://frame/");
What will WebCore do with this wyciwyg scheme later? The result of this serialization seems useless to non-Chromium ports, and this should either be addressed, or made clear through naming and use of #ifs.
> Source/WebCore/page/PageSerializer.cpp:316 > +} // namespace WebCore > + > + > +
Lots of empty lines here. I'm not a fan of comments like "namespace WebCore" - most of the time you don't question file structure, so they are useless and add visual noise. And when you do, they are likely to be misleading - the only way to match braces is to use your text editor features, comments can't tell the truth.
> Source/WebCore/page/PageSerializer.h:51 > +// It serializes all the page frames and retreives resources such as images and CSS stylesheets.
Typo: "retreives".
> Source/WebCore/page/PageSerializer.h:59 > + Resource(const KURL&, const CString& mimeType, PassRefPtr<SharedBuffer> data);
Please don't use CString here. It's a fairly strange a dangerous class.
> Source/WebCore/page/PageSerializer.h:82 > + // The list of resources retreived so far.
Typo: "retreived". I'm not sure if this comment adds any information.
Adam Barth
Comment 19
2011-04-21 13:43:49 PDT
Comment on
attachment 90562
[details]
Moved code to WebCore. View in context:
https://bugs.webkit.org/attachment.cgi?id=90562&action=review
>> Source/WebCore/page/PageSerializer.cpp:305 >> + String url("wyciwyg://frame/"); > > What will WebCore do with this wyciwyg scheme later? The result of this serialization seems useless to non-Chromium ports, and this should either be addressed, or made clear through naming and use of #ifs.
wyciwyg doesn't mean anything special to Chromium either. I'm not sure how this is supposed to work. Maybe the code that uses the serialized version is supposed to re-write these URLs to be local to the disk? I just recommended that Jay use wyciwyg because Firefox uses that scheme for a similar purpose.
Jay Civelli
Comment 20
2011-04-21 14:58:02 PDT
Comment on
attachment 90562
[details]
Moved code to WebCore. View in context:
https://bugs.webkit.org/attachment.cgi?id=90562&action=review
>> Source/WebCore/page/PageSerializer.cpp:127 >> + appendAttribute(out, element, *attribute, namespaces); > > This isn't quite correct in the case of an <object> tag, which uses the data attribute for the frame's URL. HTMLFrameElementBase::isURLAttribute exists to solve this problem in other places. See also HTMLObjectElement::imageSourceAttributeName for a slightly different solution to a similar problem.
I see. Since there is no getURLAttribute method I ended up iterating over the attributes and calling isURLAttribute to figure out the right one. Less than ideal, but I am not sure what else to do (short of introducing a new getURLAttribute method to Element).
>> Source/WebCore/page/PageSerializer.cpp:146 >> + > > extra blank line here
Removed.
>> Source/WebCore/page/PageSerializer.cpp:163 >> + if (!url.isValid() || url == blankURL()) { > > This should use the protocolIs check instead of comparing against blankURL.
Switched to protocolIs("about"). Do you know if there could be cases where an frame loads something other than about:blank?
>> Source/WebCore/page/PageSerializer.cpp:168 >> + if (m_resourceURLs.contains(url)) { > > Maybe we should convert all the URLs be unique?
Yeah, we could unique frame URLs but that would require some more changes to MarkupAccumulator. I'll do that in a subsequent patch if that's OK with you.
>> Source/WebCore/page/PageSerializer.cpp:191 >> + HTMLImageElement* imageElem = static_cast<HTMLImageElement*>(element); > > imageElem => imageElement (we don't like abbreviations)
Done.
>> Source/WebCore/page/PageSerializer.cpp:193 >> + CachedImage* cachedImg = imageElem->cachedImage(); > > cachedImage.
Done.
>> Source/WebCore/page/PageSerializer.cpp:199 >> + KURL url = document->completeURL(linkElement->getAttribute(HTMLNames::hrefAttr)); > > This isn't really correct. There might have been a redirect. You probably want StyleSheet::finalURL, but that might cause problems when loading the style sheet... I guess it depends on how we deserialize these. I'm not sure what to do here.
I am not sure I understand the problem. Let's say the page has: <link rel="stylesheet" type="text/css" href="styles.css" /> and styles.css is a redirect to something else. As long as the link is serialized as is (without it's href being resolved), then its URL will match the URL we are using for the CSS resource. And during deserialization, they'll still match.
>> Source/WebCore/page/PageSerializer.cpp:207 >> + serializeCSSStyleSheet(static_cast<CSSStyleSheet*>(sheet), KURL()); > > What about style attributes? Seems like you'll want to grab the images and such out of those as well.
It does it as part of the serialization process. (this is validated by the WebPageNewSerializeTest::CSSResources test).
>> Source/WebCore/page/PageSerializer.cpp:217 >> + String cssText; > > We should use StringBuilder instead. Calling append a lot on String is slow.
Done.
>> Source/WebCore/page/PageSerializer.cpp:307 >> + KURL fakeURL(ParsedURLString, url); > > You should use makeString instead of calling append. I'm not sure it matters, but it's the new hotness.
Done, I totally want to use the hippest APIs.
>> Source/WebCore/page/PageSerializer.h:92 >> + unsigned m_blankFrameCounter; > > We usually skip these comments and remove all the blank lines between member variables.
Removed the comments.
>> Source/WebKit/chromium/src/WebPageSerializer.cpp:195 >> + resource.data = WebCString(iter->data->data(), iter->data->size()); > > This creates a copy of all the images? There isn't a way to just take a reference to the SharedData ?
Yes, it copies all resources. There is no way in the current Chromium WebKit API to create a wrapper around SharedData. I guess I could do that if you think that's necessary.
Jay Civelli
Comment 21
2011-04-21 14:59:57 PDT
Created
attachment 90610
[details]
Addressed Adam's and Alexey's comments
Jay Civelli
Comment 22
2011-04-21 15:02:17 PDT
Comment on
attachment 90562
[details]
Moved code to WebCore. View in context:
https://bugs.webkit.org/attachment.cgi?id=90562&action=review
>>> Source/WebCore/page/PageSerializer.cpp:56 >>> +namespace { >> >> Apparently we're supposed to use the static keyword in favor of anonymous namespaces, but I see that you're using this because you want the SerializerMarkupAccumulator class to have internal linkage. > > We usually don't care about linkage utility classes in WebCore namespace. So I suggest just removing the anonymous namespace.
Removed the anonymous namespace.
>> Source/WebCore/page/PageSerializer.cpp:138 >> +PageSerializer::Resource::Resource() { } > > Braces should be on separate lines.
Done.
>> Source/WebCore/page/PageSerializer.cpp:171 >> + // containing frame. Arg! > > We don't align code and comments like that, please just use one space.
Done.
>> Source/WebCore/page/PageSerializer.cpp:235 >> + // be retrieved from the CSSFontFaceRule object. > > Same comment about indentation.
Done.
>> Source/WebCore/page/PageSerializer.cpp:316 >> + > > Lots of empty lines here. > > I'm not a fan of comments like "namespace WebCore" - most of the time you don't question file structure, so they are useless and add visual noise. And when you do, they are likely to be misleading - the only way to match braces is to use your text editor features, comments can't tell the truth.
Removed empty lines and comment.
>> Source/WebCore/page/PageSerializer.h:51 >> +// It serializes all the page frames and retreives resources such as images and CSS stylesheets. > > Typo: "retreives".
Fixed.
>> Source/WebCore/page/PageSerializer.h:59 >> + Resource(const KURL&, const CString& mimeType, PassRefPtr<SharedBuffer> data); > > Please don't use CString here. It's a fairly strange a dangerous class.
Now using String.
>> Source/WebCore/page/PageSerializer.h:82 >> + // The list of resources retreived so far. > > Typo: "retreived". I'm not sure if this comment adds any information.
Removed comments.
WebKit Review Bot
Comment 23
2011-04-21 15:03:37 PDT
Attachment 90610
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/editing/MarkupAccumulator.h:96: The parameter name "element" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 1 in 19 files If any of these errors are false positives, please file a bug against check-webkit-style.
Jay Civelli
Comment 24
2011-04-21 15:04:09 PDT
On patch number 4 I fixed the encoding by intercepting the content meta attribute and replacing it with UTF-8 (as well as the CSS charset) per Adam suggestion. I still need to add the new files to all build systems.
Adam Barth
Comment 25
2011-04-21 15:06:11 PDT
Comment on
attachment 90562
[details]
Moved code to WebCore. View in context:
https://bugs.webkit.org/attachment.cgi?id=90562&action=review
>>> Source/WebCore/page/PageSerializer.cpp:127 >>> + appendAttribute(out, element, *attribute, namespaces); >> >> This isn't quite correct in the case of an <object> tag, which uses the data attribute for the frame's URL. HTMLFrameElementBase::isURLAttribute exists to solve this problem in other places. See also HTMLObjectElement::imageSourceAttributeName for a slightly different solution to a similar problem. > > I see. Since there is no getURLAttribute method I ended up iterating over the attributes and calling isURLAttribute to figure out the right one. Less than ideal, but I am not sure what else to do (short of introducing a new getURLAttribute method to Element).
You can add the method to HTMLFrameOwnerElement, right?
>>> Source/WebCore/page/PageSerializer.cpp:163 >>> + if (!url.isValid() || url == blankURL()) { >> >> This should use the protocolIs check instead of comparing against blankURL. > > Switched to protocolIs("about"). > Do you know if there could be cases where an frame loads something other than about:blank?
Sure. WebKit treats about:foo just like about:blank. <iframe src="about:foo"></script>
>>> Source/WebCore/page/PageSerializer.cpp:168 >>> + if (m_resourceURLs.contains(url)) { >> >> Maybe we should convert all the URLs be unique? > > Yeah, we could unique frame URLs but that would require some more changes to MarkupAccumulator. > I'll do that in a subsequent patch if that's OK with you.
Sounds good.
>>> Source/WebCore/page/PageSerializer.cpp:199 >>> + KURL url = document->completeURL(linkElement->getAttribute(HTMLNames::hrefAttr)); >> >> This isn't really correct. There might have been a redirect. You probably want StyleSheet::finalURL, but that might cause problems when loading the style sheet... I guess it depends on how we deserialize these. I'm not sure what to do here. > > I am not sure I understand the problem. > Let's say the page has: > <link rel="stylesheet" type="text/css" href="styles.css" /> > and styles.css is a redirect to something else. > As long as the link is serialized as is (without it's href being resolved), then its URL will match the URL we are using for the CSS resource. And during deserialization, they'll still match.
Right, the danger is that you're taking bytes that you got from one URL and saying that they came from another URL. That's the first step towards a security vulnerability. Once we re-write all the URLs to use dummy URLs, we won't have that problem.
>>> Source/WebCore/page/PageSerializer.cpp:207 >>> + serializeCSSStyleSheet(static_cast<CSSStyleSheet*>(sheet), KURL()); >> >> What about style attributes? Seems like you'll want to grab the images and such out of those as well. > > It does it as part of the serialization process. (this is validated by the WebPageNewSerializeTest::CSSResources test).
Ah, great. I missed that.
>>> Source/WebKit/chromium/src/WebPageSerializer.cpp:195 >>> + resource.data = WebCString(iter->data->data(), iter->data->size()); >> >> This creates a copy of all the images? There isn't a way to just take a reference to the SharedData ? > > Yes, it copies all resources. > There is no way in the current Chromium WebKit API to create a wrapper around SharedData. > I guess I could do that if you think that's necessary.
Sounds like a performance optimization we could consider in the future. Perhaps a FIXME comment?
> Source/WebKit/chromium/tests/WebPageNewSerializerTest.cpp:183 > +TEST_F(WebPageNewSerializeTest, CSSResources)
I guess I don't see how this tests the case I was thinking about. What about: <div style="width:10; heigh:10; background-image: url(blue_background.png)">Hi</div> ?
WebKit Review Bot
Comment 26
2011-04-21 15:26:30 PDT
Attachment 90610
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8498121
Adam Barth
Comment 27
2011-04-21 16:21:22 PDT
Comment on
attachment 90610
[details]
Addressed Adam's and Alexey's comments View in context:
https://bugs.webkit.org/attachment.cgi?id=90610&action=review
I feel like I'm nit-picking at this point. The main things to address are moving the function to HTMLFrameOwnerElement and style attributes.
>> Source/WebCore/editing/MarkupAccumulator.h:96 >> void appendOpenTag(Vector<UChar>& out, Element* element, Namespaces*); >> void appendCloseTag(Vector<UChar>& out, Element* element); >> - void appendAttribute(Vector<UChar>& out, Element* element, const Attribute&, Namespaces*); >> + virtual void appendAttribute(Vector<UChar>& out, Element* element, const Attribute&, Namespaces*); > > The parameter name "element" adds no information, so it should be removed. [readability/parameter_name] [5]
I'd fix all the instance of this issue in this block of code.
> Source/WebCore/page/PageSerializer.cpp:65 > +static const WebCore::QualifiedName* frameOwnerSrcAttributes[] = {
This array does not appear to be used.
> Source/WebCore/page/PageSerializer.cpp:81 > + // FIXME: ideally HTMLFrameOwnerElement would have a method to return the URL attribute name.
I'd just add this method to HTMLFrameOwnerElement
> Source/WebCore/page/PageSerializer.cpp:133 > + if (element->hasTagName(HTMLNames::metaTag) && attribute.name() == HTMLNames::contentAttr) {
Do we want to make sure the http-equiv attribute of this element is Content-Type ? Otherwise, you could be munging other http-equiv meta tags.
> Source/WebCore/page/PageSerializer.cpp:136 > + if (charset != "utf-8" && charset != "utf8") {
Doe we need a case-insensitive compare here? If not, we should ASSERT that charset is lowercase.
> Source/WebCore/page/PageSerializer.cpp:142 > + }
This isn't the only case. There's also: <meta charset="utf-8">
> Source/WebCore/page/PageSerializer.cpp:166 > + if (!urlAttribute) { > + LOG_ERROR("No URL attribute found for frame owner."); > + return; > + }
Why is this an error? It can definitely happen.
> Source/WebCore/page/PageSerializer.cpp:219 > + Vector<Node*> nodes; > + SerializerMarkupAccumulator accumulator(this, &nodes); > + CString frameHTML = accumulator.serializeNodes(document->documentElement(), 0, IncludeNode).utf8(); > + m_resources->append(Resource(url, String("text/html"), SharedBuffer::create(frameHTML.data(), frameHTML.length()))); > + m_resourceURLs.add(url);
I wonder if you want to add a <meta charset="UTF-8"> to the serialization just in case there's no meta element. Also, is document always text/html? What if it's application/svg+xml ?
WebKit Review Bot
Comment 28
2011-04-21 16:32:52 PDT
Attachment 90610
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8497170
Alexey Proskuryakov
Comment 29
2011-04-21 16:58:33 PDT
> On patch number 4 I fixed the encoding by intercepting the content meta attribute and replacing > it with UTF-8 (as well as the CSS charset) per Adam suggestion.
What I was saying was that ideally, one should preserve the original encoding if they want to restore the page later. Encoding affects the behavior, so changing it to utf-8 will break some pages, even though they will be decoded without problem.
Adam Barth
Comment 30
2011-04-21 17:35:58 PDT
(In reply to
comment #29
)
> > On patch number 4 I fixed the encoding by intercepting the content meta attribute and replacing > > it with UTF-8 (as well as the CSS charset) per Adam suggestion. > > What I was saying was that ideally, one should preserve the original encoding if they want to restore the page later. Encoding affects the behavior, so changing it to utf-8 will break some pages, even though they will be decoded without problem.
Did you have something specific in mind? This approach changes a lot of things that are observable to the page, so some number of pages will break.
Alexey Proskuryakov
Comment 31
2011-04-21 19:54:35 PDT
From the top of my head, I have two variants: - encode documents using document.charset, add a META to make sure that we decode correctly (least confusing); - encode using utf-8, change META if it exists, store the encoding out of band (or perhaps in a private attribute), so that it could be restored after parsing. The first option seems preferable.
> This approach changes a lot of things that are observable to the page, so some number of pages will break.
Yes, I'm not pushing this as a pre-requisite for landing this patch. But encoding is something that may be easier to get right from the start than to change later.
Adam Barth
Comment 32
2011-04-21 20:13:43 PDT
(In reply to
comment #31
)
> From the top of my head, I have two variants: > - encode documents using document.charset, add a META to make sure that we decode correctly (least confusing); > - encode using utf-8, change META if it exists, store the encoding out of band (or perhaps in a private attribute), so that it could be restored after parsing. > > The first option seems preferable.
The approach in Jay's previous patch is like (2) but it doesn't restore the encoding after parsing. What goes wrong if we don't restore the encoding?
Alexey Proskuryakov
Comment 33
2011-04-21 20:42:03 PDT
I briefly mentioned that in
comment 18
. Some uses I could remember or easily find: - for encoding form data; - by XSSFilter, basically for the same purpose; - by Document::completeURL(); - as a default encoding for some text-based subresources (which would obviously affect new one that are dynamically added after restoring); - it changes rendering of a backslash/Yen character in Japanese. Losing the encoding would cause a number of subtle bugs.
Adam Barth
Comment 34
2011-04-21 22:12:58 PDT
Ok. Jay, what do you think about option (1): - encode documents using document.charset, add a META to make sure that we decode correctly
Jay Civelli
Comment 35
2011-04-22 08:50:59 PDT
Option 1 sounds good, I'll go ahead and implement it this way. I guess it probably does not matter for CSS subresources whether we enforce the charset or whether we force UTF-8, but I guess for consistency I'll preserve the encoding for these as well.
Jay Civelli
Comment 36
2011-04-22 13:49:26 PDT
Created
attachment 90753
[details]
Now keeping original files encoding With this new patch we now keep the original charset of the page and append our own meta tag to specify it (and we remove any meta tag specifying the charset). Also addressed other comments from Adam and Alexey.
Jay Civelli
Comment 37
2011-04-22 13:52:00 PDT
Comment on
attachment 90562
[details]
Moved code to WebCore. View in context:
https://bugs.webkit.org/attachment.cgi?id=90562&action=review
>>>> Source/WebCore/page/PageSerializer.cpp:127 >>>> + appendAttribute(out, element, *attribute, namespaces); >>> >>> This isn't quite correct in the case of an <object> tag, which uses the data attribute for the frame's URL. HTMLFrameElementBase::isURLAttribute exists to solve this problem in other places. See also HTMLObjectElement::imageSourceAttributeName for a slightly different solution to a similar problem. >> >> I see. Since there is no getURLAttribute method I ended up iterating over the attributes and calling isURLAttribute to figure out the right one. Less than ideal, but I am not sure what else to do (short of introducing a new getURLAttribute method to Element). > > You can add the method to HTMLFrameOwnerElement, right?
OK, done. Had to name it getFrameURLAttribute() as there is already an Element::getURLAttribute method.
>>>> Source/WebKit/chromium/src/WebPageSerializer.cpp:195 >>>> + resource.data = WebCString(iter->data->data(), iter->data->size()); >>> >>> This creates a copy of all the images? There isn't a way to just take a reference to the SharedData ? >> >> Yes, it copies all resources. >> There is no way in the current Chromium WebKit API to create a wrapper around SharedData. >> I guess I could do that if you think that's necessary. > > Sounds like a performance optimization we could consider in the future. Perhaps a FIXME comment?
Added FIXME comment.
>> Source/WebKit/chromium/tests/WebPageNewSerializerTest.cpp:183 >> +TEST_F(WebPageNewSerializeTest, CSSResources) > > I guess I don't see how this tests the case I was thinking about. What about: > > <div style="width:10; heigh:10; background-image: url(blue_background.png)">Hi</div> > > ?
I does it, see css_test_page.html line 80.
Jay Civelli
Comment 38
2011-04-22 13:53:22 PDT
Comment on
attachment 90610
[details]
Addressed Adam's and Alexey's comments View in context:
https://bugs.webkit.org/attachment.cgi?id=90610&action=review
>>> Source/WebCore/editing/MarkupAccumulator.h:96 >>> + virtual void appendAttribute(Vector<UChar>& out, Element* element, const Attribute&, Namespaces*); >> >> The parameter name "element" adds no information, so it should be removed. [readability/parameter_name] [5] > > I'd fix all the instance of this issue in this block of code.
Done.
>> Source/WebCore/page/PageSerializer.cpp:65 >> +static const WebCore::QualifiedName* frameOwnerSrcAttributes[] = { > > This array does not appear to be used.
Ah right. Removed.
>> Source/WebCore/page/PageSerializer.cpp:81 >> + // FIXME: ideally HTMLFrameOwnerElement would have a method to return the URL attribute name. > > I'd just add this method to HTMLFrameOwnerElement
Done.
>> Source/WebCore/page/PageSerializer.cpp:133 >> + if (element->hasTagName(HTMLNames::metaTag) && attribute.name() == HTMLNames::contentAttr) { > > Do we want to make sure the http-equiv attribute of this element is Content-Type ? Otherwise, you could be munging other http-equiv meta tags.
Good point. Now that we are keeping the original encoding I am skipping any <meta http-equiv="Content-Type"...
>> Source/WebCore/page/PageSerializer.cpp:136 >> + if (charset != "utf-8" && charset != "utf8") { > > Doe we need a case-insensitive compare here? If not, we should ASSERT that charset is lowercase.
This code is gone.
>> Source/WebCore/page/PageSerializer.cpp:142 >> + } > > This isn't the only case. There's also: > > <meta charset="utf-8">
Also excluding these now.
>> Source/WebCore/page/PageSerializer.cpp:166 >> + } > > Why is this an error? It can definitely happen.
OK, removed the LOG_ERROR. I was not sure which kind of frame owner would not have a src (since frame, object, embed would).
>> Source/WebCore/page/PageSerializer.cpp:219 >> + m_resourceURLs.add(url); > > I wonder if you want to add a <meta charset="UTF-8"> to the serialization just in case there's no meta element. > > Also, is document always text/html? What if it's application/svg+xml ?
I added a method to Document to get the contentType and it deals with xhtml and svg, and use that now.
WebKit Review Bot
Comment 39
2011-04-22 14:22:46 PDT
Attachment 90753
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8495506
Adam Barth
Comment 40
2011-04-22 14:40:07 PDT
Comment on
attachment 90753
[details]
Now keeping original files encoding View in context:
https://bugs.webkit.org/attachment.cgi?id=90753&action=review
> Source/WebCore/dom/Document.cpp:1128 > +String Document::contentType() > +{ > + String content; > + if (m_document->isXHTMLDocument()) > + content = "application/xhtml+xml"; > + else if (m_document->isSVGDocument()) > + content = "image/svg+xml"; > + else > + content = "text/html"; > + > + return makeString(content, "; charset=", charset()); > +}
This doesn't seem right. For example, what if the Content-Type is text/plain? You need to ask the ResourceResponse that contained the document originally. You probably need to ask DocumentLoader.
Alexey Proskuryakov
Comment 41
2011-04-22 14:47:29 PDT
I don't see a problem with that. Current document encoding is authoritative, as that's what is used in all the cases I mentioned. An HTTP response may not have any charset in Content-Type, or it may have an invalid one. On the other hand, I don't think that this function is needed. Can't we just use <meta charset>, not <meta http-equiv>? Also, <meta>s of all sorts only work in HTML. You should be adding an encoding to <?xml?> for XML, and there is nothing that can be done for text.
Jay Civelli
Comment 42
2011-04-22 14:57:22 PDT
The reason I added this function is because in order to add the meta tag as we discussed before, I removed any existing meta-tags with a content-type. May be that's a bad idea? It seemed just trickier to add the meta only if it's not there. (I would have to wait for the end of the header, and I assumed such a tag should probably be one of the first tags in teh header as <title> and friends do need the encoding). As a result of removing the existing the content-type meta tags, I removed the mime type of the document which is what I am attempting to regenerate. My approach seems broken. May be I should preparse the header before I even start serializing the DOM to figure-out if I need to specify the charset myself? What do you guys think?
Simon Fraser (smfr)
Comment 43
2011-04-22 15:37:39 PDT
What is the serialization format? Is it something that should be standardized?
WebKit Review Bot
Comment 44
2011-04-22 16:38:48 PDT
Attachment 90753
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8495597
Adam Barth
Comment 45
2011-04-22 16:44:05 PDT
> What is the serialization format? Is it something that should be standardized?
The serialization format is HTML.
WebKit Review Bot
Comment 46
2011-04-22 17:39:32 PDT
Attachment 90753
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8487928
Jay Civelli
Comment 47
2011-04-22 17:43:55 PDT
Created
attachment 90815
[details]
Only specifying the charset if the page does not do so. I now only add the charset to pages that don't specify it already. Also I specify the XML charset for XML documents (including XHTML and SVG) and ignore other types.
WebKit Review Bot
Comment 48
2011-04-22 20:04:09 PDT
Attachment 90815
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8497594
Alexey Proskuryakov
Comment 49
2011-04-22 21:36:44 PDT
Comment on
attachment 90815
[details]
Only specifying the charset if the page does not do so. View in context:
https://bugs.webkit.org/attachment.cgi?id=90815&action=review
I have a number of comments, but I only looked into a few aspects of the patch.
> Source/WebCore/ChangeLog:13 > + * Android.mk: > + * CMakeLists.txt: > + * WebCore.gypi:
Still need many more build systems.
> Source/WebCore/ChangeLog:26 > + * page/PageSerializer.h: Added.
Despite trying to follow this patch for a while now, I'm still not sure how to explain the relation of new code to what we have now (e.g. Web archives). I don't have any specific suggestions or requests, but if you could make it more clear, that would be great.
> Source/WebCore/dom/Document.cpp:1116 > +String Document::mimeType() const
This should be more explicit about what it returns. Like maybe suggestedMIMETypeForSerialization?
> Source/WebCore/html/HTMLFrameOwnerElement.cpp:114 > + for (size_t i = 0; i < attributeMap->length(); ++i) { > + Attribute* attribute = attributeMap->attributeItem(i); > + if (isURLAttribute(attribute))
I bet we can do much better than iterate. Also, it's really confusing what this function even does. getSourceURLAttributeNameIfPresent? From briefly looking at call sites, it appears that those should check that the elements are instances of HTMLFrameElementBase somehow, and then just use src. They won't know what to do with potential future frame owner elements anyway, so being future-proof beyond adding assertions doesn't really help.
> Source/WebCore/page/PageSerializer.cpp:91 > + if (element->hasAttribute(HTMLNames::charsetAttr)) > + return true; > + > + String attributeValue; > + if (!(attributeValue = element->getAttribute(HTMLNames::http_equivAttr)).isNull() > + && attributeValue.lower().stripWhiteSpace() == "content-type") > + return true;
This logic doesn't match HTML5. Actual parsing logic lives in HTMLMetaCharsetParser, and I think that this check should be unified with it.
> Source/WebCore/page/PageSerializer.cpp:119 > + // MarkupAccumulator does not serialie the <?xml ... line, so we add it explicitely to ensure the right encoding is specified.
Typos: serialie, explicitely. I suspect that it's a bug that MarkupAccumulator doesn't do that. Specifically,
bug 25206
tracks a consequence of that. You should add guards (at least in the form of regression tests) to avoid getting two XML declarations - currently, this patch doesn't include any XML tests AFAICS. Obviously, it would be even better if you'd be willing to investigate and fix this in a prequel patch.
> Source/WebCore/page/PageSerializer.cpp:157 > + String metaStart("<meta charset=\"");
I've lost track of what the best way to build strings is today. StringBuilder perhaps? Or just "<meta charset=\"" + m_document->charset() + "\">"?
> Source/WebCore/page/PageSerializer.cpp:237 > + TextEncoding& textEncoding = TextEncoding(document->charset());
TextEncoding textEncoding(document->charset());
> Source/WebCore/page/PageSerializer.cpp:243 > + CString frameHTML = textEncoding.encode(text.characters(), text.length(), QuestionMarksForUnencodables);
Why QuestionMarksForUnencodables here and elsewhere? The page may have characters outside the original encoding repertoire inserted via DOM. It's very easy to avoid losing information by using other options here.
> Source/WebCore/page/PageSerializer.cpp:307 > + TextEncoding& textEncoding = TextEncoding(styleSheet->charset());
TextEncoding textEncoding(styleSheet->charset());
> Source/WebCore/page/PageSerializer.cpp:314 > + m_resources->append(Resource(url, String("text/css"), SharedBuffer::create(text.data(), text.length())));
We probably need need to ensure @charset for those. Thankfully, it's much easier than HTML! It's also less important, and could easily wait for a follow-up patch.
Adam Barth
Comment 50
2011-04-22 22:25:24 PDT
> Despite trying to follow this patch for a while now, I'm still not sure how to explain the relation of new code to what we have now (e.g. Web archives). I don't have any specific suggestions or requests, but if you could make it more clear, that would be great.
Web archive is an Apple-proprietary format, which is coupled to the Objective-C ecosystem (e.g., via the use of NSKeyedEncoder). MHTML is another serialization format, which is used by IE and Opera. One perspective is that this mechanism is at a lower layer than the particular final serialization format. This code serializes the DOM to HTML and encapsulates some subresources (such as images). You could take the output of this class an complete the serialization to either web archive, MHTML, or some other on-disk format. There might be some opportunity to re-use some of this mechanism in WebKit's web archive codepath, but the clients of that code path might want to make different trade-offs (e.g., about inline scripts, as discussed above). Another perspective is that MarkupAccumulator is doing most of the heavy lifting here and is the most valuable component to re-use.
WebKit Review Bot
Comment 51
2011-04-22 22:44:55 PDT
Attachment 90815
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8495723
WebKit Review Bot
Comment 52
2011-04-22 23:45:54 PDT
Attachment 90815
[details]
did not build on chromium: Build output:
http://queues.webkit.org/results/8503062
Simon Fraser (smfr)
Comment 53
2011-04-23 09:17:23 PDT
FYI one requirement we've found necessary that is poorly met by WebArchives is to be able to explode the serialized form into the original files, in order to edit those files to create bug reductions. Would the current code be usable if we wanted, for example, to create a bundle (directory-based) archive, or a zip archive?
Adam Barth
Comment 54
2011-04-23 09:20:57 PDT
> Would the current code be usable if we wanted, for example, to create a bundle (directory-based) archive, or a zip archive?
Yep. Chrome's current "Save As (complete)" feature produces a directory-based representation of the page. The main HTML file is stored in next to a folder containing the subresources as individual files. In this model, the exact on-disk representation is chosen by the embedder.
Jay Civelli
Comment 55
2011-04-25 12:28:32 PDT
Comment on
attachment 90815
[details]
Only specifying the charset if the page does not do so. View in context:
https://bugs.webkit.org/attachment.cgi?id=90815&action=review
>> Source/WebCore/ChangeLog:13 >> + * WebCore.gypi: > > Still need many more build systems.
OK, I tried to add all of them. I am still missing the xcode one, I'll do it next (I don't think I can do it in a text editor without screwing things up, so I'll have to move my patch to my Mac).
>> Source/WebCore/ChangeLog:26 >> + * page/PageSerializer.h: Added. > > Despite trying to follow this patch for a while now, I'm still not sure how to explain the relation of new code to what we have now (e.g. Web archives). I don't have any specific suggestions or requests, but if you could make it more clear, that would be great.
As Adam said, the goal is to provide another way to save a page to disk. One of my objective is to have a subsequent patch that adds MHTML generation support (by leveraging the Archive/ArchiveFactory classes).
>> Source/WebCore/dom/Document.cpp:1116 >> +String Document::mimeType() const > > This should be more explicit about what it returns. Like maybe suggestedMIMETypeForSerialization?
I am happy to change the name, but is the MIME type returned only specific to serialization? Are there cases where the document MIME type would not be considered to be one of these?
>> Source/WebCore/html/HTMLFrameOwnerElement.cpp:114 >> + if (isURLAttribute(attribute)) > > I bet we can do much better than iterate. Also, it's really confusing what this function even does. getSourceURLAttributeNameIfPresent? > > From briefly looking at call sites, it appears that those should check that the elements are instances of HTMLFrameElementBase somehow, and then just use src. They won't know what to do with potential future frame owner elements anyway, so being future-proof beyond adding assertions doesn't really help.
I believe since object and embed tags can be used as frames, then we cannot assume these tags are going to be HTMLFrameElementBase. And in the object tag case, the attribute to look at is not src but data. I think ideally HTMLFrameOwnerElement would have a method that returns the actual URL attribute instead of isURLAttribute(). Implementing this (if this is actually the right think to do) might impact several more files, and I'd rather not do it as part this patch if that's OK with you.
>> Source/WebCore/page/PageSerializer.cpp:91 >> + return true; > > This logic doesn't match HTML5. Actual parsing logic lives in HTMLMetaCharsetParser, and I think that this check should be unified with it.
I tried to unify the code, but it turned out I had to use a Vector to unify the attributes which are of different types in the 2 locations. Please let me know if you can think of a better way to do this.
>> Source/WebCore/page/PageSerializer.cpp:119 >> + // MarkupAccumulator does not serialie the <?xml ... line, so we add it explicitely to ensure the right encoding is specified. > > Typos: serialie, explicitely. > > I suspect that it's a bug that MarkupAccumulator doesn't do that. Specifically,
bug 25206
tracks a consequence of that. > > You should add guards (at least in the form of regression tests) to avoid getting two XML declarations - currently, this patch doesn't include any XML tests AFAICS. Obviously, it would be even better if you'd be willing to investigate and fix this in a prequel patch.
Typo fixed. OK, I added a regression test for now. I think I have I know how to fix the MarkupAccumulator bug, I'll try to do that in a follow-up patch.
>> Source/WebCore/page/PageSerializer.cpp:157 >> + String metaStart("<meta charset=\""); > > I've lost track of what the best way to build strings is today. StringBuilder perhaps? Or just "<meta charset=\"" + m_document->charset() + "\">"?
Switched to use makeString.
>> Source/WebCore/page/PageSerializer.cpp:237 >> + TextEncoding& textEncoding = TextEncoding(document->charset()); > > TextEncoding textEncoding(document->charset());
Fixed.
>> Source/WebCore/page/PageSerializer.cpp:243 >> + CString frameHTML = textEncoding.encode(text.characters(), text.length(), QuestionMarksForUnencodables); > > Why QuestionMarksForUnencodables here and elsewhere? The page may have characters outside the original encoding repertoire inserted via DOM. It's very easy to avoid losing information by using other options here.
OK, I switched to using EntitiesForUnencodables that will encode characters outside of the encoding repertoire to XML entity.
>> Source/WebCore/page/PageSerializer.cpp:307 >> + TextEncoding& textEncoding = TextEncoding(styleSheet->charset()); > > TextEncoding textEncoding(styleSheet->charset());
Done.
>> Source/WebCore/page/PageSerializer.cpp:314 >> + m_resources->append(Resource(url, String("text/css"), SharedBuffer::create(text.data(), text.length()))); > > We probably need need to ensure @charset for those. Thankfully, it's much easier than HTML! It's also less important, and could easily wait for a follow-up patch.
OK, I added a FIXME and will keep that for a follow-up patch.
Jay Civelli
Comment 56
2011-04-25 12:29:40 PDT
Created
attachment 90935
[details]
Addressing Alexey's comments
Early Warning System Bot
Comment 57
2011-04-25 12:49:43 PDT
Attachment 90935
[details]
did not build on qt: Build output:
http://queues.webkit.org/results/8506427
Adam Barth
Comment 58
2011-04-25 14:12:38 PDT
Comment on
attachment 90935
[details]
Addressing Alexey's comments View in context:
https://bugs.webkit.org/attachment.cgi?id=90935&action=review
I feel like we're getting the point where we should land this patch and iterate with incremental patches. (Minor nits below.)
> Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp:118 > +TextEncoding HTMLMetaCharsetParser::getEncodingFromMetaAttributes(const Vector<pair<AtomicString, String> >& attributes)
We usually don't start functions with the word "get".
> Source/WebCore/html/parser/HTMLMetaCharsetParser.h:52 > + static TextEncoding getEncodingFromMetaAttributes(const Vector<pair<AtomicString, String> >&);
const Vector<pair<AtomicString, String> > is not a very semantic type...
> Source/WebCore/page/PageSerializer.cpp:57 > +
We shouldn't have this blank line here.
> Source/WebCore/page/PageSerializer.cpp:128 > + if (head) {
Prefer early return.
Alexey Proskuryakov
Comment 59
2011-04-25 15:30:03 PDT
>> This should be more explicit about what it returns. Like maybe suggestedMIMETypeForSerialization?
>
> I am happy to change the name, but is the MIME type returned only specific to serialization? Are there > cases where the document MIME type would not be considered to be one of these?
Maybe suggestedMIMEType() is OK indeed. mimeType() feels like it is "real", original or authoritative in some way, which is probably more than what is offered here. Another case where we need a MIME type is to fix
bug 11049
. I haven't checked if what you add matches what we need there (we'll need the test both spec and important browsers). Maybe you'll want to fix that bug as a follow-up, too?
> I tried to unify the code, but it turned out I had to use a Vector to unify the attributes which are of > different types in the 2 locations. Please let me know if you can think of a better way to do this.
This is not exactly what I had in mind. I've been thinking about just having a check for every single <meta> be unified with HTMLCharsetParser, not about passing all metas to it. Not sure which is better.
Alexey Proskuryakov
Comment 60
2011-04-25 15:48:09 PDT
Comment on
attachment 90935
[details]
Addressing Alexey's comments View in context:
https://bugs.webkit.org/attachment.cgi?id=90935&action=review
> This is not exactly what I had in mind
Please ignore this comment - I've now looked at the code, and understood what it's doing. I agree that this should be landed sooner rather than later, and I avoided nitpicking on actual serialization code. r- for still not having all project file changes, and I think that my code quality comments below need to be addressed, too. But we are getting close, hopefully the next iteration will be landable.
> Source/WebCore/html/HTMLFrameOwnerElement.cpp:116 > + for (size_t i = 0; i < attributeMap->length(); ++i) { > + Attribute* attribute = attributeMap->attributeItem(i); > + if (isURLAttribute(attribute)) > + return &(attribute->name()); > + }
This still stands out to me as something that should be fixed in initial patch. Is there some difficulty getting rid of the loop? The only non-null result this functions can ever return is srcAttr, so why does it even exist?
>> Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp:118 >> +TextEncoding HTMLMetaCharsetParser::getEncodingFromMetaAttributes(const Vector<pair<AtomicString, String> >& attributes) > > We usually don't start functions with the word "get".
Yes, in WebKit coding style, "get..." is for functions that return a result via a reference (or pointer) argument.
> Source/WebCore/page/PageSerializer.cpp:74 > + for (size_t i = 0; i < WTF_ARRAY_LENGTH(ignoredTags); ++i) { > + if (element->hasTagName(*ignoredTags[i])) > + return true; > + } > + return false;
I suggest changing this to: return element->hasTagName(HTMLNames::scriptTag) || element->hasTagName(HTMLNames:: noscriptTag); It's much shorter, and the current code isn't scalable anyway (you'd need to start using a HashMap with many ignored tags).
> Source/WebCore/page/PageSerializer.cpp:107 > + PageSerializer* m_serializer;
What guarantees that this doesn't become a dangling pointer?
> Source/WebCore/page/PageSerializer.cpp:108 > + Document* m_document;
And this?
> Source/WebCore/page/PageSerializer.cpp:124 > + // HTML might or not have their encoding specified, we'll add it if it's missing.
This is not sufficient. It is very common to have a <meta> that is wrong - HTTP headers override <meta>, so people leave broken ones in files all the time.
> Source/WebCore/page/PageSerializer.cpp:183 > + RefPtr<Attribute> attribute = Attribute::create(HTMLNames::srcAttr, url.string()); > + appendAttribute(out, element, *attribute, namespaces);
Do you really need to use the internal Attribute class directly? What about Element::setAttribute()?
> Source/WebCore/page/PageSerializer.cpp:233 > + if (!textEncoding.isValid()) {
You can just ASSERT, document->encoding() is never invalid.
Jay Civelli
Comment 61
2011-04-26 00:37:45 PDT
Comment on
attachment 90935
[details]
Addressing Alexey's comments View in context:
https://bugs.webkit.org/attachment.cgi?id=90935&action=review
>> Source/WebCore/html/HTMLFrameOwnerElement.cpp:116 >> + } > > This still stands out to me as something that should be fixed in initial patch. Is there some difficulty getting rid of the loop? The only non-null result this functions can ever return is srcAttr, so why does it even exist?
As I mentioned in my previous comment, the problem is with object tags that I believe don't use src but data. SInce I believe this is the only weird case, I simplified the method.
>>> Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp:118 >>> +TextEncoding HTMLMetaCharsetParser::getEncodingFromMetaAttributes(const Vector<pair<AtomicString, String> >& attributes) >> >> We usually don't start functions with the word "get". > > Yes, in WebKit coding style, "get..." is for functions that return a result via a reference (or pointer) argument.
Done.
>> Source/WebCore/html/parser/HTMLMetaCharsetParser.h:52 >> + static TextEncoding getEncodingFromMetaAttributes(const Vector<pair<AtomicString, String> >&); > > const Vector<pair<AtomicString, String> > is not a very semantic type...
Now typedefed.
>> Source/WebCore/page/PageSerializer.cpp:57 >> + > > We shouldn't have this blank line here.
Removed.
>> Source/WebCore/page/PageSerializer.cpp:74 >> + return false; > > I suggest changing this to: > > return element->hasTagName(HTMLNames::scriptTag) || element->hasTagName(HTMLNames:: noscriptTag); > > It's much shorter, and the current code isn't scalable anyway (you'd need to start using a HashMap with many ignored tags).
Good idea, done.
>> Source/WebCore/page/PageSerializer.cpp:107 >> + PageSerializer* m_serializer; > > What guarantees that this doesn't become a dangling pointer?
It is an internal class instanciated by PageSerializer that never outlives the PageSerializer or Document.
>> Source/WebCore/page/PageSerializer.cpp:124 >> + // HTML might or not have their encoding specified, we'll add it if it's missing. > > This is not sufficient. It is very common to have a <meta> that is wrong - HTTP headers override <meta>, so people leave broken ones in files all the time.
OK, reverted to what I was doing previously, which is to always remove meta charset tags and always add our own.
>> Source/WebCore/page/PageSerializer.cpp:128 >> + if (head) { > > Prefer early return.
Done.
>> Source/WebCore/page/PageSerializer.cpp:183 >> + appendAttribute(out, element, *attribute, namespaces); > > Do you really need to use the internal Attribute class directly? What about Element::setAttribute()?
I have tried hard in this patch to do the serialization without modifying the DOM to avoid any unintended consequences. Wouldn't setting the src attribute on the frame cause it to load the fake URL?
>> Source/WebCore/page/PageSerializer.cpp:233 >> + if (!textEncoding.isValid()) { > > You can just ASSERT, document->encoding() is never invalid.
Done.
Jay Civelli
Comment 62
2011-04-26 00:52:02 PDT
(In reply to
comment #59
)
> >> This should be more explicit about what it returns. Like maybe suggestedMIMETypeForSerialization? > > > > I am happy to change the name, but is the MIME type returned only specific to serialization? Are there > > cases where the document MIME type would not be considered to be one of these? > > Maybe suggestedMIMEType() is OK indeed. mimeType() feels like it is "real", original or authoritative in some way, which is probably more than what is offered here.
OK, changed to sugestedMimeType.
> Another case where we need a MIME type is to fix
bug 11049
. I haven't checked if what you add matches what we need there (we'll need the test both spec and important browsers). Maybe you'll want to fix that bug as a follow-up, too?
I'll try to take a look as a follow up.
> > I tried to unify the code, but it turned out I had to use a Vector to unify the attributes which are of > > different types in the 2 locations. Please let me know if you can think of a better way to do this. > > This is not exactly what I had in mind. I've been thinking about just having a check for every single <meta> be unified with HTMLCharsetParser, not about passing all metas to it. Not sure which is better.
Jay Civelli
Comment 63
2011-04-26 01:09:36 PDT
Created
attachment 91081
[details]
Patch
Alexey Proskuryakov
Comment 64
2011-04-26 08:35:15 PDT
Comment on
attachment 91081
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=91081&action=review
Looks good to me.
> Source/WebCore/html/HTMLFrameOwnerElement.cpp:112 > + return hasTagName(HTMLNames::objectTag) ? HTMLNames::dataAttr : HTMLNames::srcAttr;
I like this much better without a loop. This returns a correct result for <embed>, but what about <applet>? As long as this is a method on HTMLFrameOwnerElement, it needs to do the right thing in all cases, and assert that an unexpected case didn't arise. In page serialization code, it would be more acceptable to only handle the cases you support. Perhaps this can be moved to PageSerializer.cpp.
> Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp:108 > + for (HTMLToken::AttributeList::const_iterator iter = tokenAttributes.begin(); > + iter != tokenAttributes.end(); ++iter) {
This line is sufficiently short, no need to wrap it.
> Source/WebCore/page/PageSerializer.cpp:74 > + attributes.append(make_pair(item->name().toString(), item->value()));
Will this code be fooled by namespaced attributes?
> Source/WebCore/page/PageSerializer.cpp:271 > + if (!textEncoding.isValid()) { > + LOG_ERROR("Unsupported charset '%s', defaulting to UTF-8.", styleSheet->charset().ascii().data());
Missed this last time. I don't think that a stylesheet charset can be invalid either.
Jay Civelli
Comment 65
2011-04-26 11:01:41 PDT
Comment on
attachment 91081
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=91081&action=review
>> Source/WebCore/html/HTMLFrameOwnerElement.cpp:112 >> + return hasTagName(HTMLNames::objectTag) ? HTMLNames::dataAttr : HTMLNames::srcAttr; > > I like this much better without a loop. > > This returns a correct result for <embed>, but what about <applet>? As long as this is a method on HTMLFrameOwnerElement, it needs to do the right thing in all cases, and assert that an unexpected case didn't arise. In page serialization code, it would be more acceptable to only handle the cases you support. Perhaps this can be moved to PageSerializer.cpp.
Moved that code back to PageSerializer (where it was originally)
>> Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp:108 >> + iter != tokenAttributes.end(); ++iter) { > > This line is sufficiently short, no need to wrap it.
Done.
>> Source/WebCore/page/PageSerializer.cpp:74 >> + attributes.append(make_pair(item->name().toString(), item->value())); > > Will this code be fooled by namespaced attributes?
Added a FIXME to look into this as a follow up patch if that's OK with you.
>> Source/WebCore/page/PageSerializer.cpp:271 >> + LOG_ERROR("Unsupported charset '%s', defaulting to UTF-8.", styleSheet->charset().ascii().data()); > > Missed this last time. I don't think that a stylesheet charset can be invalid either.
Fixed.
Jay Civelli
Comment 66
2011-04-26 11:03:40 PDT
Created
attachment 91128
[details]
Patch
Early Warning System Bot
Comment 67
2011-04-26 11:21:32 PDT
Attachment 91128
[details]
did not build on qt: Build output:
http://queues.webkit.org/results/8508715
Jay Civelli
Comment 68
2011-04-26 11:36:16 PDT
Created
attachment 91131
[details]
Patch
Early Warning System Bot
Comment 69
2011-04-26 11:57:21 PDT
Attachment 91131
[details]
did not build on qt: Build output:
http://queues.webkit.org/results/8509709
Jay Civelli
Comment 70
2011-04-26 12:09:04 PDT
Created
attachment 91142
[details]
Patch
WebKit Review Bot
Comment 71
2011-04-26 13:31:31 PDT
Attachment 91128
[details]
did not build on mac: Build output:
http://queues.webkit.org/results/8507716
Adam Barth
Comment 72
2011-04-26 17:05:26 PDT
Comment on
attachment 91142
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=91142&action=review
> Source/WebCore/page/PageSerializer.cpp:74 > + // FIXME: we should deal appropriately with the attribute if they have a namespace.
we => We
> Source/WebCore/page/PageSerializer.cpp:88 > + // FIXME: we should support all frame owners including applets.
we => We
Jay Civelli
Comment 73
2011-04-26 17:33:02 PDT
Created
attachment 91194
[details]
Patch
Jay Civelli
Comment 74
2011-04-26 17:45:10 PDT
Created
attachment 91196
[details]
Patch
WebKit Review Bot
Comment 75
2011-04-26 22:27:40 PDT
Attachment 91196
[details]
did not build on mac: Build output:
http://queues.webkit.org/results/8508862
Jay Civelli
Comment 76
2011-04-26 23:00:10 PDT
Created
attachment 91236
[details]
Patch for landing
Jay Civelli
Comment 77
2011-04-26 23:04:55 PDT
Created
attachment 91237
[details]
Patch for landing
WebKit Review Bot
Comment 78
2011-04-26 23:26:39 PDT
Attachment 91196
[details]
did not build on mac: Build output:
http://queues.webkit.org/results/8513237
WebKit Commit Bot
Comment 79
2011-04-27 05:55:08 PDT
Comment on
attachment 91237
[details]
Patch for landing Rejecting
attachment 91237
[details]
from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=eseidel-cq-sf', 'la..." exit_code: 1 Last 500 characters of output: n/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc ... Currently at 85045 = c90ce6df6de66c3e8aa4e4647defcc79fbeb71d8
r85046
= 56faf6b1e02973a2afa610db422557d9b295b465
r85047
= 36d92b65972e8c122cdf958588b4d52a4d988a12
r85048
= 9f2306577fb4fcb9c7c377fcc4873aaa774e2e1a Done rebuilding .git/svn/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc First, rewinding head to replay your work on top of it... Fast-forwarded master to refs/remotes/origin/master. Full output:
http://queues.webkit.org/results/8508948
Jay Civelli
Comment 80
2011-04-27 08:51:41 PDT
Created
attachment 91293
[details]
Patch for landing
WebKit Commit Bot
Comment 81
2011-04-27 20:36:54 PDT
Comment on
attachment 91293
[details]
Patch for landing Rejecting
attachment 91293
[details]
from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-4', 'apply-..." exit_code: 2 Last 500 characters of output: hing file Source/WebKit/chromium/tests/data/pageserializer/import_style_from_link.css patching file Source/WebKit/chromium/tests/data/pageserializer/import_styles.css patching file Source/WebKit/chromium/tests/data/pageserializer/link_styles.css patching file Source/WebKit/chromium/tests/data/pageserializer/simple.xhtml patching file Source/WebKit/chromium/tests/data/pageserializer/top_frame.html Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', u'--force']" exit_code: 1 Full output:
http://queues.webkit.org/results/8518206
Alexey Proskuryakov
Comment 82
2011-04-27 21:02:54 PDT
Probably failing on CRLF line endings in vcproj, and will need to be landed manually. By the way, xcodeproj changes are not in ChangeLog.
Adam Barth
Comment 83
2011-04-28 02:13:46 PDT
Created
attachment 91452
[details]
patch (updated to avoid Android.mk)
Adam Barth
Comment 84
2011-04-28 15:04:09 PDT
Comment on
attachment 91452
[details]
patch (updated to avoid Android.mk) Clearing flags on attachment: 91452 Committed
r85244
: <
http://trac.webkit.org/changeset/85244
>
Adam Barth
Comment 85
2011-04-28 15:04:20 PDT
All reviewed patches have been landed. Closing bug.
WebKit Review Bot
Comment 86
2011-04-28 15:52:47 PDT
http://trac.webkit.org/changeset/85244
might have broken WinCE Release (Build)
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug