Bug 31737

Summary: Chromium WebKit API WebPageSerializer
Product: WebKit Reporter: Yaar Schnitman <yaar>
Component: WebKit APIAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, eric, fishd, justin.garcia, webkit.review.bot, yaar
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Attachments:
Description Flags
Patch
none
WebPageSerializer
none
Patch 3
none
Added WebPageSerializer to WebKit API.
none
Added WebPageSerializer to WebKit API.
none
Added WebPageSerializer to WebKit API.
none
Added WebPageSerializer to WebKit API.
none
Added WebPageSerializer to WebKit API.
none
Added WebPageSerializer to WebKit API. fishd: review+, commit-queue: commit-queue-

Description Yaar Schnitman 2009-11-20 12:02:22 PST
Created attachment 43602 [details]
Patch

Upstreaming of webkit/glue/dom_serializer.
Comment 1 Yaar Schnitman 2009-11-20 12:44:11 PST
Almost compiles. Note a few FIXME with questions for you.
Comment 2 Eric Seidel (no email) 2009-11-21 07:22:27 PST
I guess we would unify some of this with markup.h/cpp once it's upstream?
Comment 3 Darin Fisher (:fishd, Google) 2009-11-23 15:54:40 PST
Eric, yeah... that's the idea!
Comment 4 Darin Fisher (:fishd, Google) 2009-11-24 13:21:57 PST
Comment on attachment 43602 [details]
Patch

The StringBuilder additions seem quite handy.


> +++ b/WebKit/chromium/public/WebPageSerializer.h
...
> +// Get html data by serializing all frames of current page with lists
> +// which contain all resource links that have local copy.
> +// contain all saved auxiliary files included all sub frames and resources.

did you mean to delete this second sentence?


> +// This function will find out all frames and serialize them to HTML data.
> +// We have a data buffer to temporary saving generated html data. We will
> +// sequentially call WebViewDelegate::SendSerializedHtmlData once the data
> +// buffer is full. See comments of WebViewDelegate::SendSerializedHtmlData

nit: please fix these references to WebViewDelegate since that interface no
longer exists.


> +class WebPageSerializer {
> +public:
> +

nit: ^^^ kill this new line

> +    // Do serialization action. Return false means no available frame has been
> +    // serialized, otherwise return true.
> +    // The parameter specifies which frame need to be serialized.
> +    // The parameter recursive specifies whether we need to
> +    // serialize all sub frames of the specified frame or not.
> +    // The parameter client specifies the pointer of interface

nit: ^^^ add a trailing period


> +    // WebPageSerializerClient provide sink interface which can receive the
> +    // individual chunks of data to be saved.
> +    // The parameter links contain original URLs of all saved links.
> +    // The parameter localPaths contain corresponding local file paths of all
> +    // saved links, which matched with vector:links one by one.
> +    // The parameter localDirectoryName is relative path of directory which
> +    // contain all saved auxiliary files included all sub frames and resources.
> +    WEBKIT_API static bool serialize(WebFrame* frame,
> +                                     bool recursive,
> +                                     WebPageSerializerClient* client,
> +                                     const WebVector<WebURL>& links,
> +                                     const WebVector<WebString>& localPaths,
> +                                     const WebString& localDirectoryName);
> +
> +    // FIXME: The following are here for unit testing purposes. Consider
> +    // fixing the unit tests instead.

nit: s/fixing/changing/


> +++ b/WebKit/chromium/public/WebPageSerializerClient.h
...
> +class WebPageSerializerClient {
> +public:
> +    // This enum indicates  This sink interface can receive the individual chunks
> +    // of serialized data to be saved, so we use values of following enum
> +    // definition to indicate the serialization status of serializing all html
> +    // content. If current frame is not complete serialized, call
> +    // DidSerializeDataForFrame with URL of current frame, data, data length and

nit: DidSerializeDataForFrame -> didSerializeDataForFrame 


> +    // flag CurrentFrameIsNotFinished.
> +    // If current frame is complete serialized, call DidSerializeDataForFrame

ditto


> +    // with URL of current frame, data, data length and flag
> +    // CurrentFrameIsFinished.
> +    // If all frames of page are complete serialized, call
> +    // DidSerializeDataForFrame with empty URL, empty data, 0 and flag

ditto


> +    enum PageSerializationStatus {
> +        // Current frame is not finished saving.
> +        CurrentFrameIsNotFinished = 0,
> +        // Current frame is finished saving.
> +        CurrentFrameIsFinished,
> +        // All frame are finished saving.
> +        AllFramesAreFinished,
> +    };

nit: ^^^ those comments are just restating the name of the enum with the exception
of the "saving" word.  can we delete the comments?


> +    // Receive the individual chunks of serialized data to be saved.
> +    // The parameter frameURL specifies what frame the data belongs. The
> +    // parameter data contains the available data for saving. The parameter
> +    // status indicates the status of data serialization.
> +    virtual void didSerializeDataForFrame(const WebURL& frameURL,
> +                                          const WebString& data,
> +                                          PageSerializationStatus status) = 0;
> +
> +    WebPageSerializerClient() { }
> +    virtual ~WebPageSerializerClient() { }

^^^ can this destructor be made protected instead?  i think we can do this
to make it clear that WebPageSerializer does not delete the given client.


> +#endif  // WEBKIT_GLUE_DOM_SERIALIZER_DELEGATE_H__

^^^ please fix that comment


> +++ b/WebKit/chromium/src/DOMUtilitiesPrivate.cpp
...
> +    if (attrName == HTMLNames::srcAttr) {
> +        // Check src attribute.
> +        if (element->hasTagName(HTMLNames::imgTag) ||
> +            element->hasTagName(HTMLNames::scriptTag) ||
> +            element->hasTagName(HTMLNames::iframeTag) ||

nit: in webkit style, the "||" should go at the start of the next line


> +++ b/WebKit/chromium/src/WebEntities.cpp
...
> +#include "config.h"
> +#include "WebEntities.h"
> +
> +#include <string.h>
> +// Note that this file is also included by HTMLTokenizer.cpp so we are getting

nit: ^^^ please add a new line after #include <string.h>


> +// two copies of the data in memory.  We can fix this by changing the script
> +// that generated the array to create a static const that is its length, but
> +// this is low priority since the data is less than 4K.
> +#include "HTMLEntityNames.c"
> +#include "PlatformString.h"
> +#include "StringBuilder.h"
> +#include <wtf/HashMap.h>
> +
> +#undef LOG

nit: ^^^ i think can delete this #undef now.


> +
> +#include "WebString.h"
> +
> +using namespace WebCore;
> +
> +namespace WebKit {
> +
> +void populateMap(EntitiesMapType& map,
> +                 const Entity* entities,
> +                 int entitiesCount,

nit: the type of entitiesCount should probably be size_t since you
pass the result of sizeof to it.


> +                 bool standardHTML)
> +{
> +    ASSERT(map.isEmpty());
> +    const Entity* entity = &entities[0];
> +    for (int i = 0; i < entitiesCount; i++, entity++) {

^^^ loop index, i, should then also be size_t.


> +        int code = entity->code;
> +        String name = entity->name;
> +        // For consistency, use case insensitive compare for entity codes that have both.
> +        if (map.contains(code)
> +            && equalIgnoringCase(map.get(code), name))

nit: ^^^ given the long comment, it seems odd to line break here.


> +static const Entity xmlBuiltInEntityCodes[] = {
> +    {"lt", 0x003c},
> +    {"gt", 0x003e},
> +    {"amp", 0x0026},
> +    {"apos", 0x0027},
> +    {"quot", 0x0022}

nit: ^^^ please insert a space after "{" and before "}"


> +String WebEntities::convertEntitiesInString(const String& value) const
...
> +        } else {
> +            curPos++;
> +        }

nit: ^^^ no brackets


> +++ b/WebKit/chromium/src/WebEntities.h

> +#include "WebCommon.h"

nit: WebCommon.h doesn't seem to be needed here.


> +namespace WebKit {
> +typedef HashMap<int, WebCore::String> EntitiesMapType;

can you move this typedef into the WebEntities class?  maybe as a MapType typedef?


> +class WebEntities {
> +public:
...
> +    WebEntities(bool xmlEntities);

nit: ^^^ can this ctor be made explicit?


> +
> +    // Check whether specified unicode has corresponding html or xml built-in
> +    // entity name. If yes, return the entity notation. If not, returns an
> +    // empty string. Parameter isHTML indicates check the code in html entity
> +    // map or in xml entity map.
> +    WebCore::String getEntityNameByCode(int code) const;

nit: I think it would be more conventional to name this method "entityNameByCode"


> +++ b/WebKit/chromium/src/WebFrameImpl.cpp

> +WebFrameImpl* WebFrameImpl::fromFrameOwnerElement(Element* element)
> +{
> +    if (!element
> +        || !element->isFrameOwnerElement()
> +        || (!element->hasTagName(WebCore::HTMLNames::iframeTag) 
> +            && !element->hasTagName(WebCore::HTMLNames::frameTag)))
> +        return 0;
> +
> +    WebCore::HTMLFrameOwnerElement* frameElement =
> +        static_cast<WebCore::HTMLFrameOwnerElement*>(element);

nit: I think you can drop the WebCore:: prefixes above.


> +++ b/WebKit/chromium/src/WebPageSerializer.cpp
...
> +// FIXME: verify unicode conversions below with Darin.

please remove this comment once settled here.


> +WebString WebPageSerializer::generateMetaCharsetDeclaration(const WebString& charset)
> +{
> +    return String::format("<META http-equiv=\"Content-Type\" content=\"text/html; charset=%ls\">",
> +                          charset.utf8().data());

I think %ls is wrong since you are passing in a UTF-8 string.


> +WebString WebPageSerializer::generateMarkOfTheWebDeclaration(const WebURL& url)
> +{
> +    KURL kurl = url;
> +    return String::format("\n<!-- saved from url=(%04d)%s -->\n",
> +                          kurl.string().length(),
> +                          kurl.string().utf8().data());

do you really mean to report the number of UTF-16 code points here?  also, since
you are printing the UTF-8 string, it seems like you would be better off just getting
the string value of the WebURL, which is a WebCString, which is already UTF-8.


> +WebString WebPageSerializer::generateBaseTagDeclaration(const WebString& baseTarget)
> +{
> +    String targetDeclaration;
> +    if (!baseTarget.isEmpty())
> +        targetDeclaration = String::format(" target=\"%ls\"", baseTarget.utf8().data());
> +    return String::format("<BASE href=\".\"%ls>", targetDeclaration.utf8().data());

same issue with %ls.  i think you want %s.


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.cpp
...

In the ChangeLog, I think you should attribute Johnny Ding as the original author
of this code.

> +namespace WebKit {
...
> +static const WebEntities htmlEntities(false);
> +static const WebEntities xmlEntities(true);

please avoid file static instances of non-POD types.  these should be changed
to either be local variables or function statics that get lazily initialized.


> +// SerializeDomParam Constructor.

this comment isn't useful, so we should just delete it.


> +String WebPageSerializerImpl::preActionBeforeSerializeEndTag(
...
> +    if (param->skipMetaElement == element) {
> +        *needSkip = true;
> +    } else if (element->hasTagName(HTMLNames::scriptTag) ||
> +               element->hasTagName(HTMLNames::styleTag)) {

nits: ^^^ no brackets around single line statements and "||"
should go at the start of the next line.


> +void WebPageSerializerImpl::openTagToString(const Element* element,
...
> +                    if (attrValue.startsWith("javascript:", false)) {
> +                        result += attrValue;
> +                    } else {

nit: ^^^ no brackets around single line statments



> +                       // Get the absolute link
> +                        String completeURL = param->doc->completeURL(attrValue);

nit: fix indentation of comment


> +void WebPageSerializerImpl::buildContentForNode(const Node* node,
> +                                                SerializeDomParam* param)
> +{
> +    switch (node->nodeType()) {
> +        case Node::ELEMENT_NODE: {

nit: case should not be indented inside a switch


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.h

> +#include "config.h"

do not include config.h in header files.
Comment 5 Yaar Schnitman 2009-11-25 10:48:23 PST
Created attachment 43858 [details]
WebPageSerializer
Comment 6 Yaar Schnitman 2009-11-25 10:56:52 PST
Created attachment 43859 [details]
Patch 3

Further changes in the API were required to support downstream dom_serializer_unittest.cc.
Comment 7 Darin Fisher (:fishd, Google) 2009-11-25 12:12:32 PST
Comment on attachment 43859 [details]
Patch 3

> +++ b/WebKit/chromium/ChangeLog

^^^ I think this ChangeLog needs to be updated to include WebDocument.


> +++ b/WebKit/chromium/public/WebDocument.h
...
> +// Provides readonly access to some properties of a DOM element node.

nit: did you mean "some properties of a DOM document"?


> +class WebDocument : public WebNode {
> +
> +public:

^^^ no new line before "public:"


> +    WebDocument() : WebNode() { }

nit: no need to call WebNode() explicitly since that happens implicitly.
it is more common to not mention the base class constructor in such cases.


> +    template<typename T> T toElementConst() const
> +    {
> +        T res;
> +        res.m_private = m_private;
> +        return res;
> +    }

^^^ Shouldn't toElementConst return 'const T' to preserve constness?

Also, a naming nit:  You have constUnwrap, which uses const as a prefix,
but here you have it as a suffix.   maybe constUnwrap should be renamed
to unwrapConst for consistency?


> +++ b/WebKit/chromium/public/WebElement.h
...
>          WebElement& operator=(const WTF::PassRefPtr<WebCore::Element>&);
>          operator WTF::PassRefPtr<WebCore::Element>() const;
>  #endif
> +        WEBKIT_API bool hasTagName(const WebString&) const;
> +        WEBKIT_API bool hasAttribute(const WebString&) const;
> +        WEBKIT_API WebString getAttribute(const WebString&) const;

We usually put the WEBKIT_IMPLEMENTATION helpers at the bottom of the
public section.  So, I think you should move these accessors up above
the WEBKIT_IMPLEMENTATION section.


> +++ b/WebKit/chromium/public/WebFrame.h
...
> +    // Return the frame's encoding.
> +    virtual WebString encoding() const = 0;

We already have WebView::pageEncoding().  Should we deprecate that in
favor of this?


> +    // Returns the page's document element.
> +    virtual WebDocument document() const = 0;

WebDocument is not an element.  A "document element" is something else.

I think you can just delete this comment.


> +    // Returns the frame inside a given frame or iframe element. Returns 0 if
> +    // the given element is not a frame, iframe or if the frame is empty.
> +    virtual WebFrame* fromFrameElement(WebElement&) const = 0;

^^^ This should be a static function, and it should probably be named
fromFrameOwnerElement.  I think the parameter should be a "const WebElement&"

I would move this to the top-most section of WebFrame, where have other static
methods that return a WebFrame pointer.


> +++ b/WebKit/chromium/public/WebNode.h
...
>      WEBKIT_API WebFrame* frame() const;

I think we should deprecate the "frame()" accessor and replace it with a
document() accessor.  WebDocument should then get a frame() accessor.  This way
it better matches WebCore.

The container for nodes is a document and the container for a document is a
frame.  It seems cleaner for the API to just expose the next containing layer
for any given layer.  Makes sense?


> +++ b/WebKit/chromium/public/WebNodeCollection.h
...
> +#if WEBKIT_IMPLEMENTATION
> +    WebNodeCollection(const WTF::PassRefPtr<WebCore::HTMLCollection>&);
> +#endif
> +
> +    WEBKIT_API unsigned length() const;
> +    WEBKIT_API WebNode nextItem() const;
> +    WEBKIT_API WebNode firstItem() const;

Same nit as before.  Please push the WEBKIT_IMPLEMENTATION section to
the bottom of the public section.


> +protected:
> +    void assign(WebCore::HTMLCollection*);
> +    WebCore::HTMLCollection* m_private;
> +};

^^^ why protected and not private?  Do you anticipate subclassing?


> +++ b/WebKit/chromium/public/WebPageSerializerClient.h
...
> +#ifndef WebDomSerializerClient_h
> +#define WebDomSerializerClient_h

Please fix this include guard.


> +    WebPageSerializerClient() { }
> +protected:
> +    virtual ~WebPageSerializerClient() { }
> +};

^^^ nit: please add a new line above "protected:"


> +++ b/WebKit/chromium/public/WebView.h
...
> +    // Returns the frame with the given url. Returns 0 if no frame has this url.
> +    // If more than one frame has this url, returns the first one.
> +    virtual WebFrame* findFrameByURL(const WebURL& url) = 0;
>  
>      // Focus ---------------------------------------------------------------

^^^ nit: please preserve the two blank line separator between sections


> +++ b/WebKit/chromium/src/WebEntities.cpp

> +    // Optimization: Create StringBuilder only if value has any entities.
> +/*    while (len--) {
> +        if (m_entitiesMap.contains(*curPos)){
> +            len++;
> +            break;
> +        }
> +        curPos++;
> +    }
> +    if (!len)
> +        return value; */

^^^ delete this commented-out code?


> +++ b/WebKit/chromium/src/WebFrameImpl.cpp

> +WebFrame* WebFrameImpl::fromFrameElement(WebElement& element) const
> +{
> +    RefPtr<Element> e = element.operator WTF::PassRefPtr<WebCore::Element>();
> +    return fromFrameOwnerElement(e.get());

it is a bit ugly to call that operator directly like this.  i think it
is better to write it like so:

  return fromFrameOwnerElement(PassRefPtr<Element>(element).get());

Or, perhaps the unwrap and constUnwrap methods should be made public.


> +++ b/WebKit/chromium/src/WebFrameImpl.h
...
> @@ -157,7 +160,7 @@ public:
>      virtual WebURL completeURL(const WebString& url) const;
>      virtual WebString contentAsText(size_t maxChars) const;
>      virtual WebString contentAsMarkup() const;
> -
> +    
>      static PassRefPtr<WebFrameImpl> create(WebFrameClient* client);

^^^ nit: please avoid adding unnecessary whitespace



> +++ b/WebKit/chromium/src/WebNode.cpp

> +WebVector<WebNode> WebNode::childNodes()
> +{
> +    WTF::Vector<WebNode> res;
> +    WTF::PassRefPtr<NodeList> nodeList = m_private->childNodes();
> +    for (unsigned i = 0; i < nodeList->length(); i++)
> +        res.append(nodeList->item(i));
> +    return WebVector<WebNode>(res);

^^^ no need to specify WTF:: in .cpp files.

Also, should we have a WebNodeList type?


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.cpp
> +    , isInScriptOrStyleTag(false)
> +    , hasDocDeclaration(false)
> +{
> +    // Cache the value since we check it lots of times.
> +   isHTMLDocument = doc->isHTMLDocument();

nit: ^^^ please indent by 4 spaces


> +        // See http://bugs.webkit.org/show_bug.cgi?id=16621.
> +        // First we generate new content for writing correct META element.
> +        result.append(WebPageSerializer::generateMetaCharsetDeclaration(
> +            String(param->textEncoding.name())));
> +
> +        param->hasAddedContentsBeforeEnd = true;
> +        // Will search each META which has charset declaration, and skip them all
> +        // in PreActionBeforeSerializeOpenTag.
> +    } else if (element->hasTagName(HTMLNames::scriptTag) ||
> +               element->hasTagName(HTMLNames::styleTag)) {

nit: the || should be at the beginning of the next line


> +    switch (node->nodeType()) {
> +    case Node::ELEMENT_NODE: {
> +        // Process open tag of element.
> +        openTagToString(static_cast<const Element*>(node), param);
> +        // Walk through the children nodes and process it.
> +        for (const Node *child = node->firstChild(); child != NULL;
> +             child = child->nextSibling())
> +            buildContentForNode(child, param);
> +        // Process end tag of element.
> +        endTagToString(static_cast<const Element*>(node), param);
> +        break;
> +    }
> +    case Node::TEXT_NODE: {
> +        saveHTMLContentToBuffer(createMarkup(node), param);
> +        break;
> +    }
> +    case Node::ATTRIBUTE_NODE:
> +    case Node::DOCUMENT_NODE:
> +    case Node::DOCUMENT_FRAGMENT_NODE: {
> +        // Should not exist.
> +        ASSERT(false);
> +        break;
> +    }
> +        // Document type node can be in DOM?
> +    case Node::DOCUMENT_TYPE_NODE:
> +        param->hasDoctype = true;
> +    default: {
> +        // For other type node, call default action.
> +        saveHTMLContentToBuffer(createMarkup(node), param);
> +        break;
> +    }
> +    }

^^^ the brackets for each case are not needed.  you only need
those if there are local variables.

nit: ASSERT(false) should be changed to ASSERT_NOT_REACHED.


> +    // Go through all frames for serializing DOM for whole page, include
> +    // sub-frames.
> +    for (int i = 0; i < static_cast<int>(m_frames.size()); ++i) {

nit: use size_t for i instead of int so you can avoid the static_cast?


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.h
> +public:
> +    // Do serialization action. Return false means no available frame has been
> +    // serialized, otherwise return true.
> +    bool serialize();
> +    // The parameter specifies which frame need to be serialized.

nit: please add a new line after 'serialize()'


> +++ b/WebKit/chromium/src/WebViewImpl.cpp

> +WebFrame* WebViewImpl::findFrameByURL(const WebURL& url)
> +{
> +    if (!mainFrameImpl())
> +        return 0;
> +    KURL kurl = url;
> +
> +    WTF::Vector<WebFrameImpl*> frames;

nit: please drop the WTF:: prefix in .cpp files


> +    // First, process main frame.
> +    frames.append(mainFrameImpl());
> +    // Collect all frames inside the specified frame.
> +    for (int i = 0; i < static_cast<int>(frames.size()); ++i) {

nit: please use size_t instead of int for i


> +        WebFrameImpl* currentFrame = frames[i];
> +        // Check whether current frame has the desired url or not.
> +        const KURL& currentFrameURL =
> +            currentFrame->frame()->loader()->url();
> +        if (kurl == currentFrameURL)
> +            return currentFrame;
> +        // Go through nodes and look for sub-frames.
> +        RefPtr<HTMLCollection> all = currentFrame->frame()->document()->all();
> +        for (Node* node = all->firstItem(); node; node = all->nextItem()) {
> +            if (!node->isHTMLElement())
> +                continue;
> +            Element* element = static_cast<WebCore::Element*>(node);
> +        
> +            // Check frame tag and iframe tag.
> +            WebFrameImpl* webFrame = WebFrameImpl::fromFrameOwnerElement(element);
> +            if (webFrame)
> +                frames.append(webFrame);
> +        }
> +    }
> +
> +    return 0;
> +}

Given all of the new additions to the WebKit API, I don't think you need
to expose findFrameByURL on WebViewImpl.  We can write that function in
the Chromium code base without any troubles.  Let's keep the WebKit API
as minimal as possible.
Comment 8 Yaar Schnitman 2009-11-30 09:18:35 PST
Created attachment 44025 [details]
Added WebPageSerializer to WebKit API.
Comment 9 Darin Fisher (:fishd, Google) 2009-11-30 10:00:44 PST
Comment on attachment 44025 [details]
Added WebPageSerializer to WebKit API.

> +++ b/WebKit/chromium/public/WebDocument.h
...
> +class WebDocument : public WebNode {
> +public:
> +    WebDocument() { }
> +    WebDocument(const WebDocument& e) : WebNode(e) { }
> +
> +    WebDocument& operator=(const WebDocument& e) { WebNode::assign(e); return *this; }
> +    WEBKIT_API void assign(const WebDocument& e) { WebNode::assign(e); }

since this assign method is implemented inline, it should not be
exported from a WebKit.dll, so drop the WEBKIT_API prefix.


> +    WEBKIT_API WebFrame* frame() const;

perhaps we should document the fact that document's frame might be null?


> +++ b/WebKit/chromium/public/WebElement.h
...
>          WEBKIT_API void assign(const WebElement& e) { WebNode::assign(e); }

ditto.  drop the WEBKIT_API prefix.


> +++ b/WebKit/chromium/public/WebFrame.h
...
> +    // Returns the frame inside a given frame or iframe element. Returns 0 if
> +    // the given element is not a frame, iframe or if the frame is empty.
> +    WEBKIT_API static WebFrame* fromFrameOwnerElement(WebElement&);

the parameter should be a |const WebElement&|


> +++ b/WebKit/chromium/public/WebNodeCollection.h
...
> +// Provides readonly access to some properties of a DOM node.
> +class WebNodeCollection {
> +public:
> +    virtual ~WebNodeCollection() { reset(); }

does this destructor need to be virtual?  this class doesn't have
any other virtual methods.  if we can avoid a vtable, then we should.


> +++ b/WebKit/chromium/public/WebNodeList.h
...
> +    virtual ~WebNodeList() { reset(); }

ditto.  seems that we should drop the virtual keyword here.


> +++ b/WebKit/chromium/public/WebPageSerializerClient.h
...
> +    enum PageSerializationStatus {
> +        CurrentFrameIsNotFinished = 0,

nit: no need for "= 0" since that is the default.


> +++ b/WebKit/chromium/public/WebString.h

this patch has grown quite large.  i think in the future it would be much
better to break it up into multiple patches.


> @@ -87,6 +87,9 @@ public:
>      WEBKIT_API static WebString fromUTF8(const char* data, size_t length);
>      WEBKIT_API static WebString fromUTF8(const char* data);
>  
> +    WebString(const char* data);
> +    WebString& operator=(const char* s);

given the implicit WebString(const char*) constructor, I think we can do
without the operator=.  We already have operator=(const WebString&), so
the compiler should invoke the WebString(const char*) constructor when
you write expressions like this:

  WebString s;
  s = "foo";

also, please remember that any public method not implemented inline needs
WEBKIT_API prefix.  however, please avoid adding that to constructors.
i've tried to not need to export constructors.

also, see WebData which has a fancy constructor for initialization from
a string literal that avoids a strlen call.

i think i'd change to this:

  template <int N>
  WebString(const char (&data)[N]) : m_private(0) { assign(fromUTF8(data, N - 1)); }

we could optimize this later to avoid the fromUTF8.


> +++ b/WebKit/chromium/public/WebView.h
> @@ -44,6 +44,7 @@ class WebFrameClient;
>  class WebNode;
>  class WebSettings;
>  class WebString;
> +class WebURL;
>  class WebViewClient;
>  struct WebMediaPlayerAction;
>  struct WebPoint;

^^^ is this change needed?


> +++ b/WebKit/chromium/src/DOMUtilitiesPrivate.cpp
> +++ b/WebKit/chromium/src/WebDocument.cpp
...
> +WebDocument::WebDocument(const WTF::PassRefPtr<WebCore::Document>& elem)
> +    : WebNode(elem.releaseRef())
> +{
> +}
> +
> +WebDocument& WebDocument::operator=(const WTF::PassRefPtr<WebCore::Document>& elem)

please drop the WTF:: prefix and the WebCore:: prefix throughout this file.


> +++ b/WebKit/chromium/src/WebFrameImpl.cpp
...
> +WebFrameImpl* WebFrameImpl::fromFrameOwnerElement(Element* element)
> +{
> +    if (!element
> +        || !element->isFrameOwnerElement()
> +        || (!element->hasTagName(HTMLNames::iframeTag)
> +            && !element->hasTagName(HTMLNames::frameTag)))
> +        return 0;
> +
> +    WebCore::HTMLFrameOwnerElement* frameElement =
> +        static_cast<WebCore::HTMLFrameOwnerElement*>(element);

drop the WebCore:: prefix


> +++ b/WebKit/chromium/src/WebNode.cpp

> +WebNode::NodeType WebNode::nodeType() const
> +{
> +    return static_cast<WebNode::NodeType>(m_private->nodeType());
> +}

nit: no need for the WebNode:: prefix within the implementation of
a WebNode method.


>  WebNode::WebNode(const WTF::PassRefPtr<WebCore::Node>& node)
>      : m_private(static_cast<WebNodePrivate*>(node.releaseRef()))
>  {

nit: please avoid the WTF:: and WebCore:: prefixes in .cpp files.


> +++ b/WebKit/chromium/src/WebNodeCollection.cpp
> @@ -0,0 +1,86 @@
> +/*
> + * Copyright (C) 2009 Google Inc. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are
> + * met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following disclaimer
> + * in the documentation and/or other materials provided with the
> + * distribution.
> + *     * Neither the name of Google Inc. nor the names of its
> + * contributors may be used to endorse or promote products derived from
> + * this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include "config.h"
> +#include "WebNodeCollection.h"
> +
> +#include "HTMLCollection.h"
> +#include "Node.h"
> +
> +#include "WebNode.h"
> +
> +#include <wtf/PassRefPtr.h>

nit: please place the wtf include in the same group as the WebCore headers.
in this case just below Node.h


> +WebNodeCollection::WebNodeCollection(const WTF::PassRefPtr<HTMLCollection>& col)
> +    : m_private(static_cast<HTMLCollection*>(col.releaseRef()))

drop WTF:: prefix


> +++ b/WebKit/chromium/src/WebNodeList.cpp
...
> +#include "NodeList.h"
> +#include "Node.h"
> +
> +#include "WebNode.h"
> +
> +#include <wtf/PassRefPtr.h>

nit: please place the wtf include in the same group as the WebCore headers.
in this case just below Node.h


> +WebNodeList::WebNodeList(const WTF::PassRefPtr<NodeList>& col)
> +    : m_private(static_cast<NodeList*>(col.releaseRef()))

drop WTF:: prefix


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.h

> +#include "PlatformString.h"
> +#include "StringBuilder.h"
> +#include "StringHash.h"
> +
> +#include "WebEntities.h"
> +#include "WebPageSerializer.h"
> +#include "WebPageSerializerClient.h"
> +#include "WebString.h"
> +#include "WebURL.h"
> +#include <wtf/HashMap.h>
> +#include <wtf/Vector.h>

nit: place the wtf headers up in the same section as the WebCore includes.


> +++ b/WebKit/chromium/src/WebViewImpl.cpp
> @@ -50,6 +50,7 @@
>  #include "FrameView.h"
>  #include "GraphicsContext.h"
>  #include "HitTestResult.h"
> +#include "HTMLAllCollection.h"
>  #include "HTMLInputElement.h"
>  #include "HTMLMediaElement.h"
>  #include "HTMLNames.h"

^^^ is this include needed?
Comment 10 Yaar Schnitman 2009-11-30 10:24:08 PST
Created attachment 44029 [details]
Added WebPageSerializer to WebKit API.
Comment 11 Darin Fisher (:fishd, Google) 2009-11-30 12:50:37 PST
Comment on attachment 44029 [details]
Added WebPageSerializer to WebKit API.

> +++ b/WebKit/chromium/public/WebNodeCollection.h
...
> +class WebNodeCollection {
> +public:
> +    ~WebNodeCollection() { reset(); }
> +
> +    WebNodeCollection(const WebNodeCollection& n) : m_private(0) { assign(n); }
> +    WebNodeCollection& operator=(const WebNodeCollection& n)
> +    {
> +        assign(n);
> +        return *this;
> +    }
> +
> +    WEBKIT_API void reset();
> +    WEBKIT_API void assign(const WebNodeCollection&);
> +
> +    WEBKIT_API unsigned length() const;
> +    WEBKIT_API WebNode nextItem() const;
> +    WEBKIT_API WebNode firstItem() const;
> +
> +#if WEBKIT_IMPLEMENTATION
> +    WebNodeCollection(const WTF::PassRefPtr<WebCore::HTMLCollection>&);
> +#endif
> +
> +private:
> +    void assign(WebCore::HTMLCollection*);
> +    WebCore::HTMLCollection* m_private;
> +};

It looks like this class is missing a default constructor.  I think that'll
cause some crashes.


> +++ b/WebKit/chromium/public/WebNodeList.h
...
> +class WebNodeList {
> +public:
> +    ~WebNodeList() { reset(); }
> +
> +    WebNodeList(const WebNodeList& n) : m_private(0) { assign(n); }
> +    WebNodeList& operator=(const WebNodeList& n)
> +    {
> +        assign(n);
> +        return *this;
> +    }
> +
> +    WEBKIT_API void reset();
> +    WEBKIT_API void assign(const WebNodeList&);
> +
> +    WEBKIT_API unsigned length() const;
> +    WEBKIT_API WebNode item(size_t) const;
> +
> +#if WEBKIT_IMPLEMENTATION
> +    WebNodeList(const WTF::PassRefPtr<WebCore::NodeList>&);
> +#endif
> +
> +private:
> +    void assign(WebCore::NodeList*);
> +    WebCore::NodeList* m_private;
> +};

ditto



> +++ b/WebKit/chromium/public/WebPageSerializerClient.h
...
> +    enum PageSerializationStatus {
> +        CurrentFrameIsNotFinished = 0,

See previous nit about removing the "= 0"...


> +++ b/WebKit/chromium/public/WebString.h
> @@ -87,6 +87,9 @@ public:
>      WEBKIT_API static WebString fromUTF8(const char* data, size_t length);
>      WEBKIT_API static WebString fromUTF8(const char* data);
>  
> +    WebString(const char* data);
> +    WebString& operator=(const char* s);

looks like this is the same as before.  did you update the wrong patch?


> +++ b/WebKit/chromium/src/WebNode.cpp
...
> +WebNode::NodeType WebNode::nodeType() const
> +{
> +    return static_cast<WebNode::NodeType>(m_private->nodeType());
> +}

see previous nit about dropping the WebNode:: prefix within the body
of a WebNode method.


> +++ b/WebKit/chromium/src/WebNodeList.cpp
...
> +#include "NodeList.h"
> +#include "Node.h"
> +
> +#include "WebNode.h"
> +
> +#include <wtf/PassRefPtr.h>

see previous nit about moving the wtf header up to the section
including Node.h

[stopping here since i'm repeating myself... is this the right patch?]
Comment 12 Yaar Schnitman 2009-11-30 17:57:24 PST
Created attachment 44049 [details]
Added WebPageSerializer to WebKit API.
Comment 13 WebKit Review Bot 2009-11-30 17:58:35 PST
Attachment 44049 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
Done processing WebCore/platform/text/StringBuilder.cpp
WebKit/chromium/src/WebNodeList.cpp:35:  Alphabetical sorting problem.  [build/include_order] [4]
WebKit/chromium/src/WebNodeList.cpp:38:  Alphabetical sorting problem.  [build/include_order] [4]
Done processing WebKit/chromium/src/WebNodeList.cpp
Done processing WebKit/chromium/src/WebString.cpp
Done processing WebKit/chromium/public/WebPageSerializer.h
Done processing WebKit/chromium/src/AssertMatchingEnums.cpp
Done processing WebKit/chromium/src/WebFrameImpl.h
Done processing WebKit/chromium/public/WebElement.h
WebKit/chromium/src/WebPageSerializerImpl.h:40:  Alphabetical sorting problem.  [build/include_order] [4]
Done processing WebKit/chromium/src/WebPageSerializerImpl.h
WebKit/chromium/src/WebNode.cpp:41:  Alphabetical sorting problem.  [build/include_order] [4]
Done processing WebKit/chromium/src/WebNode.cpp
Done processing WebKit/chromium/src/WebElement.cpp
WebKit/chromium/src/WebPageSerializerImpl.cpp:92:  Alphabetical sorting problem.  [build/include_order] [4]
WebKit/chromium/src/WebPageSerializerImpl.cpp:96:  Alphabetical sorting problem.  [build/include_order] [4]
WebKit/chromium/src/WebPageSerializerImpl.cpp:201:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
WebKit/chromium/src/WebPageSerializerImpl.cpp:235:  Use 0 instead of NULL.  [readability/null] [4]
WebKit/chromium/src/WebPageSerializerImpl.cpp:329:  An else should appear on the same line as the preceding }  [whitespace/newline] [4]
WebKit/chromium/src/WebPageSerializerImpl.cpp:401:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
WebKit/chromium/src/WebPageSerializerImpl.cpp:401:  Use 0 instead of NULL.  [readability/null] [5]
WebKit/chromium/src/WebPageSerializerImpl.cpp:442:  Use 0 instead of NULL.  [readability/null] [4]
WebKit/chromium/src/WebPageSerializerImpl.cpp:473:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
WebKit/chromium/src/WebPageSerializerImpl.cpp:473:  Use 0 instead of NULL.  [readability/null] [5]
Done processing WebKit/chromium/src/WebPageSerializerImpl.cpp
Done processing WebKit/chromium/public/WebPageSerializerClient.h
WebKit/chromium/src/WebEntities.cpp:36:  Alphabetical sorting problem.  [build/include_order] [4]
WebKit/chromium/src/WebEntities.cpp:40:  Alphabetical sorting problem.  [build/include_order] [4]
WebKit/chromium/src/WebEntities.cpp:50:  Alphabetical sorting problem.  [build/include_order] [4]
WebKit/chromium/src/WebEntities.cpp:125:  One line control clauses should not use braces.  [whitespace/braces] [4]
Done processing WebKit/chromium/src/WebEntities.cpp
Done processing WebKit/chromium/public/WebNodeCollection.h
WebKit/chromium/src/WebDocument.cpp:43:  Alphabetical sorting problem.  [build/include_order] [4]
Done processing WebKit/chromium/src/WebDocument.cpp
Done processing WebKit/chromium/public/WebNodeList.h
Done processing WebCore/platform/text/StringBuilder.h
Done processing WebKit/chromium/src/WebEntities.h
WebKit/chromium/src/WebPageSerializer.cpp:35:  Alphabetical sorting problem.  [build/include_order] [4]
WebKit/chromium/src/WebPageSerializer.cpp:42:  Alphabetical sorting problem.  [build/include_order] [4]
Done processing WebKit/chromium/src/WebPageSerializer.cpp
Done processing WebKit/chromium/src/DOMUtilitiesPrivate.cpp
Done processing WebKit/chromium/src/DOMUtilitiesPrivate.h
WebKit/chromium/public/WebDocument.h:54:  More than one command on the same line  [whitespace/newline] [4]
Done processing WebKit/chromium/public/WebDocument.h
Done processing WebKit/chromium/src/WebFrameImpl.cpp
Done processing WebKit/chromium/public/WebNode.h
WebKit/chromium/public/WebFormElement.h:53:  More than one command on the same line  [whitespace/newline] [4]
Done processing WebKit/chromium/public/WebFormElement.h
Done processing WebKit/chromium/public/WebFrame.h
Done processing WebKit/chromium/public/WebString.h
WebKit/chromium/src/WebNodeCollection.cpp:38:  Alphabetical sorting problem.  [build/include_order] [4]
Done processing WebKit/chromium/src/WebNodeCollection.cpp
Total errors found: 24
Comment 14 Darin Fisher (:fishd, Google) 2009-12-01 19:55:03 PST
The WebKit Review Bot seems to have spotted a number of valid issues.  Can you please address those?
Comment 15 Darin Fisher (:fishd, Google) 2009-12-01 19:57:29 PST
Comment on attachment 44049 [details]
Added WebPageSerializer to WebKit API.

This patch is still missing default constructors for WebNodeCollection and WebNodeList.  Uninitialized memory equals crashiness. R-
Comment 16 Yaar Schnitman 2009-12-11 12:49:20 PST
Created attachment 44703 [details]
Added WebPageSerializer to WebKit API.
Comment 17 Yaar Schnitman 2009-12-11 12:52:20 PST
> Created an attachment (id=44703) [details]
> Added WebPageSerializer to WebKit API.
This patch fixes regression crashes.

Main change is in didSerializeDataForFrame(... WebString ...) which I changed to WebCString, to prevent double encoding.
Comment 18 Darin Fisher (:fishd, Google) 2009-12-11 13:04:02 PST
Comment on attachment 44703 [details]
Added WebPageSerializer to WebKit API.

Please fix these nits and then R=me.


> +++ b/WebKit/chromium/public/WebNodeCollection.h
...
> +    WebNodeCollection() : m_private(0) { };

^^^ nit: delete the trailing semicolon


> +++ b/WebKit/chromium/public/WebNodeList.h
...
> +    WebNodeList() : m_private(0) { };

ditto


> +    WEBKIT_API WebString& operator=(const char* s);

why not implement this using the same template approach?  it would be
nice for the conversion constructor to be similar to the operator=.

> +++ b/WebKit/chromium/src/DOMUtilitiesPrivate.cpp
...
>  }
>  
> +
> +bool elementHasLegalLinkAttribute(const Element* element,

^^^ nit: no extra new line above.


> +++ b/WebKit/chromium/src/WebDocument.cpp

> +#include "Document.h"
> +#include "Element.h"
> +#include "HTMLAllCollection.h"
> +#include "HTMLBodyElement.h"
> +#include "HTMLCollection.h"
> +#include "HTMLElement.h"
> +#include "HTMLHeadElement.h"
> +
> +#include "WebElement.h"
> +#include "WebFrameImpl.h"
> +#include "WebNodeCollection.h"
> +#include "WebURL.h"
> +
> +#include <wtf/PassRefPtr.h>

^^^ nit: please move wtf/ includes up with the WebCore includes.


> +++ b/WebKit/chromium/src/WebEntities.cpp
...
> +#include "config.h"
> +#include "WebEntities.h"
> +
> +#include "PlatformString.h"
> +#include "StringBuilder.h"
> +
> +#include "WebString.h"
> +
> +#include <string.h>
> +#include <wtf/HashMap.h>

the above should be:

#include "config.h"
#include "WebEntities.h"

#include <string.h>

#include "PlatformString.h"
#include "StringBuilder.h"
#include <wtf/HashMap.h>

#include "WebString.h"




> +
> +
> +using namespace WebCore;
> +
> +namespace {
> +// Note that this file is also included by HTMLTokenizer.cpp so we are getting
> +// two copies of the data in memory.  We can fix this by changing the script
> +// that generated the array to create a static const that is its length, but
> +// this is low priority since the data is less than 4K. We use anonymous
> +// namespace to prevent name collisions.
> +#include "HTMLEntityNames.c" // NOLINT

nit: what is the NOLINT comment doing here?  i don't think we want to use that
in the webkit repository.


> +++ b/WebKit/chromium/src/WebNodeCollection.cpp
...
> +#include "config.h"
> +#include "WebNodeCollection.h"
> +
> +#include "HTMLCollection.h"
> +#include "Node.h"
> +
> +#include "WebNode.h"
> +
> +#include <wtf/PassRefPtr.h>

^^^ same nit about wtf/ header placement.  please group with webcore headers.


> +++ b/WebKit/chromium/src/WebNodeList.cpp
...
> +#include "config.h"
> +#include "WebNodeList.h"
> +
> +#include "Node.h"
> +#include "NodeList.h"
> +
> +#include "WebNode.h"
> +
> +#include <wtf/PassRefPtr.h>

ditto


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.cpp
...
> +#include "config.h"
> +#include "WebPageSerializerImpl.h"
> +
> +#include "DOMUtilitiesPrivate.h"
> +#include "Document.h"
> +#include "DocumentType.h"
> +#include "Element.h"
> +#include "FrameLoader.h"
> +#include "HTMLAllCollection.h"
> +#include "HTMLElement.h"
> +#include "HTMLFormElement.h"
> +#include "HTMLMetaElement.h"
> +#include "HTMLNames.h"
> +#include "KURL.h"
> +#include "PlatformString.h"
> +#include "StringBuilder.h"
> +#include "TextEncoding.h"
> +
> +#include "WebFrameImpl.h"
> +#include "WebURL.h"
> +#include "WebVector.h"
> +
> +#include "markup.h"

^^^ markup.h should be grouped with the other webcore headers.


> +++ b/WebKit/chromium/src/WebPageSerializerImpl.h
...
> +#include "PlatformString.h"
> +#include "StringBuilder.h"
> +#include "StringHash.h"
> +
> +#include "WebEntities.h"
> +#include "WebPageSerializer.h"
> +#include "WebPageSerializerClient.h"
> +#include "WebString.h"
> +#include "WebURL.h"
> +
> +#include <wtf/HashMap.h>
> +#include <wtf/Vector.h>

^^^ nit: please group wtf/ headers with the webcore headers.
Comment 19 Yaar Schnitman 2009-12-11 15:07:50 PST
Created attachment 44712 [details]
Added WebPageSerializer to WebKit API.
Comment 20 Darin Fisher (:fishd, Google) 2009-12-14 12:49:07 PST
Comment on attachment 44712 [details]
Added WebPageSerializer to WebKit API.

Did you upload the wrong patch?  This looks like the previous one.
Comment 21 Yaar Schnitman 2009-12-14 19:26:52 PST
Created attachment 44834 [details]
Added WebPageSerializer to WebKit API.
Comment 22 Darin Fisher (:fishd, Google) 2009-12-14 20:53:21 PST
Comment on attachment 44834 [details]
Added WebPageSerializer to WebKit API.

R=me :)
Comment 23 WebKit Commit Bot 2009-12-15 10:43:50 PST
Comment on attachment 44834 [details]
Added WebPageSerializer to WebKit API.

Rejecting patch 44834 from commit-queue.

Failed to run "['/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', '--reviewer', 'Darin Fisher', '--force']" exit_code: 1
Last 500 characters of output:
t/chromium/src/WebEntities.h
patching file WebKit/chromium/src/WebFrameImpl.cpp
Hunk #5 succeeded at 1649 (offset -1 lines).
patching file WebKit/chromium/src/WebFrameImpl.h
patching file WebKit/chromium/src/WebNode.cpp
patching file WebKit/chromium/src/WebNodeCollection.cpp
patching file WebKit/chromium/src/WebNodeList.cpp
patching file WebKit/chromium/src/WebPageSerializer.cpp
patching file WebKit/chromium/src/WebPageSerializerImpl.cpp
patching file WebKit/chromium/src/WebPageSerializerImpl.h
Comment 24 Yaar Schnitman 2009-12-17 16:27:53 PST
Landed: http://trac.webkit.org/changeset/52268/trunk