Bug 62112

Summary: Context-aware HTML paste for Chromium
Product: WebKit Reporter: Daniel Cheng <dcheng>
Component: PlatformAssignee: Daniel Cheng <dcheng>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aroben, darin, dbates, dglazkov, enrica, eric, fishd, jberlin, rniwa, tony, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 60620    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
gyuyoung.kim: commit-queue-
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Daniel Cheng 2011-06-05 22:38:06 PDT
We need this so we can support pastes of table cells without the surrounding table; otherwise, the HTML5 parser will normally strip them out. We plan to support this by passing the entire context with the fragment range--that way, we can recreate the entire document and then extract just the markup we care about. If we pass the fragment directly to the HTML5 parser, elements that are 'impossible' in the current context just get removed.
Comment 1 Daniel Cheng 2011-06-05 22:43:19 PDT
Created attachment 96064 [details]
Patch
Comment 2 Daniel Cheng 2011-09-29 16:34:50 PDT
Created attachment 109219 [details]
Patch
Comment 3 Daniel Cheng 2011-09-29 16:39:58 PDT
This patch isn't quite in final condition yet, but it does work. I have a few questions that I'm looking for answers to, in addition to comments on the code itself:

1) Is markup.cpp the appropriate place for this new functionality?
2) How should I handle LayoutTest diffs? Because the parsing algorithm is different, layout tests that do pasting could have different results now as well.
3) Should we just use the comment markers that CF_HTML provides (<!--StartFragment--> and <!--EndFragment-->) or is there a possibility we'll want to use this on non-Windows platforms? For example, I think it might be useful to support HTML paste as described in the Clipboards Events working draft.
Comment 4 WebKit Review Bot 2011-09-29 18:11:44 PDT
Comment on attachment 109219 [details]
Patch

Attachment 109219 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9899262

New failing tests:
editing/inserting/insert-3907422-fix.html
editing/pasteboard/5065605.html
editing/pasteboard/4922709.html
editing/pasteboard/4944770-2.html
editing/inserting/insert-paste-bidi-control.html
editing/pasteboard/5247341.html
editing/inserting/insert-3786362-fix.html
editing/pasteboard/19644-1.html
editing/pasteboard/4242293.html
editing/execCommand/4128080-2.html
editing/pasteboard/4944770-1.html
editing/pasteboard/4989774.html
editing/pasteboard/5478250.html
editing/pasteboard/4242293-1.html
editing/pasteboard/5071074.html
editing/pasteboard/5028447.html
http/tests/misc/copy-resolves-urls.html
editing/pasteboard/5075944.html
editing/pasteboard/19644-2.html
editing/pasteboard/4641033.html
Comment 5 Ryosuke Niwa 2011-10-03 11:24:35 PDT
Comment on attachment 109219 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109219&action=review

I'm sad that there's no way for us to test this.

> Source/WebCore/editing/markup.cpp:50
> +#include "FileChooser.h"
> +#include "FormState.h"

Why are these headers required here?

> Source/WebCore/editing/markup.cpp:54
> +#include "HTMLDocument.h"

Ditto.

> Source/WebCore/editing/markup.cpp:56
> +#include "HTMLFormElement.h"

Ditto.

> Source/WebCore/editing/markup.cpp:77
> +namespace {

Why do we need an anonymous namespace? We don't normally do this in WebCore.

> Source/WebCore/editing/markup.cpp:80
> +Page* pageForParsing = 0;
> +
> +bool ensurePageForParsingInitialized()

I'd prefer this function returned a page so rename it to something like "Page* pageForPaste()".

> Source/WebCore/editing/markup.cpp:85
> +    static FrameLoaderClient* dummyFrameLoaderClient = new EmptyFrameLoaderClient;

You need to use DEFINE_STATIC_LOCAL. And please use OwnPtr.

> Source/WebCore/editing/markup.cpp:763
> +    // We use comment tags to help us extract the appropriate DocumentFragment.
> +    const char beginTag[] = "3f81c6fb-e92e-417d-a05e-fd6f7a6c883c";
> +    const char endTag[] = "3879efc4-7f0a-4352-ba73-61158bbecd70";

I'd prefer using webkit prefix e.g. <!--webkit-fragment-marker--> or<?-webkit-fragment-marker?>  Also we should make sure that this string doesn't exist inside the markup and if it does, we need to append appropriate text to distinguish ours from them.

> Source/WebCore/editing/markup.cpp:783
> +    // TODO: We need to do some cleanup before loading the new doc... figure out what.

We don't use TODO in WebKit. It should be FIXME.

> Source/WebCore/editing/markup.cpp:795
> +            if (static_cast<CharacterData*>(n)->data() == endTag)
> +                fragmentEndNode = n;
> +            if (fragmentStartNode && fragmentEndNode)
> +                break;

So there's a case where fragmentStartNode appears after fragmentStartNode?

> Source/WebCore/editing/markup.cpp:801
> +                                        Position(fragmentStartNode, Position::PositionIsAfterAnchor),
> +                                        Position(fragmentEndNode, Position::PositionIsBeforeAnchor));

Nit: indentation. Position should be indented by exactly 8 spaces and should NOT be assigned with pageForParsing or RefPtr<Range>.

> Source/WebKit/chromium/public/WebClipboard.h:69
> +    virtual WebString readHTML(
> +        Buffer buffer, WebURL* pageURL, int* fragmentStart,
> +        int* fragmentEnd) { return WebString(); }

It seems like we should put all of the them in one line. As I understand it, Source/WebKit/chromium follows WebKit coding guideline so there's no 80-column restriction.
Comment 6 Ryosuke Niwa 2011-10-03 16:57:15 PDT
Comment on attachment 109219 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109219&action=review

> Source/WebCore/ChangeLog:3
> +        Context-aware HTML paste

Why is the bug title here doesn't match the one on Bugzilla?

> Source/WebCore/ChangeLog:12
> +        No new tests. (OOPS!)

You should explain why we can't have a test.
Comment 7 Ryosuke Niwa 2011-10-03 16:58:06 PDT
Since this code can be used by Apple's Windows port as well, it's more appropriate to label it with [Win].
Comment 8 Daniel Cheng 2011-10-05 22:16:17 PDT
Created attachment 109914 [details]
Patch
Comment 9 WebKit Review Bot 2011-10-05 22:18:42 PDT
Please wait for approval from fishd@chromium.org before submitting because this patch contains changes to the Chromium public API.
Comment 10 Daniel Cheng 2011-10-05 22:22:46 PDT
(In reply to comment #5)
> (From update of attachment 109219 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=109219&action=review
> 
> I'm sad that there's no way for us to test this.
> 
> > Source/WebCore/editing/markup.cpp:50
> > +#include "FileChooser.h"
> > +#include "FormState.h"
> 
> Why are these headers required here?
> 
> > Source/WebCore/editing/markup.cpp:54
> > +#include "HTMLDocument.h"
> 
> Ditto.
> 
> > Source/WebCore/editing/markup.cpp:56
> > +#include "HTMLFormElement.h"
> 
> Ditto.
> 

Removed. They were leftover from some other approaches I had tried.

> > Source/WebCore/editing/markup.cpp:77
> > +namespace {
> 
> Why do we need an anonymous namespace? We don't normally do this in WebCore.
> 

Done. Made it a static function.

> > Source/WebCore/editing/markup.cpp:80
> > +Page* pageForParsing = 0;
> > +
> > +bool ensurePageForParsingInitialized()
> 
> I'd prefer this function returned a page so rename it to something like "Page* pageForPaste()".
> 

Done. I'm looking for naming suggestions. pageForPaste() doesn't seem entirely appropriate here, as there's no other references to pasting in this file.

> > Source/WebCore/editing/markup.cpp:85
> > +    static FrameLoaderClient* dummyFrameLoaderClient = new EmptyFrameLoaderClient;
> 
> You need to use DEFINE_STATIC_LOCAL. And please use OwnPtr.
> 

This code was copied from SVGImage.cpp. How would I use OwnPtr with DEFINE_STATIC_LOCAL though?

> > Source/WebCore/editing/markup.cpp:763
> > +    // We use comment tags to help us extract the appropriate DocumentFragment.
> > +    const char beginTag[] = "3f81c6fb-e92e-417d-a05e-fd6f7a6c883c";
> > +    const char endTag[] = "3879efc4-7f0a-4352-ba73-61158bbecd70";
> 
> I'd prefer using webkit prefix e.g. <!--webkit-fragment-marker--> or<?-webkit-fragment-marker?>  Also we should make sure that this string doesn't exist inside the markup and if it does, we need to append appropriate text to distinguish ours from them.
> 

Done, though I haven't figured out the proper approach to prevent duplicates.

> > Source/WebCore/editing/markup.cpp:783
> > +    // TODO: We need to do some cleanup before loading the new doc... figure out what.
> 
> We don't use TODO in WebKit. It should be FIXME.
> 

Done.

> > Source/WebCore/editing/markup.cpp:795
> > +            if (static_cast<CharacterData*>(n)->data() == endTag)
> > +                fragmentEndNode = n;
> > +            if (fragmentStartNode && fragmentEndNode)
> > +                break;
> 
> So there's a case where fragmentStartNode appears after fragmentStartNode?
> 

It would be an abnormality. I've changed it so we use the same tag to mark the start and the end though.

> > Source/WebCore/editing/markup.cpp:801
> > +                                        Position(fragmentStartNode, Position::PositionIsAfterAnchor),
> > +                                        Position(fragmentEndNode, Position::PositionIsBeforeAnchor));
> 
> Nit: indentation. Position should be indented by exactly 8 spaces and should NOT be assigned with pageForParsing or RefPtr<Range>.
> 

Done.

> > Source/WebKit/chromium/public/WebClipboard.h:69
> > +    virtual WebString readHTML(
> > +        Buffer buffer, WebURL* pageURL, int* fragmentStart,
> > +        int* fragmentEnd) { return WebString(); }
> 
> It seems like we should put all of the them in one line. As I understand it, Source/WebKit/chromium follows WebKit coding guideline so there's no 80-column restriction.

I didn't change this for consistency with the other lines in this file.

(In reply to comment #6)
> (From update of attachment 109219 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=109219&action=review
> 
> > Source/WebCore/ChangeLog:3
> > +        Context-aware HTML paste
> 
> Why is the bug title here doesn't match the one on Bugzilla?
> 

Done. I tagged it as [chromium] for now since it will be enabled on all Chromium ports. I'm happy to create another bug for the Safari win port.

> > Source/WebCore/ChangeLog:12
> > +        No new tests. (OOPS!)
> 
> You should explain why we can't have a test.

I updated it. It's covered by layout tests. As expected, enabling the new parsing mode results in ~20 layout test diffs that I will have to go through and resolve.
Comment 11 Daniel Cheng 2011-10-05 22:25:04 PDT
Created attachment 109916 [details]
Patch
Comment 12 Ryosuke Niwa 2011-10-05 22:25:34 PDT
Comment on attachment 109914 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109914&action=review

> Source/WebCore/editing/markup.cpp:74
> +static bool pageForParsing()

The return type is wrong.

> Source/WebCore/editing/markup.cpp:76
> +    static Page* page = 0;

Please use DEFINE_STATIC_LOCAL.

> Source/WebCore/editing/markup.cpp:99
> +    static FrameLoaderClient* dummyFrameLoaderClient = new EmptyFrameLoaderClient;
> +    Page::PageClients pageClients;
> +    static ChromeClient* dummyChromeClient = new EmptyChromeClient;
> +    pageClients.chromeClient = dummyChromeClient;
> +#if ENABLE(CONTEXT_MENUS)
> +    static ContextMenuClient* dummyContextMenuClient = new EmptyContextMenuClient;
> +    pageClients.contextMenuClient = dummyContextMenuClient;
> +#endif
> +    static EditorClient* dummyEditorClient = new EmptyEditorClient;
> +    pageClients.editorClient = dummyEditorClient;
> +#if ENABLE(DRAG_SUPPORT)
> +    static DragClient* dummyDragClient = new EmptyDragClient;
> +    pageClients.dragClient = dummyDragClient;
> +#endif
> +    static InspectorClient* dummyInspectorClient = new EmptyInspectorClient;
> +    pageClients.inspectorClient = dummyInspectorClient;
> +#if ENABLE(DEVICE_ORIENTATION)
> +    static DeviceMotionClient* dummyDeviceMotionClient = new EmptyDeviceMotionClient;
> +    pageClients.deviceMotionClient = dummyDeviceMotionClient;
> +    static DeviceOrientationClient* dummyDeviceOrientationClient = new EmptyDeviceOrientationClient;

They all ought to use DEFINE_STATIC_LOCAL also please use OwnPtr, adopt, and leakPtr.

> Source/WebCore/editing/markup.cpp:751
> +    // This method isn't really meant to be called directly by web apps so it's assumed we're passing in something sane.

I don't think this comment adds any useful information here. Please remove.

> Source/WebCore/editing/markup.cpp:755
> +    // We use comment tags to help us extract the appropriate DocumentFragment.

I think this is pretty self-evident from the code. Please remove the comment.

> Source/WebCore/editing/markup.cpp:779
> +    RefPtr<Node> fragmentStartNode = 0;
> +    RefPtr<Node> fragmentEndNode = 0;

Why are they not Node*? I don't think they need to be RefPtr.

> Source/WebCore/editing/markup.cpp:780
> +    for (Node* n = pageForParsing->mainFrame()->document()->firstChild(); n; n = n->traverseNextNode()) {

Nit: please don't use abbreviations such as n.

> Source/WebCore/editing/markup.cpp:786
> +                else
> +                    fragmentEndNode = n;

It seems like you can just break here.

> Source/WebCore/editing/markup.cpp:789
> +            if (fragmentStartNode && fragmentEndNode)
> +                break;

instead of having a separate if-statement here.

> Source/WebCore/editing/markup.cpp:795
> +            Position(fragmentStartNode, Position::PositionIsAfterAnchor),
> +            Position(fragmentEndNode, Position::PositionIsBeforeAnchor));

Nit: wrong indentation. It needs to look like:
    RefPtr<Range> range = Range::create(pageForParsing->mainFrame()->document(),
        Position(fragmentStartNode, Position::PositionIsAfterAnchor),
        Position(fragmentEndNode, Position::PositionIsBeforeAnchor));

> Source/WebCore/editing/markup.cpp:798
> +    const String& fragmentMarkup = createMarkup(range.get(), 0, AnnotateForInterchange);
> +
> +    return createFragmentFromMarkup(document, fragmentMarkup, baseURL, scriptingPermission);

It seems like this can be done in one line.
Comment 13 Gyuyoung Kim 2011-10-05 22:29:52 PDT
Comment on attachment 109914 [details]
Patch

Attachment 109914 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/9981104
Comment 14 WebKit Review Bot 2011-10-05 22:30:49 PDT
Comment on attachment 109914 [details]
Patch

Attachment 109914 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9968129
Comment 15 Ryosuke Niwa 2011-10-05 22:33:31 PDT
Renaming the bug summary since this patch modifies Source/WebCore/editing/markup.cpp.

(In reply to comment #10)
> > > Source/WebCore/editing/markup.cpp:80
> > > +Page* pageForParsing = 0;
> > > +
> > > +bool ensurePageForParsingInitialized()
> > 
> > I'd prefer this function returned a page so rename it to something like "Page* pageForPaste()".
> > 
> 
> Done. I'm looking for naming suggestions. pageForPaste() doesn't seem entirely appropriate here, as there's no other references to pasting in this file.

pateForPaste sounds okay to me but darin (@apple) or enrica might know a better name.

> > > Source/WebCore/editing/markup.cpp:85
> > > +    static FrameLoaderClient* dummyFrameLoaderClient = new EmptyFrameLoaderClient;
> > 
> > You need to use DEFINE_STATIC_LOCAL. And please use OwnPtr.
> > 
> 
> This code was copied from SVGImage.cpp. How would I use OwnPtr with DEFINE_STATIC_LOCAL though?

e.g. http://trac.webkit.org/browser/trunk/Source/WebCore/editing/EditingStyle.cpp#L641 and http://trac.webkit.org/browser/trunk/Source/WebKit/chromium/src/WebDataSourceImpl.cpp#L45

> > I'd prefer using webkit prefix e.g. <!--webkit-fragment-marker--> or<?-webkit-fragment-marker?>  Also we should make sure that this string doesn't exist inside the markup and if it does, we need to append appropriate text to distinguish ours from them.
> > 
> 
> Done, though I haven't figured out the proper approach to prevent duplicates.

I think we can be done as a follow up.

> > > Source/WebCore/editing/markup.cpp:795
> > > +            if (static_cast<CharacterData*>(n)->data() == endTag)
> > > +                fragmentEndNode = n;
> > > +            if (fragmentStartNode && fragmentEndNode)
> > > +                break;
> > 
> > So there's a case where fragmentStartNode appears after fragmentStartNode?
> > 
> 
> It would be an abnormality. I've changed it so we use the same tag to mark the start and the end though.

Great. Looks much better.

> (In reply to comment #6)
> > (From update of attachment 109219 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=109219&action=review
> > 
> > > Source/WebCore/ChangeLog:3
> > > +        Context-aware HTML paste
> > 
> > Why is the bug title here doesn't match the one on Bugzilla?
> > 
> 
> Done. I tagged it as [chromium] for now since it will be enabled on all Chromium ports. I'm happy to create another bug for the Safari win port.

Per discussion on webkit-dev, I don't think we should be tagging with [chromium] given that this patch adds code to markup.cpp

> > > Source/WebCore/ChangeLog:12
> > > +        No new tests. (OOPS!)
> > 
> > You should explain why we can't have a test.
> 
> I updated it. It's covered by layout tests. As expected, enabling the new parsing mode results in ~20 layout test diffs that I will have to go through and resolve.
Comment 16 Darin Fisher (:fishd, Google) 2011-10-05 22:35:07 PDT
Comment on attachment 109916 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109916&action=review

> Source/WebKit/chromium/public/WebClipboard.h:68
> +        Buffer buffer, WebURL* pageURL, unsigned* fragmentStart,

it would be helpful to define what fragmentStart and fragmentEnd mean.  they
look like offsets into some buffer.  what buffer?  do they refer to the
string that readHTML returns?
Comment 17 Ryosuke Niwa 2011-10-05 22:43:12 PDT
Comment on attachment 109916 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109916&action=review

>> Source/WebKit/chromium/public/WebClipboard.h:68
>> +        Buffer buffer, WebURL* pageURL, unsigned* fragmentStart,
> 
> it would be helpful to define what fragmentStart and fragmentEnd mean.  they
> look like offsets into some buffer.  what buffer?  do they refer to the
> string that readHTML returns?

fragment start and fragment end are terms used in CF_HTML format: http://msdn.microsoft.com/en-us/library/aa767917(v=vs.85).aspx
Comment 18 Darin Fisher (:fishd, Google) 2011-10-05 22:51:14 PDT
(In reply to comment #17)
> fragment start and fragment end are terms used in CF_HTML format: http://msdn.microsoft.com/en-us/library/aa767917(v=vs.85).aspx

Ah, that's very helpful to know.  The API docs should probably reference that URL.
Comment 19 Daniel Cheng 2011-10-05 23:18:13 PDT
Created attachment 109919 [details]
Patch
Comment 20 Daniel Cheng 2011-10-05 23:18:33 PDT
(In reply to comment #12)
> (From update of attachment 109914 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=109914&action=review
> 
> > Source/WebCore/editing/markup.cpp:74
> > +static bool pageForParsing()
> 
> The return type is wrong.

Done.

> 
> > Source/WebCore/editing/markup.cpp:76
> > +    static Page* page = 0;
> 
> Please use DEFINE_STATIC_LOCAL.
> 

DEFINE_STATIC_LOCAL doesn't fit the pattern here well, since it expects to construct the object at the site of the call. We can't do that, since we don't have a pageClients yet here. I could move it down below the pageClients initialization, but then we'll always needlessly reinitialize the pageClients struct.

> > Source/WebCore/editing/markup.cpp:99
> > +    static FrameLoaderClient* dummyFrameLoaderClient = new EmptyFrameLoaderClient;
> > +    Page::PageClients pageClients;
> > +    static ChromeClient* dummyChromeClient = new EmptyChromeClient;
> > +    pageClients.chromeClient = dummyChromeClient;
> > +#if ENABLE(CONTEXT_MENUS)
> > +    static ContextMenuClient* dummyContextMenuClient = new EmptyContextMenuClient;
> > +    pageClients.contextMenuClient = dummyContextMenuClient;
> > +#endif
> > +    static EditorClient* dummyEditorClient = new EmptyEditorClient;
> > +    pageClients.editorClient = dummyEditorClient;
> > +#if ENABLE(DRAG_SUPPORT)
> > +    static DragClient* dummyDragClient = new EmptyDragClient;
> > +    pageClients.dragClient = dummyDragClient;
> > +#endif
> > +    static InspectorClient* dummyInspectorClient = new EmptyInspectorClient;
> > +    pageClients.inspectorClient = dummyInspectorClient;
> > +#if ENABLE(DEVICE_ORIENTATION)
> > +    static DeviceMotionClient* dummyDeviceMotionClient = new EmptyDeviceMotionClient;
> > +    pageClients.deviceMotionClient = dummyDeviceMotionClient;
> > +    static DeviceOrientationClient* dummyDeviceOrientationClient = new EmptyDeviceOrientationClient;
> 
> They all ought to use DEFINE_STATIC_LOCAL also please use OwnPtr, adopt, and leakPtr.
> 

I took a look at the examples you listed. I can understand using DEFINE_STATIC_LOCAL here, but using DEFINE_STATIC_LOCAL with OwnPtr essentially creates two layers of checks that we have to go through each time instead of one--the compiler check that the static is initialized, and then a second check we have to add to allocate the object if the OwnPtr is still null. Why do we need this complexity for something we are statically allocating? What does using OwnPtr here give us that a raw pointer does not?

> > Source/WebCore/editing/markup.cpp:751
> > +    // This method isn't really meant to be called directly by web apps so it's assumed we're passing in something sane.
> 
> I don't think this comment adds any useful information here. Please remove.
> 

Done.

> > Source/WebCore/editing/markup.cpp:755
> > +    // We use comment tags to help us extract the appropriate DocumentFragment.
> 
> I think this is pretty self-evident from the code. Please remove the comment.
> 

Done.

> > Source/WebCore/editing/markup.cpp:779
> > +    RefPtr<Node> fragmentStartNode = 0;
> > +    RefPtr<Node> fragmentEndNode = 0;
> 
> Why are they not Node*? I don't think they need to be RefPtr.
> 

Done.

> > Source/WebCore/editing/markup.cpp:780
> > +    for (Node* n = pageForParsing->mainFrame()->document()->firstChild(); n; n = n->traverseNextNode()) {
> 
> Nit: please don't use abbreviations such as n.
> 

Done.

> > Source/WebCore/editing/markup.cpp:786
> > +                else
> > +                    fragmentEndNode = n;
> 
> It seems like you can just break here.
> 

Done.

> > Source/WebCore/editing/markup.cpp:789
> > +            if (fragmentStartNode && fragmentEndNode)
> > +                break;
> 
> instead of having a separate if-statement here.
> 

Done.

> > Source/WebCore/editing/markup.cpp:795
> > +            Position(fragmentStartNode, Position::PositionIsAfterAnchor),
> > +            Position(fragmentEndNode, Position::PositionIsBeforeAnchor));
> 
> Nit: wrong indentation. It needs to look like:
>     RefPtr<Range> range = Range::create(pageForParsing->mainFrame()->document(),
>         Position(fragmentStartNode, Position::PositionIsAfterAnchor),
>         Position(fragmentEndNode, Position::PositionIsBeforeAnchor));
> 

Done. Sorry I thought you meant it should be indented 8 spaces.

> > Source/WebCore/editing/markup.cpp:798
> > +    const String& fragmentMarkup = createMarkup(range.get(), 0, AnnotateForInterchange);
> > +
> > +    return createFragmentFromMarkup(document, fragmentMarkup, baseURL, scriptingPermission);
> 
> It seems like this can be done in one line.

Done.

(In reply to comment #16)
> (From update of attachment 109916 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=109916&action=review
> 
> > Source/WebKit/chromium/public/WebClipboard.h:68
> > +        Buffer buffer, WebURL* pageURL, unsigned* fragmentStart,
> 
> it would be helpful to define what fragmentStart and fragmentEnd mean.  they
> look like offsets into some buffer.  what buffer?  do they refer to the
> string that readHTML returns?

Done. Is the link to CF_HTML appropriate here even though it's Windows specific? At this point, there is no knowledge of the original CF_HTML format (on Windows).(In reply to comment #15)
> Renaming the bug summary since this patch modifies Source/WebCore/editing/markup.cpp.
> 
> (In reply to comment #10)
> > > > Source/WebCore/editing/markup.cpp:80
> > > > +Page* pageForParsing = 0;
> > > > +
> > > > +bool ensurePageForParsingInitialized()
> > > 
> > > I'd prefer this function returned a page so rename it to something like "Page* pageForPaste()".
> > > 
> > 
> > Done. I'm looking for naming suggestions. pageForPaste() doesn't seem entirely appropriate here, as there's no other references to pasting in this file.
> 
> pateForPaste sounds okay to me but darin (@apple) or enrica might know a better name.
> 
> > > > Source/WebCore/editing/markup.cpp:85
> > > > +    static FrameLoaderClient* dummyFrameLoaderClient = new EmptyFrameLoaderClient;
> > > 
> > > You need to use DEFINE_STATIC_LOCAL. And please use OwnPtr.
> > > 
> > 
> > This code was copied from SVGImage.cpp. How would I use OwnPtr with DEFINE_STATIC_LOCAL though?
> 
> e.g. http://trac.webkit.org/browser/trunk/Source/WebCore/editing/EditingStyle.cpp#L641 and http://trac.webkit.org/browser/trunk/Source/WebKit/chromium/src/WebDataSourceImpl.cpp#L45
> 
> > > I'd prefer using webkit prefix e.g. <!--webkit-fragment-marker--> or<?-webkit-fragment-marker?>  Also we should make sure that this string doesn't exist inside the markup and if it does, we need to append appropriate text to distinguish ours from them.
> > > 
> > 
> > Done, though I haven't figured out the proper approach to prevent duplicates.
> 
> I think we can be done as a follow up.
> 

Done. Added a FIXME.

> > > > Source/WebCore/editing/markup.cpp:795
> > > > +            if (static_cast<CharacterData*>(n)->data() == endTag)
> > > > +                fragmentEndNode = n;
> > > > +            if (fragmentStartNode && fragmentEndNode)
> > > > +                break;
> > > 
> > > So there's a case where fragmentStartNode appears after fragmentStartNode?
> > > 
> > 
> > It would be an abnormality. I've changed it so we use the same tag to mark the start and the end though.
> 
> Great. Looks much better.
> 
> > (In reply to comment #6)
> > > (From update of attachment 109219 [details] [details] [details])
> > > View in context: https://bugs.webkit.org/attachment.cgi?id=109219&action=review
> > > 
> > > > Source/WebCore/ChangeLog:3
> > > > +        Context-aware HTML paste
> > > 
> > > Why is the bug title here doesn't match the one on Bugzilla?
> > > 
> > 
> > Done. I tagged it as [chromium] for now since it will be enabled on all Chromium ports. I'm happy to create another bug for the Safari win port.
> 
> Per discussion on webkit-dev, I don't think we should be tagging with [chromium] given that this patch adds code to markup.cpp
> 

Done, though I changed the title to remove "win" since it's going to be enabled on all platforms unless there's a reason we shouldn't.

> > > > Source/WebCore/ChangeLog:12
> > > > +        No new tests. (OOPS!)
> > > 
> > > You should explain why we can't have a test.
> > 
> > I updated it. It's covered by layout tests. As expected, enabling the new parsing mode results in ~20 layout test diffs that I will have to go through and resolve.
Comment 21 Ryosuke Niwa 2011-10-05 23:24:53 PDT
Comment on attachment 109914 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109914&action=review

>>> Source/WebCore/editing/markup.cpp:99
>>> +    static DeviceOrientationClient* dummyDeviceOrientationClient = new EmptyDeviceOrientationClient;
>> 
>> They all ought to use DEFINE_STATIC_LOCAL also please use OwnPtr, adopt, and leakPtr.
> 
> I took a look at the examples you listed. I can understand using DEFINE_STATIC_LOCAL here, but using DEFINE_STATIC_LOCAL with OwnPtr essentially creates two layers of checks that we have to go through each time instead of one--the compiler check that the static is initialized, and then a second check we have to add to allocate the object if the OwnPtr is still null. Why do we need this complexity for something we are statically allocating? What does using OwnPtr here give us that a raw pointer does not?

You should do
static DeviceMotionClient* dummyDeviceMotionClient = OwnPtr::adopt(new EmptyDeviceMotionClient).leakPtr();
for example. See http://trac.webkit.org/browser/trunk/Source/WebCore/platform/ColorChooser.cpp#L49 for an example.
Comment 22 Ryosuke Niwa 2011-10-05 23:32:45 PDT
Comment on attachment 109919 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109919&action=review

> Source/WebCore/editing/markup.cpp:80
> +    static FrameLoaderClient* dummyFrameLoaderClient = new EmptyFrameLoaderClient;

new needs to be wrapped by adoptPtr(~).leakPtr(). r- because of this.

> Source/WebCore/editing/markup.cpp:116
> +    loader->activeDocumentLoader()->writer()->setMIMEType("text/html");

Are you sure we can always assume the content is text/html? no application/xml?

> Source/WebCore/editing/markup.cpp:763
> +    taggedMarkup.append("<!--");
> +    taggedMarkup.append(fragmentMarkerTag);
> +    taggedMarkup.append("-->");
> +    taggedMarkup.append(markup.substring(fragmentStart, fragmentEnd - fragmentStart));
> +    taggedMarkup.append("<!--");
> +    taggedMarkup.append(fragmentMarkerTag);
> +    taggedMarkup.append("-->");
> +    taggedMarkup.append(markup.substring(fragmentEnd));

It'll be really nice if we could share code with MarkupAccumulator::appendComment.
http://trac.webkit.org/browser/trunk/Source/WebCore/editing/MarkupAccumulator.cpp#L301

> Source/WebCore/editing/markup.cpp:778
> +        if (node->nodeType() == Node::COMMENT_NODE) {
> +            if (static_cast<CharacterData*>(node)->data() == fragmentMarkerTag) {

It seems like these conditions can be checked in one if-statement.

> Source/WebCore/editing/markup.cpp:791
> +        Position(fragmentStartNode, Position::PositionIsAfterAnchor),
> +        Position(fragmentEndNode, Position::PositionIsBeforeAnchor));

Oops, these lines are actually wrong. These positions aren't range-compliant. Furthermore, you should make use of positionBeforeNode and positionAfterNode so do:
positionAfterNode(fragmentStartNode).parentAnchoredEquivalent(), positionBeforeNode(fragmentEndNode).parentAnchoredEquivalent(),
Comment 23 Daniel Cheng 2011-10-05 23:47:07 PDT
Created attachment 109926 [details]
Patch
Comment 24 Daniel Cheng 2011-10-05 23:53:07 PDT
Thanks for the review.

(In reply to comment #22)
> (From update of attachment 109919 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=109919&action=review
> 
> > Source/WebCore/editing/markup.cpp:80
> > +    static FrameLoaderClient* dummyFrameLoaderClient = new EmptyFrameLoaderClient;
> 
> new needs to be wrapped by adoptPtr(~).leakPtr(). r- because of this.
> 

Done. I ended up finding some context for this, if this issue comes up, maybe just ask the developer to search for "naked new considered harmful" on webkit-dev. Sorry about the confusion though.

> > Source/WebCore/editing/markup.cpp:116
> > +    loader->activeDocumentLoader()->writer()->setMIMEType("text/html");
> 
> Are you sure we can always assume the content is text/html? no application/xml?
> 

Yes, that's all this is currently designed for. If we want to extend it in the future, this would need to be updated of course.

> > Source/WebCore/editing/markup.cpp:763
> > +    taggedMarkup.append("<!--");
> > +    taggedMarkup.append(fragmentMarkerTag);
> > +    taggedMarkup.append("-->");
> > +    taggedMarkup.append(markup.substring(fragmentStart, fragmentEnd - fragmentStart));
> > +    taggedMarkup.append("<!--");
> > +    taggedMarkup.append(fragmentMarkerTag);
> > +    taggedMarkup.append("-->");
> > +    taggedMarkup.append(markup.substring(fragmentEnd));
> 
> It'll be really nice if we could share code with MarkupAccumulator::appendComment.
> http://trac.webkit.org/browser/trunk/Source/WebCore/editing/MarkupAccumulator.cpp#L301
> 

It'd be nice but I don't see a good way =/

> > Source/WebCore/editing/markup.cpp:778
> > +        if (node->nodeType() == Node::COMMENT_NODE) {
> > +            if (static_cast<CharacterData*>(node)->data() == fragmentMarkerTag) {
> 
> It seems like these conditions can be checked in one if-statement.
> 

Done.

> > Source/WebCore/editing/markup.cpp:791
> > +        Position(fragmentStartNode, Position::PositionIsAfterAnchor),
> > +        Position(fragmentEndNode, Position::PositionIsBeforeAnchor));
> 
> Oops, these lines are actually wrong. These positions aren't range-compliant. Furthermore, you should make use of positionBeforeNode and positionAfterNode so do:
> positionAfterNode(fragmentStartNode).parentAnchoredEquivalent(), positionBeforeNode(fragmentEndNode).parentAnchoredEquivalent(),

Done. What does it mean for something to be range-compliant though?

Finally, I think I might have an idea of what's causing the layout test diffs, but I'll have to wait till I'm in the office tomorrow to try it.
Comment 25 Ryosuke Niwa 2011-10-05 23:53:43 PDT
Comment on attachment 109926 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109926&action=review

> Source/WebCore/ChangeLog:9
> +        This change allows us to create an HTML fragment taking into account the context around the
> +        markup. This preserves <style> declarations and other HTML elements that are

Nit: please move "the" to the second line to avoid the awkward wrapping.

> Source/WebCore/ChangeLog:12
> +        Covered by existing tests.

So this patch doesn't require any rebaselines?

> Source/WebCore/editing/markup.cpp:111
> +    // Disable scrolling. This prevents a crash... though we should really figure out why it crashes.
> +    frame->view()->setCanHaveScrollbars(false);

Do you have a stack trace for the crash when you enabled scrolling?
Comment 26 Ryosuke Niwa 2011-10-05 23:58:14 PDT
(In reply to comment #24)
> > > Source/WebCore/editing/markup.cpp:763
> > > +    taggedMarkup.append("<!--");
> > > +    taggedMarkup.append(fragmentMarkerTag);
> > > +    taggedMarkup.append("-->");
> > > +    taggedMarkup.append(markup.substring(fragmentStart, fragmentEnd - fragmentStart));
> > > +    taggedMarkup.append("<!--");
> > > +    taggedMarkup.append(fragmentMarkerTag);
> > > +    taggedMarkup.append("-->");
> > > +    taggedMarkup.append(markup.substring(fragmentEnd));
> > 
> > It'll be really nice if we could share code with MarkupAccumulator::appendComment.
> > http://trac.webkit.org/browser/trunk/Source/WebCore/editing/MarkupAccumulator.cpp#L301
> > 
> 
> It'd be nice but I don't see a good way =/

I guess we can make MarkupAccumulator::appendComment static & public since it doesn't do anything with member variables.
Comment 27 WebKit Review Bot 2011-10-06 01:05:06 PDT
Comment on attachment 109926 [details]
Patch

Attachment 109926 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9978122

New failing tests:
editing/inserting/insert-3907422-fix.html
editing/pasteboard/5065605.html
editing/pasteboard/4922709.html
editing/pasteboard/4944770-2.html
editing/inserting/insert-paste-bidi-control.html
editing/pasteboard/5247341.html
editing/inserting/insert-3786362-fix.html
editing/pasteboard/19644-1.html
editing/pasteboard/4242293.html
editing/execCommand/4128080-2.html
editing/pasteboard/4944770-1.html
editing/pasteboard/4989774.html
editing/pasteboard/5478250.html
editing/pasteboard/4242293-1.html
editing/pasteboard/5071074.html
editing/pasteboard/5028447.html
http/tests/misc/copy-resolves-urls.html
editing/pasteboard/5075944.html
editing/pasteboard/19644-2.html
editing/pasteboard/4641033.html
Comment 28 Tony Chang 2011-10-06 10:17:03 PDT
Comment on attachment 109926 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109926&action=review

LGTM, but I would be interested in any test changes.

> Source/WebCore/editing/markup.cpp:752
> +    // FIXME: Need to handle the case where the markup already contains these markers.
> +    const char fragmentMarkerTag[] = "webkit-fragment-marker";

Seems like we can just strip them in the markup.  A follow up patch for this seems fine.

> Source/WebCore/editing/markup.cpp:773
> +    // Now spelunk through the DOM and extract the beginning and the end of the range using the setinels.

Nit: typo sentinels

> Source/WebCore/editing/markup.h:48
> +    PassRefPtr<DocumentFragment> createFragmentFromMarkupWithContext(Document*, const String& markup, unsigned fragmentStart, unsigned fragmentEnd, const String& baseURL, FragmentScriptingPermission = FragmentScriptingAllowed);

Does FragmentScriptingPermission need to be optional?  Seems like the one caller of this function sets it.
Comment 29 Tony Chang 2011-10-06 10:17:44 PDT
Comment on attachment 109926 [details]
Patch

r- due to the missing test changes
Comment 30 Daniel Cheng 2011-10-07 13:41:50 PDT
Created attachment 110209 [details]
Patch
Comment 31 Daniel Cheng 2011-10-07 13:45:36 PDT
Comment on attachment 110209 [details]
Patch

CQ- since I'm still working through a few layout test issues. The major change is using DoNotAnnotateForInterchange.

(In reply to comment #25)
> (From update of attachment 109926 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=109926&action=review
> 
> > Source/WebCore/ChangeLog:9
> > +        This change allows us to create an HTML fragment taking into account the context around the
> > +        markup. This preserves <style> declarations and other HTML elements that are
> 
> Nit: please move "the" to the second line to avoid the awkward wrapping.

Done.

> 
> > Source/WebCore/ChangeLog:12
> > +        Covered by existing tests.
> 
> So this patch doesn't require any rebaselines?
> 

There are 11 tests which fail on Linux, 1 which fails on Mac, and 1 test case that crashes (not the scrollbar crash) if it's run in a certain order. I'm still investigating them.

> > Source/WebCore/editing/markup.cpp:111
> > +    // Disable scrolling. This prevents a crash... though we should really figure out why it crashes.
> > +    frame->view()->setCanHaveScrollbars(false);
> 
> Do you have a stack trace for the crash when you enabled scrolling?

(In reply to comment #26)
> (In reply to comment #24)
> > > > Source/WebCore/editing/markup.cpp:763
> > > > +    taggedMarkup.append("<!--");
> > > > +    taggedMarkup.append(fragmentMarkerTag);
> > > > +    taggedMarkup.append("-->");
> > > > +    taggedMarkup.append(markup.substring(fragmentStart, fragmentEnd - fragmentStart));
> > > > +    taggedMarkup.append("<!--");
> > > > +    taggedMarkup.append(fragmentMarkerTag);
> > > > +    taggedMarkup.append("-->");
> > > > +    taggedMarkup.append(markup.substring(fragmentEnd));
> > > 
> > > It'll be really nice if we could share code with MarkupAccumulator::appendComment.
> > > http://trac.webkit.org/browser/trunk/Source/WebCore/editing/MarkupAccumulator.cpp#L301
> > > 
> > 
> > It'd be nice but I don't see a good way =/
> 
> I guess we can make MarkupAccumulator::appendComment static & public since it doesn't do anything with member variables.

Done.

(In reply to comment #28)
> (From update of attachment 109926 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=109926&action=review
> 
> LGTM, but I would be interested in any test changes.
> 
> > Source/WebCore/editing/markup.cpp:752
> > +    // FIXME: Need to handle the case where the markup already contains these markers.
> > +    const char fragmentMarkerTag[] = "webkit-fragment-marker";
> 
> Seems like we can just strip them in the markup.  A follow up patch for this seems fine.
> 

Without parsing, we won't know if they are part of a comment-node or not, so it might not be safe to strip them.

> > Source/WebCore/editing/markup.cpp:773
> > +    // Now spelunk through the DOM and extract the beginning and the end of the range using the setinels.
> 
> Nit: typo sentinels
> 

Done.

> > Source/WebCore/editing/markup.h:48
> > +    PassRefPtr<DocumentFragment> createFragmentFromMarkupWithContext(Document*, const String& markup, unsigned fragmentStart, unsigned fragmentEnd, const String& baseURL, FragmentScriptingPermission = FragmentScriptingAllowed);
> 
> Does FragmentScriptingPermission need to be optional?  Seems like the one caller of this function sets it.

Done. Removed the default argument.
Comment 32 Ryosuke Niwa 2011-10-07 14:22:18 PDT
Comment on attachment 109926 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109926&action=review

>>> Source/WebCore/ChangeLog:12
>>> +        Covered by existing tests.
>> 
>> So this patch doesn't require any rebaselines?
> 
> There are 11 tests which fail on Linux, 1 which fails on Mac, and 1 test case that crashes (not the scrollbar crash) if it's run in a certain order. I'm still investigating them.

You should include rebaselines if there are any.
Comment 33 Ryosuke Niwa 2011-10-07 14:26:09 PDT
Comment on attachment 110209 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110209&action=review

r- due to the comment that we have failing tests.

>> Source/WebCore/ChangeLog:12
>> +        Covered by existing tests.
> 
> There are 11 tests which fail on Linux, 1 which fails on Mac, and 1 test case that crashes (not the scrollbar crash) if it's run in a certain order. I'm still investigating them.

I can't r+ this patch until I see those failures.

> Source/WebCore/editing/markup.cpp:786
> +    RefPtr<Range> range = Range::create(pageForPaste()->mainFrame()->document(),
> +        positionAfterNode(fragmentStartNode).parentAnchoredEquivalent(),
> +        positionBeforeNode(fragmentEndNode).parentAnchoredEquivalent());

Knowing there was some crash, is it possible that fragmentStart and fragmentEnd are never found? Also is it possible that fragment start and fragment ends are before and after html element respectively? In those cases, these lines might blow up so you may want to have some nullity check here.
Comment 34 Daniel Cheng 2011-10-07 15:01:31 PDT
The crash that I get on the second paste if I don't disable scrollbars:
ASSERTION FAILED: documentWrapper == m_document || m_document.IsEmpty()
third_party/WebKit/Source/WebCore/bindings/v8/V8DOMWindowShell.cpp(485) : void WebCore::V8DOMWindowShell::updateDocumentWrapperCache()
#0  0x00007ffff38dfd1e in WebCore::V8DOMWindowShell::updateDocumentWrapperCache (
    this=0x7fffd74d7a20) at third_party/WebKit/Source/WebCore/bindings/v8/V8DOMWindowShell.cpp:485
#1  0x00007ffff38e0180 in WebCore::V8DOMWindowShell::updateDocument (this=0x7fffd74d7a20)
    at third_party/WebKit/Source/WebCore/bindings/v8/V8DOMWindowShell.cpp:561
#2  0x00007ffff38c2d42 in WebCore::ScriptController::updateDocument (this=0x7fffdeb7d5c0)
    at third_party/WebKit/Source/WebCore/bindings/v8/ScriptController.cpp:422
#3  0x00007ffff3b9a090 in WebCore::Frame::setDocument (this=0x7fffdeb7d000, newDoc=...)
    at third_party/WebKit/Source/WebCore/page/Frame.cpp:308
#4  0x00007ffff3af53f6 in WebCore::DocumentWriter::begin (this=0x7fffdeb7dd20, urlReference=...,
    dispatch=true, originSource=WebCore::DocumentWriter::CreateNewSecurityOrigin)
    at third_party/WebKit/Source/WebCore/loader/DocumentWriter.cpp:136#5  0x00007ffff3a53f50 in WebCore::createFragmentFromMarkupWithContext (document=0x7fffd6fd3800,
    markup=..., fragmentStart=0, fragmentEnd=145995, baseURL=...,
    scriptingPermission=WebCore::FragmentScriptingNotAllowed)
    at third_party/WebKit/Source/WebCore/editing/markup.cpp:765
#6  0x00007ffff377e6c7 in WebCore::Pasteboard::documentFragment (this=0x7fffd7425fb0,
    frame=0x7fffd6fcf000, context=..., allowPlainText=true, chosePlainText=@0x7fffdb7f4a5f)
    at third_party/WebKit/Source/WebCore/platform/chromium/PasteboardChromium.cpp:181
#7  0x00007ffff39fc9e4 in WebCore::Editor::pasteWithPasteboard (this=0x7fffd6fcf630,
    pasteboard=0x7fffd7425fb0, allowPlainText=true)
    at third_party/WebKit/Source/WebCore/editing/Editor.cpp:375
#8  0x00007ffff3a0107e in WebCore::Editor::paste (this=0x7fffd6fcf630)
    at third_party/WebKit/Source/WebCore/editing/Editor.cpp:1198
#9  0x00007ffff3a0face in WebCore::executePaste (frame=0x7fffd6fcf000,
    source=WebCore::CommandFromMenuOrKeyBinding)

(In reply to comment #33)
> (From update of attachment 110209 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=110209&action=review
> 
> r- due to the comment that we have failing tests.
> 
> >> Source/WebCore/ChangeLog:12
> >> +        Covered by existing tests.
> > 
> > There are 11 tests which fail on Linux, 1 which fails on Mac, and 1 test case that crashes (not the scrollbar crash) if it's run in a certain order. I'm still investigating them.
> 
> I can't r+ this patch until I see those failures.

They look like whitespace diffs in Apple-converted-space blocks, but I do not understand why I'm not getting the same results on Chrome Mac and Chrome Linux, so I haven't included the rebaseline here because I'm still investigating.

> 
> > Source/WebCore/editing/markup.cpp:786
> > +    RefPtr<Range> range = Range::create(pageForPaste()->mainFrame()->document(),
> > +        positionAfterNode(fragmentStartNode).parentAnchoredEquivalent(),
> > +        positionBeforeNode(fragmentEndNode).parentAnchoredEquivalent());
> 
> Knowing there was some crash, is it possible that fragmentStart and fragmentEnd are never found? Also is it possible that fragment start and fragment ends are before and after html element respectively? In those cases, these lines might blow up so you may want to have some nullity check here.

The crash is much before this, inside DocumentWriter::begin():

ASSERTION FAILED: documentWrapper == m_document || m_document.IsEmpty()
third_party/WebKit/Source/WebCore/bindings/v8/V8DOMWindowShell.cpp(485) : void WebCore::V8DOMWindowShell::updateDocumentWrapperCache()
[23152:23152:599754171953:ERROR:process_util_posix.cc(134)] Received signal 11
        base::debug::StackTrace::StackTrace() [0x6fa3e6]
        base::(anonymous namespace)::StackDumpSignalHandler() [0x6b7805]
        0x7ff819b82af0
        WebCore::V8DOMWindowShell::updateDocumentWrapperCache() [0xf96456]
        WebCore::V8DOMWindowShell::updateDocument() [0xf968b8]
        WebCore::ScriptController::updateDocument() [0xf79976]
        WebCore::Frame::setDocument() [0x124a85c]
        WebCore::DocumentWriter::begin() [0x11a6372]
        WebCore::createFragmentFromMarkupWithContext() [0x1107c32]
        WebCore::Pasteboard::documentFragment() [0xe2f16b]
        WebCore::Editor::pasteWithPasteboard() [0x10b0698]
        WebCore::Editor::paste() [0x10b4d32]
        WebCore::executePaste() [0x10c37a4]

The crashing test is editing/pasteboard/4922709.html. I don't understand why it crashes, but it only appears to happen if I run 4922709.html after 4641033.html.
Comment 35 Ryosuke Niwa 2011-10-07 16:26:34 PDT
(In reply to comment #34)
> ASSERTION FAILED: documentWrapper == m_document || m_document.IsEmpty()
> third_party/WebKit/Source/WebCore/bindings/v8/V8DOMWindowShell.cpp(485) : void WebCore::V8DOMWindowShell::updateDocumentWrapperCache()
> [23152:23152:599754171953:ERROR:process_util_posix.cc(134)] Received signal 11
>         base::debug::StackTrace::StackTrace() [0x6fa3e6]
>         base::(anonymous namespace)::StackDumpSignalHandler() [0x6b7805]
>         0x7ff819b82af0
>         WebCore::V8DOMWindowShell::updateDocumentWrapperCache() [0xf96456]
>         WebCore::V8DOMWindowShell::updateDocument() [0xf968b8]
>         WebCore::ScriptController::updateDocument() [0xf79976]
>         WebCore::Frame::setDocument() [0x124a85c]
>         WebCore::DocumentWriter::begin() [0x11a6372]
>         WebCore::createFragmentFromMarkupWithContext() [0x1107c32]
>         WebCore::Pasteboard::documentFragment() [0xe2f16b]
>         WebCore::Editor::pasteWithPasteboard() [0x10b0698]
>         WebCore::Editor::paste() [0x10b4d32]
>         WebCore::executePaste() [0x10c37a4]
> 
> The crashing test is editing/pasteboard/4922709.html. I don't understand why it crashes, but it only appears to happen if I run 4922709.html after 4641033.html.

Thanks for the stack trace. This looks like a real bug. Maybe fishd or dglazkov would know what's going wrong here.
Comment 36 Adam Barth 2011-10-07 16:43:13 PDT
Creating another use of EmptyClient is a very bad idea.
Comment 37 Ryosuke Niwa 2011-10-07 17:42:48 PDT
Per IRC discussion, we've concluded that the right approach is to let the embedder supply the shared page (real page instead of one created by empty clients) and pass it to createFragmentFromMarkupWithContext.
Comment 38 Daniel Cheng 2011-10-11 11:01:22 PDT
More followup. The following approaches to solving this problem have been considered:

Parsing using a fake page using empty clients:
It works but is considered a giant hack. Also it encountered assertions on the second paste (assertion when updating the document wrapper and when updating the scroll bar).

Parsing using a real page from the WebKit layer:
It works but encounters the same problems as above. jamers also pointed out that we need to make sure to block external resource loads. abarth suspected that the assertions are due to using DocumentWriter directly instead of using FrameLoader; unfortunately FrameLoader does not offer a suitable sync API for loading for us to use. If we ever consider this approach again, there are several other things we must remember:
- Scripting should not execute
- Media should not play
- Plugins should not be loaded
Basically, all 'active' content needs to be disabled.

Using the HTML5 tree builder:
abarth suggested this approach. Basically, feed the parser the content from the beginning of the doc to the start of the fragment. Figure out what context node the tree is in, and create a fragment parser using the correct context node. Unfortunately this does not work in the case of an unbalanced selection, such as:
<table><tr><!--StartFragment--><td>Data</td></tr></table><div>Hello</div><!--EndFragment-->, as the fragment will not have the TABLE or TR elements inserted.

This leads to the new approach being tried:
Parse the fragment with tags using the fragment parser instead of a fake page. Locate the fragment markers, then use some magic to build a copy of that document fragment with the appropriate context nodes. For example, if the fragment includes TD, we must make it a child of TABLE > TR. I hope to be able to share some code with createMarkup() which also needs to do that.

Implications:
We will still fail to inline styles correctly on pastes.
Comment 39 Daniel Cheng 2011-10-12 16:16:29 PDT
Created attachment 110767 [details]
Patch
Comment 40 Ryosuke Niwa 2011-10-12 16:26:57 PDT
Comment on attachment 110767 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110767&action=review

> Source/WebCore/editing/markup.cpp:474
> +static Node* ancestorToRetainStructureAndAppearanceHACK(Node* commonAncestor)
> +{
> +    if (commonAncestor->isTextNode())
> +        return 0;

Instead of this, we probably need some helper function that finds a block element without access to renderer.

> Source/WebCore/editing/markup.cpp:760
> +    // This loop is structured similarly to traverseNodesForSerialization().
> +    for (Node* n = startNode; n != pastEnd; n = next) {

If we're doing this, then we should probably templatize traverseNodesForSerialization and share code. But I think you can just use Range::commonAncestor instead.

> Source/WebCore/editing/markup.cpp:793
> +            RefPtr<Node> clone = ancestor->cloneNode(false);

You shouldn't clone. You should be removing this node from the fragment instead.
Comment 41 Early Warning System Bot 2011-10-12 16:43:41 PDT
Comment on attachment 110767 [details]
Patch

Attachment 110767 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/10041072
Comment 42 WebKit Review Bot 2011-10-12 17:33:18 PDT
Comment on attachment 110767 [details]
Patch

Attachment 110767 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/10046083
Comment 43 Daniel Cheng 2011-10-13 14:54:06 PDT
Created attachment 110914 [details]
Patch
Comment 44 Daniel Cheng 2011-10-13 14:57:05 PDT
(In reply to comment #40)
> (From update of attachment 110767 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=110767&action=review
> 
> > Source/WebCore/editing/markup.cpp:474
> > +static Node* ancestorToRetainStructureAndAppearanceHACK(Node* commonAncestor)
> > +{
> > +    if (commonAncestor->isTextNode())
> > +        return 0;
> 
> Instead of this, we probably need some helper function that finds a block element without access to renderer.
> 

I agree. But for a first iteration, this might be good enough I think. I could add some other "obvious" things like spans I guess. Is there somewhere in WebKit source to derive a list of elements are considered block?

> > Source/WebCore/editing/markup.cpp:760
> > +    // This loop is structured similarly to traverseNodesForSerialization().
> > +    for (Node* n = startNode; n != pastEnd; n = next) {
> 
> If we're doing this, then we should probably templatize traverseNodesForSerialization and share code. But I think you can just use Range::commonAncestor instead.
> 
> > Source/WebCore/editing/markup.cpp:793
> > +            RefPtr<Node> clone = ancestor->cloneNode(false);
> 
> You shouldn't clone. You should be removing this node from the fragment instead.

I'm using an alternate approach now where we move the entire block we think we're interested in and then trim elements we don't care about.
Comment 45 Ryosuke Niwa 2011-10-13 15:15:39 PDT
Comment on attachment 110914 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110914&action=review

The patch looks much nicer!

> Source/WebCore/editing/markup.cpp:460
> +static Node* ancestorToRetainStructureAndAppearance(Node* commonAncestor)

Please add inline keyword here.

> Source/WebCore/editing/markup.cpp:465
> +static Node* ancestorToRetainStructureAndAppearanceHACK(Node* commonAncestor)

We definitely don't want the name HACK here. r- because of this.

> Source/WebCore/editing/markup.cpp:474
> +    if (commonAncestor->isTextNode())
> +        return 0;
> +
> +    // FIXME: Do we want to handle mail quotes correctly? It might be useful...
> +    // We cheat here since we don't have a render tree.
> +    if (commonAncestor->hasTagName(tdTag)
> +        || commonAncestor->hasTagName(liTag))
> +        commonAncestor = commonAncestor->parentNode();

My suggestion is to extract a function out of ancestorToRetainStructureAndAppearanceForBlock (lines 444-454) and add td, tr, table, blockquote to the list.
And call enclosingNodeOfType with that.

> Source/WebCore/editing/markup.cpp:736
> +    if (!startNode || !endNode)

You should also check that startNode and endNode have parents.

> Source/WebCore/editing/markup.cpp:757
> +    for (Node* n = fragment->firstChild(); n; n = next) {

Please don't use one letter abbreviation like n.

> Source/WebCore/editing/markup.cpp:759
> +            next = n->traverseNextNode();

You probably want to call traverseNextSibling here.

> Source/WebCore/editing/markup.cpp:764
> +        next = n->traverseNextNode();

Ditto.

> Source/WebCore/editing/markup.cpp:766
> +        // FIXME: We're comparing a deleted pointer. It works, but it seems a little dangerous.

You should make n ref-counted.

> Source/WebCore/editing/markup.cpp:770
> +    for (Node* n = endNode; n; n = next) {

Ditto about one letter variable.

> Source/WebCore/editing/markup.cpp:773
> +        if (n->childNodeCount())
> +            static_cast<ContainerNode*>(n)->removeAllChildren();
> +        next = n->traverseNextNode();

No need to remove all children if you just call traverseNextSibling.
Comment 46 Daniel Cheng 2011-10-13 15:52:18 PDT
(In reply to comment #45)
> (From update of attachment 110914 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=110914&action=review
> 
> The patch looks much nicer!
> 
> > Source/WebCore/editing/markup.cpp:460
> > +static Node* ancestorToRetainStructureAndAppearance(Node* commonAncestor)
> 
> Please add inline keyword here.
> 

Why not just leave it up to the compiler?

> > Source/WebCore/editing/markup.cpp:465
> > +static Node* ancestorToRetainStructureAndAppearanceHACK(Node* commonAncestor)
> 
> We definitely don't want the name HACK here. r- because of this.
> 
> > Source/WebCore/editing/markup.cpp:474
> > +    if (commonAncestor->isTextNode())
> > +        return 0;
> > +
> > +    // FIXME: Do we want to handle mail quotes correctly? It might be useful...
> > +    // We cheat here since we don't have a render tree.
> > +    if (commonAncestor->hasTagName(tdTag)
> > +        || commonAncestor->hasTagName(liTag))
> > +        commonAncestor = commonAncestor->parentNode();
> 
> My suggestion is to extract a function out of ancestorToRetainStructureAndAppearanceForBlock (lines 444-454) and add td, tr, table, blockquote to the list.
> And call enclosingNodeOfType with that.
> 

I don't think that will work, because it might go too far up the tree. For example, if we had:
TABLE
>TR
>>TD
>>>DIV

If commonAncestor is the DIV, then I don't think we want to go up any higher, since we don't want the table in this case (I think).

> > Source/WebCore/editing/markup.cpp:736
> > +    if (!startNode || !endNode)
> 
> You should also check that startNode and endNode have parents.

Is it possible for them to be parentless at this point. At minimum, we know they are rooted somewhere under the Document.

> 
> > Source/WebCore/editing/markup.cpp:757
> > +    for (Node* n = fragment->firstChild(); n; n = next) {
> 
> Please don't use one letter abbreviation like n.
> 

Done.

> > Source/WebCore/editing/markup.cpp:759
> > +            next = n->traverseNextNode();
> 
> You probably want to call traverseNextSibling here.
> 
> > Source/WebCore/editing/markup.cpp:764
> > +        next = n->traverseNextNode();
> 
> Ditto.
> 

We're not necessarily deleting siblings though. We also need to traverse down into its children potentially, or possibly traverse to one of its ancestor's children.

> > Source/WebCore/editing/markup.cpp:766
> > +        // FIXME: We're comparing a deleted pointer. It works, but it seems a little dangerous.
> 
> You should make n ref-counted.
> 

Done.

> > Source/WebCore/editing/markup.cpp:770
> > +    for (Node* n = endNode; n; n = next) {
> 
> Ditto about one letter variable.
> 

Done.

> > Source/WebCore/editing/markup.cpp:773
> > +        if (n->childNodeCount())
> > +            static_cast<ContainerNode*>(n)->removeAllChildren();
> > +        next = n->traverseNextNode();
> 
> No need to remove all children if you just call traverseNextSibling.
Comment 47 Daniel Cheng 2011-10-13 15:53:14 PDT
Created attachment 110919 [details]
Patch
Comment 48 Daniel Cheng 2011-10-13 16:20:48 PDT
Created attachment 110925 [details]
Patch
Comment 49 Ryosuke Niwa 2011-10-13 16:22:03 PDT
(In reply to comment #46)
> (In reply to comment #45)
> > > Source/WebCore/editing/markup.cpp:460
> > > +static Node* ancestorToRetainStructureAndAppearance(Node* commonAncestor)
> > 
> > Please add inline keyword here.
> 
> Why not just leave it up to the compiler?

I think adding inline is a good way to document that this function should be short and static.

> > My suggestion is to extract a function out of ancestorToRetainStructureAndAppearanceForBlock (lines 444-454) and add td, tr, table, blockquote to the list.
> > And call enclosingNodeOfType with that.
> > 
> 
> I don't think that will work, because it might go too far up the tree. For example, if we had:
> TABLE
> >TR
> >>TD
> >>>DIV
> 
> If commonAncestor is the DIV, then I don't think we want to go up any higher, since we don't want the table in this case (I think).

And enclosingNodeOfType stops at td. I don't know what problem/concern you have.

> > > Source/WebCore/editing/markup.cpp:736
> > > +    if (!startNode || !endNode)
> > 
> > You should also check that startNode and endNode have parents.
> 
> Is it possible for them to be parentless at this point. At minimum, we know they are rooted somewhere under the Document.

That's a good point.

> > > Source/WebCore/editing/markup.cpp:759
> > > +            next = n->traverseNextNode();
> > 
> > You probably want to call traverseNextSibling here.
> > 
> We're not necessarily deleting siblings though. We also need to traverse down into its children potentially, or possibly traverse to one of its ancestor's children.

> > > Source/WebCore/editing/markup.cpp:764
> > > +        next = n->traverseNextNode();
> > 
> > Ditto.

This should still be traverseNextSibling.
Comment 50 Daniel Cheng 2011-10-13 16:22:28 PDT
This version adds better support for finding an enclosing block without a render tree. It uses what the HTML5 spec considers block-level elements, but I'm working on making sure the list is canonical.
Comment 51 Ryosuke Niwa 2011-10-13 16:25:55 PDT
Comment on attachment 110925 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110925&action=review

> Source/WebCore/ChangeLog:-975
> -

Please revert this.

> Source/WebCore/editing/markup.cpp:496
> +    return node->hasTagName(addressTag)
> +        || node->hasTagName(articleTag)
> +        || node->hasTagName(asideTag)
> +        || node->hasTagName(audioTag)
> +        || node->hasTagName(blockquoteTag)
> +        || node->hasTagName(canvasTag)
> +        || node->hasTagName(ddTag)
> +        || node->hasTagName(divTag)
> +        || node->hasTagName(dlTag)
> +        || node->hasTagName(fieldsetTag)
> +        || node->hasTagName(figcaptionTag)
> +        || node->hasTagName(figureTag)
> +        || node->hasTagName(footerTag)
> +        || node->hasTagName(formTag)
> +        || node->hasTagName(headerTag)
> +        || node->hasTagName(hgroupTag)
> +        || node->hasTagName(hrTag)
> +        || node->hasTagName(noscriptTag)
> +        || node->hasTagName(olTag)
> +        || node->hasTagName(outputTag)
> +        || node->hasTagName(pTag)
> +        || node->hasTagName(preTag)
> +        || node->hasTagName(sectionTag)
> +        || node->hasTagName(tableTag)
> +        || node->hasTagName(tfootTag)
> +        || node->hasTagName(ulTag)
> +        || node->hasTagName(videoTag);

Where did you get this list? We should use the same list of nodes as ancestorToRetainStructureAndAppearanceForBlock. e.g. this list is missing h1 and including random elements like video and audio. r- because of this code duplication.

> Source/WebCore/editing/markup.cpp:502
> +    Node* commonAncestorBlock = enclosingNodeOfType(
> +        firstPositionInOrBeforeNode(commonAncestor), isHtmlBlockLevelElement);

Nit: It seems like this can fit in one line.

> Source/WebCore/editing/markup.cpp:749
> +    RefPtr<Node> startNode;
> +    RefPtr<Node> endNode;

Come to think of it, these noes should probably be renamed to nodeBeforePastedContent and nodeAfterPastedContent.

> Source/WebCore/editing/markup.cpp:789
> +        if (node->childNodeCount())
> +            static_cast<ContainerNode*>(node.get())->removeAllChildren();
> +        next = node->traverseNextNode();

Again, next = node->traverseNextSibling(); since you've deleted all children.
Comment 52 Daniel Cheng 2011-10-14 00:05:23 PDT
Created attachment 110971 [details]
Patch
Comment 53 Daniel Cheng 2011-10-14 00:07:52 PDT
(In reply to comment #51)
> (From update of attachment 110925 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=110925&action=review
> 
> > Source/WebCore/ChangeLog:-975
> > -
> 
> Please revert this.
> 

Done.

> > Source/WebCore/editing/markup.cpp:496
> > +    return node->hasTagName(addressTag)
> > +        || node->hasTagName(articleTag)
> > +        || node->hasTagName(asideTag)
> > +        || node->hasTagName(audioTag)
> > +        || node->hasTagName(blockquoteTag)
> > +        || node->hasTagName(canvasTag)
> > +        || node->hasTagName(ddTag)
> > +        || node->hasTagName(divTag)
> > +        || node->hasTagName(dlTag)
> > +        || node->hasTagName(fieldsetTag)
> > +        || node->hasTagName(figcaptionTag)
> > +        || node->hasTagName(figureTag)
> > +        || node->hasTagName(footerTag)
> > +        || node->hasTagName(formTag)
> > +        || node->hasTagName(headerTag)
> > +        || node->hasTagName(hgroupTag)
> > +        || node->hasTagName(hrTag)
> > +        || node->hasTagName(noscriptTag)
> > +        || node->hasTagName(olTag)
> > +        || node->hasTagName(outputTag)
> > +        || node->hasTagName(pTag)
> > +        || node->hasTagName(preTag)
> > +        || node->hasTagName(sectionTag)
> > +        || node->hasTagName(tableTag)
> > +        || node->hasTagName(tfootTag)
> > +        || node->hasTagName(ulTag)
> > +        || node->hasTagName(videoTag);
> 
> Where did you get this list? We should use the same list of nodes as ancestorToRetainStructureAndAppearanceForBlock. e.g. this list is missing h1 and including random elements like video and audio. r- because of this code duplication.
> 

Updated per offline discussion.

> > Source/WebCore/editing/markup.cpp:502
> > +    Node* commonAncestorBlock = enclosingNodeOfType(
> > +        firstPositionInOrBeforeNode(commonAncestor), isHtmlBlockLevelElement);
> 
> Nit: It seems like this can fit in one line.
> 

Done.

> > Source/WebCore/editing/markup.cpp:749
> > +    RefPtr<Node> startNode;
> > +    RefPtr<Node> endNode;
> 
> Come to think of it, these noes should probably be renamed to nodeBeforePastedContent and nodeAfterPastedContent.
> 

Done.

> > Source/WebCore/editing/markup.cpp:789
> > +        if (node->childNodeCount())
> > +            static_cast<ContainerNode*>(node.get())->removeAllChildren();
> > +        next = node->traverseNextNode();
> 
> Again, next = node->traverseNextSibling(); since you've deleted all children.

Done. I misunderstood what traverseNextSibling() did. This makes the patch a bit simpler in the end too.
Comment 54 Ryosuke Niwa 2011-10-14 08:56:04 PDT
Comment on attachment 110971 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110971&action=review

> Source/WebCore/ChangeLog:10
> +        Covered by existing layout tests.

No rebaselines?

> Source/WebCore/editing/markup.cpp:431
> +bool isSpeciallyStyledBlock(const Node* node)

The word "specially" doesn't add any semantics here. I would merge this function and isHTMLBlockElement, and make it take an enum that indicates whether we include table cell (td/th) or not.

Alternatively, we can name this isNonTableCellHTMLBlockElement.

> Source/WebCore/editing/markup.cpp:444
> +        || node->hasTagName(h6Tag);

Let's not add h6 now. We need a test for this. r- because of this.

> Source/WebCore/editing/markup.cpp:449
> +    return node->hasTagName(tdTag)

You're missing th here.

> Source/WebCore/editing/markup.cpp:450
> +        || node->hasTagName(trTag)

I don't think we should have tr here.

> Source/WebCore/editing/markup.cpp:451
> +        || node->hasTagName(blockquoteTag)

blockquote probably belongs to isSpeciallyStyledBlock but let's not make that change either. We need a test.

> Source/WebCore/editing/markup.cpp:455
> +// FIXME: Do we want to handle mail quotes here instead? It might be useful...

Please remove "It might be useful..." It doesn't add any semantics.

> Source/WebCore/editing/markup.cpp:456
> +// This is currently handled one level about ancestorToRetainStructureAndAppearance.

I don't understand this comment.

> Source/WebCore/editing/markup.cpp:481
> +static Node* ancestorToRetainStructureAndAppearance(Node* commonAncestor)
> +{
> +    return ancestorToRetainStructureAndAppearanceForBlock(enclosingBlock(commonAncestor));
> +}
> +
> +static Node* ancestorToRetainStructureAndAppearanceWithNoRenderer(Node* commonAncestor)

Please add inline keywords to these functions.

> Source/WebCore/editing/markup.cpp:725
> +    // We move the resulting DOM tree into a Document so we can create a proper Range.

This comment seems unnecessary.

> Source/WebCore/editing/markup.cpp:747
> +    for (Node* node = taggedDocument->firstChild(); node; node = node->traverseNextNode()) {
> +        if (node->nodeType() == Node::COMMENT_NODE && static_cast<CharacterData*>(node)->data() == fragmentMarkerTag) {
> +            if (!nodeBeforeContext)
> +                nodeBeforeContext = node;
> +            else {
> +                nodeAfterContext = node;
> +                break;
> +            }
> +        }
> +    }
> +
> +    if (!(nodeBeforeContext && nodeAfterContext))
> +        return 0;
> +
> +    RefPtr<Range> range = Range::create(taggedDocument.get(),
> +        positionAfterNode(nodeBeforeContext.get()).parentAnchoredEquivalent(),
> +        positionBeforeNode(nodeAfterContext.get()).parentAnchoredEquivalent());

We can probably extract this segment as a function.

> Source/WebCore/editing/markup.cpp:754
> +    // Set up the initial range we're interested in.

This comment repeats what the code says. Please remove.

> Source/WebCore/editing/markup.cpp:760
> +        fragment->appendChild(specialCommonAncestor, ec);
> +        ASSERT(!ec);
> +    } else
> +        fragment->takeAllChildrenFrom(static_cast<ContainerNode*>(commonAncestor));

I would rather see a comment on why we call appendChild for specialCommonAncestor and takeAllChildrenFrom for commonAncestor.

> Source/WebCore/editing/markup.cpp:779
> +    Node* next;
> +    for (RefPtr<Node> node = fragment->firstChild(); node; node = next) {
> +        if (nodeBeforeContext->isDescendantOf(node.get())) {
> +            next = node->traverseNextNode();
> +            continue;
> +        }
> +        next = node->traverseNextSibling();
> +        node->parentNode()->removeChild(node.get(), ec);
> +        if (nodeBeforeContext == node)
> +            break;
> +    }
> +
> +    ASSERT(nodeAfterContext->parentNode());
> +    for (Node* node = nodeAfterContext.get(); node; node = next) {
> +        next = node->traverseNextSibling();
> +        node->parentNode()->removeChild(node, ec);
> +        ASSERT(!ec);
> +    }

I feel like this should be extracted as a helper function.
Comment 55 Daniel Cheng 2011-10-14 11:38:09 PDT
Created attachment 111035 [details]
Patch
Comment 56 Daniel Cheng 2011-10-14 11:38:46 PDT
Comment on attachment 110971 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110971&action=review

>> Source/WebCore/ChangeLog:10
>> +        Covered by existing layout tests.
> 
> No rebaselines?

No, assuming everything is correct, the resulting DOM tree in the "normal case" should be identical.

>> Source/WebCore/editing/markup.cpp:431
>> +bool isSpeciallyStyledBlock(const Node* node)
> 
> The word "specially" doesn't add any semantics here. I would merge this function and isHTMLBlockElement, and make it take an enum that indicates whether we include table cell (td/th) or not.
> 
> Alternatively, we can name this isNonTableCellHTMLBlockElement.

I chose to split them up but use your naming suggestion.

>> Source/WebCore/editing/markup.cpp:444
>> +        || node->hasTagName(h6Tag);
> 
> Let's not add h6 now. We need a test for this. r- because of this.

Done.

>> Source/WebCore/editing/markup.cpp:449
>> +    return node->hasTagName(tdTag)
> 
> You're missing th here.

Done.

>> Source/WebCore/editing/markup.cpp:450
>> +        || node->hasTagName(trTag)
> 
> I don't think we should have tr here.

I don't know that I agree. Why do we check for trTag in ancestorToRetainStructureAndAppearanceForBlock then?
Since I don't think it should make a difference, I removed it for now.

>> Source/WebCore/editing/markup.cpp:451
>> +        || node->hasTagName(blockquoteTag)
> 
> blockquote probably belongs to isSpeciallyStyledBlock but let's not make that change either. We need a test.

Done. Removed.

>> Source/WebCore/editing/markup.cpp:455
>> +// FIXME: Do we want to handle mail quotes here instead? It might be useful...
> 
> Please remove "It might be useful..." It doesn't add any semantics.

Done.

>> Source/WebCore/editing/markup.cpp:456
>> +// This is currently handled one level about ancestorToRetainStructureAndAppearance.
> 
> I don't understand this comment.

I updated it. I hope it makes more sense now.

>> Source/WebCore/editing/markup.cpp:481
>> +static Node* ancestorToRetainStructureAndAppearanceWithNoRenderer(Node* commonAncestor)
> 
> Please add inline keywords to these functions.

I really don't feel like the inline keyword adds or gets us anything here, but done.

>> Source/WebCore/editing/markup.cpp:725
>> +    // We move the resulting DOM tree into a Document so we can create a proper Range.
> 
> This comment seems unnecessary.

Done.

>> Source/WebCore/editing/markup.cpp:747
>> +        positionBeforeNode(nodeAfterContext.get()).parentAnchoredEquivalent());
> 
> We can probably extract this segment as a function.

Which segment and what would you suggest naming it?

>> Source/WebCore/editing/markup.cpp:754
>> +    // Set up the initial range we're interested in.
> 
> This comment repeats what the code says. Please remove.

Done.

>> Source/WebCore/editing/markup.cpp:760
>> +        fragment->takeAllChildrenFrom(static_cast<ContainerNode*>(commonAncestor));
> 
> I would rather see a comment on why we call appendChild for specialCommonAncestor and takeAllChildrenFrom for commonAncestor.

Done.

>> Source/WebCore/editing/markup.cpp:779
>> +    }
> 
> I feel like this should be extracted as a helper function.

Done... not entirely sure what you had in mind here though.
Comment 57 Ryosuke Niwa 2011-10-14 11:43:14 PDT
(In reply to comment #56)
> (From update of attachment 110971 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=110971&action=review
> 
> >> Source/WebCore/ChangeLog:10
> >> +        Covered by existing layout tests.
> > 
> > No rebaselines?
> 
> No, assuming everything is correct, the resulting DOM tree in the "normal case" should be identical.

Did you run layout tests? At least ones in editing, fast/events, and fast/forms?

> >> Source/WebCore/editing/markup.cpp:747
> >> +        positionBeforeNode(nodeAfterContext.get()).parentAnchoredEquivalent());
> > 
> > We can probably extract this segment as a function.
> 
> Which segment and what would you suggest naming it?

The part you'd find start and end nodes and create a range out of it.
Comment 58 Ryosuke Niwa 2011-10-14 11:48:54 PDT
Comment on attachment 111035 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111035&action=review

Please consider making trimFragment take PassRefPtr but this patch is probably landable as is.

> Source/WebCore/editing/markup.cpp:711
> +static void trimFragment(DocumentFragment* fragment, Node* nodeBeforeContext, Node* nodeAfterContext)

This function should probably take PassRefPtr to nodes.

> Source/WebCore/editing/markup.cpp:721
> +        node->parentNode()->removeChild(node.get(), ec);

I'd add an assertion that node doesn't contain nodeAfterContext.

> Source/WebCore/editing/markup.cpp:734
> +PassRefPtr<DocumentFragment> createFragmentFromMarkupWithContext(Document* document, const String& markup, unsigned fragmentStart, unsigned fragmentEnd, const String& baseURL, FragmentScriptingPermission scriptingPermission)

Nit: Probably split this into two lines.

> Source/WebCore/editing/markup.cpp:763
> +    if (!(nodeBeforeContext && nodeAfterContext))

I think !nodeBeforeContext || ! nodeAfterContext looked better.
Comment 59 Daniel Cheng 2011-10-14 12:16:37 PDT
Created attachment 111046 [details]
Patch
Comment 60 Daniel Cheng 2011-10-14 12:16:57 PDT
Comment on attachment 110971 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=110971&action=review

>>>> Source/WebCore/ChangeLog:10
>>>> +        Covered by existing layout tests.
>>> 
>>> No rebaselines?
>> 
>> No, assuming everything is correct, the resulting DOM tree in the "normal case" should be identical.
> 
> Did you run layout tests? At least ones in editing, fast/events, and fast/forms?

Yes.

>>>> Source/WebCore/editing/markup.cpp:747
>>>> +        positionBeforeNode(nodeAfterContext.get()).parentAnchoredEquivalent());
>>> 
>>> We can probably extract this segment as a function.
>> 
>> Which segment and what would you suggest naming it?
> 
> The part you'd find start and end nodes and create a range out of it.

Done.
Comment 61 Daniel Cheng 2011-10-14 12:17:52 PDT
Comment on attachment 111035 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111035&action=review

>> Source/WebCore/editing/markup.cpp:711
>> +static void trimFragment(DocumentFragment* fragment, Node* nodeBeforeContext, Node* nodeAfterContext)
> 
> This function should probably take PassRefPtr to nodes.

Not needed per our discussion.

>> Source/WebCore/editing/markup.cpp:721
>> +        node->parentNode()->removeChild(node.get(), ec);
> 
> I'd add an assertion that node doesn't contain nodeAfterContext.

Done.

>> Source/WebCore/editing/markup.cpp:734
>> +PassRefPtr<DocumentFragment> createFragmentFromMarkupWithContext(Document* document, const String& markup, unsigned fragmentStart, unsigned fragmentEnd, const String& baseURL, FragmentScriptingPermission scriptingPermission)
> 
> Nit: Probably split this into two lines.

Done.

>> Source/WebCore/editing/markup.cpp:763
>> +    if (!(nodeBeforeContext && nodeAfterContext))
> 
> I think !nodeBeforeContext || ! nodeAfterContext looked better.

Not needed anymore.
Comment 62 Ryosuke Niwa 2011-10-14 12:19:43 PDT
Comment on attachment 111046 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111046&action=review

> Source/WebCore/editing/markup.cpp:711
> +const char fragmentMarkerTag[] = "webkit-fragment-marker";

Nit: missing static.

> Source/WebCore/editing/markup.cpp:713
> +bool findNodesSurroundingContext(Document* document, RefPtr<Node>& nodeBeforeContext, RefPtr<Node>& nodeAfterContext)

Nit: missing static (this is going to cause a compilation error on some platforms).

> Source/WebCore/platform/chromium/ChromiumDataObject.cpp:-270
> -

Are you deleting a blank line? Or removing the last line break from the line (which is bad)?
Comment 63 Daniel Cheng 2011-10-14 12:26:06 PDT
Created attachment 111050 [details]
Patch
Comment 64 Daniel Cheng 2011-10-14 12:26:20 PDT
Comment on attachment 111046 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111046&action=review

>> Source/WebCore/editing/markup.cpp:711
>> +const char fragmentMarkerTag[] = "webkit-fragment-marker";
> 
> Nit: missing static.

Done. I always thought that const implies internal linkage in C++, but I added it to be safe.

>> Source/WebCore/editing/markup.cpp:713
>> +bool findNodesSurroundingContext(Document* document, RefPtr<Node>& nodeBeforeContext, RefPtr<Node>& nodeAfterContext)
> 
> Nit: missing static (this is going to cause a compilation error on some platforms).

Oops. Fixed.

>> Source/WebCore/platform/chromium/ChromiumDataObject.cpp:-270
>> -
> 
> Are you deleting a blank line? Or removing the last line break from the line (which is bad)?

Hm. I think my editor did that. Reverted this line.
Comment 65 Daniel Cheng 2011-10-14 12:55:01 PDT
Committed r97497: <http://trac.webkit.org/changeset/97497>
Comment 66 Adam Barth 2011-10-15 00:14:00 PDT
Comment on attachment 111050 [details]
Patch

Cleared review? from attachment 111050 [details] so that this bug does not appear in http://webkit.org/pending-review.  If you would like this patch reviewed, please attach it to a new bug (or re-open this bug before marking it for review again).