Bug 111360

Summary: After sending message, Mail changes formatting
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: HTML EditingAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ayg, dbates, dglazkov, enrica, eric, esprehn+autocc, ian, mifenton, ojan.autocc, ojan, sam, tony, webkit.review.bot
Priority: P2 Keywords: HasReduction, InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 111397, 111400, 111466, 111491, 111639    
Bug Blocks:    
Attachments:
Description Flags
Reduction
none
Reduced repro. that requires only one paste operation
none
Work in progress 1
none
Work in progress 2
none
Current test failures
none
Work in progress 3
none
Before copy & paste (paste-text-018.html)
none
After copy & paste (paste-text-018.html)
none
Fixes the bug
none
Also prevent nested h1-h6
none
Fixes the bug enrica: review+

Description Ryosuke Niwa 2013-03-04 13:39:54 PST
Created attachment 191302 [details]
Reduction

Reproduction steps:
1. Triple click the line "– 1. Copy this line after triple click" and copy
2. Paste it after inserting a new line at the end of "2. insert new line after this and paste, select all and copy"
3. Select all
4. Copy
5. Paste it in the box right beneath "3. paste it below:"

Expected result:
Source and Destination look identical

Actual result:
The line pasted in step 2 has a bigger font.

<rdar://problem/12442331>
Comment 1 Ryosuke Niwa 2013-03-04 13:43:37 PST
The problem is caused by us generating nested p's. when WebKit inserts the copied content in step 2 (pasting "1. Copy this line after triple click"), it creates the following markup:

<p class="primary_area" style="font-size: 12px;"><p><span class="bullet">–</span><span class="platforms">1. &nbsp;</span>Copy this line after triple click<br></p><div><br></div></p>

Here, p is nested so HTML5 tree builder will close the outer p "prematurely".
Comment 2 Adam Barth 2013-03-04 16:42:08 PST
I can reproduce this issue.
Comment 3 Adam Barth 2013-03-04 16:42:51 PST
> Here, p is nested so HTML5 tree builder will close the outer p "prematurely".

How is the tree builder involved in this operation?  For parsing the fragment somehow?
Comment 4 Ryosuke Niwa 2013-03-04 16:46:06 PST
(In reply to comment #3)
> > Here, p is nested so HTML5 tree builder will close the outer p "prematurely".
> 
> How is the tree builder involved in this operation?  For parsing the fragment somehow?

Right. I have an even better repro. that only require pasting once.
Comment 5 Ryosuke Niwa 2013-03-04 16:46:41 PST
Created attachment 191350 [details]
Reduced repro. that requires only one paste operation
Comment 6 Adam Barth 2013-03-04 16:51:17 PST
I don't fully understand what the editing code is doing here, but I think you're running up against a basic limitation of HTML which is that not every HTML DOM can be represented as an HTML string.  If your DOM has nested HTMLParagraphElements, you won't be able to represent that using an HTML string because the HTML parser automatically closes the first <p> when parsing the "nested" <p>.
Comment 7 Adam Barth 2013-03-04 16:51:56 PST
What that means is that you'll always be able to find examples of DOMs that cannot be round-tripped through strings via copy and paste.
Comment 8 Ryosuke Niwa 2013-03-04 19:17:13 PST
Created attachment 191377 [details]
Work in progress 1
Comment 9 Ojan Vafai 2013-03-04 20:00:46 PST
(In reply to comment #6)
> I don't fully understand what the editing code is doing here, but I think you're running up against a basic limitation of HTML which is that not every HTML DOM can be represented as an HTML string.  If your DOM has nested HTMLParagraphElements, you won't be able to represent that using an HTML string because the HTML parser automatically closes the first <p> when parsing the "nested" <p>.

(In reply to comment #7)
> What that means is that you'll always be able to find examples of DOMs that cannot be round-tripped through strings via copy and paste.

That sucks for editing. Maybe clipboard parsing on paste shouldn't go through the standard HTML parser since typically the things put on clipboards are done by the browser and not by web developers.
Comment 10 Ryosuke Niwa 2013-03-04 20:02:34 PST
(In reply to comment #9)
>
> (In reply to comment #7)
> > What that means is that you'll always be able to find examples of DOMs that cannot be round-tripped through strings via copy and paste.
> 
> That sucks for editing. Maybe clipboard parsing on paste shouldn't go through the standard HTML parser since typically the things put on clipboards are done by the browser and not by web developers.

Yeah, I initially though that might work 'til I realized that when this message is sent (on Mail, Gmail, etc…) to someone else, the recipient will use the regular HTML5 parsing algorithm to view the message :(
Comment 11 Ryosuke Niwa 2013-03-05 18:05:37 PST
Created attachment 191624 [details]
Work in progress 2
Comment 12 Ryosuke Niwa 2013-03-05 18:08:50 PST
Created attachment 191626 [details]
Current test failures

The current approach doesn't seem to work because of the cases where we want to preserve border lines and other styles in the outer block.

We need to instead let elements get nested and then remove them as needed after the insertion.
Comment 13 Ryosuke Niwa 2013-03-06 20:08:28 PST
Created attachment 191903 [details]
Work in progress 3
Comment 14 Ryosuke Niwa 2013-03-06 20:10:17 PST
Created attachment 191904 [details]
Before copy & paste (paste-text-018.html)

Right now, this patch arguably regresses the following 3 tests:
editing/pasteboard/line-feed-between-br-and-b-should-not-reorder-pasted-content.html
editing/pasteboard/paste-text-018.html
editing/pasteboard/testcase-9507.html
Comment 15 Ryosuke Niwa 2013-03-06 20:16:32 PST
Created attachment 191906 [details]
After copy & paste (paste-text-018.html)

And here are screenshots showing the behavior. In particular, the red box that surrounds two lines will surround only the first line after copy & paste.
But I'm not sure if this should be a blocker since the said tests aren't really testing whether the red boxes to stay.

If this is considered as a regression, then I don't see any other way but to abandon the current approach and add a postmortem fix up instead (probably into SimplifyMarkupCommand).
Comment 16 Aryeh Gregor 2013-03-07 04:33:20 PST
You definitely want to make sure that no editing actions create DOMs that don't round-trip through HTML serialization.  The editing spec says that at various points, newly-inserted nodes should be checked to see if they have prohibited ancestors, and if they do, they break out of their parent repeatedly until their parent is allowed:

https://dvcs.w3.org/hg/editing/raw-file/tip/editing.html#fix-disallowed-ancestors
https://dvcs.w3.org/hg/editing/raw-file/tip/editing.html#allowed-child

E.g., if you had <p>foo<p>bar</p>baz</p>, it would become <p>foo</p><p>bar</p><p>baz</p>.

There are more things to worry about than just nested <p> -- you also can't nest <h#> in each other, nor <a>, and using table-related elements incorrectly can be very exciting (they'll vanish or jump around).  If you care about corner cases, <xmp> is fun, and <plaintext> even more so.  The spec's algorithm also converts misplaced <li>/<dt>/<dd> to <p>/<div>, although that round-trips fine, and likewise doesn't allow block elements inside inline.
Comment 17 Ryosuke Niwa 2013-03-07 19:53:16 PST
Created attachment 192134 [details]
Fixes the bug
Comment 18 WebKit Review Bot 2013-03-07 20:56:27 PST
Comment on attachment 192134 [details]
Fixes the bug

Attachment 192134 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17021275

New failing tests:
editing/pasteboard/paste-text-016.html
Comment 19 Ryosuke Niwa 2013-03-07 23:33:09 PST
Created attachment 192160 [details]
Also prevent nested h1-h6
Comment 20 WebKit Review Bot 2013-03-08 00:16:34 PST
Comment on attachment 192160 [details]
Also prevent nested h1-h6

Attachment 192160 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17054402

New failing tests:
editing/pasteboard/paste-text-016.html
Comment 21 Adam Barth 2013-03-08 11:27:37 PST
Comment on attachment 192160 [details]
Also prevent nested h1-h6

View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review

Please do

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:462
> +    DEFINE_STATIC_LOCAL(HashSet<AtomicString>, elements, ());

Please do not add static state to the tree builder.  It's likely we'll want to move it onto a background thread at some point, and we can't have static state on a background thread.

> Source/WebCore/html/parser/HTMLTreeBuilder.h:92
> +    static bool isProhibitedParagraphChild(const AtomicString&);

I'd wrap this in an #ifndef NDEBUG to make it clear that it's a debug-only function.
Comment 22 Adam Barth 2013-03-08 11:27:59 PST
(In reply to comment #21)
> (From update of attachment 192160 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review
> 
> Please do

^^^ Typo  :)
Comment 23 Enrica Casucci 2013-03-08 11:37:30 PST
Comment on attachment 192160 [details]
Also prevent nested h1-h6

View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review

> Source/WebCore/ChangeLog:26
> +        (ReplaceSelectionCommand):

Remove or add the list of added methods.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:562
> +// FIXME: Fix other types of DOM tree constructs like nested h1's that aren't isomortic under the HTML serialization and parsering algorithms.

typos: isomortic and parsering
Comment 24 Ryosuke Niwa 2013-03-08 11:42:49 PST
Created attachment 192257 [details]
Fixes the bug
Comment 25 Enrica Casucci 2013-03-08 11:46:51 PST
Comment on attachment 192257 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=192257&action=review

Looks good to me, Please add the FIXME to makeInsertedContentRoundTrippableWithHTMLTreeBuilder and address the test failures on cr-linux

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:617
> +

You removed the comment with the FIXME:
I think you should add it back.
Comment 26 Eric Seidel (no email) 2013-03-08 11:48:19 PST
Comment on attachment 192160 [details]
Also prevent nested h1-h6

View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review

>> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:462
>> +    DEFINE_STATIC_LOCAL(HashSet<AtomicString>, elements, ());
> 
> Please do not add static state to the tree builder.  It's likely we'll want to move it onto a background thread at some point, and we can't have static state on a background thread.

In this particular case, I'm not sure this is a problem, or?  We'd have to initialize this at a known time, but then it should be threadsafe to read from this hash, no?

>> Source/WebCore/html/parser/HTMLTreeBuilder.h:92
>> +    static bool isProhibitedParagraphChild(const AtomicString&);
> 
> I'd wrap this in an #ifndef NDEBUG to make it clear that it's a debug-only function.

He's using it from the Editor. :)
Comment 27 Adam Barth 2013-03-08 11:55:43 PST
(In reply to comment #26)
> (From update of attachment 192160 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review
> 
> >> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:462
> >> +    DEFINE_STATIC_LOCAL(HashSet<AtomicString>, elements, ());
> > 
> > Please do not add static state to the tree builder.  It's likely we'll want to move it onto a background thread at some point, and we can't have static state on a background thread.
> 
> In this particular case, I'm not sure this is a problem, or?  We'd have to initialize this at a known time, but then it should be threadsafe to read from this hash, no?

I'd rather not introduce static dependencies at all.
Comment 28 Eric Seidel (no email) 2013-03-08 12:02:22 PST
Comment on attachment 192257 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=192257&action=review

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:561
> +bool isProhibitedParagraphChild(const AtomicString& name)

Is this where you're getting the list?
http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inbody
Comment 29 Eric Seidel (no email) 2013-03-08 12:03:32 PST
(In reply to comment #28)
> (From update of attachment 192257 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=192257&action=review
> 
> > Source/WebCore/editing/ReplaceSelectionCommand.cpp:561
> > +bool isProhibitedParagraphChild(const AtomicString& name)
> 
> Is this where you're getting the list?
> http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inbody

AKA this line in our treebuidler:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp#L683
Comment 30 Adam Barth 2013-03-08 12:04:27 PST
I would like a chance to review this patch before it lands.
Comment 31 Eric Seidel (no email) 2013-03-08 12:04:49 PST
Comment on attachment 192257 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=192257&action=review

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:563
> +    // https://dvcs.w3.org/hg/editing/raw-file/57abe6d3cb60/editing.html#prohibited-paragraph-child

If this list is based on an editing spec, then yes, this definitely belongs in Editing and not the tree-builder.  If this is trying to share spec space with the HTML5 treebuilder, then we could presumably share code.
Comment 32 Ryosuke Niwa 2013-03-08 12:12:59 PST
Committed r145253: <http://trac.webkit.org/changeset/145253>
Comment 33 Adam Barth 2013-03-08 12:14:45 PST
(In reply to comment #32)
> Committed r145253: <http://trac.webkit.org/changeset/145253>

Why did you land this patch after I explicitly asked to review it before it landed?
Comment 34 Ryosuke Niwa 2013-03-08 12:18:16 PST
(In reply to comment #33)
> (In reply to comment #32)
> > Committed r145253: <http://trac.webkit.org/changeset/145253>
> 
> Why did you land this patch after I explicitly asked to review it before it landed?

Sorry, I didn't see your comment.
Comment 35 Ryosuke Niwa 2013-03-08 12:18:55 PST
Are there any specific concerns you have with the patch? I'm more than happy to address them or roll out the patch if needed.
Comment 36 Adam Barth 2013-03-08 12:19:38 PST
Did you not see Eric's comments either?
Comment 37 Ryosuke Niwa 2013-03-08 12:21:20 PST
(In reply to comment #36)
> Did you not see Eric's comments either?

No. I was talking with him on IRC. He had a few suggestions but he told me he's also fine with having a list in editing so I went ahead with the current approach.
Comment 38 Adam Barth 2013-03-08 12:27:23 PST
(In reply to comment #35)
> Are there any specific concerns you have with the patch? I'm more than happy to address them or roll out the patch if needed.

The issue I wanted to study was where this list comes from and whether it is redundant with other tag lists we have already.  The current patch reads like a rushed hack that is unlikely to be maintained as HTML parsing evolves in the future.  If we can build on other lists that are maintained for other purposes, we'll have less maintenance overall.

I'm eating lunch right now, so I don't have a large enough screen to study this question in detail.
Comment 39 Ryosuke Niwa 2013-03-08 12:33:38 PST
(In reply to comment #38)
> The issue I wanted to study was where this list comes from and whether it is redundant with other tag lists we have already.

The list comes from https://dvcs.w3.org/hg/editing/raw-file/57abe6d3cb60/editing.html#prohibited-paragraph-child although I suspect Aryeh grabbed it from HTML5 specifications. See his comment #16.

Unfortunately, the editing code is not at the state to implement the fix-disallowed-ancestors algorithm in the specification.

> If we can build on other lists that are maintained for other purposes, we'll have less maintenance overall.

That'll be nice indeed.

In general, we have a problem of editing code getting outdated while new features are added to HTML/CSS. We also have a numerous list of elements that I suspect are already outdated or do not contain all necessary elements.
Comment 40 Adam Barth 2013-03-08 13:13:29 PST
> In general, we have a problem of editing code getting outdated while new features are added to HTML/CSS. We also have a numerous list of elements that I suspect are already outdated or do not contain all necessary elements.

It sounds like you're saying that the editing code is full of unmaintainable hacks and therefore you don't mind adding yet another one.  Whatever, man.  I'm sorry I tried to engage with you on this bug.
Comment 41 Ryosuke Niwa 2013-03-08 13:46:06 PST
(In reply to comment #40)
> > In general, we have a problem of editing code getting outdated while new features are added to HTML/CSS. We also have a numerous list of elements that I suspect are already outdated or do not contain all necessary elements.
> 
> It sounds like you're saying that the editing code is full of unmaintainable hacks and therefore you don't mind adding yet another one.  Whatever, man.  I'm sorry I tried to engage with you on this bug.

No, I'm not saying that. I'm saying that we need to fix it.
Comment 42 Ryosuke Niwa 2013-03-08 13:52:37 PST
That's why my initial patch attempted to add the list to HTMLTreeBuilder.cpp in the hope that people adding new stuff there could update the list.

However, you made it clear that you don't want a static variable in HTMLTreeBuilder. Given that, I see no other option but to add it to ReplaceSelectionCommand.

As I've repeatedly said, I'm all ears if someone can come up with a way to share lists with HTMLTreeBuilder since that's what I want as well but I don't think anyone has done that yet.
Comment 43 Aryeh Gregor 2013-03-13 06:50:44 PDT
(In reply to comment #38)
> The issue I wanted to study was where this list comes from and whether it is redundant with other tag lists we have already.  The current patch reads like a rushed hack that is unlikely to be maintained as HTML parsing evolves in the future.  If we can build on other lists that are maintained for other purposes, we'll have less maintenance overall.

I whole-heartedly agree.  Unfortunately, the editing spec cares about a lot of stuff that nothing else in the platform does, because editing stuff deals with the DOM on a much higher level than anything else I'm aware of, so there's heavy dependency on editing-specific behavior for particular elements or lists thereof.  Things like introducing a new HTML element will need case-by-case consideration for what their impact on editing should be, in general (if the element is relevant to editable content).

Specifically, in this case, in the spec

https://dvcs.w3.org/hg/editing/raw-file/tip/editing.html#prohibited-paragraph-child-name

you'll find "View comments" links at the side.  The comment here says

"""
These are all the things that will close a <p> if found as a descendant. I think. Plus table stuff, since that can't be a descendant of a p either, although it won't auto-close it.
"""

So that tells you how I made the list.  I don't think it's redundant to anything in the HTML specification or can be constructed automatically from anything in the specification.  But if you have any suggestions to reduce redundancy or reduce the chances of the specs falling out of sync, I'd be very interested to know.