Bug 111360

Summary: After sending message, Mail changes formatting
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: HTML EditingAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ayg, dbates, dglazkov, enrica, eric, esprehn+autocc, ian, mifenton, ojan.autocc, ojan, sam, tony, webkit.review.bot
Priority: P2 Keywords: HasReduction, InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 111397, 111400, 111466, 111491, 111639    
Bug Blocks:    
Attachments:
Description Flags
Reduction
none
Reduced repro. that requires only one paste operation
none
Work in progress 1
none
Work in progress 2
none
Current test failures
none
Work in progress 3
none
Before copy & paste (paste-text-018.html)
none
After copy & paste (paste-text-018.html)
none
Fixes the bug
none
Also prevent nested h1-h6
none
Fixes the bug enrica: review+

Ryosuke Niwa
Reported 2013-03-04 13:39:54 PST
Created attachment 191302 [details] Reduction Reproduction steps: 1. Triple click the line "– 1. Copy this line after triple click" and copy 2. Paste it after inserting a new line at the end of "2. insert new line after this and paste, select all and copy" 3. Select all 4. Copy 5. Paste it in the box right beneath "3. paste it below:" Expected result: Source and Destination look identical Actual result: The line pasted in step 2 has a bigger font. <rdar://problem/12442331>
Attachments
Reduction (1.62 KB, text/html)
2013-03-04 13:39 PST, Ryosuke Niwa
no flags
Reduced repro. that requires only one paste operation (1.26 KB, text/html)
2013-03-04 16:46 PST, Ryosuke Niwa
no flags
Work in progress 1 (20.46 KB, patch)
2013-03-04 19:17 PST, Ryosuke Niwa
no flags
Work in progress 2 (37.99 KB, patch)
2013-03-05 18:05 PST, Ryosuke Niwa
no flags
Current test failures (56.82 KB, application/zip)
2013-03-05 18:08 PST, Ryosuke Niwa
no flags
Work in progress 3 (38.64 KB, patch)
2013-03-06 20:08 PST, Ryosuke Niwa
no flags
Before copy & paste (paste-text-018.html) (36.54 KB, image/png)
2013-03-06 20:10 PST, Ryosuke Niwa
no flags
After copy & paste (paste-text-018.html) (37.40 KB, image/png)
2013-03-06 20:16 PST, Ryosuke Niwa
no flags
Fixes the bug (28.63 KB, patch)
2013-03-07 19:53 PST, Ryosuke Niwa
no flags
Also prevent nested h1-h6 (31.97 KB, patch)
2013-03-07 23:33 PST, Ryosuke Niwa
no flags
Fixes the bug (26.67 KB, patch)
2013-03-08 11:42 PST, Ryosuke Niwa
enrica: review+
Ryosuke Niwa
Comment 1 2013-03-04 13:43:37 PST
The problem is caused by us generating nested p's. when WebKit inserts the copied content in step 2 (pasting "1. Copy this line after triple click"), it creates the following markup: <p class="primary_area" style="font-size: 12px;"><p><span class="bullet">–</span><span class="platforms">1. &nbsp;</span>Copy this line after triple click<br></p><div><br></div></p> Here, p is nested so HTML5 tree builder will close the outer p "prematurely".
Adam Barth
Comment 2 2013-03-04 16:42:08 PST
I can reproduce this issue.
Adam Barth
Comment 3 2013-03-04 16:42:51 PST
> Here, p is nested so HTML5 tree builder will close the outer p "prematurely". How is the tree builder involved in this operation? For parsing the fragment somehow?
Ryosuke Niwa
Comment 4 2013-03-04 16:46:06 PST
(In reply to comment #3) > > Here, p is nested so HTML5 tree builder will close the outer p "prematurely". > > How is the tree builder involved in this operation? For parsing the fragment somehow? Right. I have an even better repro. that only require pasting once.
Ryosuke Niwa
Comment 5 2013-03-04 16:46:41 PST
Created attachment 191350 [details] Reduced repro. that requires only one paste operation
Adam Barth
Comment 6 2013-03-04 16:51:17 PST
I don't fully understand what the editing code is doing here, but I think you're running up against a basic limitation of HTML which is that not every HTML DOM can be represented as an HTML string. If your DOM has nested HTMLParagraphElements, you won't be able to represent that using an HTML string because the HTML parser automatically closes the first <p> when parsing the "nested" <p>.
Adam Barth
Comment 7 2013-03-04 16:51:56 PST
What that means is that you'll always be able to find examples of DOMs that cannot be round-tripped through strings via copy and paste.
Ryosuke Niwa
Comment 8 2013-03-04 19:17:13 PST
Created attachment 191377 [details] Work in progress 1
Ojan Vafai
Comment 9 2013-03-04 20:00:46 PST
(In reply to comment #6) > I don't fully understand what the editing code is doing here, but I think you're running up against a basic limitation of HTML which is that not every HTML DOM can be represented as an HTML string. If your DOM has nested HTMLParagraphElements, you won't be able to represent that using an HTML string because the HTML parser automatically closes the first <p> when parsing the "nested" <p>. (In reply to comment #7) > What that means is that you'll always be able to find examples of DOMs that cannot be round-tripped through strings via copy and paste. That sucks for editing. Maybe clipboard parsing on paste shouldn't go through the standard HTML parser since typically the things put on clipboards are done by the browser and not by web developers.
Ryosuke Niwa
Comment 10 2013-03-04 20:02:34 PST
(In reply to comment #9) > > (In reply to comment #7) > > What that means is that you'll always be able to find examples of DOMs that cannot be round-tripped through strings via copy and paste. > > That sucks for editing. Maybe clipboard parsing on paste shouldn't go through the standard HTML parser since typically the things put on clipboards are done by the browser and not by web developers. Yeah, I initially though that might work 'til I realized that when this message is sent (on Mail, Gmail, etc…) to someone else, the recipient will use the regular HTML5 parsing algorithm to view the message :(
Ryosuke Niwa
Comment 11 2013-03-05 18:05:37 PST
Created attachment 191624 [details] Work in progress 2
Ryosuke Niwa
Comment 12 2013-03-05 18:08:50 PST
Created attachment 191626 [details] Current test failures The current approach doesn't seem to work because of the cases where we want to preserve border lines and other styles in the outer block. We need to instead let elements get nested and then remove them as needed after the insertion.
Ryosuke Niwa
Comment 13 2013-03-06 20:08:28 PST
Created attachment 191903 [details] Work in progress 3
Ryosuke Niwa
Comment 14 2013-03-06 20:10:17 PST
Created attachment 191904 [details] Before copy & paste (paste-text-018.html) Right now, this patch arguably regresses the following 3 tests: editing/pasteboard/line-feed-between-br-and-b-should-not-reorder-pasted-content.html editing/pasteboard/paste-text-018.html editing/pasteboard/testcase-9507.html
Ryosuke Niwa
Comment 15 2013-03-06 20:16:32 PST
Created attachment 191906 [details] After copy & paste (paste-text-018.html) And here are screenshots showing the behavior. In particular, the red box that surrounds two lines will surround only the first line after copy & paste. But I'm not sure if this should be a blocker since the said tests aren't really testing whether the red boxes to stay. If this is considered as a regression, then I don't see any other way but to abandon the current approach and add a postmortem fix up instead (probably into SimplifyMarkupCommand).
Aryeh Gregor
Comment 16 2013-03-07 04:33:20 PST
You definitely want to make sure that no editing actions create DOMs that don't round-trip through HTML serialization. The editing spec says that at various points, newly-inserted nodes should be checked to see if they have prohibited ancestors, and if they do, they break out of their parent repeatedly until their parent is allowed: https://dvcs.w3.org/hg/editing/raw-file/tip/editing.html#fix-disallowed-ancestors https://dvcs.w3.org/hg/editing/raw-file/tip/editing.html#allowed-child E.g., if you had <p>foo<p>bar</p>baz</p>, it would become <p>foo</p><p>bar</p><p>baz</p>. There are more things to worry about than just nested <p> -- you also can't nest <h#> in each other, nor <a>, and using table-related elements incorrectly can be very exciting (they'll vanish or jump around). If you care about corner cases, <xmp> is fun, and <plaintext> even more so. The spec's algorithm also converts misplaced <li>/<dt>/<dd> to <p>/<div>, although that round-trips fine, and likewise doesn't allow block elements inside inline.
Ryosuke Niwa
Comment 17 2013-03-07 19:53:16 PST
Created attachment 192134 [details] Fixes the bug
WebKit Review Bot
Comment 18 2013-03-07 20:56:27 PST
Comment on attachment 192134 [details] Fixes the bug Attachment 192134 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17021275 New failing tests: editing/pasteboard/paste-text-016.html
Ryosuke Niwa
Comment 19 2013-03-07 23:33:09 PST
Created attachment 192160 [details] Also prevent nested h1-h6
WebKit Review Bot
Comment 20 2013-03-08 00:16:34 PST
Comment on attachment 192160 [details] Also prevent nested h1-h6 Attachment 192160 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17054402 New failing tests: editing/pasteboard/paste-text-016.html
Adam Barth
Comment 21 2013-03-08 11:27:37 PST
Comment on attachment 192160 [details] Also prevent nested h1-h6 View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review Please do > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:462 > + DEFINE_STATIC_LOCAL(HashSet<AtomicString>, elements, ()); Please do not add static state to the tree builder. It's likely we'll want to move it onto a background thread at some point, and we can't have static state on a background thread. > Source/WebCore/html/parser/HTMLTreeBuilder.h:92 > + static bool isProhibitedParagraphChild(const AtomicString&); I'd wrap this in an #ifndef NDEBUG to make it clear that it's a debug-only function.
Adam Barth
Comment 22 2013-03-08 11:27:59 PST
(In reply to comment #21) > (From update of attachment 192160 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review > > Please do ^^^ Typo :)
Enrica Casucci
Comment 23 2013-03-08 11:37:30 PST
Comment on attachment 192160 [details] Also prevent nested h1-h6 View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review > Source/WebCore/ChangeLog:26 > + (ReplaceSelectionCommand): Remove or add the list of added methods. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:562 > +// FIXME: Fix other types of DOM tree constructs like nested h1's that aren't isomortic under the HTML serialization and parsering algorithms. typos: isomortic and parsering
Ryosuke Niwa
Comment 24 2013-03-08 11:42:49 PST
Created attachment 192257 [details] Fixes the bug
Enrica Casucci
Comment 25 2013-03-08 11:46:51 PST
Comment on attachment 192257 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=192257&action=review Looks good to me, Please add the FIXME to makeInsertedContentRoundTrippableWithHTMLTreeBuilder and address the test failures on cr-linux > Source/WebCore/editing/ReplaceSelectionCommand.cpp:617 > + You removed the comment with the FIXME: I think you should add it back.
Eric Seidel (no email)
Comment 26 2013-03-08 11:48:19 PST
Comment on attachment 192160 [details] Also prevent nested h1-h6 View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review >> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:462 >> + DEFINE_STATIC_LOCAL(HashSet<AtomicString>, elements, ()); > > Please do not add static state to the tree builder. It's likely we'll want to move it onto a background thread at some point, and we can't have static state on a background thread. In this particular case, I'm not sure this is a problem, or? We'd have to initialize this at a known time, but then it should be threadsafe to read from this hash, no? >> Source/WebCore/html/parser/HTMLTreeBuilder.h:92 >> + static bool isProhibitedParagraphChild(const AtomicString&); > > I'd wrap this in an #ifndef NDEBUG to make it clear that it's a debug-only function. He's using it from the Editor. :)
Adam Barth
Comment 27 2013-03-08 11:55:43 PST
(In reply to comment #26) > (From update of attachment 192160 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=192160&action=review > > >> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:462 > >> + DEFINE_STATIC_LOCAL(HashSet<AtomicString>, elements, ()); > > > > Please do not add static state to the tree builder. It's likely we'll want to move it onto a background thread at some point, and we can't have static state on a background thread. > > In this particular case, I'm not sure this is a problem, or? We'd have to initialize this at a known time, but then it should be threadsafe to read from this hash, no? I'd rather not introduce static dependencies at all.
Eric Seidel (no email)
Comment 28 2013-03-08 12:02:22 PST
Comment on attachment 192257 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=192257&action=review > Source/WebCore/editing/ReplaceSelectionCommand.cpp:561 > +bool isProhibitedParagraphChild(const AtomicString& name) Is this where you're getting the list? http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inbody
Eric Seidel (no email)
Comment 29 2013-03-08 12:03:32 PST
(In reply to comment #28) > (From update of attachment 192257 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=192257&action=review > > > Source/WebCore/editing/ReplaceSelectionCommand.cpp:561 > > +bool isProhibitedParagraphChild(const AtomicString& name) > > Is this where you're getting the list? > http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inbody AKA this line in our treebuidler: http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp#L683
Adam Barth
Comment 30 2013-03-08 12:04:27 PST
I would like a chance to review this patch before it lands.
Eric Seidel (no email)
Comment 31 2013-03-08 12:04:49 PST
Comment on attachment 192257 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=192257&action=review > Source/WebCore/editing/ReplaceSelectionCommand.cpp:563 > + // https://dvcs.w3.org/hg/editing/raw-file/57abe6d3cb60/editing.html#prohibited-paragraph-child If this list is based on an editing spec, then yes, this definitely belongs in Editing and not the tree-builder. If this is trying to share spec space with the HTML5 treebuilder, then we could presumably share code.
Ryosuke Niwa
Comment 32 2013-03-08 12:12:59 PST
Adam Barth
Comment 33 2013-03-08 12:14:45 PST
(In reply to comment #32) > Committed r145253: <http://trac.webkit.org/changeset/145253> Why did you land this patch after I explicitly asked to review it before it landed?
Ryosuke Niwa
Comment 34 2013-03-08 12:18:16 PST
(In reply to comment #33) > (In reply to comment #32) > > Committed r145253: <http://trac.webkit.org/changeset/145253> > > Why did you land this patch after I explicitly asked to review it before it landed? Sorry, I didn't see your comment.
Ryosuke Niwa
Comment 35 2013-03-08 12:18:55 PST
Are there any specific concerns you have with the patch? I'm more than happy to address them or roll out the patch if needed.
Adam Barth
Comment 36 2013-03-08 12:19:38 PST
Did you not see Eric's comments either?
Ryosuke Niwa
Comment 37 2013-03-08 12:21:20 PST
(In reply to comment #36) > Did you not see Eric's comments either? No. I was talking with him on IRC. He had a few suggestions but he told me he's also fine with having a list in editing so I went ahead with the current approach.
Adam Barth
Comment 38 2013-03-08 12:27:23 PST
(In reply to comment #35) > Are there any specific concerns you have with the patch? I'm more than happy to address them or roll out the patch if needed. The issue I wanted to study was where this list comes from and whether it is redundant with other tag lists we have already. The current patch reads like a rushed hack that is unlikely to be maintained as HTML parsing evolves in the future. If we can build on other lists that are maintained for other purposes, we'll have less maintenance overall. I'm eating lunch right now, so I don't have a large enough screen to study this question in detail.
Ryosuke Niwa
Comment 39 2013-03-08 12:33:38 PST
(In reply to comment #38) > The issue I wanted to study was where this list comes from and whether it is redundant with other tag lists we have already. The list comes from https://dvcs.w3.org/hg/editing/raw-file/57abe6d3cb60/editing.html#prohibited-paragraph-child although I suspect Aryeh grabbed it from HTML5 specifications. See his comment #16. Unfortunately, the editing code is not at the state to implement the fix-disallowed-ancestors algorithm in the specification. > If we can build on other lists that are maintained for other purposes, we'll have less maintenance overall. That'll be nice indeed. In general, we have a problem of editing code getting outdated while new features are added to HTML/CSS. We also have a numerous list of elements that I suspect are already outdated or do not contain all necessary elements.
Adam Barth
Comment 40 2013-03-08 13:13:29 PST
> In general, we have a problem of editing code getting outdated while new features are added to HTML/CSS. We also have a numerous list of elements that I suspect are already outdated or do not contain all necessary elements. It sounds like you're saying that the editing code is full of unmaintainable hacks and therefore you don't mind adding yet another one. Whatever, man. I'm sorry I tried to engage with you on this bug.
Ryosuke Niwa
Comment 41 2013-03-08 13:46:06 PST
(In reply to comment #40) > > In general, we have a problem of editing code getting outdated while new features are added to HTML/CSS. We also have a numerous list of elements that I suspect are already outdated or do not contain all necessary elements. > > It sounds like you're saying that the editing code is full of unmaintainable hacks and therefore you don't mind adding yet another one. Whatever, man. I'm sorry I tried to engage with you on this bug. No, I'm not saying that. I'm saying that we need to fix it.
Ryosuke Niwa
Comment 42 2013-03-08 13:52:37 PST
That's why my initial patch attempted to add the list to HTMLTreeBuilder.cpp in the hope that people adding new stuff there could update the list. However, you made it clear that you don't want a static variable in HTMLTreeBuilder. Given that, I see no other option but to add it to ReplaceSelectionCommand. As I've repeatedly said, I'm all ears if someone can come up with a way to share lists with HTMLTreeBuilder since that's what I want as well but I don't think anyone has done that yet.
Aryeh Gregor
Comment 43 2013-03-13 06:50:44 PDT
(In reply to comment #38) > The issue I wanted to study was where this list comes from and whether it is redundant with other tag lists we have already. The current patch reads like a rushed hack that is unlikely to be maintained as HTML parsing evolves in the future. If we can build on other lists that are maintained for other purposes, we'll have less maintenance overall. I whole-heartedly agree. Unfortunately, the editing spec cares about a lot of stuff that nothing else in the platform does, because editing stuff deals with the DOM on a much higher level than anything else I'm aware of, so there's heavy dependency on editing-specific behavior for particular elements or lists thereof. Things like introducing a new HTML element will need case-by-case consideration for what their impact on editing should be, in general (if the element is relevant to editable content). Specifically, in this case, in the spec https://dvcs.w3.org/hg/editing/raw-file/tip/editing.html#prohibited-paragraph-child-name you'll find "View comments" links at the side. The comment here says """ These are all the things that will close a <p> if found as a descendant. I think. Plus table stuff, since that can't be a descendant of a p either, although it won't auto-close it. """ So that tells you how I made the list. I don't think it's redundant to anything in the HTML specification or can be constructed automatically from anything in the specification. But if you have any suggestions to reduce redundancy or reduce the chances of the specs falling out of sync, I'd be very interested to know.
Note You need to log in before you can comment on or make changes to this bug.