Bug 182564

Summary:

REGRESSION (r223440): Copying & pasting a list from Microsoft Word to TinyMCE fails

Product:

WebKit

Reporter:

Ryosuke Niwa <rniwa>

Component:

HTML Editing

Assignee:

Ryosuke Niwa <rniwa>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

bfulgham, commit-queue, ews-watchlist, m.lewandowski, m.samsel, thespyder, wenson_hsieh

Priority:

Keywords:

InRadar

Version:

WebKit Nightly Build

Hardware:

Unspecified

OS:

Unspecified

Bug Depends on:

124391

Bug Blocks:

182938, 182954

Attachments:

Description	Flags
WIP	none
WIP2	none
WIP3	none
Fixes the bug	none
Updated for ToT	none
Archive of layout-test-results from ews205 for win-future	none
Skip the test on Windows	none
Patch for landing	none
Fixes the off-by-one error	wenson_hsieh: review+
Fixed off by one error	none
Fixes Wordpress, evernote, and Gmail	bfulgham: review+
Fixes older versions of TinyMCE	none
Challenging document	none
Fixed Evernote & Wordpress.com & MS Word's compat mode	wenson_hsieh: review+
Fixed Evernote & Wordpress.com & MS Word's compat mode	none
Document with heading lists	none

Ryosuke Niwa

Reported 2018-02-06 23:53:15 PST

As reported by the developers of TinyMCE, my work to sanitize the pasted content introduced a new regression that copying & pasting a list item from Microsoft Word into TinyMCE would cause the list items to be plain text.

Attachments
WIP (9.65 KB, patch) 2018-02-06 23:55 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
WIP2 (19.41 KB, patch) 2018-02-08 00:32 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
WIP3 (19.41 KB, patch) 2018-02-08 00:42 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Fixes the bug (94.83 KB, patch) 2018-02-08 23:38 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Updated for ToT (94.82 KB, patch) 2018-02-08 23:41 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews205 for win-future (11.50 MB, application/zip) 2018-02-09 03:40 PST, EWS Watchlist	no flags	Details
Skip the test on Windows (99.31 KB, patch) 2018-02-09 14:43 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Patch for landing (99.33 KB, patch) 2018-02-09 19:45 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Fixes the off-by-one error (9.08 KB, patch) 2018-02-12 11:48 PST, Ryosuke Niwa	wenson_hsieh: review+	Details Formatted Diff Diff
Fixed off by one error (9.08 KB, patch) 2018-02-12 12:09 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Fixes Wordpress, evernote, and Gmail (27.43 KB, patch) 2018-02-12 17:32 PST, Ryosuke Niwa	bfulgham: review+	Details Formatted Diff Diff
Fixes older versions of TinyMCE (26.97 KB, patch) 2018-02-13 14:44 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Challenging document (40.50 KB, application/msword) 2018-02-13 20:13 PST, Andrew Herron	no flags	Details
Fixed Evernote & Wordpress.com & MS Word's compat mode (81.75 KB, patch) 2018-02-13 22:16 PST, Ryosuke Niwa	wenson_hsieh: review+	Details Formatted Diff Diff
Fixed Evernote & Wordpress.com & MS Word's compat mode (82.63 KB, patch) 2018-02-14 09:58 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Document with heading lists (32.31 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document) 2018-02-18 21:36 PST, Andrew Herron	no flags	Details
Show Obsolete (12) View All Add attachment proposed patch, testcase, etc.

Ryosuke Niwa

Comment 1 2018-02-06 23:55:05 PST

Created attachment 333269 [details] WIP

Ryosuke Niwa

Comment 2 2018-02-06 23:55:30 PST

<rdar://problem/36035103>

Ryosuke Niwa

Comment 3 2018-02-08 00:32:55 PST

Created attachment 333361 [details] WIP2

Ryosuke Niwa

Comment 4 2018-02-08 00:33:21 PST

The code change is done. Will write tests next.

Ryosuke Niwa

Comment 5 2018-02-08 00:42:06 PST

Created attachment 333362 [details] WIP3 Oops, there was a tiny bug there.

Andrew Herron

Comment 6 2018-02-08 22:45:19 PST

My initial impressions are that this gives us most (if not all) of what we need. It would be really nice if we could get a valid HTML document, though, instead of "<html><style></style>content" :) I'll adjust my code next week to pick up the style tag this way and see if there are any problems.

Ryosuke Niwa

Comment 7 2018-02-08 23:38:00 PST

Created attachment 333448 [details] Fixes the bug

Ryosuke Niwa

Comment 8 2018-02-08 23:41:29 PST

Created attachment 333449 [details] Updated for ToT

Ryosuke Niwa

Comment 9 2018-02-08 23:42:02 PST

(In reply to Andrew Herron from comment #6) > My initial impressions are that this gives us most (if not all) of what we > need. It would be really nice if we could get a valid HTML document, though, > instead of "<html><style></style>content" :) > > I'll adjust my code next week to pick up the style tag this way and see if > there are any problems. Oh, what kind of markup do you want to see instead? <html><body><style>?

Andrew Herron

Comment 10 2018-02-09 01:51:45 PST

I can technically handle anything, but a proper HTML document would be easiest to work with :) <html><head><style></style></head><body>content</body></html>

EWS Watchlist

Comment 11 2018-02-09 03:40:25 PST

Comment on attachment 333449 [details] Updated for ToT Attachment 333449 [details] did not pass win-ews (win): Output: http://webkit-queues.webkit.org/results/6429691 New failing tests: http/tests/security/clipboard/copy-paste-html-across-origin-strips-mso-list.html

EWS Watchlist

Comment 12 2018-02-09 03:40:35 PST

Created attachment 333475 [details] Archive of layout-test-results from ews205 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews205 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit

Ryosuke Niwa

Comment 13 2018-02-09 12:17:48 PST

(In reply to Andrew Herron from comment #10) > I can technically handle anything, but a proper HTML document would be > easiest to work with :) > > <html><head><style></style></head><body>content</body></html> <html><style>content is a perfectly a valid HTML5 document. Having said that, we can make that change.

Ryosuke Niwa

Comment 14 2018-02-09 12:56:53 PST

(In reply to Ryosuke Niwa from comment #13) > (In reply to Andrew Herron from comment #10) > > I can technically handle anything, but a proper HTML document would be > > easiest to work with :) > > > > <html><head><style></style></head><body>content</body></html> > > <html><style>content is a perfectly a valid HTML5 document. Having said > that, we can make that change. Never mind. This is actually tricky because we're using the fragment parsing algorithm which strips away head.

Wenson Hsieh

Comment 15 2018-02-09 13:54:47 PST

Comment on attachment 333449 [details] Updated for ToT View in context: https://bugs.webkit.org/attachment.cgi?id=333449&action=review Just some preliminary comments — I'll continue looking. > Source/WebCore/ChangeLog:25 > + that don't enable custom pasteboard data (detected by the content origin being null), and the HTML that don't => that doesn't > Source/WebCore/ChangeLog:30 > + PasteHTML.SripsMSOListWhenMissingMSOHTMLElement Srips => Strips > Source/WebCore/ChangeLog:58 > + set to Enable whenever the content COULD be the one generated by Microsoft Word. Not very intuitive, but sensible I suppose :/ Maybe MSOListQuirks should be renamed to something more like `MayUseMSOListQuirks { No, Yes }` to make this more obvious? > Source/WebCore/ChangeLog:65 > + attributes, one containing computed styles, and another containing mso-list. Luckily, HTML parsering parsering => parsing > Source/WebCore/ChangeLog:66 > + rule dictats that the first attribute always wins when more than attributes of the same name appears, dictats => dictates more than attributes of the same name appears => multiple attributes of the same name appear > Source/WebCore/editing/markup.cpp:596 > +bool StyledMarkupAccumulator::appendNodeToPreserveMSOList(Node& node, bool& inMSOList) Maybe set `inMSOList = false` at the beginning of this function so it always has a correct value after this call?

Ryosuke Niwa

Comment 16 2018-02-09 14:04:26 PST

(In reply to Wenson Hsieh from comment #15) > Comment on attachment 333449 [details] > Updated for ToT > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333449&action=review > > Just some preliminary comments — I'll continue looking. > > > Source/WebCore/ChangeLog:25 > > + that don't enable custom pasteboard data (detected by the content origin being null), and the HTML > > that don't => that doesn't Fixed. > > Source/WebCore/ChangeLog:30 > > + PasteHTML.SripsMSOListWhenMissingMSOHTMLElement > > Srips => Strips Fixed. > > Source/WebCore/ChangeLog:58 > > + set to Enable whenever the content COULD be the one generated by Microsoft Word. > > Not very intuitive, but sensible I suppose :/ > > Maybe MSOListQuirks should be renamed to something more like > `MayUseMSOListQuirks { No, Yes }` to make this more obvious? So the idea here is that MSOListQuirks is like a runtime feature. When the feature is enabled, we'd check for html tag, and do the quirks if needed. > > Source/WebCore/ChangeLog:65 > > + attributes, one containing computed styles, and another containing mso-list. Luckily, HTML parsering > > parsering => parsing Fixed. > > > Source/WebCore/ChangeLog:66 > > + rule dictats that the first attribute always wins when more than attributes of the same name appears, > > dictats => dictates Fixed. > more than attributes of the same name appears => multiple attributes of the > same name appear Fixed. > > > Source/WebCore/editing/markup.cpp:596 > > +bool StyledMarkupAccumulator::appendNodeToPreserveMSOList(Node& node, bool& inMSOList) > > Maybe set `inMSOList = false` at the beginning of this function so it always > has a correct value after this call? Oh, the whole point of that boolean is it changes over time during the traversal. It gets updated in this function. Made it a member variable instead to make this clear.

Ryosuke Niwa

Comment 17 2018-02-09 14:43:43 PST

Created attachment 333520 [details] Skip the test on Windows

Wenson Hsieh

Comment 18 2018-02-09 16:05:27 PST

> > Not very intuitive, but sensible I suppose :/ > > > > Maybe MSOListQuirks should be renamed to something more like > > `MayUseMSOListQuirks { No, Yes }` to make this more obvious? > > So the idea here is that MSOListQuirks is like a runtime feature. When the > feature is enabled, we'd check for html tag, and do the quirks if needed. Right, I understand that — it's just that the purpose wasn't obvious to me from the name alone (MSOListQuirks) and I was initially confused why there were 2 flags. Though, I suppose the enum values, ::Enable and ::Disable should be enough to make this clear, so it's fine. > > Maybe set `inMSOList = false` at the beginning of this function so it always > > has a correct value after this call? > > Oh, the whole point of that boolean is it changes over time during the > traversal. It gets updated in this function. Made it a member variable > instead to make this clear. Oh ok, now I see! Your new approach makes this a lot clearer. 👍🏻

Wenson Hsieh

Comment 19 2018-02-09 17:00:47 PST

Comment on attachment 333520 [details] Skip the test on Windows View in context: https://bugs.webkit.org/attachment.cgi?id=333520&action=review > Source/WebCore/editing/MarkupAccumulator.cpp:204 > + ASSERT(length <= text.data().length()); Perhaps `start + length <= text.data().length()`? > Source/WebCore/editing/markup.cpp:487 > + // Unfortunately, TinyMCE doesn't recognize mso-list inlie style if the style attribute contains properties. inlie => inline > Source/WebCore/editing/markup.cpp:600 > + Comment& commentNode = downcast<Comment>(node); Nit - auto&?

Ryosuke Niwa

Comment 20 2018-02-09 17:22:33 PST

(In reply to Wenson Hsieh from comment #19) > Comment on attachment 333520 [details] > Skip the test on Windows > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333520&action=review > > > Source/WebCore/editing/MarkupAccumulator.cpp:204 > > + ASSERT(length <= text.data().length()); > > Perhaps `start + length <= text.data().length()`? Oh oops, nice catch. Fixed. > > Source/WebCore/editing/markup.cpp:487 > > + // Unfortunately, TinyMCE doesn't recognize mso-list inlie style if the style attribute contains properties. > > inlie => inline Fixed. > > Source/WebCore/editing/markup.cpp:600 > > + Comment& commentNode = downcast<Comment>(node); > > Nit - auto&? Fixed.

Ryosuke Niwa

Comment 21 2018-02-09 19:17:31 PST

(In reply to Wenson Hsieh from comment #18) > > > Not very intuitive, but sensible I suppose :/ > > > > > > Maybe MSOListQuirks should be renamed to something more like > > > `MayUseMSOListQuirks { No, Yes }` to make this more obvious? > > > > So the idea here is that MSOListQuirks is like a runtime feature. When the > > feature is enabled, we'd check for html tag, and do the quirks if needed. > > Right, I understand that — it's just that the purpose wasn't obvious to me > from the name alone (MSOListQuirks) and I was initially confused why there > were 2 flags. Though, I suppose the enum values, ::Enable and ::Disable > should be enough to make this clear, so it's fine. Okay, I'd rename MSOListQuirks::Enabled to MSOListQuirks::CheckIfNeeded as we discussed in person.

Ryosuke Niwa

Comment 22 2018-02-09 19:45:39 PST

Created attachment 333550 [details] Patch for landing

Ryosuke Niwa

Comment 23 2018-02-09 19:56:40 PST

Comment on attachment 333550 [details] Patch for landing Wait for EWS first.

Ryosuke Niwa

Comment 24 2018-02-09 21:07:58 PST

Committed r228352: <https://trac.webkit.org/changeset/228352>

Andrew Herron

Comment 25 2018-02-11 20:48:11 PST

(In reply to Ryosuke Niwa from comment #14) > (In reply to Ryosuke Niwa from comment #13) > > > > <html><style>content is a perfectly a valid HTML5 document. Having said > > that, we can make that change. > > Never mind. This is actually tricky because we're using the fragment parsing > algorithm which strips away head. I realise it's valid, however previously we could rely on the raw clipboard being a "proper" document. But as I said it's not a huge issue it just means some adjustment to how I handle the structure before I can validate if this includes everything we need.

Ryosuke Niwa

Comment 26 2018-02-12 11:48:22 PST

Reopening to attach new patch.

Ryosuke Niwa

Comment 27 2018-02-12 11:48:23 PST

Created attachment 333610 [details] Fixes the off-by-one error

EWS Watchlist

Comment 28 2018-02-12 11:49:32 PST

Attachment 333610 [details] did not pass style-queue: ERROR: Tools/ChangeLog:8: Line contains tab character. [whitespace/tab] [5] ERROR: Tools/ChangeLog:9: Line contains tab character. [whitespace/tab] [5] Total errors found: 2 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.

Wenson Hsieh

Comment 29 2018-02-12 12:02:17 PST

Comment on attachment 333610 [details] Fixes the off-by-one error View in context: https://bugs.webkit.org/attachment.cgi?id=333610&action=review r=me > Tools/ChangeLog:8 > + Added assertions to make sure the trailing } is includd in the style. Nit - it looks like there's a tab character here.

Ryosuke Niwa

Comment 30 2018-02-12 12:07:51 PST

(In reply to Wenson Hsieh from comment #29) > Comment on attachment 333610 [details] > Fixes the off-by-one error > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333610&action=review > > r=me > > > Tools/ChangeLog:8 > > + Added assertions to make sure the trailing } is includd in the style. > > Nit - it looks like there's a tab character here. Oops, yeah, will fix before landing.

Ryosuke Niwa

Comment 31 2018-02-12 12:09:47 PST

Created attachment 333614 [details] Fixed off by one error

Ryosuke Niwa

Comment 32 2018-02-12 14:07:45 PST

Comment on attachment 333614 [details] Fixed off by one error This doesn't quite cut it. The original patch broke Evernote & Wordpress.com :(

Andrew Herron

Comment 33 2018-02-12 16:37:46 PST

Ryosuke Niwa

Comment 34 2018-02-12 17:32:01 PST

Created attachment 333653 [details] Fixes Wordpress, evernote, and Gmail

Andrew Herron

Comment 35 2018-02-12 23:32:59 PST

I like where this patch is going with extra style definitions included. Those are the main thing I was concerned we might need to fully process lists :) I wonder whether the variations in behaviour across TinyMCE versions you mention in your changelog are due to plugins; only our PowerPaste plugin uses detailed list processing (it's enabled on tinymce.com), the default paste implementation has very basic word import. Treating the style tag as text may be a bug that I can address rather than needing a webkit workaround (although putting it inside a head tag definitely helps). It's most likely because the content is not inside a body tag - in the current release if a body tag isn't found it assumes the entire clipboard is content. This is related to why I asked for the structure change earlier. I'm going to need to release a plugin update to support this change anyway so if the changes here break paste on the current release that's not the end of the world. I have some feedback on the current patch: > if (m_shouldPreserveMSOList && element.hasTagName(pTag) && element.getAttribute(classAttr).startsWith("MsoListParagraph")) This won't catch all list items in the content. Just from a quick search I'm seeing paragraphs with class "MsoHeading7" that have mso-list attributes, as well as "MsoNormal". And I'm sure there are others. Having a whitelist for this seems like the sort of thing that could easily break between versions of word. In addition, this "MsoListParagraph" filter isn't applied to the comment nodes in appendNodeToPreserveMSOList so the results are inconsistent. I'm seeing paragraphs without style mso-list but with the "supportsLists" comments inside. I made a custom patch on my machine that relaxes the override checks a bit and I'm getting _really_ good results.

Ryosuke Niwa

Comment 36 2018-02-13 13:03:40 PST

(In reply to Andrew Herron from comment #35) > > > if (m_shouldPreserveMSOList && element.hasTagName(pTag) && element.getAttribute(classAttr).startsWith("MsoListParagraph")) > > This won't catch all list items in the content. Just from a quick search I'm > seeing paragraphs with class "MsoHeading7" that have mso-list attributes, as > well as "MsoNormal". And I'm sure there are others. Having a whitelist for > this seems like the sort of thing that could easily break between versions > of word. That doesn't seem to affect the end result. > In addition, this "MsoListParagraph" filter isn't applied to the comment > nodes in appendNodeToPreserveMSOList so the results are inconsistent. I'm > seeing paragraphs without style mso-list but with the "supportsLists" > comments inside. Those nodes are taken care of by m_inMSOList check. > I made a custom patch on my machine that relaxes the override checks a bit > and I'm getting _really_ good results. What are the changes you made?

Brent Fulgham

Comment 37 2018-02-13 13:43:48 PST

Comment on attachment 333653 [details] Fixes Wordpress, evernote, and Gmail View in context: https://bugs.webkit.org/attachment.cgi?id=333653&action=review r=me > Tools/ChangeLog:33 > + Did you mean to have these two separate ChangeLog entries?

Ryosuke Niwa

Comment 38 2018-02-13 14:44:05 PST

Created attachment 333727 [details] Fixes older versions of TinyMCE

Andrew Herron

Comment 39 2018-02-13 16:08:40 PST

(In reply to Ryosuke Niwa from comment #36) > (In reply to Andrew Herron from comment #35) > > > > > if (m_shouldPreserveMSOList && element.hasTagName(pTag) && element.getAttribute(classAttr).startsWith("MsoListParagraph")) > > > > This won't catch all list items in the content. Just from a quick search I'm > > seeing paragraphs with class "MsoHeading7" that have mso-list attributes, as > > well as "MsoNormal". And I'm sure there are others. Having a whitelist for > > this seems like the sort of thing that could easily break between versions > > of word. > > That doesn't seem to affect the end result. This issue means mso-list styles are stripped from actual list items, and every time that happens we can't process the item. If the documents you're testing with all appear fine I will try to find some more complex documents that I can share - my test data mostly comes from customers so I need to be careful. > > In addition, this "MsoListParagraph" filter isn't applied to the comment > > nodes in appendNodeToPreserveMSOList so the results are inconsistent. I'm > > seeing paragraphs without style mso-list but with the "supportsLists" > > comments inside. > > Those nodes are taken care of by m_inMSOList check. But that isn't used for the style attribute output. I think the two should be linked somehow. As I said, in some documents the HTML I get on paste has paragraphs with the supportsLists comments and related data but no mso-list style on the P tag. That's the problem. If you're ever going to strip mso-list from a paragraph (which per above is a bad idea imo) it seems like the comments should be removed as well. My code deals with these comments, but only when processing a list paragraph which is detected by the mso-list style. Without mso-list, these comments make it into customer content (as a part of fake list-like paragraphs that are just confusing). > > I made a custom patch on my machine that relaxes the override checks a bit > > and I'm getting _really_ good results. > > What are the changes you made? I targeted two checks: - Style attribute preservation relaxed to element.getAttribute(classAttr).startsWith("Mso") - The msoListMode check relaxed to originalMarkup.startsWith("<html xmlns") The second change relates to an email I sent you, the check you use to detect MS Word documents is too specific and doesn't catch everything. I realise these changes are unlikely to be suitable for release, but I wanted to verify if more complete data would work with our list import routines. At first glance the answer appears to be yes.

Ryosuke Niwa

Comment 40 2018-02-13 16:38:30 PST

(In reply to Andrew Herron from comment #39) > > - Style attribute preservation relaxed to > element.getAttribute(classAttr).startsWith("Mso") > - The msoListMode check relaxed to originalMarkup.startsWith("<html xmlns") > > The second change relates to an email I sent you, the check you use to > detect MS Word documents is too specific and doesn't catch everything. These changes are probably too generic. That would match more documents than we'd like. It would be great if you can share specific documents from which copying & pasting would fail with the latest two patches.

Andrew Herron

Comment 41 2018-02-13 20:13:10 PST

Created attachment 333761 [details] Challenging document I found a really old Ephox press release in our test data that is an extreme example of what I'm talking about, it's not handled properly even with the changes I made :) There are newer documents with similar issues but they all have private data in them. I can try to generate my own or sanitise one of them if this document isn't suitable. When pasting from Word v16.9: - the <html> tag doesn't have xmlns:o as the first attribute, failing the overall list check - none of the list paragraphs (inside the table) have a class at all, completely failing the "preserve style attribute" check As a result, even with my relaxed checks this produces paragraphs with no "mso-list" style but retaining the "if supportsLists" comments.

Ryosuke Niwa

Comment 42 2018-02-13 22:16:26 PST

Created attachment 333768 [details] Fixed Evernote & Wordpress.com & MS Word's compat mode

Ryosuke Niwa

Comment 43 2018-02-13 22:17:54 PST

(In reply to Andrew Herron from comment #41) > Created attachment 333761 [details] > Challenging document Thanks for the document. This was super helpful! > There are newer documents with similar issues but they all have private data > in them. I can try to generate my own or sanitise one of them if this > document isn't suitable. > > When pasting from Word v16.9: > > - the <html> tag doesn't have xmlns:o as the first attribute, failing the > overall list check Indeed. This seems to only occur with the compatibility mode too. I've adjusted the check for look for xmlns:o and xmlns:w within <html ~> instead. > - none of the list paragraphs (inside the table) have a class at all, > completely failing the "preserve style attribute" check Indeed. I've fixed it to look for mso-list CSS property instead.

Andrew Herron

Comment 44 2018-02-13 22:32:43 PST

Sounds good! I'm building with the patch now. I'll try to confirm whether this appears to fix our issues before I finish work today - once this is merged I can get other developers to help verify the changes across our entire test document set. We will still need a new plugin release to be more flexible about the document structure it accepts, but other than that I think the changes on our end will be minor.

Andrew Herron

Comment 45 2018-02-13 23:21:32 PST

I pasted a few of our test documents with the patch and it looks good.

Wenson Hsieh

Comment 46 2018-02-14 07:42:16 PST

Comment on attachment 333768 [details] Fixed Evernote & Wordpress.com & MS Word's compat mode View in context: https://bugs.webkit.org/attachment.cgi?id=333768&action=review > Source/WebCore/editing/markup.cpp:433 > + if (m_shouldPreserveMSOList && element.hasTagName(pTag) && element.getAttribute(styleAttr).contains(";mso-list:")) Nit - maybe add a new helper method for this called something like `shouldPreserveMSOListStyleForElement`, and turn the above into a one-liner?

Ryosuke Niwa

Comment 47 2018-02-14 09:58:35 PST

Created attachment 333810 [details] Fixed Evernote & Wordpress.com & MS Word's compat mode

Ryosuke Niwa

Comment 48 2018-02-14 09:59:08 PST

Comment on attachment 333810 [details] Fixed Evernote & Wordpress.com & MS Word's compat mode Already been reviewed. Waiting for EWS.

Ryosuke Niwa

Comment 49 2018-02-14 13:07:44 PST

Committed r228482: <https://trac.webkit.org/changeset/228482>

Andrew Herron

Comment 50 2018-02-15 21:47:32 PST

I've come across a document that when copied to the clipboard has a much simpler style attribute than others we've been testing with. The entire opening paragraph tag is: > <p class=Ctrl4Opsomming style='mso-list:l0 level1 lfo1'> As such it fails the ";mso-list:" check this patch uses and the style attribute is overridden. Could it be made to check if the style starts with "mso-list:" as well? :)

Andrew Herron

Comment 51 2018-02-18 21:36:52 PST

Created attachment 334136 [details] Document with heading lists I've found a new failing document that I can attach publicly. I remember thinking when I looked at the "element tag name is p" check that there would probably be cases of heading tags with mso-list attributes, but I don't think I ever mentioned it here. One of our test documents does indeed have that :) I've made some adjustments to the shouldPreserve check and verified this retains the correct style attributes both with this document and the private one mentioned in my previous comment: > bool StyledMarkupAccumulator::shouldPreserveMSOListStyleForElement(const Element& element) > { > return m_inMSOList || (m_shouldPreserveMSOList && (element.getAttribute(styleAttr).contains(";mso-list:") || element.getAttribute(styleAttr).startsWith("mso-list:"))); > } Although that's again probably too relaxed - it's really only the primary content block tags (P, H1-6) that can be lists. Not _all_ tags. And to be fair TinyMCE doesn't import this document correctly even when given the right styles. But that's a bug I plan to fix so it'd be nice to have heading list styles retained :)

Ryosuke Niwa

Comment 52 2018-02-19 13:23:53 PST

Let's track this bug in https://bugs.webkit.org/show_bug.cgi?id=182938.

Andrew Herron

Comment 53 2018-02-19 15:53:58 PST

Should I log a new bug for the heading lists as well? The document I attached was for that, not the "starts with" issue.

Ryosuke Niwa

Comment 54 2018-02-19 17:17:29 PST

(In reply to Andrew Herron from comment #53) > Should I log a new bug for the heading lists as well? The document I > attached was for that, not the "starts with" issue. Oh, okay. Please do file one.

Note You need to log in before you can comment on or make changes to this bug.