RESOLVED FIXED Bug 182564
REGRESSION (r223440): Copying & pasting a list from Microsoft Word to TinyMCE fails
https://bugs.webkit.org/show_bug.cgi?id=182564
Summary REGRESSION (r223440): Copying & pasting a list from Microsoft Word to TinyMCE...
Ryosuke Niwa
Reported 2018-02-06 23:53:15 PST
As reported by the developers of TinyMCE, my work to sanitize the pasted content introduced a new regression that copying & pasting a list item from Microsoft Word into TinyMCE would cause the list items to be plain text.
Attachments
WIP (9.65 KB, patch)
2018-02-06 23:55 PST, Ryosuke Niwa
no flags
WIP2 (19.41 KB, patch)
2018-02-08 00:32 PST, Ryosuke Niwa
no flags
WIP3 (19.41 KB, patch)
2018-02-08 00:42 PST, Ryosuke Niwa
no flags
Fixes the bug (94.83 KB, patch)
2018-02-08 23:38 PST, Ryosuke Niwa
no flags
Updated for ToT (94.82 KB, patch)
2018-02-08 23:41 PST, Ryosuke Niwa
no flags
Archive of layout-test-results from ews205 for win-future (11.50 MB, application/zip)
2018-02-09 03:40 PST, EWS Watchlist
no flags
Skip the test on Windows (99.31 KB, patch)
2018-02-09 14:43 PST, Ryosuke Niwa
no flags
Patch for landing (99.33 KB, patch)
2018-02-09 19:45 PST, Ryosuke Niwa
no flags
Fixes the off-by-one error (9.08 KB, patch)
2018-02-12 11:48 PST, Ryosuke Niwa
wenson_hsieh: review+
Fixed off by one error (9.08 KB, patch)
2018-02-12 12:09 PST, Ryosuke Niwa
no flags
Fixes Wordpress, evernote, and Gmail (27.43 KB, patch)
2018-02-12 17:32 PST, Ryosuke Niwa
bfulgham: review+
Fixes older versions of TinyMCE (26.97 KB, patch)
2018-02-13 14:44 PST, Ryosuke Niwa
no flags
Challenging document (40.50 KB, application/msword)
2018-02-13 20:13 PST, Andrew Herron
no flags
Fixed Evernote & Wordpress.com & MS Word's compat mode (81.75 KB, patch)
2018-02-13 22:16 PST, Ryosuke Niwa
wenson_hsieh: review+
Fixed Evernote & Wordpress.com & MS Word's compat mode (82.63 KB, patch)
2018-02-14 09:58 PST, Ryosuke Niwa
no flags
Document with heading lists (32.31 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-02-18 21:36 PST, Andrew Herron
no flags
Ryosuke Niwa
Comment 1 2018-02-06 23:55:05 PST
Ryosuke Niwa
Comment 2 2018-02-06 23:55:30 PST
Ryosuke Niwa
Comment 3 2018-02-08 00:32:55 PST
Ryosuke Niwa
Comment 4 2018-02-08 00:33:21 PST
The code change is done. Will write tests next.
Ryosuke Niwa
Comment 5 2018-02-08 00:42:06 PST
Created attachment 333362 [details] WIP3 Oops, there was a tiny bug there.
Andrew Herron
Comment 6 2018-02-08 22:45:19 PST
My initial impressions are that this gives us most (if not all) of what we need. It would be really nice if we could get a valid HTML document, though, instead of "<html><style></style>content" :) I'll adjust my code next week to pick up the style tag this way and see if there are any problems.
Ryosuke Niwa
Comment 7 2018-02-08 23:38:00 PST
Created attachment 333448 [details] Fixes the bug
Ryosuke Niwa
Comment 8 2018-02-08 23:41:29 PST
Created attachment 333449 [details] Updated for ToT
Ryosuke Niwa
Comment 9 2018-02-08 23:42:02 PST
(In reply to Andrew Herron from comment #6) > My initial impressions are that this gives us most (if not all) of what we > need. It would be really nice if we could get a valid HTML document, though, > instead of "<html><style></style>content" :) > > I'll adjust my code next week to pick up the style tag this way and see if > there are any problems. Oh, what kind of markup do you want to see instead? <html><body><style>?
Andrew Herron
Comment 10 2018-02-09 01:51:45 PST
I can technically handle anything, but a proper HTML document would be easiest to work with :) <html><head><style></style></head><body>content</body></html>
EWS Watchlist
Comment 11 2018-02-09 03:40:25 PST
Comment on attachment 333449 [details] Updated for ToT Attachment 333449 [details] did not pass win-ews (win): Output: http://webkit-queues.webkit.org/results/6429691 New failing tests: http/tests/security/clipboard/copy-paste-html-across-origin-strips-mso-list.html
EWS Watchlist
Comment 12 2018-02-09 03:40:35 PST
Created attachment 333475 [details] Archive of layout-test-results from ews205 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews205 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Ryosuke Niwa
Comment 13 2018-02-09 12:17:48 PST
(In reply to Andrew Herron from comment #10) > I can technically handle anything, but a proper HTML document would be > easiest to work with :) > > <html><head><style></style></head><body>content</body></html> <html><style>content is a perfectly a valid HTML5 document. Having said that, we can make that change.
Ryosuke Niwa
Comment 14 2018-02-09 12:56:53 PST
(In reply to Ryosuke Niwa from comment #13) > (In reply to Andrew Herron from comment #10) > > I can technically handle anything, but a proper HTML document would be > > easiest to work with :) > > > > <html><head><style></style></head><body>content</body></html> > > <html><style>content is a perfectly a valid HTML5 document. Having said > that, we can make that change. Never mind. This is actually tricky because we're using the fragment parsing algorithm which strips away head.
Wenson Hsieh
Comment 15 2018-02-09 13:54:47 PST
Comment on attachment 333449 [details] Updated for ToT View in context: https://bugs.webkit.org/attachment.cgi?id=333449&action=review Just some preliminary comments — I'll continue looking. > Source/WebCore/ChangeLog:25 > + that don't enable custom pasteboard data (detected by the content origin being null), and the HTML that don't => that doesn't > Source/WebCore/ChangeLog:30 > + PasteHTML.SripsMSOListWhenMissingMSOHTMLElement Srips => Strips > Source/WebCore/ChangeLog:58 > + set to Enable whenever the content COULD be the one generated by Microsoft Word. Not very intuitive, but sensible I suppose :/ Maybe MSOListQuirks should be renamed to something more like `MayUseMSOListQuirks { No, Yes }` to make this more obvious? > Source/WebCore/ChangeLog:65 > + attributes, one containing computed styles, and another containing mso-list. Luckily, HTML parsering parsering => parsing > Source/WebCore/ChangeLog:66 > + rule dictats that the first attribute always wins when more than attributes of the same name appears, dictats => dictates more than attributes of the same name appears => multiple attributes of the same name appear > Source/WebCore/editing/markup.cpp:596 > +bool StyledMarkupAccumulator::appendNodeToPreserveMSOList(Node& node, bool& inMSOList) Maybe set `inMSOList = false` at the beginning of this function so it always has a correct value after this call?
Ryosuke Niwa
Comment 16 2018-02-09 14:04:26 PST
(In reply to Wenson Hsieh from comment #15) > Comment on attachment 333449 [details] > Updated for ToT > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333449&action=review > > Just some preliminary comments — I'll continue looking. > > > Source/WebCore/ChangeLog:25 > > + that don't enable custom pasteboard data (detected by the content origin being null), and the HTML > > that don't => that doesn't Fixed. > > Source/WebCore/ChangeLog:30 > > + PasteHTML.SripsMSOListWhenMissingMSOHTMLElement > > Srips => Strips Fixed. > > Source/WebCore/ChangeLog:58 > > + set to Enable whenever the content COULD be the one generated by Microsoft Word. > > Not very intuitive, but sensible I suppose :/ > > Maybe MSOListQuirks should be renamed to something more like > `MayUseMSOListQuirks { No, Yes }` to make this more obvious? So the idea here is that MSOListQuirks is like a runtime feature. When the feature is enabled, we'd check for html tag, and do the quirks if needed. > > Source/WebCore/ChangeLog:65 > > + attributes, one containing computed styles, and another containing mso-list. Luckily, HTML parsering > > parsering => parsing Fixed. > > > Source/WebCore/ChangeLog:66 > > + rule dictats that the first attribute always wins when more than attributes of the same name appears, > > dictats => dictates Fixed. > more than attributes of the same name appears => multiple attributes of the > same name appear Fixed. > > > Source/WebCore/editing/markup.cpp:596 > > +bool StyledMarkupAccumulator::appendNodeToPreserveMSOList(Node& node, bool& inMSOList) > > Maybe set `inMSOList = false` at the beginning of this function so it always > has a correct value after this call? Oh, the whole point of that boolean is it changes over time during the traversal. It gets updated in this function. Made it a member variable instead to make this clear.
Ryosuke Niwa
Comment 17 2018-02-09 14:43:43 PST
Created attachment 333520 [details] Skip the test on Windows
Wenson Hsieh
Comment 18 2018-02-09 16:05:27 PST
> > Not very intuitive, but sensible I suppose :/ > > > > Maybe MSOListQuirks should be renamed to something more like > > `MayUseMSOListQuirks { No, Yes }` to make this more obvious? > > So the idea here is that MSOListQuirks is like a runtime feature. When the > feature is enabled, we'd check for html tag, and do the quirks if needed. Right, I understand that — it's just that the purpose wasn't obvious to me from the name alone (MSOListQuirks) and I was initially confused why there were 2 flags. Though, I suppose the enum values, ::Enable and ::Disable should be enough to make this clear, so it's fine. > > Maybe set `inMSOList = false` at the beginning of this function so it always > > has a correct value after this call? > > Oh, the whole point of that boolean is it changes over time during the > traversal. It gets updated in this function. Made it a member variable > instead to make this clear. Oh ok, now I see! Your new approach makes this a lot clearer. 👍🏻
Wenson Hsieh
Comment 19 2018-02-09 17:00:47 PST
Comment on attachment 333520 [details] Skip the test on Windows View in context: https://bugs.webkit.org/attachment.cgi?id=333520&action=review > Source/WebCore/editing/MarkupAccumulator.cpp:204 > + ASSERT(length <= text.data().length()); Perhaps `start + length <= text.data().length()`? > Source/WebCore/editing/markup.cpp:487 > + // Unfortunately, TinyMCE doesn't recognize mso-list inlie style if the style attribute contains properties. inlie => inline > Source/WebCore/editing/markup.cpp:600 > + Comment& commentNode = downcast<Comment>(node); Nit - auto&?
Ryosuke Niwa
Comment 20 2018-02-09 17:22:33 PST
(In reply to Wenson Hsieh from comment #19) > Comment on attachment 333520 [details] > Skip the test on Windows > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333520&action=review > > > Source/WebCore/editing/MarkupAccumulator.cpp:204 > > + ASSERT(length <= text.data().length()); > > Perhaps `start + length <= text.data().length()`? Oh oops, nice catch. Fixed. > > Source/WebCore/editing/markup.cpp:487 > > + // Unfortunately, TinyMCE doesn't recognize mso-list inlie style if the style attribute contains properties. > > inlie => inline Fixed. > > Source/WebCore/editing/markup.cpp:600 > > + Comment& commentNode = downcast<Comment>(node); > > Nit - auto&? Fixed.
Ryosuke Niwa
Comment 21 2018-02-09 19:17:31 PST
(In reply to Wenson Hsieh from comment #18) > > > Not very intuitive, but sensible I suppose :/ > > > > > > Maybe MSOListQuirks should be renamed to something more like > > > `MayUseMSOListQuirks { No, Yes }` to make this more obvious? > > > > So the idea here is that MSOListQuirks is like a runtime feature. When the > > feature is enabled, we'd check for html tag, and do the quirks if needed. > > Right, I understand that — it's just that the purpose wasn't obvious to me > from the name alone (MSOListQuirks) and I was initially confused why there > were 2 flags. Though, I suppose the enum values, ::Enable and ::Disable > should be enough to make this clear, so it's fine. Okay, I'd rename MSOListQuirks::Enabled to MSOListQuirks::CheckIfNeeded as we discussed in person.
Ryosuke Niwa
Comment 22 2018-02-09 19:45:39 PST
Created attachment 333550 [details] Patch for landing
Ryosuke Niwa
Comment 23 2018-02-09 19:56:40 PST
Comment on attachment 333550 [details] Patch for landing Wait for EWS first.
Ryosuke Niwa
Comment 24 2018-02-09 21:07:58 PST
Andrew Herron
Comment 25 2018-02-11 20:48:11 PST
(In reply to Ryosuke Niwa from comment #14) > (In reply to Ryosuke Niwa from comment #13) > > > > <html><style>content is a perfectly a valid HTML5 document. Having said > > that, we can make that change. > > Never mind. This is actually tricky because we're using the fragment parsing > algorithm which strips away head. I realise it's valid, however previously we could rely on the raw clipboard being a "proper" document. But as I said it's not a huge issue it just means some adjustment to how I handle the structure before I can validate if this includes everything we need.
Ryosuke Niwa
Comment 26 2018-02-12 11:48:22 PST
Reopening to attach new patch.
Ryosuke Niwa
Comment 27 2018-02-12 11:48:23 PST
Created attachment 333610 [details] Fixes the off-by-one error
EWS Watchlist
Comment 28 2018-02-12 11:49:32 PST
Attachment 333610 [details] did not pass style-queue: ERROR: Tools/ChangeLog:8: Line contains tab character. [whitespace/tab] [5] ERROR: Tools/ChangeLog:9: Line contains tab character. [whitespace/tab] [5] Total errors found: 2 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
Wenson Hsieh
Comment 29 2018-02-12 12:02:17 PST
Comment on attachment 333610 [details] Fixes the off-by-one error View in context: https://bugs.webkit.org/attachment.cgi?id=333610&action=review r=me > Tools/ChangeLog:8 > + Added assertions to make sure the trailing } is includd in the style. Nit - it looks like there's a tab character here.
Ryosuke Niwa
Comment 30 2018-02-12 12:07:51 PST
(In reply to Wenson Hsieh from comment #29) > Comment on attachment 333610 [details] > Fixes the off-by-one error > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333610&action=review > > r=me > > > Tools/ChangeLog:8 > > + Added assertions to make sure the trailing } is includd in the style. > > Nit - it looks like there's a tab character here. Oops, yeah, will fix before landing.
Ryosuke Niwa
Comment 31 2018-02-12 12:09:47 PST
Created attachment 333614 [details] Fixed off by one error
Ryosuke Niwa
Comment 32 2018-02-12 14:07:45 PST
Comment on attachment 333614 [details] Fixed off by one error This doesn't quite cut it. The original patch broke Evernote & Wordpress.com :(
Andrew Herron
Comment 33 2018-02-12 16:37:46 PST
(In reply to Wenson Hsieh from comment #29) > Comment on attachment 333610 [details] > Fixes the off-by-one error > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333610&action=review > > r=me > > > Tools/ChangeLog:8 > > + Added assertions to make sure the trailing } is includd in the style. > > Nit - it looks like there's a tab character here. You've also missed the e in "included" :)
Ryosuke Niwa
Comment 34 2018-02-12 17:32:01 PST
Created attachment 333653 [details] Fixes Wordpress, evernote, and Gmail
Andrew Herron
Comment 35 2018-02-12 23:32:59 PST
I like where this patch is going with extra style definitions included. Those are the main thing I was concerned we might need to fully process lists :) I wonder whether the variations in behaviour across TinyMCE versions you mention in your changelog are due to plugins; only our PowerPaste plugin uses detailed list processing (it's enabled on tinymce.com), the default paste implementation has very basic word import. Treating the style tag as text may be a bug that I can address rather than needing a webkit workaround (although putting it inside a head tag definitely helps). It's most likely because the content is not inside a body tag - in the current release if a body tag isn't found it assumes the entire clipboard is content. This is related to why I asked for the structure change earlier. I'm going to need to release a plugin update to support this change anyway so if the changes here break paste on the current release that's not the end of the world. I have some feedback on the current patch: > if (m_shouldPreserveMSOList && element.hasTagName(pTag) && element.getAttribute(classAttr).startsWith("MsoListParagraph")) This won't catch all list items in the content. Just from a quick search I'm seeing paragraphs with class "MsoHeading7" that have mso-list attributes, as well as "MsoNormal". And I'm sure there are others. Having a whitelist for this seems like the sort of thing that could easily break between versions of word. In addition, this "MsoListParagraph" filter isn't applied to the comment nodes in appendNodeToPreserveMSOList so the results are inconsistent. I'm seeing paragraphs without style mso-list but with the "supportsLists" comments inside. I made a custom patch on my machine that relaxes the override checks a bit and I'm getting _really_ good results.
Ryosuke Niwa
Comment 36 2018-02-13 13:03:40 PST
(In reply to Andrew Herron from comment #35) > > > if (m_shouldPreserveMSOList && element.hasTagName(pTag) && element.getAttribute(classAttr).startsWith("MsoListParagraph")) > > This won't catch all list items in the content. Just from a quick search I'm > seeing paragraphs with class "MsoHeading7" that have mso-list attributes, as > well as "MsoNormal". And I'm sure there are others. Having a whitelist for > this seems like the sort of thing that could easily break between versions > of word. That doesn't seem to affect the end result. > In addition, this "MsoListParagraph" filter isn't applied to the comment > nodes in appendNodeToPreserveMSOList so the results are inconsistent. I'm > seeing paragraphs without style mso-list but with the "supportsLists" > comments inside. Those nodes are taken care of by m_inMSOList check. > I made a custom patch on my machine that relaxes the override checks a bit > and I'm getting _really_ good results. What are the changes you made?
Brent Fulgham
Comment 37 2018-02-13 13:43:48 PST
Comment on attachment 333653 [details] Fixes Wordpress, evernote, and Gmail View in context: https://bugs.webkit.org/attachment.cgi?id=333653&action=review r=me > Tools/ChangeLog:33 > + Did you mean to have these two separate ChangeLog entries?
Ryosuke Niwa
Comment 38 2018-02-13 14:44:05 PST
Created attachment 333727 [details] Fixes older versions of TinyMCE
Andrew Herron
Comment 39 2018-02-13 16:08:40 PST
(In reply to Ryosuke Niwa from comment #36) > (In reply to Andrew Herron from comment #35) > > > > > if (m_shouldPreserveMSOList && element.hasTagName(pTag) && element.getAttribute(classAttr).startsWith("MsoListParagraph")) > > > > This won't catch all list items in the content. Just from a quick search I'm > > seeing paragraphs with class "MsoHeading7" that have mso-list attributes, as > > well as "MsoNormal". And I'm sure there are others. Having a whitelist for > > this seems like the sort of thing that could easily break between versions > > of word. > > That doesn't seem to affect the end result. This issue means mso-list styles are stripped from actual list items, and every time that happens we can't process the item. If the documents you're testing with all appear fine I will try to find some more complex documents that I can share - my test data mostly comes from customers so I need to be careful. > > In addition, this "MsoListParagraph" filter isn't applied to the comment > > nodes in appendNodeToPreserveMSOList so the results are inconsistent. I'm > > seeing paragraphs without style mso-list but with the "supportsLists" > > comments inside. > > Those nodes are taken care of by m_inMSOList check. But that isn't used for the style attribute output. I think the two should be linked somehow. As I said, in some documents the HTML I get on paste has paragraphs with the supportsLists comments and related data but no mso-list style on the P tag. That's the problem. If you're ever going to strip mso-list from a paragraph (which per above is a bad idea imo) it seems like the comments should be removed as well. My code deals with these comments, but only when processing a list paragraph which is detected by the mso-list style. Without mso-list, these comments make it into customer content (as a part of fake list-like paragraphs that are just confusing). > > I made a custom patch on my machine that relaxes the override checks a bit > > and I'm getting _really_ good results. > > What are the changes you made? I targeted two checks: - Style attribute preservation relaxed to element.getAttribute(classAttr).startsWith("Mso") - The msoListMode check relaxed to originalMarkup.startsWith("<html xmlns") The second change relates to an email I sent you, the check you use to detect MS Word documents is too specific and doesn't catch everything. I realise these changes are unlikely to be suitable for release, but I wanted to verify if more complete data would work with our list import routines. At first glance the answer appears to be yes.
Ryosuke Niwa
Comment 40 2018-02-13 16:38:30 PST
(In reply to Andrew Herron from comment #39) > > - Style attribute preservation relaxed to > element.getAttribute(classAttr).startsWith("Mso") > - The msoListMode check relaxed to originalMarkup.startsWith("<html xmlns") > > The second change relates to an email I sent you, the check you use to > detect MS Word documents is too specific and doesn't catch everything. These changes are probably too generic. That would match more documents than we'd like. It would be great if you can share specific documents from which copying & pasting would fail with the latest two patches.
Andrew Herron
Comment 41 2018-02-13 20:13:10 PST
Created attachment 333761 [details] Challenging document I found a really old Ephox press release in our test data that is an extreme example of what I'm talking about, it's not handled properly even with the changes I made :) There are newer documents with similar issues but they all have private data in them. I can try to generate my own or sanitise one of them if this document isn't suitable. When pasting from Word v16.9: - the <html> tag doesn't have xmlns:o as the first attribute, failing the overall list check - none of the list paragraphs (inside the table) have a class at all, completely failing the "preserve style attribute" check As a result, even with my relaxed checks this produces paragraphs with no "mso-list" style but retaining the "if supportsLists" comments.
Ryosuke Niwa
Comment 42 2018-02-13 22:16:26 PST
Created attachment 333768 [details] Fixed Evernote & Wordpress.com & MS Word's compat mode
Ryosuke Niwa
Comment 43 2018-02-13 22:17:54 PST
(In reply to Andrew Herron from comment #41) > Created attachment 333761 [details] > Challenging document Thanks for the document. This was super helpful! > There are newer documents with similar issues but they all have private data > in them. I can try to generate my own or sanitise one of them if this > document isn't suitable. > > When pasting from Word v16.9: > > - the <html> tag doesn't have xmlns:o as the first attribute, failing the > overall list check Indeed. This seems to only occur with the compatibility mode too. I've adjusted the check for look for xmlns:o and xmlns:w within <html ~> instead. > - none of the list paragraphs (inside the table) have a class at all, > completely failing the "preserve style attribute" check Indeed. I've fixed it to look for mso-list CSS property instead.
Andrew Herron
Comment 44 2018-02-13 22:32:43 PST
Sounds good! I'm building with the patch now. I'll try to confirm whether this appears to fix our issues before I finish work today - once this is merged I can get other developers to help verify the changes across our entire test document set. We will still need a new plugin release to be more flexible about the document structure it accepts, but other than that I think the changes on our end will be minor.
Andrew Herron
Comment 45 2018-02-13 23:21:32 PST
I pasted a few of our test documents with the patch and it looks good.
Wenson Hsieh
Comment 46 2018-02-14 07:42:16 PST
Comment on attachment 333768 [details] Fixed Evernote & Wordpress.com & MS Word's compat mode View in context: https://bugs.webkit.org/attachment.cgi?id=333768&action=review > Source/WebCore/editing/markup.cpp:433 > + if (m_shouldPreserveMSOList && element.hasTagName(pTag) && element.getAttribute(styleAttr).contains(";mso-list:")) Nit - maybe add a new helper method for this called something like `shouldPreserveMSOListStyleForElement`, and turn the above into a one-liner?
Ryosuke Niwa
Comment 47 2018-02-14 09:58:35 PST
Created attachment 333810 [details] Fixed Evernote & Wordpress.com & MS Word's compat mode
Ryosuke Niwa
Comment 48 2018-02-14 09:59:08 PST
Comment on attachment 333810 [details] Fixed Evernote & Wordpress.com & MS Word's compat mode Already been reviewed. Waiting for EWS.
Ryosuke Niwa
Comment 49 2018-02-14 13:07:44 PST
Andrew Herron
Comment 50 2018-02-15 21:47:32 PST
I've come across a document that when copied to the clipboard has a much simpler style attribute than others we've been testing with. The entire opening paragraph tag is: > <p class=Ctrl4Opsomming style='mso-list:l0 level1 lfo1'> As such it fails the ";mso-list:" check this patch uses and the style attribute is overridden. Could it be made to check if the style starts with "mso-list:" as well? :)
Andrew Herron
Comment 51 2018-02-18 21:36:52 PST
Created attachment 334136 [details] Document with heading lists I've found a new failing document that I can attach publicly. I remember thinking when I looked at the "element tag name is p" check that there would probably be cases of heading tags with mso-list attributes, but I don't think I ever mentioned it here. One of our test documents does indeed have that :) I've made some adjustments to the shouldPreserve check and verified this retains the correct style attributes both with this document and the private one mentioned in my previous comment: > bool StyledMarkupAccumulator::shouldPreserveMSOListStyleForElement(const Element& element) > { > return m_inMSOList || (m_shouldPreserveMSOList && (element.getAttribute(styleAttr).contains(";mso-list:") || element.getAttribute(styleAttr).startsWith("mso-list:"))); > } Although that's again probably too relaxed - it's really only the primary content block tags (P, H1-6) that can be lists. Not _all_ tags. And to be fair TinyMCE doesn't import this document correctly even when given the right styles. But that's a bug I plan to fix so it'd be nice to have heading list styles retained :)
Ryosuke Niwa
Comment 52 2018-02-19 13:23:53 PST
Andrew Herron
Comment 53 2018-02-19 15:53:58 PST
Should I log a new bug for the heading lists as well? The document I attached was for that, not the "starts with" issue.
Ryosuke Niwa
Comment 54 2018-02-19 17:17:29 PST
(In reply to Andrew Herron from comment #53) > Should I log a new bug for the heading lists as well? The document I > attached was for that, not the "starts with" issue. Oh, okay. Please do file one.
Note You need to log in before you can comment on or make changes to this bug.