Bug 124391

Summary: Cannot access images included in the content pasted from Microsoft Word
Product: WebKit Reporter: Andrew Herron <thespyder>
Component: HTML EditingAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, bdakin, buildbot, cdumez, commit-queue, darin, david.wood, dbates, enrica, koivisto, mihaip, m.lewandowski, m.samsel, pkoszulinski, rniwa, sam, webkit-bug-importer, wenson_hsieh
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Mac (Intel)   
OS: OS X 10.9   
Bug Depends on: 174165, 176986, 177801, 178060, 178118, 178154    
Bug Blocks: 181616, 182564    
Attachments:
Description Flags
replication case
none
WIP patch
none
WIP patch
buildbot: commit-queue-
Archive of layout-test-results from ews103 for mac-elcapitan
none
Archive of layout-test-results from ews112 for mac-elcapitan
none
WIP - Made macOS work
none
Fixes the bug
none
Updated for ToT
none
Archive of layout-test-results from ews116 for mac-elcapitan
none
Mostly fixed builds
none
Archive of layout-test-results from ews112 for mac-elcapitan
none
Archive of layout-test-results from ews101 for mac-elcapitan
none
Fixed tests
none
Archive of layout-test-results from ews124 for ios-simulator-wk2
none
Fixed one more test
none
Patch
none
Used CallWith=Document
none
Added a API test for WebArchive
none
Removed TestWebKitAPI.xcodeproj/project.pbxproj.orig
none
Updated for ToT
none
Patch for landing
commit-queue: commit-queue-
Document that fails to import an image using word 2016 none

Description Andrew Herron 2013-11-14 18:29:55 PST
Created attachment 217003 [details]
replication case

When capturing a paste event, the clipboardData object claims to offer text/rtf data but the contents are (as far as I can tell) always empty. There is some additional information in RTF data that is not available through the text/html data format and I would like to include it in my paste data processing.

I have attached a HTML document which prints the RTF contents to the console on paste. There is also a sample MS Word document for creating RTF clipboard data (although on OS X simply copying content inside webkit works just as well).

Replicated as I write this on nightly build r159308.
Comment 1 Andrew Herron 2016-06-17 00:05:01 PDT
This bug impacts the ability of our two widely used HTML editors (http://textbox.io and http://tinymce.com) to import content from Microsoft Word, a very popular feature among our customers. Textbox.io is in the process of being deployed to hundreds of thousands of IBM users. TinyMCE is very well known as the default Wordpress editor and sees wide use in other markets as well.

This bug is the reason our editors currently require Flash, and with the recent talk of turning off Flash by default we’ve been (unsuccessfully) trying to find a contractor to help us implement this and contribute the code. Now that we know it’s happening in Safari 10, we’d _really_ like to get this looked at. We successfully sponsored the same change in Firefox last year: https://bugzilla.mozilla.org/show_bug.cgi?id=938991

We need RTF because when content with images is pasted from Microsoft Word it includes references to those images in a temp folder on the local file system. The files can’t be read in JavaScript, but the binary contents are also contained within the text/rtf format on the clipboard.  Due to this bug we have been forced to use a small Flash object to read the RTF contents from the clipboard and gain access to the image data.

We have also logged this issue with Chrome:
https://bugs.chromium.org/p/chromium/issues/detail?id=317807
Comment 2 Radar WebKit Bug Importer 2016-06-17 08:47:34 PDT
<rdar://problem/26862741>
Comment 3 David Wood 2017-01-26 00:24:53 PST
This issue has been fixed by the other major browsers:

- Firefox fixed it as of version 45
- Chrome fixed it as of version 54
- Edge reported a fix today in their Issue #9018384, https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/9018384/

So it's Apple's turn next, right? :)
Comment 4 Chris Dumez 2017-01-26 10:24:27 PST
We put the RTF data on the pasteboard but we do not allow WebContent to read it at the moment. In PasteboardMac.mm's cocoaTypeFromHTMLClipboardType():
    // Blacklist types that might contain subframe information.
    if (lowercasedType == "text/rtf" || lowercasedType == "public.rtf" || lowercasedType == "com.apple.traditional-mac-plain-text")
        return String();

If you replace with:
    if (lowercasedType == "text/rtf" || lowercasedType == "public.rtf")
        return NSRTFPboardType;

then the test case works.

However, it looks like someone intentionally blacklisted those (likely for security reasons) so we'll have to investigate why and how to better deal with RTF.
Comment 5 Chris Dumez 2017-01-26 10:31:16 PST
(In reply to comment #4)
> We put the RTF data on the pasteboard but we do not allow WebContent to read
> it at the moment. In PasteboardMac.mm's cocoaTypeFromHTMLClipboardType():
>     // Blacklist types that might contain subframe information.
>     if (lowercasedType == "text/rtf" || lowercasedType == "public.rtf" ||
> lowercasedType == "com.apple.traditional-mac-plain-text")
>         return String();
> 
> If you replace with:
>     if (lowercasedType == "text/rtf" || lowercasedType == "public.rtf")
>         return NSRTFPboardType;
> 
> then the test case works.
> 
> However, it looks like someone intentionally blacklisted those (likely for
> security reasons) so we'll have to investigate why and how to better deal
> with RTF.

This was done in https://trac.webkit.org/changeset/115513.
Comment 6 Chris Dumez 2017-01-26 10:53:19 PST
(In reply to comment #5)
> (In reply to comment #4)
> > We put the RTF data on the pasteboard but we do not allow WebContent to read
> > it at the moment. In PasteboardMac.mm's cocoaTypeFromHTMLClipboardType():
> >     // Blacklist types that might contain subframe information.
> >     if (lowercasedType == "text/rtf" || lowercasedType == "public.rtf" ||
> > lowercasedType == "com.apple.traditional-mac-plain-text")
> >         return String();
> > 
> > If you replace with:
> >     if (lowercasedType == "text/rtf" || lowercasedType == "public.rtf")
> >         return NSRTFPboardType;
> > 
> > then the test case works.
> > 
> > However, it looks like someone intentionally blacklisted those (likely for
> > security reasons) so we'll have to investigate why and how to better deal
> > with RTF.
> 
> This was done in https://trac.webkit.org/changeset/115513.

see <rdar://problem/10639226>
Comment 7 David Wood 2017-01-26 10:57:43 PST
I believe the security issue was leakage of local filesystem paths (e.g. in image locations). Instead of fixing the input filter to the RTF clipboard, the bug was closed with a sledge hammer by removing access to the RTF clipboard.

Chrome fixed this with code that should be quite similar to what Webkit needs. Please see https://bugs.chromium.org/p/chromium/issues/detail?id=317807
Comment 8 Andrew Herron 2017-01-26 15:45:03 PST
ah, I found that blacklist code a long time ago but had only tried commenting it out (reversing the changeset). I'll try a build with returning NSRTFPboardType and see if that works in our editors.
Comment 9 Chris Dumez 2017-01-26 19:42:36 PST
(In reply to comment #8)
> ah, I found that blacklist code a long time ago but had only tried
> commenting it out (reversing the changeset). I'll try a build with returning
> NSRTFPboardType and see if that works in our editors.

Yes, just commenting out those lines also works.
Comment 10 Ryosuke Niwa 2017-01-26 19:47:30 PST
The issue here is that:
1. It can leak private data embedded in RTF from third party applications
2. IT can leak cross-origin content if the user had copied a range of content across an cross-origin iframe.

We need to solve both of these problems in order to enable this feature.

For 1, we probably need to paste RTF content into a document ourself, and then re-generate RTF out of the said document. For 2, we probably need to stop copying contents across an cross-origin iframe.
Comment 11 Chris Dumez 2017-01-26 19:52:00 PST
(In reply to comment #10)
> The issue here is that:
> 1. It can leak private data embedded in RTF from third party applications
> 2. IT can leak cross-origin content if the user had copied a range of
> content across an cross-origin iframe.
> 
> We need to solve both of these problems in order to enable this feature.
> 
> For 1, we probably need to paste RTF content into a document ourself, and
> then re-generate RTF out of the said document. For 2, we probably need to
> stop copying contents across an cross-origin iframe.

I am not sure I understand 1. I think it would be that third party app's responsibility to to put in the clipboard private data.

I understand 2. and it is the reason RTF was blacklisted in the first place as far as I can tell. I agree with your solution also it may be a little annoying to implement since AppKit does the RTF conversion for us.
Comment 12 Chris Dumez 2017-01-26 20:19:59 PST
Created attachment 299902 [details]
WIP patch
Comment 13 WebKit Commit Bot 2017-01-26 20:25:25 PST
Attachment 299902 [details] did not pass style-queue:


ERROR: Source/WebCore/editing/mac/EditorMac.mm:215:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 14 Ryosuke Niwa 2017-01-26 20:39:18 PST
(In reply to comment #11)
> (In reply to comment #10)
> > The issue here is that:
> > 1. It can leak private data embedded in RTF from third party applications
> > 2. IT can leak cross-origin content if the user had copied a range of
> > content across an cross-origin iframe.
> > 
> > We need to solve both of these problems in order to enable this feature.
> > 
> > For 1, we probably need to paste RTF content into a document ourself, and
> > then re-generate RTF out of the said document. For 2, we probably need to
> > stop copying contents across an cross-origin iframe.
> 
> I am not sure I understand 1. I think it would be that third party app's
> responsibility to to put in the clipboard private data.

No, we can't do that. Third party applications aren't expecting their RTF to be exposed to a random Web page.

This is why, for example, we don't expose raw HTML, which contain sensitive information such as local file path, real user name, etc... included in link/meta elements that aren't even visible in the page.
Comment 15 Chris Dumez 2017-01-26 20:41:07 PST
(In reply to comment #14)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > The issue here is that:
> > > 1. It can leak private data embedded in RTF from third party applications
> > > 2. IT can leak cross-origin content if the user had copied a range of
> > > content across an cross-origin iframe.
> > > 
> > > We need to solve both of these problems in order to enable this feature.
> > > 
> > > For 1, we probably need to paste RTF content into a document ourself, and
> > > then re-generate RTF out of the said document. For 2, we probably need to
> > > stop copying contents across an cross-origin iframe.
> > 
> > I am not sure I understand 1. I think it would be that third party app's
> > responsibility to to put in the clipboard private data.
> 
> No, we can't do that. Third party applications aren't expecting their RTF to
> be exposed to a random Web page.
> 
> This is why, for example, we don't expose raw HTML, which contain sensitive
> information such as local file path, real user name, etc... included in
> link/meta elements that aren't even visible in the page.

Ok. I am still working on 2.

Regarding 1., your proposal was to paste RTF content into a document ourself. I am not sure what you mean by that. Can you point me in the right direction?
Comment 16 Ryosuke Niwa 2017-01-26 20:41:42 PST
Comment on attachment 299902 [details]
WIP patch

View in context: https://bugs.webkit.org/attachment.cgi?id=299902&action=review

> Source/WebCore/editing/cocoa/HTMLConverter.mm:443
> -HTMLConverter::HTMLConverter(Range& range)
> +HTMLConverter::HTMLConverter(Range& range, IncludeSubFramesInAttributedString includeSubFrames)

I don't think we need to have this flag. We should just always not include cross-origin iframe's content.
Comment 17 Chris Dumez 2017-01-26 20:42:53 PST
Comment on attachment 299902 [details]
WIP patch

View in context: https://bugs.webkit.org/attachment.cgi?id=299902&action=review

>> Source/WebCore/editing/cocoa/HTMLConverter.mm:443
>> +HTMLConverter::HTMLConverter(Range& range, IncludeSubFramesInAttributedString includeSubFrames)
> 
> I don't think we need to have this flag. We should just always not include cross-origin iframe's content.

Ok. sounds good to me. I am currently working on checking security origins instead of skipping ALL iframes.
Comment 18 Ryosuke Niwa 2017-01-26 20:48:00 PST
(In reply to comment #15)
>
> Regarding 1., your proposal was to paste RTF content into a document
> ourself. I am not sure what you mean by that. Can you point me in the right
> direction?

What I mean is that
1. Paste RTF read from the pasteboard into a fake/empty document
2. Serialize the contents into RTF again using our HTMLConverter
3. Expose RTF from 2

We already need to do this for text/html since we should be exposing that as well. We used to have code to do this for Chromium:
https://trac.webkit.org/browser/trunk/Source/WebCore/editing/markup.cpp?rev=156256#L713

We might want to resurrect that function back into life.
Comment 19 Chris Dumez 2017-01-26 20:48:52 PST
Created attachment 299904 [details]
WIP patch
Comment 20 Ryosuke Niwa 2017-01-26 20:52:18 PST
Could you add a test to LayoutTests/editing/mac/attributed-string/?
Comment 21 Chris Dumez 2017-01-26 20:54:12 PST
(In reply to comment #20)
> Could you add a test to LayoutTests/editing/mac/attributed-string/?

Oh yes, sure, I'll add a test once the patch is complete. I still need to take care of sanitizing RTF on pasting, which is not something I am super familiar with.
Comment 22 Ryosuke Niwa 2017-01-26 20:58:31 PST
(In reply to comment #21)
> (In reply to comment #20)
> > Could you add a test to LayoutTests/editing/mac/attributed-string/?
> 
> Oh yes, sure, I'll add a test once the patch is complete. I still need to
> take care of sanitizing RTF on pasting, which is not something I am super
> familiar with.

Okay. It might be easier to start with text/html case because then you'd have the fake/empty document thing taken care of.
Comment 23 Chris Dumez 2017-01-26 21:00:21 PST
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #20)
> > > Could you add a test to LayoutTests/editing/mac/attributed-string/?
> > 
> > Oh yes, sure, I'll add a test once the patch is complete. I still need to
> > take care of sanitizing RTF on pasting, which is not something I am super
> > familiar with.
> 
> Okay. It might be easier to start with text/html case because then you'd
> have the fake/empty document thing taken care of.

I was not planning to work on text/html here, this bug is about rtf. I think we should file a separate bug if there is work needed for text/html.
Comment 24 Andrew Herron 2017-01-26 21:29:00 PST
I'm not really following... I wonder if there's some confusion here. What we need RTF for is not processed output, it's the raw string from the HTML5 clipboard API. My replication case only deals with text/rtf but we use text/html from the same event.

For the specific case of word import, it's critical that we get the raw clipboard string for both text/html (which we do today in Safari and other browsers) and text/rtf (which other browsers have added code to provide for us).

There is a lot of seemingly unnecessary stuff, particularly in the HTML contents, that we use to improve the quality of the pasted result.
Comment 25 Ryosuke Niwa 2017-01-26 21:35:24 PST
(In reply to comment #24)
> There is a lot of seemingly unnecessary stuff, particularly in the HTML
> contents, that we use to improve the quality of the pasted result.

The problem is that there's a privacy issue there. Applications like Microsoft Word exposes privacy sensitive information such as local file paths and user's real names without user even noticing them because they're invisible. We do need to strip those information to protect user's privacy.

Since there's a multiple of ways HTML can contain such information, the only sure way to get the right HTML content is to re-process the content in the pasteboard using our own HTML serialization algorithm. As far as I know, that's more or less what Chrome does already or what it used to do.

There is some consideration we can make like preserving the original HTML markup when copying & pasting the content within the same origin but that's probably more of a nice-to-have since such a content can be tagged with some class/id in the content.
Comment 26 Build Bot 2017-01-26 21:50:34 PST
Comment on attachment 299904 [details]
WIP patch

Attachment 299904 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/2956330

New failing tests:
fast/events/drag-and-drop-subframe-dataTransfer.html
Comment 27 Build Bot 2017-01-26 21:50:39 PST
Created attachment 299912 [details]
Archive of layout-test-results from ews103 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 28 Chris Dumez 2017-01-26 21:56:29 PST
If I understand correctly Ryosuke's proposal for exposing RTF to the Web is that we would do a round-trip to DOM representation:
Clipboard -> RTF -> DOM -> RTF -> JS Clipboard API

The idea being that the round-tripping to DOM would avoid exposing potentially sensitive information in the original RTF.

Ryosuke, please correct me if I misunderstood.
Comment 29 Build Bot 2017-01-26 21:59:21 PST
Comment on attachment 299904 [details]
WIP patch

Attachment 299904 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/2956332

New failing tests:
fast/events/drag-and-drop-subframe-dataTransfer.html
Comment 30 Build Bot 2017-01-26 21:59:25 PST
Created attachment 299914 [details]
Archive of layout-test-results from ews112 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 31 Ryosuke Niwa 2017-01-26 22:04:02 PST
(In reply to comment #28)
> If I understand correctly Ryosuke's proposal for exposing RTF to the Web is
> that we would do a round-trip to DOM representation:
> Clipboard -> RTF -> DOM -> RTF -> JS Clipboard API
> 
> The idea being that the round-tripping to DOM would avoid exposing
> potentially sensitive information in the original RTF.
> 
> Ryosuke, please correct me if I misunderstood.

Yes, that's the idea. An alternative route is to invent some sort of RTF parser & serializer which sanitizes things but that's a lot of work!
Comment 32 Andrew Herron 2017-01-26 22:18:55 PST
> The problem is that there's a privacy issue there. Applications like Microsoft Word exposes privacy sensitive information such as local file paths and user's real names without user even noticing them because they're invisible. We do need to strip those information to protect user's privacy.
> 
> Since there's a multiple of ways HTML can contain such information, the only sure way to get the right HTML content is to re-process the content in the pasteboard using our own HTML serialization algorithm. As far as I know, that's more or less what Chrome does already or what it used to do.

Path changes make sense; it is weird to have pasted content reference images on the file system (chrome doesn't appear to modify the clipboard at all, for the record, at at least not in v56).

I'm just concerned about parsing it to strip them. I'll watch and give feedback if it breaks our process; there are all sorts of things (style tags, comments, weird attributes) that word uses to store information we need.
Comment 33 Ryosuke Niwa 2017-01-26 22:32:29 PST
(In reply to comment #32)
> > The problem is that there's a privacy issue there. Applications like Microsoft Word exposes privacy sensitive information such as local file paths and user's real names without user even noticing them because they're invisible. We do need to strip those information to protect user's privacy.
> > 
> > Since there's a multiple of ways HTML can contain such information, the only sure way to get the right HTML content is to re-process the content in the pasteboard using our own HTML serialization algorithm. As far as I know, that's more or less what Chrome does already or what it used to do.
> 
> Path changes make sense; it is weird to have pasted content reference images
> on the file system (chrome doesn't appear to modify the clipboard at all,
> for the record, at at least not in v56).
> 
> I'm just concerned about parsing it to strip them. I'll watch and give
> feedback if it breaks our process; there are all sorts of things (style
> tags, comments, weird attributes) that word uses to store information we
> need.

The image should most certainly be included in the re-generated RTF.

By the way, these security & privacy considerations are explicitly called out in https://www.w3.org/TR/clipboard-apis/#security although the section mainly focuses on HTML content since RTF is not currently included in the list of allowed MIME types.
Comment 34 Andrew Herron 2017-01-26 22:47:48 PST
(In reply to comment #33)
> The image should most certainly be included in the re-generated RTF.
> 
> By the way, these security & privacy considerations are explicitly called
> out in https://www.w3.org/TR/clipboard-apis/#security although the section
> mainly focuses on HTML content since RTF is not currently included in the
> list of allowed MIME types.

I'm aware of the security requirements and implications, although nothing in that section mentions sanitising private information within the clipboard data; just that clipboard data must only be available in a user-initiated paste event.

I am happy to take a nightly once all of this is done and see what impact it has on our editors. I can provide a more complex word document to test with against our real code if that would help.
Comment 35 David Wood 2017-01-27 01:27:31 PST
(In reply to comment #33)
> (In reply to comment #32)
> The image should most certainly be included in the re-generated RTF.

Yes, we need to get the image data from the RTF clipboard in order to remove Flash. We have no need of any image path data. Thanks, Ryosuke.
Comment 36 Andrew Herron 2017-01-27 05:50:17 PST
(In reply to comment #8)
> ah, I found that blacklist code a long time ago but had only tried
> commenting it out (reversing the changeset). I'll try a build with returning
> NSRTFPboardType and see if that works in our editors.

Returning NSRTFPboardType didn't work for me but deleting the RTF blacklist completely did. I'm sure I tried deleting that section of code back when I first logged this bug, but either I was compiling wrong at the time or something else has changed since then that makes it work.

Regardless, using my build I've confirmed pasting images into https://textbox.io works without flash so that definitely matches what need.
Comment 37 David Wood 2017-03-07 21:56:21 PST
Microsoft Edge v15 tech preview now supports an RTF clipboard. That makes Safari/Webkit the last major browser that does not.

Since we seem to have fallen through the looking glass into a strange world where the US president reads Breitbart News, Microsoft writes Open Source Software, and China is the world leader on climate change policy, what do you say that Apple toss in a fix to this?
Comment 38 Ryosuke Niwa 2017-03-19 17:05:57 PDT
It's clear that we should fix this bug sooner or later. Unfortunately, we're currently busy with other higher priority bugs. I'd keep this bug in my mind next time I have time to work on some bug.
Comment 39 David Wood 2017-03-21 17:24:03 PDT
Thanks, Ryosuke. I guess we'll just need to drop Safari support in our products until this can be addressed :( We can't keep working around it with Flash anymore, so we have little choice.
Comment 40 Andrew Herron 2017-08-09 20:58:20 PDT
With Adobe's recent announcement that support for Flash will end in 2020, I'm wondering how that impacts this bug. It would be unfortunate if we have to wait another couple of years for a fix ;)
Comment 41 Ryosuke Niwa 2017-08-09 21:03:38 PDT
I'm going to start working on this bug shortly.
Comment 42 Andrew Herron 2017-08-09 21:15:41 PDT
Excellent, thanks Ryosuke!
Comment 43 David Wood 2017-08-09 22:43:51 PDT
Yes, thanks indeed. Much appreciated!
Comment 44 Ryosuke Niwa 2017-10-07 22:50:18 PDT
Apparently Apple's attributed string implementation doesn't support processing images in RTF. This would mean that we can't easily expose RTF content because we can't do a sanitization required for security & privacy purposes.

Instead, we're going to replace image URLs included in the pasted content by blob URLs so that editors can read & save them.

In fact, we have a related bug where if you copy & paste a part of web page that includes images in Safari, then editors like TinyMCE won't have access to that image because we simply override the document loader to expose those images instead of actually placing them in the page, or converting them to data URLs. This would mean that editors need to do some sort of server side fetching if at all possible to save / handle images whenever a Web content is copy & pasted within Safari.

The new approach of replacing every resource by blob URL should solve both of these issues, and allow editors like TinyMCE to save images.
Comment 45 Andrew Herron 2017-10-08 17:19:32 PDT
That's a much easier way for editors to access the image data, and we considered asking for it, but we figured there would be less security implications with exposing RTF data versus the browser needing to be careful which images referenced in pasted content were safe to expose as blobs.

But if that's how you want to move forward I'm happy :)
Comment 46 Ryosuke Niwa 2017-10-12 20:06:56 PDT
Created attachment 323624 [details]
WIP - Made macOS work
Comment 47 Ryosuke Niwa 2017-10-13 01:20:00 PDT
Created attachment 323648 [details]
Fixes the bug
Comment 48 Ryosuke Niwa 2017-10-13 01:28:20 PDT
Created attachment 323649 [details]
Updated for ToT
Comment 49 Build Bot 2017-10-13 03:14:28 PDT
Comment on attachment 323649 [details]
Updated for ToT

Attachment 323649 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4843299

New failing tests:
fast/events/ondrop-text-html.html
editing/pasteboard/data-transfer-get-data-on-pasting-html-uses-blob-url.html
editing/pasteboard/data-transfer-set-data-sanitize-url-when-dragging-in-null-origin.html
editing/pasteboard/data-transfer-set-data-sanitizes-html-when-copying-in-null-origin.html
Comment 50 Build Bot 2017-10-13 03:14:30 PDT
Created attachment 323654 [details]
Archive of layout-test-results from ews116 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 51 Ryosuke Niwa 2017-10-13 09:32:03 PDT
Created attachment 323682 [details]
Mostly fixed builds
Comment 52 Build Bot 2017-10-13 11:11:10 PDT
Comment on attachment 323682 [details]
Mostly fixed builds

Attachment 323682 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4846706

New failing tests:
fast/events/ondrop-text-html.html
editing/pasteboard/data-transfer-set-data-sanitize-url-when-dragging-in-null-origin.html
Comment 53 Build Bot 2017-10-13 11:11:12 PDT
Created attachment 323709 [details]
Archive of layout-test-results from ews112 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 54 Build Bot 2017-10-13 11:48:50 PDT
Comment on attachment 323682 [details]
Mostly fixed builds

Attachment 323682 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/4847943

New failing tests:
fast/events/ondrop-text-html.html
editing/pasteboard/data-transfer-set-data-sanitize-url-when-dragging-in-null-origin.html
Comment 55 Build Bot 2017-10-13 11:48:52 PDT
Created attachment 323716 [details]
Archive of layout-test-results from ews101 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 56 Ryosuke Niwa 2017-10-13 12:01:19 PDT
Created attachment 323721 [details]
Fixed tests
Comment 57 Build Bot 2017-10-13 16:42:35 PDT
Comment on attachment 323721 [details]
Fixed tests

Attachment 323721 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/4851364

New failing tests:
editing/pasteboard/data-transfer-get-data-on-paste-rich-text.html
Comment 58 Build Bot 2017-10-13 16:42:37 PDT
Created attachment 323769 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.6
Comment 59 Ryosuke Niwa 2017-10-13 16:55:57 PDT
Comment on attachment 323769 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

Ugh... this test is failing due to HTML dumps with CSS style properties behaving differently on different versions of iOS. I'm gonna strip them away in the test manually.
Comment 60 Ryosuke Niwa 2017-10-13 16:56:21 PDT
Created attachment 323773 [details]
Fixed one more test
Comment 61 Ryosuke Niwa 2017-10-13 17:01:25 PDT
Created attachment 323774 [details]
Patch
Comment 62 Chris Dumez 2017-10-13 17:21:35 PDT
Comment on attachment 323774 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=323774&action=review

> Source/WebCore/dom/DataTransfer.idl:42
> +    [CallWith=ScriptExecutionContext] void setData(DOMString format, DOMString data);

Could we use CallWith=Document since this is not exposed to workers?
Comment 63 Ryosuke Niwa 2017-10-13 22:51:38 PDT
(In reply to Chris Dumez from comment #62)
> Comment on attachment 323774 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=323774&action=review
> 
> > Source/WebCore/dom/DataTransfer.idl:42
> > +    [CallWith=ScriptExecutionContext] void setData(DOMString format, DOMString data);
> 
> Could we use CallWith=Document since this is not exposed to workers?

I didn't know such an option even existed!
Comment 64 Ryosuke Niwa 2017-10-14 00:26:27 PDT
Created attachment 323798 [details]
Used CallWith=Document
Comment 65 Ryosuke Niwa 2017-10-15 14:37:14 PDT
Ping reviwers.
Comment 66 Andrew Herron 2017-10-15 18:24:40 PDT
How can I do a local build with the patch? It doesn't seem to apply cleanly to either the latest rev as I write this (223330) or the one mentioned in the patch (223316).

I'd like to check whether there are changes I need to make to support this change in our editors, and I'm also curious what sanitisation you're doing. There are some weird seemingly pointless details about the word HTML format that are critical to our process :)
Comment 67 Ryosuke Niwa 2017-10-15 18:37:55 PDT
Created attachment 323853 [details]
Added a API test for WebArchive
Comment 68 Ryosuke Niwa 2017-10-15 18:40:16 PDT
Unfortunately, we can't easily test WebArchive behavior on iOS because the API isn't exposed. But the code change is done to WebContentReaderCocoa.mm, which is shared between macOS and iOS, so having a test on macOS should be good enough for now.
Comment 69 Ryosuke Niwa 2017-10-15 18:45:20 PDT
(In reply to Andrew Herron from comment #66)
> How can I do a local build with the patch? It doesn't seem to apply cleanly
> to either the latest rev as I write this (223330) or the one mentioned in
> the patch (223316).

It should apply cleanly on r223330. You'd likely have to use ./Tools/Scripts/webkit-patch apply-from-bug 124391 because of the change log files.

> I'd like to check whether there are changes I need to make to support this
> change in our editors, and I'm also curious what sanitisation you're doing.
> There are some weird seemingly pointless details about the word HTML format
> that are critical to our process :)

Once this patch is landed, your code to handle copying & pasting of HTML should be completely agnostic of whether the content comes from Microsoft Word, Mail, or Keynote, etc... because we would strip away pretty much everything that WebKit doesn't recognize as styling information. The behavior will be similar to copying & pasting content from Microsoft Word into an empty content editable element in Safari, and then copying & pasting that content again into the actual web page. What you see is the content in the second paste so everything WebKit doesn't include during copy will be removed.

Due to privacy & security constraints, we can't expose the original markup Microsoft word exposes.
Comment 70 Andrew Herron 2017-10-15 19:00:35 PDT
ah, I hadn't looked through the tool instructions far enough to see the apply patch script. Thanks!

I agree there are some things that are privacy sensitive such as image URLs, but stripping everything the browser doesn't recognise will completely break our ability to clean up the pasted HTML. Specifically word adds non-standard CSS properties that describe how lists are converted into paragraphs and we use that information to reverse the transformation. These should be completely harmless to retain.

I'm familiar with the default ContentEditable paste behaviour. If you strip those details, we will be back to a horrible guesswork standard that we left behind with IE10.
Comment 71 Ryosuke Niwa 2017-10-15 19:55:31 PDT
(In reply to Andrew Herron from comment #70)
> ah, I hadn't looked through the tool instructions far enough to see the
> apply patch script. Thanks!
> 
> I agree there are some things that are privacy sensitive such as image URLs,
> but stripping everything the browser doesn't recognise will completely break
> our ability to clean up the pasted HTML. Specifically word adds non-standard
> CSS properties that describe how lists are converted into paragraphs and we
> use that information to reverse the transformation. These should be
> completely harmless to retain.

You mean mso-list properties?

In the ideal world, WebKit will do that conversion automatically when pasting the content since it's currently broken if you don't have a custom editing code.

Having said that, we can probably preserve those properties although we should do that in a separate bug/patch.
Comment 72 Andrew Herron 2017-10-15 20:16:53 PDT
Yes mso-list is the main one, but there are also details in the document style tag that we use (including some non-standard rules and properties) to match the editing experience of word.

Making those import changes in the browser is what IE11 did, but not perfectly and Microsoft abandoned the effort for Edge. We’ve made significant investment in figuring out how to do it using the raw HTML so it’s not something I’m likely to be able to contribute.
Comment 73 Andrew Herron 2017-10-15 23:21:41 PDT
I've done a build with the latest patch, and it doesn't actually bring in the image data for my test document. The images still use local file references. But as you say it's replicating the ContentEditable paste experience in the "text/html" event.ClipboardData info which is pretty much the worst possible outcome from me logging this bug. It negates all need to expose RTF data, too, as the link between images and their RTF representation lives in all of the metadata that has now been stripped out.

This has started an interesting discussion in the office about whether we would actually consider working with Apple to contribute our word import logic, but for now if this lands (regardless of whether mso-list is restored later) we'll probably start recommending customers who want high quality word import not use Safari. This makes me very sad.
Comment 74 Ryosuke Niwa 2017-10-15 23:55:58 PDT
(In reply to Andrew Herron from comment #73)
> I've done a build with the latest patch, and it doesn't actually bring in
> the image data for my test document. The images still use local file
> references.

You have to access the html via event.clipboardData.getData on paste event instead of reading off of the pasted HTML content. I'm working on another patch to make the regular paste code use the new format so that you don't have to do this.
Comment 75 Andrew Herron 2017-10-16 01:14:52 PDT
That was the first thing I checked :)

All of the images in my test doc look like this even in the ClipboardData:
<img width="96" height="120" src="file:///Users/spyder/Library/Group%20Containers/UBF8T346G9.Office/msoclip1/01/clip_image003.png" alt="manager.jpg" v:shapes="Image_x0020_3">
Comment 76 Ryosuke Niwa 2017-10-16 01:16:18 PDT
(In reply to Andrew Herron from comment #75)
> That was the first thing I checked :)
> 
> All of the images in my test doc look like this even in the ClipboardData:
> <img width="96" height="120"
> src="file:///Users/spyder/Library/Group%20Containers/UBF8T346G9.Office/
> msoclip1/01/clip_image003.png" alt="manager.jpg" v:shapes="Image_x0020_3">

That doesn't seem right. Are you using Safari to test this? If it's some other random app which uses WKWebView, it won't work because the entire feature will be disabled.
Comment 77 Ryosuke Niwa 2017-10-16 03:46:39 PDT
Created attachment 323885 [details]
Removed TestWebKitAPI.xcodeproj/project.pbxproj.orig
Comment 78 Ryosuke Niwa 2017-10-16 03:49:54 PDT
Created attachment 323886 [details]
Updated for ToT
Comment 79 Antti Koivisto 2017-10-16 04:00:15 PDT
Comment on attachment 323886 [details]
Updated for ToT

View in context: https://bugs.webkit.org/attachment.cgi?id=323886&action=review

r=me

> Source/WebCore/dom/DataTransfer.cpp:144
> +String DataTransfer::getDataForItem(Document& document, const String& type) const

Maybe Document can be const?
Comment 80 Antti Koivisto 2017-10-16 04:02:28 PDT
Comment on attachment 323886 [details]
Updated for ToT

View in context: https://bugs.webkit.org/attachment.cgi?id=323886&action=review

> Source/WebCore/platform/Pasteboard.h:280
> +    enum class ReaderResult {
> +        Read,
> +        DidNotRead,
> +        ContentChanged
> +    };

It would be good to have a comment explaining what these mean.
Comment 81 Antti Koivisto 2017-10-16 04:04:02 PDT
Comment on attachment 323886 [details]
Updated for ToT

View in context: https://bugs.webkit.org/attachment.cgi?id=323886&action=review

> Source/WebCore/platform/mac/PasteboardMac.mm:387
> +            if (m_changeCount != changeCount() || reader.readImage(buffer.releaseNonNull(), ASCIILiteral("image/png")))

These is a lot of repeated code like this. You might want to for example use lambda-taking function to avoid repetition.
Comment 82 Ryosuke Niwa 2017-10-16 04:14:04 PDT
(In reply to Antti Koivisto from comment #79)
> Comment on attachment 323886 [details]
> Updated for ToT
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=323886&action=review
> 
> r=me
> 
> > Source/WebCore/dom/DataTransfer.cpp:144
> > +String DataTransfer::getDataForItem(Document& document, const String& type) const
> 
> Maybe Document can be const?

Not really because we're getting frame out of it.

(In reply to Antti Koivisto from comment #80)
> Comment on attachment 323886 [details]
> Updated for ToT
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=323886&action=review
> 
> > Source/WebCore/platform/Pasteboard.h:280
> > +    enum class ReaderResult {
> > +        Read,
> > +        DidNotRead,
> > +        ContentChanged
> > +    };
> 
> It would be good to have a comment explaining what these mean.

Will rename them to ReadType, DidNotReadType, and PasteboardWasChangedExternally respectively.

(In reply to Antti Koivisto from comment #81)
> Comment on attachment 323886 [details]
> Updated for ToT
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=323886&action=review
> 
> > Source/WebCore/platform/mac/PasteboardMac.mm:387
> > +            if (m_changeCount != changeCount() || reader.readImage(buffer.releaseNonNull(), ASCIILiteral("image/png")))
> 
> These is a lot of repeated code like this. You might want to for example use
> lambda-taking function to avoid repetition.

Yeah, I tried that but I have to interleave the check of changeCount with the call to bufferForType and reader.readX so you end up with pairs of lambdas and becomes pretty gross :(
Comment 83 Antti Koivisto 2017-10-16 04:14:05 PDT
Comment on attachment 323886 [details]
Updated for ToT

View in context: https://bugs.webkit.org/attachment.cgi?id=323886&action=review

> Source/WebCore/platform/ios/PasteboardIOS.mm:207
> +        URL url = strategy.readURLFromPasteboard(itemIndex, kUTTypeURL, m_pasteboardName, title);
> +        if (m_changeCount != changeCount())
> +            return ReaderResult::ContentChanged;
> +        return !url.isNull() && reader.readURL(url, title) ? ReaderResult::Read : ReaderResult::DidNotRead;

This stuff also repeats and might be improved with some helpers.
Comment 84 Ryosuke Niwa 2017-10-16 04:15:07 PDT
Created attachment 323888 [details]
Patch for landing
Comment 85 Ryosuke Niwa 2017-10-16 04:15:25 PDT
Comment on attachment 323888 [details]
Patch for landing

Wait for EWS.
Comment 86 Ryosuke Niwa 2017-10-16 04:16:41 PDT
(In reply to Andrew Herron from comment #75)
> That was the first thing I checked :)
> 
> All of the images in my test doc look like this even in the ClipboardData:
> <img width="96" height="120"
> src="file:///Users/spyder/Library/Group%20Containers/UBF8T346G9.Office/
> msoclip1/01/clip_image003.png" alt="manager.jpg" v:shapes="Image_x0020_3">

Also, which Microsoft Word & macOS were you using?
Comment 87 WebKit Commit Bot 2017-10-16 14:33:34 PDT
Comment on attachment 323888 [details]
Patch for landing

Rejecting attachment 323888 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-03', 'land-attachment', '--force-clean', '--non-interactive', '--parent-command=commit-queue', 323888, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Last 500 characters of output:
233ac54159d9bf67d70cc377c11388d91ec 4c0127219c8fefd5995569b67f78ee349630abce M	Tools
Current branch master is up to date.
ERROR: Not all changes have been committed into SVN, however the committed
ones (if any) seem to be successfully integrated into the working tree.
Please see the above messages for details.


Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit
Updating OpenSource
Current branch master is up to date.
Total errors found: 0 in 3 files

Full output: http://webkit-queues.webkit.org/results/4874773
Comment 88 Ryosuke Niwa 2017-10-16 14:44:31 PDT
Committed r223440: <https://trac.webkit.org/changeset/223440>
Comment 89 Andrew Herron 2017-10-16 19:11:36 PDT
Sorry, being in Australia I was not at work when your questions came in. I've spent the morning updating and rebuilding with r223446 to check what I'm seeing.

Yes I am using Safari via build-webkit and run-safari. I was running Word 15.25, which still replicates the issue, but it turns out the auto updater was broken and it was over a year out of date. I just updated to 15.39 and it works, but that's concerning if older releases won't work with this change.

Our word import code, when it doesn't detect a raw office HTML document in the text/html data, falls back to standard CE paste and cleans up the result. So there will still be work required to adjust that to read text/html and then run through our IE10 paste routines. But looking at the difference between the two, the standard CE paste still includes a lot of the information we need to clean up and process lists etc. The new text/html data does not.

This change does give us image data, so we'll finally be able to remove flash from our products once it is released. But it's objectively worse in every other way for the quality of word import we are able to provide to our customers.
Comment 90 Ryosuke Niwa 2017-10-16 20:07:34 PDT
(In reply to Andrew Herron from comment #89)
> Yes I am using Safari via build-webkit and run-safari. I was running Word
> 15.25, which still replicates the issue, but it turns out the auto updater
> was broken and it was over a year out of date. I just updated to 15.39 and
> it works, but that's concerning if older releases won't work with this
> change.

Thanks for the info. I'd try to test it with 15.25.

> Our word import code, when it doesn't detect a raw office HTML document in
> the text/html data, falls back to standard CE paste and cleans up the
> result. So there will still be work required to adjust that to read
> text/html and then run through our IE10 paste routines. 

That, you don't have to because I'm working on a patch to make the content pasted into CE will go through the same code path.

> But looking at the
> difference between the two, the standard CE paste still includes a lot of
> the information we need to clean up and process lists etc. The new text/html
> data does not.

Conversely, it would unfortunately mean that you'd lose access to that information as well.

> This change does give us image data, so we'll finally be able to remove
> flash from our products once it is released. But it's objectively worse in
> every other way for the quality of word import we are able to provide to our
> customers.

Would preserving -mso-list properties solve most of your problems? We could consider preserving that for now, and come up with a better fix in the future.
Comment 91 Andrew Herron 2017-10-16 20:43:38 PDT
thanks for continuing the discussion :)

The inline mso-list will give us a better chance at correct list import, but there's non-standard information in the document style tag that we've recently discovered is necessary too. We know browser CSS parsers all discard this information, and were happy when they all gave us the raw clipboard via text/html. Our import logic was created in our old Java editor where we had full clipboard access, so once we found a way to access the same data in JavaScript we were able to replicate the quality of import.

Over time we've learnt that it's really the non-standard HTML and CSS features that Word puts in the HTML that enable us to come closer to a 1:1 import quality suitable for editors. To the point where we now ship a custom parser for both HTML and CSS to deal with it as the browser models throw all that away.
Comment 92 Ryosuke Niwa 2017-10-16 20:46:38 PDT
(In reply to Andrew Herron from comment #91)
> thanks for continuing the discussion :)
> 
> The inline mso-list will give us a better chance at correct list import, but
> there's non-standard information in the document style tag that we've
> recently discovered is necessary too. We know browser CSS parsers all
> discard this information, and were happy when they all gave us the raw
> clipboard via text/html. Our import logic was created in our old Java editor
> where we had full clipboard access, so once we found a way to access the
> same data in JavaScript we were able to replicate the quality of import.
> 
> Over time we've learnt that it's really the non-standard HTML and CSS
> features that Word puts in the HTML that enable us to come closer to a 1:1
> import quality suitable for editors. To the point where we now ship a custom
> parser for both HTML and CSS to deal with it as the browser models throw all
> that away.

I don't think we can expose the full raw HTML Microsoft Word puts into the system pasteboard because it contains a bunch of other privacy sensitive information like the list of fonts the system has, etc...
Comment 93 Andrew Herron 2017-10-16 21:07:03 PDT
I understand your privacy concerns, but you've applied the complete paste filter giving us a result that we started using text/html specifically to avoid. It all but negates the point of having text/html exposed in the first place.

Having more detail in the clipboard data versus the standard paste gives us the opportunity to support highly edge case data that is complex, difficult and expensive to maintain; things that are perhaps easier handled by a company focussed on it than in each browser.
Comment 94 Ryosuke Niwa 2017-10-16 21:15:25 PDT
(In reply to Andrew Herron from comment #93)
> I understand your privacy concerns, but you've applied the complete paste
> filter giving us a result that we started using text/html specifically to
> avoid. It all but negates the point of having text/html exposed in the first
> place.

As far as I can tell, this is the only sure way to protect user's privacy.

> Having more detail in the clipboard data versus the standard paste gives us
> the opportunity to support highly edge case data that is complex, difficult
> and expensive to maintain; things that are perhaps easier handled by a
> company focussed on it than in each browser.

To that end, I think we can preserve specific HTML/CSS structures as required by your use case. We may not be able to preserve 100% of what you need, but I suspect we can preserve much of it.
Comment 95 Andrew Herron 2017-10-16 21:19:05 PDT
OK, I'm happy to open a discussion about that. I'm pretty sure this is the wrong place to have it though - what would you suggest?
Comment 96 Ryosuke Niwa 2017-10-16 21:40:22 PDT
(In reply to Andrew Herron from comment #95)
> OK, I'm happy to open a discussion about that. I'm pretty sure this is the
> wrong place to have it though - what would you suggest?

I've sent you an email from my Apple email address. Thank you so much for your early feedback & working with us. I really appreciate it. I hope we can work come up with something that satisfies your need whilst protecting our users' privacy.
Comment 97 m.samsel 2018-01-05 02:43:34 PST
Hi

I've tried to use this "fix" to get images from clipboard, but still get an empty string.
I prepare simple microsoft word document with one picture and few lines of text and paste it here:
https://codepen.io/msamsel/pen/JrBOmq
I'm using MSWord 2011 on MacBook Air with Safari Preview 46.
According to release notes since 42 version Safari should: "Started pasting images in RTF and RTFD contents using blob URLs". 

Do I make something wrong? Is there any particular reason why `text/rtf` clipboard is empty? My test page works perfectly with Chrome but in case of Safari it returns empty string.
Comment 98 Andrew Herron 2018-01-05 06:53:16 PST
The fix doesn't provide access to RTF data, it was changed to load the image data referenced in the word HTML and set the src attributes to working blob URLs. If you look closely at the data in the ContentEditable after pasting into your example, the images are actually usable as is.

However I don't think this fix has been completely released into STP yet. I'm still seeing unfiltered text/html data in 46 which won't be true anymore (hence my earlier comments).
Comment 99 m.samsel 2018-01-09 08:10:37 PST
Andrew Herron thank you for your comment. :)
It puts more light into the case.

Unfortunately I disagree, that current link is usable as it is now. This is `file://...` link (this kind of links are generated for my tests), which point to local resources, which cannot be load on a page. :( That's why, from my point of view you have right that, this patch is not present in Safari Technical Preview.
However is quite strange, because in Release Notes to version 42, you can find such info:
> Clipboard API
> * Added the support for custom pasteboard MIME types and hid unsafe MIME types
> * Fixed copying and pasting of image files on TinyMCE and GitHub
> * Fixed DataTransfer.items to expose custom pasteboard types
> * Prevented revealing the file URL when pasting an image
> * Prevented dragenter and dragleave from using the same data transfer object
> * Removed “text/html” from DataTransfer.items when performing paste and match style
> * Started pasting images in RTF and RTFD contents using blob URLs
> * Sanitized the URL in the pasteboard for other applications and cross-origin content

https://developer.apple.com/safari/technology-preview/release-notes/

That points to my question from previous comment: do I make something wrong? I cant access to RTF clipboard neither blob url are present. So I don't see any possibility to manipulate images pasted from word with javascript. What is really sad because Safari is the last major browser where handling pasted images from Word document is impossible (Chrome, Firefox, Edge are working fine).

Could someone provide information when fix will be deployed to Safari or STP?
Comment 100 Ryosuke Niwa 2018-01-09 10:32:40 PST
There was a bug that this feature wasn't turned on by default on Safari Technology Preview. It should be fixed in STP47.
Comment 101 m.lewandowski 2018-01-10 01:16:42 PST
Thanks for the info Ryosuke. Is there any chance to enable it using some sort of flags or sth like that? Just for testing purpose, to see if everything works well?
Comment 102 Ryosuke Niwa 2018-01-10 13:15:09 PST
(In reply to m.lewandowski from comment #101)
> Thanks for the info Ryosuke. Is there any chance to enable it using some
> sort of flags or sth like that? Just for testing purpose, to see if
> everything works well?

Download the latest Safari Technology Preview 47, which was released today:
https://webkit.org/blog/8060/release-notes-for-safari-technology-preview-47/
Comment 103 m.samsel 2018-01-11 02:42:43 PST
(In reply to Ryosuke Niwa from comment #102)
> (In reply to m.lewandowski from comment #101)
> > Thanks for the info Ryosuke. Is there any chance to enable it using some
> > sort of flags or sth like that? Just for testing purpose, to see if
> > everything works well?
> 
> Download the latest Safari Technology Preview 47, which was released today:
> https://webkit.org/blog/8060/release-notes-for-safari-technology-preview-47/

I installed new STP ( Release 47 (Safari 11.1, WebKit 13605.1.19.1) ), and tried to check how this feature work. As I understand `file://...` links should be converted into `blobURL`

So I use my previous test page, but instead of reading `text/rtf`, I start to read `text/html` clipboard.

> evt.clipboardData.getData( 'text/html' )

but after copy-paste content to Safari browser, and reading `text/html` content, I don't see any blob url.

> <p class=\"MsoNormal\" style=\"margin: 0cm 0cm 0.0001pt; font-size: medium; font-family: Cambria; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;\"><img width=\"203\" height=\"203\" src=\"file:///private/var/folders/6v/719qjw4n67j2qssnj3p37hcr0000gr/T/TemporaryItems/msoclip/0/clip_image002.png\" align=\"left\" hspace=\"9\" alt=\"Description: Macintosh HD:Users:dev:Desktop:phoca_thumb_l_sample-200x200.png\" v:shapes=\"Picture_x0020_3\"><br clear=\"all\"><o:p></o:p></p><br class=\"Apple-interchange-newline\">

Could anyone explain how to get access to images which are pasted from word? Currently I'm out of ideas how to use this feature.
Comment 104 Ryosuke Niwa 2018-01-11 13:07:02 PST
(In reply to m.samsel from comment #103)
> (In reply to Ryosuke Niwa from comment #102)
> > (In reply to m.lewandowski from comment #101)
> > > Thanks for the info Ryosuke. Is there any chance to enable it using some
> > > sort of flags or sth like that? Just for testing purpose, to see if
> > > everything works well?
> > 
> > Download the latest Safari Technology Preview 47, which was released today:
> > https://webkit.org/blog/8060/release-notes-for-safari-technology-preview-47/
> 
> I installed new STP ( Release 47 (Safari 11.1, WebKit 13605.1.19.1) ), and
> tried to check how this feature work. As I understand `file://...` links
> should be converted into `blobURL`

Yes.

> So I use my previous test page, but instead of reading `text/rtf`, I start
> to read `text/html` clipboard.

It's expected that text/rtf is not there. This is the intended behavior.

> > evt.clipboardData.getData( 'text/html' )
> 
> but after copy-paste content to Safari browser, and reading `text/html`
> content, I don't see any blob url.
> 
> > <p class=\"MsoNormal\" style=\"margin: 0cm 0cm 0.0001pt; font-size: medium; font-family: Cambria; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;\"><img width=\"203\" height=\"203\" src=\"file:///private/var/folders/6v/719qjw4n67j2qssnj3p37hcr0000gr/T/TemporaryItems/msoclip/0/clip_image002.png\" align=\"left\" hspace=\"9\" alt=\"Description: Macintosh HD:Users:dev:Desktop:phoca_thumb_l_sample-200x200.png\" v:shapes=\"Picture_x0020_3\"><br clear=\"all\"><o:p></o:p></p><br class=\"Apple-interchange-newline\">
> 
> Could anyone explain how to get access to images which are pasted from word?
> Currently I'm out of ideas how to use this feature.

This isn't expected. img element's src attribute should be using blob URL, and it does on my machine.

On my machine, it gets HTML like this:
<p class="MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: medium; font-family: Calibri, sans-serif; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;"><span><img width="76" height="103" src="blob:file:///fcbf5dea-0f93-469d-a4d7-28db9c1d59e3" v:shapes="Picture_x0020_1"></span>hello world<o:p></o:p></p>

Could you tell me what happens when you insert an image into TextEdit app (the one that comes with macOS), and copy & paste it with text into simple-rte.rniwa.com?

Also, what's the version of Microsoft Word you're using? I'm using 15.37.
Comment 105 m.samsel 2018-01-12 07:29:56 PST
> This isn't expected. img element's src attribute should be using blob URL,
> and it does on my machine.
> 
> On my machine, it gets HTML like this:
> <p class="MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: medium;
> font-family: Calibri, sans-serif; caret-color: rgb(0, 0, 0); color: rgb(0,
> 0, 0); font-style: normal; font-variant-caps: normal; font-weight: normal;
> letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px;
> text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;
> -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;
> text-decoration: none;"><span><img width="76" height="103"
> src="blob:file:///fcbf5dea-0f93-469d-a4d7-28db9c1d59e3"
> v:shapes="Picture_x0020_1"></span>hello world<o:p></o:p></p>
> 
> Could you tell me what happens when you insert an image into TextEdit app
> (the one that comes with macOS), and copy & paste it with text into
> simple-rte.rniwa.com?
> 
> Also, what's the version of Microsoft Word you're using? I'm using 15.37.

Ok so I've made some debugging today.
I used:
MacBook AIR (10.13.2)
STP Release 47 Safari 11.1, WebKit 13605.1.19.1

I used 3 text editors to test, 2 out of 3 are working fine ;)
1. Word (version 15.41) paste images as blob properly
2. TextEdit paste images as blob properly
3. Word for Mac 2011 (14.7.3) doesn't work. There are still `file://...` links pasted through clipboard.

I used old Word version earlier for testing. So current solution seems to not work for legacy Word users.
Comment 106 m.samsel 2018-01-12 07:57:01 PST
Also if you need clipboard dump from your page, I attach it below:

------------------------------------------------------------------

<html><head></head><body><div contenteditable="">







<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><!--[if gte mso 9]><xml>
 <o:OfficeDocumentSettings>
  <o:AllowPNG/>
 </o:OfficeDocumentSettings>
</xml><![endif]-->

<!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:View>Normal</w:View>
  <w:Zoom>0</w:Zoom>
  <w:TrackMoves>false</w:TrackMoves>
  <w:TrackFormatting/>
  <w:PunctuationKerning/>
  <w:ValidateAgainstSchemas/>
  <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
  <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
  <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
  <w:DoNotPromoteQF/>
  <w:LidThemeOther>EN-US</w:LidThemeOther>
  <w:LidThemeAsian>JA</w:LidThemeAsian>
  <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
  <w:Compatibility>
   <w:BreakWrappedTables/>
   <w:SnapToGridInCell/>
   <w:WrapTextWithPunct/>
   <w:UseAsianBreakRules/>
   <w:DontGrowAutofit/>
   <w:SplitPgBreakAndParaMark/>
   <w:EnableOpenTypeKerning/>
   <w:DontFlipMirrorIndents/>
   <w:OverrideTableStyleHps/>
   <w:UseFELayout/>
  </w:Compatibility>
  <m:mathPr>
   <m:mathFont m:val="Cambria Math"/>
   <m:brkBin m:val="before"/>
   <m:brkBinSub m:val="&#45;-"/>
   <m:smallFrac m:val="off"/>
   <m:dispDef/>
   <m:lMargin m:val="0"/>
   <m:rMargin m:val="0"/>
   <m:defJc m:val="centerGroup"/>
   <m:wrapIndent m:val="1440"/>
   <m:intLim m:val="subSup"/>
   <m:naryLim m:val="undOvr"/>
  </m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
  DefSemiHidden="true" DefQFormat="false" DefPriority="99"
  LatentStyleCount="276">
  <w:LsdException Locked="false" Priority="0" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
  <w:LsdException Locked="false" Priority="9" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
  <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 1"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 2"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 3"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 4"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 5"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 6"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 7"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 8"/>
  <w:LsdException Locked="false" Priority="39" Name="toc 9"/>
  <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
  <w:LsdException Locked="false" Priority="10" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Title"/>
  <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
  <w:LsdException Locked="false" Priority="11" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
  <w:LsdException Locked="false" Priority="22" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
  <w:LsdException Locked="false" Priority="20" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
  <w:LsdException Locked="false" Priority="59" SemiHidden="false"
   UnhideWhenUsed="false" Name="Table Grid"/>
  <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
  <w:LsdException Locked="false" Priority="1" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
  <w:LsdException Locked="false" Priority="60" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Shading"/>
  <w:LsdException Locked="false" Priority="61" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light List"/>
  <w:LsdException Locked="false" Priority="62" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Grid"/>
  <w:LsdException Locked="false" Priority="63" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 1"/>
  <w:LsdException Locked="false" Priority="64" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 2"/>
  <w:LsdException Locked="false" Priority="65" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 1"/>
  <w:LsdException Locked="false" Priority="66" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 2"/>
  <w:LsdException Locked="false" Priority="67" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 1"/>
  <w:LsdException Locked="false" Priority="68" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 2"/>
  <w:LsdException Locked="false" Priority="69" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 3"/>
  <w:LsdException Locked="false" Priority="70" SemiHidden="false"
   UnhideWhenUsed="false" Name="Dark List"/>
  <w:LsdException Locked="false" Priority="71" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Shading"/>
  <w:LsdException Locked="false" Priority="72" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful List"/>
  <w:LsdException Locked="false" Priority="73" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Grid"/>
  <w:LsdException Locked="false" Priority="60" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
  <w:LsdException Locked="false" Priority="61" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light List Accent 1"/>
  <w:LsdException Locked="false" Priority="62" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
  <w:LsdException Locked="false" Priority="63" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
  <w:LsdException Locked="false" Priority="64" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
  <w:LsdException Locked="false" Priority="65" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
  <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
  <w:LsdException Locked="false" Priority="34" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
  <w:LsdException Locked="false" Priority="29" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
  <w:LsdException Locked="false" Priority="30" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
  <w:LsdException Locked="false" Priority="66" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
  <w:LsdException Locked="false" Priority="67" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
  <w:LsdException Locked="false" Priority="68" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
  <w:LsdException Locked="false" Priority="69" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
  <w:LsdException Locked="false" Priority="70" SemiHidden="false"
   UnhideWhenUsed="false" Name="Dark List Accent 1"/>
  <w:LsdException Locked="false" Priority="71" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
  <w:LsdException Locked="false" Priority="72" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
  <w:LsdException Locked="false" Priority="73" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
  <w:LsdException Locked="false" Priority="60" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
  <w:LsdException Locked="false" Priority="61" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light List Accent 2"/>
  <w:LsdException Locked="false" Priority="62" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
  <w:LsdException Locked="false" Priority="63" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
  <w:LsdException Locked="false" Priority="64" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
  <w:LsdException Locked="false" Priority="65" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
  <w:LsdException Locked="false" Priority="66" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
  <w:LsdException Locked="false" Priority="67" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
  <w:LsdException Locked="false" Priority="68" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
  <w:LsdException Locked="false" Priority="69" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
  <w:LsdException Locked="false" Priority="70" SemiHidden="false"
   UnhideWhenUsed="false" Name="Dark List Accent 2"/>
  <w:LsdException Locked="false" Priority="71" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
  <w:LsdException Locked="false" Priority="72" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
  <w:LsdException Locked="false" Priority="73" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
  <w:LsdException Locked="false" Priority="60" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
  <w:LsdException Locked="false" Priority="61" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light List Accent 3"/>
  <w:LsdException Locked="false" Priority="62" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
  <w:LsdException Locked="false" Priority="63" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
  <w:LsdException Locked="false" Priority="64" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
  <w:LsdException Locked="false" Priority="65" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
  <w:LsdException Locked="false" Priority="66" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
  <w:LsdException Locked="false" Priority="67" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
  <w:LsdException Locked="false" Priority="68" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
  <w:LsdException Locked="false" Priority="69" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
  <w:LsdException Locked="false" Priority="70" SemiHidden="false"
   UnhideWhenUsed="false" Name="Dark List Accent 3"/>
  <w:LsdException Locked="false" Priority="71" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
  <w:LsdException Locked="false" Priority="72" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
  <w:LsdException Locked="false" Priority="73" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
  <w:LsdException Locked="false" Priority="60" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
  <w:LsdException Locked="false" Priority="61" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light List Accent 4"/>
  <w:LsdException Locked="false" Priority="62" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
  <w:LsdException Locked="false" Priority="63" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
  <w:LsdException Locked="false" Priority="64" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
  <w:LsdException Locked="false" Priority="65" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
  <w:LsdException Locked="false" Priority="66" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
  <w:LsdException Locked="false" Priority="67" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
  <w:LsdException Locked="false" Priority="68" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
  <w:LsdException Locked="false" Priority="69" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
  <w:LsdException Locked="false" Priority="70" SemiHidden="false"
   UnhideWhenUsed="false" Name="Dark List Accent 4"/>
  <w:LsdException Locked="false" Priority="71" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
  <w:LsdException Locked="false" Priority="72" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
  <w:LsdException Locked="false" Priority="73" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
  <w:LsdException Locked="false" Priority="60" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
  <w:LsdException Locked="false" Priority="61" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light List Accent 5"/>
  <w:LsdException Locked="false" Priority="62" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
  <w:LsdException Locked="false" Priority="63" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
  <w:LsdException Locked="false" Priority="64" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
  <w:LsdException Locked="false" Priority="65" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
  <w:LsdException Locked="false" Priority="66" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
  <w:LsdException Locked="false" Priority="67" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
  <w:LsdException Locked="false" Priority="68" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
  <w:LsdException Locked="false" Priority="69" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
  <w:LsdException Locked="false" Priority="70" SemiHidden="false"
   UnhideWhenUsed="false" Name="Dark List Accent 5"/>
  <w:LsdException Locked="false" Priority="71" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
  <w:LsdException Locked="false" Priority="72" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
  <w:LsdException Locked="false" Priority="73" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
  <w:LsdException Locked="false" Priority="60" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
  <w:LsdException Locked="false" Priority="61" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light List Accent 6"/>
  <w:LsdException Locked="false" Priority="62" SemiHidden="false"
   UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
  <w:LsdException Locked="false" Priority="63" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
  <w:LsdException Locked="false" Priority="64" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
  <w:LsdException Locked="false" Priority="65" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
  <w:LsdException Locked="false" Priority="66" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
  <w:LsdException Locked="false" Priority="67" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
  <w:LsdException Locked="false" Priority="68" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
  <w:LsdException Locked="false" Priority="69" SemiHidden="false"
   UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
  <w:LsdException Locked="false" Priority="70" SemiHidden="false"
   UnhideWhenUsed="false" Name="Dark List Accent 6"/>
  <w:LsdException Locked="false" Priority="71" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
  <w:LsdException Locked="false" Priority="72" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
  <w:LsdException Locked="false" Priority="73" SemiHidden="false"
   UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
  <w:LsdException Locked="false" Priority="19" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
  <w:LsdException Locked="false" Priority="21" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
  <w:LsdException Locked="false" Priority="31" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
  <w:LsdException Locked="false" Priority="32" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
  <w:LsdException Locked="false" Priority="33" SemiHidden="false"
   UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
  <w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
  <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
 </w:LatentStyles>
</xml><![endif]-->

<!--[if gte mso 10]>
<style>
 /* Style Definitions */
table.MsoNormalTable
	{mso-style-name:"Table Normal";
	mso-tstyle-rowband-size:0;
	mso-tstyle-colband-size:0;
	mso-style-noshow:yes;
	mso-style-priority:99;
	mso-style-parent:"";
	mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
	mso-para-margin:0cm;
	mso-para-margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:Cambria;
	mso-ascii-font-family:Cambria;
	mso-ascii-theme-font:minor-latin;
	mso-hansi-font-family:Cambria;
	mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->



<!--StartFragment-->

<p class="MsoNormal"><!--[if gte vml 1]><v:shapetype
 id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t"
 path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
 <v:stroke joinstyle="miter"/>
 <v:formulas>
  <v:f eqn="if lineDrawn pixelLineWidth 0"/>
  <v:f eqn="sum @0 1 0"/>
  <v:f eqn="sum 0 0 @1"/>
  <v:f eqn="prod @2 1 2"/>
  <v:f eqn="prod @3 21600 pixelWidth"/>
  <v:f eqn="prod @3 21600 pixelHeight"/>
  <v:f eqn="sum @0 0 1"/>
  <v:f eqn="prod @6 1 2"/>
  <v:f eqn="prod @7 21600 pixelWidth"/>
  <v:f eqn="sum @8 21600 0"/>
  <v:f eqn="prod @7 21600 pixelHeight"/>
  <v:f eqn="sum @10 21600 0"/>
 </v:formulas>
 <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
 <o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype><v:shape id="Picture_x0020_1" o:spid="_x0000_i1025" type="#_x0000_t75"
 alt="Description: Macintosh HD:Users:dev:Desktop:phoca_thumb_l_sample-200x200.png"
 style='width:201pt;height:201pt;visibility:visible;mso-wrap-style:square'>
 <v:imagedata src="file://localhost/private/var/folders/6v/719qjw4n67j2qssnj3p37hcr0000gr/T/TemporaryItems/msoclip/0clip_image001.png"
  o:title="phoca_thumb_l_sample-200x200.png"/>
</v:shape><![endif]--><!--[if !vml]--><img width="203" height="203" src="file://localhost/private/var/folders/6v/719qjw4n67j2qssnj3p37hcr0000gr/T/TemporaryItems/msoclip/0clip_image002.png" alt="Description: Macintosh HD:Users:dev:Desktop:phoca_thumb_l_sample-200x200.png" v:shapes="Picture_x0020_1"><!--[endif]--><o:p></o:p></p>

<!--EndFragment--></div></body></html>
Comment 107 Ryosuke Niwa 2018-01-12 15:30:58 PST
Thanks for the clarification. Let's track that bug for Microsoft Word for Mac 2011 in https://bugs.webkit.org/show_bug.cgi?id=181616.
Comment 108 Andrew Herron 2018-01-22 23:03:15 PST
You don't need to use Word 2011 to get HTML like that. Just verified with Word 2016 v16.9 (180116).

I'm posting this update here instead of in bug 181616 because I'm testing with a document that we do successfully import the images from using RTF data on other browsers.

I don't know how your existing image import works if you're not reading files from the filesystem, but the data for these images _is_ available. Should I attach the replication doc here or log a new bug?
Comment 109 Ryosuke Niwa 2018-01-23 01:50:32 PST
(In reply to Andrew Herron from comment #108)
> You don't need to use Word 2011 to get HTML like that. Just verified with
> Word 2016 v16.9 (180116).
> 
> I'm posting this update here instead of in bug 181616 because I'm testing
> with a document that we do successfully import the images from using RTF
> data on other browsers.
> 
> I don't know how your existing image import works if you're not reading
> files from the filesystem, but the data for these images _is_ available.
> Should I attach the replication doc here or log a new bug?

No need. Yes, image files are obviously accessible to applications such as Safari. The challenge here is that Web content process, which runs JS / DOM code, etc... is sandboxed, and therefore doesn't have direct access to those files.

Ordinarily, UI process grants Web content process permission (by the way of adding sandbox extensions) to access those files when the user drag & drops files or pastes a file. In those cases, UI process know the locations of files the Web content process has to have an access prior to pasting & dropping happens in the Web content process. That is, even if Web content process was compromised, it doesn't have access to files UI process explicitly allowed, and it can't ask for more permissions unless there was an explicit user action to drop or paste a file.

Now, the challenge with a HTML in the pasteboard referencing local files is that we must parse the HTML to get the list of files Web content process will need an access to. The parsing of the HTML is currently the task of the Web content process. However, we can't parse the content in the Web content process, and grant access to those files in the UI process since that would defeat the whole point of sandboxing the Web content process.

The only safe way to parse the pasted content is to do start another Web content process, parse the pasted HTML there, and get the list of URLs, and then grant access to those files in the Web content process in which the pasting is actually happening. However, this requires a significant architectural undertaking, which we can't currently prioritize especially since pasting of images already works from Microsoft Word for Mac 2016, which has been out for more than two years now.
Comment 110 m.lewandowski 2018-01-23 07:14:49 PST
@Ryosuke thanks for the clarifications, the security model you have outlined make a good sense to me.

> we can't currently prioritize especially since pasting of images already works from Microsoft Word for Mac 2016, which has been out for more than two years now.

Actually it is my understanding that @Andrew mentioned that he has a case that fails with Word 2016 too.

Lastly for what it's worth, I'll just note that you can access these images in the RTF format representation. If you inspect RTF data, you'll see that it holds encoded blobs of the images, this is what for the time being we use in CKEditor 4 to retrieve images on browsers that does not inline images, but provide RTF type (which Safari is not doing ATM). Actually here we're coming back to the source of this issue - not exposing RTF and back to RTF privacy concerns that you have mentioned earlier. Takeaway is that if you have a solid RTF lib, you could extract the images from clipboard without a need to access file system for that. Though still it's bit of a workaroundish solution.
Comment 111 Andrew Herron 2018-01-23 16:42:18 PST
Created attachment 332098 [details]
Document that fails to import an image using word 2016

It definitely sounds like I need to attach the document here. I understand that you can't currently support it, but this is definitely a Word 2016 issue.

The image in this document fails to paste using STP 47 and Word 2016. Perhaps because it's not pasted as an img tag, similar to what was dismissed as a Word 2011 issue? This is where taking over responsibility for pasting word content in the browser starts to need code that we build in our editors. It's a moving target.

I don't know anything about how you're obtaining the images for documents that do paste successfully in STP 47, but on Chrome and Firefox this document pastes successfully into both TinyMCE and CKEditor front page demos with the image thanks to RTF data.
Comment 112 Ryosuke Niwa 2018-01-24 01:36:48 PST
(In reply to Andrew Herron from comment #111)
> Created attachment 332098 [details]
> Document that fails to import an image using word 2016
> 
> It definitely sounds like I need to attach the document here. I understand
> that you can't currently support it, but this is definitely a Word 2016
> issue.

Thanks for the document. I can indeed reproduce the issue.

> The image in this document fails to paste using STP 47 and Word 2016.
> Perhaps because it's not pasted as an img tag, similar to what was dismissed
> as a Word 2011 issue?

Right. We're going to contact Microsoft about this issue since there isn't much we can do on our end. In fact, copying & pasting content to TextEdit and other macOS applications that don't directly support rtf also strips away the image.

> I don't know anything about how you're obtaining the images for documents
> that do paste successfully in STP 47, but on Chrome and Firefox this
> document pastes successfully into both TinyMCE and CKEditor front page demos
> with the image thanks to RTF data.

Due to security and privacy concerns, we won't be able to directly expose RTF data to the Web.
Comment 113 m.lewandowski 2018-01-24 05:48:23 PST
@Ryosuke what about

> Lastly for what it's worth, I'll just note that you can (…)

In comment #110? Do you guys have any RTF parsing lib that could do the job for you?
Comment 114 Ryosuke Niwa 2018-01-24 13:04:46 PST
(In reply to m.lewandowski from comment #113)
> @Ryosuke what about
> 
> > Lastly for what it's worth, I'll just note that you can (…)
> 
> In comment #110? Do you guys have any RTF parsing lib that could do the job
> for you?

The problem here is that Microsoft Word isn't even generating RTFD with the image data. Since other macOS applications would rely on RTFD to get the image data, pasting would fail in those applications as well. From that perspective, this is really a bug in Microsoft Word.
Comment 115 Lucas Forschler 2019-02-06 09:18:51 PST
Mass move bugs into the DOM component.
Comment 116 Ryosuke Niwa 2019-02-06 22:04:33 PST
Copy & paste is editing.