WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
190101
[macOS] Update Pasteboard::read to prioritize native representations over TIFF
https://bugs.webkit.org/show_bug.cgi?id=190101
Summary
[macOS] Update Pasteboard::read to prioritize native representations over TIFF
mitz
Reported
2018-09-28 17:04:39 PDT
Pasting a pasteboard that contains JPEG data into a web view results in a TIFF representation of the image getting pasted. This is undesirable, but it matches the historic behavior of NSTextView. As of macOS Mojave, NSTextView checks for NSImage objects on the pasteboard, and given an NSImage, it first tries to get the image data from an existing EPS, PICT, PDF or bitmap representation (of whichever format), before falling back to the TIFF representation. WebKit should replicate that behavior so that things don’t get unnecessarily transcoded (the JPEG to TIFF example is especially bad!).
Attachments
For EWS
(17.24 KB, patch)
2022-01-23 16:44 PST
,
Wenson Hsieh
no flags
Details
Formatted Diff
Diff
For EWS
(16.98 KB, patch)
2022-01-24 09:04 PST
,
Wenson Hsieh
no flags
Details
Formatted Diff
Diff
Patch
(13.78 KB, patch)
2022-01-24 12:38 PST
,
Wenson Hsieh
rniwa
: review+
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch for landing
(14.13 KB, patch)
2022-01-24 14:55 PST
,
Wenson Hsieh
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
mitz
Comment 1
2018-09-28 17:06:08 PDT
<
rdar://problem/44879244
>
Wenson Hsieh
Comment 2
2022-01-23 16:44:38 PST
Created
attachment 449772
[details]
For EWS
mitz
Comment 3
2022-01-23 18:01:49 PST
Comment on
attachment 449772
[details]
For EWS View in context:
https://bugs.webkit.org/attachment.cgi?id=449772&action=review
> Source/WebCore/ChangeLog:20 > + so, then we convert TIFF data from the pasteboard to JPEG or PNG, respectively.
This sounds like reencoding, which at least in the JPEG case can degrade quality, and that doesn’t sound like what I said AppKit did.
Wenson Hsieh
Comment 4
2022-01-23 22:18:31 PST
(In reply to mitz from
comment #3
)
> Comment on
attachment 449772
[details]
> For EWS > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=449772&action=review
> > > Source/WebCore/ChangeLog:20 > > + so, then we convert TIFF data from the pasteboard to JPEG or PNG, respectively. > > This sounds like reencoding, which at least in the JPEG case can degrade > quality, and that doesn’t sound like what I said AppKit did.
Indeed, this is still transcoding image data; note that we can't expose raw PNG and JPEG data on paste, so we need some kind of strategy to avoid leaking privacy-sensitive metadata on pasted images. I'm still exploring some strategies to achieve this without transcoding; we might be able to just strip out (most) metadata properties without having to reencode to the original format...
Wenson Hsieh
Comment 5
2022-01-23 22:21:38 PST
(In reply to Wenson Hsieh from
comment #4
)
> (In reply to mitz from
comment #3
) > > Comment on
attachment 449772
[details]
> > For EWS > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=449772&action=review
> > > > > Source/WebCore/ChangeLog:20 > > > + so, then we convert TIFF data from the pasteboard to JPEG or PNG, respectively. > > > > This sounds like reencoding, which at least in the JPEG case can degrade > > quality, and that doesn’t sound like what I said AppKit did. > > Indeed, this is still transcoding image data; note that we can't expose raw > PNG and JPEG data on paste, so we need some kind of strategy to avoid > leaking privacy-sensitive metadata on pasted images. > > I'm still exploring some strategies to achieve this without transcoding; we > might be able to just strip out (most) metadata properties without having to > reencode to the original format...
Another strategy is to skip the transcoding step just for Mail, via SPI.
Ryosuke Niwa
Comment 6
2022-01-23 23:33:41 PST
Comment on
attachment 449772
[details]
For EWS View in context:
https://bugs.webkit.org/attachment.cgi?id=449772&action=review
>>>> Source/WebCore/ChangeLog:20 >>>> + so, then we convert TIFF data from the pasteboard to JPEG or PNG, respectively. >>> >>> This sounds like reencoding, which at least in the JPEG case can degrade quality, and that doesn’t sound like what I said AppKit did. >> >> Indeed, this is still transcoding image data; note that we can't expose raw PNG and JPEG data on paste, so we need some kind of strategy to avoid leaking privacy-sensitive metadata on pasted images. >> >> I'm still exploring some strategies to achieve this without transcoding; we might be able to just strip out (most) metadata properties without having to reencode to the original format... > > Another strategy is to skip the transcoding step just for Mail, via SPI.
Skipping sanitization process for app usage seems sensible especially if we restrict the origin. So if origin of the document is local & app opts-in then don't hide content from the clipboard.
Ryosuke Niwa
Comment 7
2022-01-23 23:35:09 PST
Specifically, any app that is browser like or any document whose top-level document origin isn't local will continue to use the full sanitization path.
mitz
Comment 8
2022-01-24 07:56:01 PST
Out of interest, how is this dealt with for <input type=file>?
Wenson Hsieh
Comment 9
2022-01-24 08:00:38 PST
(In reply to mitz from
comment #8
)
> Out of interest, how is this dealt with for <input type=file>?
When using a file input, it's assumed that the user knows they are uploading the file as-is, with all metadata. For pasted files, we don't (or rather, shouldn't) make the same assumption — we only allow the web page to read data that is "visibly exposed", from the user's point of view. A scenario where this property is especially important to uphold is when copying and pasting markup, in which case comments and other non-rendered content in the fragment can hold PII (as seen when copying from some native apps, such as Microsoft Word).
mitz
Comment 10
2022-01-24 08:09:12 PST
(In reply to Wenson Hsieh from
comment #9
)
> (In reply to mitz from
comment #8
) > > Out of interest, how is this dealt with for <input type=file>? > > When using a file input, it's assumed that the user knows they are uploading > the file as-is, with all metadata. > > For pasted files, we don't (or rather, shouldn't) make the same assumption — > we only allow the web page to read data that is "visibly exposed", from the > user's point of view. > > A scenario where this property is especially important to uphold is when > copying and pasting markup, in which case comments and other non-rendered > content in the fragment can hold PII (as seen when copying from some native > apps, such as Microsoft Word).
I see. It’s worth keeping in mind that TIFF data may contain metadata as well, so it may need to be processed as well. I think there must be a better way to do this than transcoding, though (notably, sharing photos from iOS uses something to strip some or all metadata).
Wenson Hsieh
Comment 11
2022-01-24 08:14:08 PST
(In reply to mitz from
comment #10
)
> (In reply to Wenson Hsieh from
comment #9
) > > (In reply to mitz from
comment #8
) > > > Out of interest, how is this dealt with for <input type=file>? > > > > When using a file input, it's assumed that the user knows they are uploading > > the file as-is, with all metadata. > > > > For pasted files, we don't (or rather, shouldn't) make the same assumption — > > we only allow the web page to read data that is "visibly exposed", from the > > user's point of view. > > > > A scenario where this property is especially important to uphold is when > > copying and pasting markup, in which case comments and other non-rendered > > content in the fragment can hold PII (as seen when copying from some native > > apps, such as Microsoft Word). > > I see. It’s worth keeping in mind that TIFF data may contain metadata as > well, so it may need to be processed as well. I think there must be a better > way to do this than transcoding, though (notably, sharing photos from iOS > uses something to strip some or all metadata).
Yes! I have an approach that uses CG API to copy out the image data while stripping away all metadata fields except for image orientation, and it seems to be working reasonably well — I'll post a new patch here soon.
Wenson Hsieh
Comment 12
2022-01-24 09:04:40 PST
Created
attachment 449824
[details]
For EWS
Tim Horton
Comment 13
2022-01-24 11:17:46 PST
Comment on
attachment 449824
[details]
For EWS View in context:
https://bugs.webkit.org/attachment.cgi?id=449824&action=review
> Source/WebCore/platform/mac/PasteboardMac.mm:438 > +static RefPtr<SharedBuffer> imageDataAfterRemovingMetadata(Ref<SharedBuffer>&& buffer)
It seems a bit unfortunate that this very-generically-named function will break all animated images (GIF and APNG). I think your changelog suggests they should never get here; maybe an assertion or type check or something at the top? Or add all of the animation properties to your allowlist?
> Source/WebCore/platform/mac/PasteboardMac.mm:455 > + if (![name isEqualToString:(__bridge NSString *)kCGImagePropertyTIFFDictionary]) {
Not sure I follow what's going on here; you're only preserving image orientation in the TIFF properties? What about the other formats?
Wenson Hsieh
Comment 14
2022-01-24 11:40:00 PST
Comment on
attachment 449824
[details]
For EWS View in context:
https://bugs.webkit.org/attachment.cgi?id=449824&action=review
Thanks for taking a look!
>> Source/WebCore/platform/mac/PasteboardMac.mm:438 >> +static RefPtr<SharedBuffer> imageDataAfterRemovingMetadata(Ref<SharedBuffer>&& buffer) > > It seems a bit unfortunate that this very-generically-named function will break all animated images (GIF and APNG). I think your changelog suggests they should never get here; maybe an assertion or type check or something at the top? Or add all of the animation properties to your allowlist?
Yes, that's right — we'll only get here for TIFF, PNG and JPEG images. In the future, if we add support for reading image data in image/gif format or support for animated PNG, we'll need to adjust this logic to preserve animation-related properties as well. As we discussed, I'll adjust this to only add the first frame of the image, since that's the behavior we currently end up with if we have animated PNG, and leave a FIXME to track making GIF/APNG readable in this codepath.
>> Source/WebCore/platform/mac/PasteboardMac.mm:455 >> + if (![name isEqualToString:(__bridge NSString *)kCGImagePropertyTIFFDictionary]) { > > Not sure I follow what's going on here; you're only preserving image orientation in the TIFF properties? What about the other formats?
So from local testing, it seems like image orientation is a subway of the tiff image property dictionary (i.e. the dictionary labeled {TIFF} in the image properties). All images appear to have this {TIFF} dictionary…at least, from what I could tell. I'll do more testing and double check this.
Wenson Hsieh
Comment 15
2022-01-24 12:25:32 PST
So after taking a closer look — it seems that even in the current implementation where we're transcoding from PNG/JPG => TIFF, AppKit *still* preserves all metadata properties on the image, so this extra TIFF transcoding step wasn't giving us any sanitization benefit to begin with — the only thing it does is change the image format and inflate image size 🤦🏻♂️. I think we should address this potential PII leak in a followup, but at least we know that simply reading the JPEG/PNG data isn't going to introduce any new privacy issues than we have today. I'll move my WIP on `imageDataAfterRemovingMetadata()` over to <
https://webkit.org/b/235534
>, and just do the "prioritize JPG/PNG data" part in this bug.
Wenson Hsieh
Comment 16
2022-01-24 12:38:42 PST
Created
attachment 449845
[details]
Patch
Ryosuke Niwa
Comment 17
2022-01-24 13:07:31 PST
Comment on
attachment 449845
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=449845&action=review
> Source/WebCore/platform/mac/PasteboardMac.mm:442 > + Vector<PasteboardItemInfo> relevantItems;
relevant items sounds very generic vague. How about itemsForIndexByFidelity? Alternatively, it might be better to create a lambda here that each case below uses: e.g. auto platformTypesFromItems = [](auto& items) { HashSet<String> types; for (auto& item : types) { for (auto& type : item.platformTypesByFidelity) nonTranscodedTypes.add(type); } return types; } if (itemIndex) { if (auto itemInfo = strategy.informationForItemAtIndex(*itemIndex, m_pasteboardName, m_changeCount, context())) { types = itemInfo->platformTypesByFidelity; nonTranscodedTypes = platformTypesFromItems(*itemInfo); } } else { strategy.getTypes(types, m_pasteboardName, context()); if (auto allItems = strategy.allPasteboardItemInfo(m_pasteboardName, m_changeCount, context())) nonTranscodedTypes = platformTypesFromItems(*itemInfo); }
> Source/WebCore/platform/mac/PasteboardMac.mm:537 > + { String(legacyTIFFPasteboardType()), "image/tiff"_s },
Doesn't this prefer TIFF and PDF over PNG/JPEG if both are written into the pasteboard? Perhaps that warrants an explanation in the change log since this patch is about preferring PNG/JPEG.
> Source/WebCore/platform/mac/PasteboardMac.mm:571 > + // If the original data could not be read directly, fall back by trying to transcode to any of the above types.
This comment repeats what the code says. Can we instead explain when this would happen? Why may we not be able to read the original data?
Wenson Hsieh
Comment 18
2022-01-24 13:50:00 PST
Comment on
attachment 449845
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=449845&action=review
Thanks for the review!
>> Source/WebCore/platform/mac/PasteboardMac.mm:442 >> + Vector<PasteboardItemInfo> relevantItems; > > relevant items sounds very generic vague. > How about itemsForIndexByFidelity? > Alternatively, it might be better to create a lambda here > that each case below uses: > > e.g. > auto platformTypesFromItems = [](auto& items) { > HashSet<String> types; > for (auto& item : types) { > for (auto& type : item.platformTypesByFidelity) > nonTranscodedTypes.add(type); > } > return types; > } > > if (itemIndex) { > if (auto itemInfo = strategy.informationForItemAtIndex(*itemIndex, m_pasteboardName, m_changeCount, context())) { > types = itemInfo->platformTypesByFidelity; > nonTranscodedTypes = platformTypesFromItems(*itemInfo); > } > } else { > strategy.getTypes(types, m_pasteboardName, context()); > if (auto allItems = strategy.allPasteboardItemInfo(m_pasteboardName, m_changeCount, context())) > nonTranscodedTypes = platformTypesFromItems(*itemInfo); > }
Sounds good! Refactored this to pull the logic out into a lambda.
>> Source/WebCore/platform/mac/PasteboardMac.mm:537 >> + { String(legacyTIFFPasteboardType()), "image/tiff"_s }, > > Doesn't this prefer TIFF and PDF over PNG/JPEG if both are written into the pasteboard? > Perhaps that warrants an explanation in the change log since this patch is about preferring PNG/JPEG.
Ah, so this patch is more about avoiding unnecessary transcoding in the case where PNG/JPEG data is supplied to the pasteboard (before this patch, we end up forcing AppKit to transcode to TIFF anyways). I think that if TIFF is explicitly written to the pasteboard (such that reading it does not require transcoding from another format), then it's probably fine to continue choosing TIFF (even if some other formats like PNG/JPEG are also written to the pasteboard). I'll clarify this in the ChangeLog.
>> Source/WebCore/platform/mac/PasteboardMac.mm:571 >> + // If the original data could not be read directly, fall back by trying to transcode to any of the above types. > > This comment repeats what the code says. > Can we instead explain when this would happen? > Why may we not be able to read the original data?
So one scenario where we would end up here is when the pasteboard contains a format that isn't one of (TIFF, PDF, PNG, JPEG), so in order to still handle the paste we fall back by transcoding to one of these formats. A more concrete example is when the pasteboard only contains GIF data; in this case it matches none of the 4 types above, so we'll handle it by transcoding to TIFF instead. Instead of relying on comments here, I renamed `remainingImageTypesToRead` to `transcodedImageTypesToRead`, which I think better describes this behavior (without needing separate comments).
Wenson Hsieh
Comment 19
2022-01-24 14:55:09 PST
Created
attachment 449864
[details]
Patch for landing
EWS
Comment 20
2022-01-24 15:48:48 PST
Committed
r288477
(
246360@main
): <
https://commits.webkit.org/246360@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 449864
[details]
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug