Bug 220526

Summary: [Cocoa] Strip DataDetectors links when copying content to the pasteboard
Product: WebKit Reporter: Aditya Keerthi <akeerthi>
Component: HTML EditingAssignee: Aditya Keerthi <akeerthi>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, ews-watchlist, mifenton, rniwa, thorton, webkit-bug-importer, wenson_hsieh
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Aditya Keerthi 2021-01-11 15:40:02 PST
...
Comment 1 Aditya Keerthi 2021-01-11 15:40:24 PST
<rdar://problem/71045590>
Comment 2 Aditya Keerthi 2021-01-11 15:42:53 PST
Created attachment 417417 [details]
Patch
Comment 3 Ryosuke Niwa 2021-01-11 15:48:26 PST
Comment on attachment 417417 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=417417&action=review

> LayoutTests/editing/pasteboard/copy-paste-data-detected-links.html:6
> +    <span>Meeting <a href="x-apple-data-detectors://0" dir="ltr" x-apple-data-detectors="true" x-apple-data-detectors-type="calendar-event" x-apple-data-detectors-result="0" style="color: currentcolor; text-decoration-color: rgba(128, 128, 128, 0.38);">on Friday 11/6 at 4pm</a> at <a href="x-apple-data-detectors://1" dir="ltr" x-apple-data-detectors="true" x-apple-data-detectors-type="address" x-apple-data-detectors-result="1" style="color: currentcolor; text-decoration-color: rgba(128, 128, 128, 0.38);">1 Apple Park Way, Cupertino CA</a>

If this anchor element gets, say, font-weight: bold, do we preserve that using a span?
Otherwise, we need a span conversion instead of simply removing the anchor.
Also, please add a test for this case.
Comment 4 Aditya Keerthi 2021-01-12 09:00:51 PST
Created attachment 417459 [details]
Patch
Comment 5 Aditya Keerthi 2021-01-12 09:01:15 PST
Added tests for the WebArchive and RTF cases.
Comment 6 Aditya Keerthi 2021-01-12 09:02:49 PST
(In reply to Ryosuke Niwa from comment #3)
> Comment on attachment 417417 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=417417&action=review
> 
> > LayoutTests/editing/pasteboard/copy-paste-data-detected-links.html:6
> > +    <span>Meeting <a href="x-apple-data-detectors://0" dir="ltr" x-apple-data-detectors="true" x-apple-data-detectors-type="calendar-event" x-apple-data-detectors-result="0" style="color: currentcolor; text-decoration-color: rgba(128, 128, 128, 0.38);">on Friday 11/6 at 4pm</a> at <a href="x-apple-data-detectors://1" dir="ltr" x-apple-data-detectors="true" x-apple-data-detectors-type="address" x-apple-data-detectors-result="1" style="color: currentcolor; text-decoration-color: rgba(128, 128, 128, 0.38);">1 Apple Park Way, Cupertino CA</a>
> 
> If this anchor element gets, say, font-weight: bold, do we preserve that
> using a span?
> Otherwise, we need a span conversion instead of simply removing the anchor.
> Also, please add a test for this case.

I'm not sure span conversion is necessary, since DD links are behind SPI, and are only used in non-editable contexts.

If we did perform span conversion, we would also preserve the "color" and "currentcolor" styles added by us when creating a DD link, which doesn't make sense to preserve.
Comment 7 Aditya Keerthi 2021-01-12 11:07:09 PST
(In reply to Aditya Keerthi from comment #6)
> (In reply to Ryosuke Niwa from comment #3)
> > Comment on attachment 417417 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=417417&action=review
> > 
> > > LayoutTests/editing/pasteboard/copy-paste-data-detected-links.html:6
> > > +    <span>Meeting <a href="x-apple-data-detectors://0" dir="ltr" x-apple-data-detectors="true" x-apple-data-detectors-type="calendar-event" x-apple-data-detectors-result="0" style="color: currentcolor; text-decoration-color: rgba(128, 128, 128, 0.38);">on Friday 11/6 at 4pm</a> at <a href="x-apple-data-detectors://1" dir="ltr" x-apple-data-detectors="true" x-apple-data-detectors-type="address" x-apple-data-detectors-result="1" style="color: currentcolor; text-decoration-color: rgba(128, 128, 128, 0.38);">1 Apple Park Way, Cupertino CA</a>
> > 
> > If this anchor element gets, say, font-weight: bold, do we preserve that
> > using a span?
> > Otherwise, we need a span conversion instead of simply removing the anchor.
> > Also, please add a test for this case.
> 
> I'm not sure span conversion is necessary, since DD links are behind SPI,
> and are only used in non-editable contexts.
> 
> If we did perform span conversion, we would also preserve the "color" and
> "currentcolor" styles added by us when creating a DD link, which doesn't
> make sense to preserve.

I was incorrect about DD links being SPI only (https://developer.apple.com/documentation/webkit/wkwebviewconfiguration/1641937-datadetectortypes?language=objc).

Will update the patch to perform span conversion and add new tests.
Comment 8 Aditya Keerthi 2021-01-12 16:02:50 PST
Created attachment 417494 [details]
Patch
Comment 9 Aditya Keerthi 2021-01-12 16:04:07 PST
The new patch replaces the nodes with <span> elements, rather than simply removing them.

The tests account for style preservation with the addition of "font-weight: bold" to the style attribute.
Comment 10 Ryosuke Niwa 2021-01-12 20:17:48 PST
Comment on attachment 417494 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=417494&action=review

> Source/WebCore/editing/cocoa/DataDetection.mm:650
> +    for (auto* child = ElementTraversal::firstChild(element); child; child = ElementTraversal::nextSibling(*child))
> +        convertDataDetectedLinkToSpan(*child);

Why are we recursing here? We've already found all anchor elements in replaceDataDetectedLinksInContainer.

> Source/WebCore/editing/cocoa/DataDetection.mm:662
> +    spanElement->removeAttribute(x_apple_data_detectorsAttr);

We should probably also remove color and background-color properties if they match the one we add by default.

> Source/WebCore/editing/markup.cpp:639
> +        if (is<HTMLAnchorElement>(node) && equalIgnoringASCIICase(downcast<HTMLAnchorElement>(node).attributeWithoutSynchronization(x_apple_data_detectorsAttr), "true"))
> +            m_containsDataDetectedLink = true;

Instead of this, we should just modify StyledMarkupAccumulator::appendStartTag to emit a span instead of <a.
Namely, look at what we do in StyledMarkupAccumulator::appendStartTag for HTMLSlotElement.
We just need to do something similar and then after the line where we merge inline style:
        if (is<StyledElement>(element) && downcast<StyledElement>(element).inlineStyle())
            newInlineStyle->overrideWithStyle(*downcast<StyledElement>(element).inlineStyle());
we need some code to delete the those inline styles inserted by WebKit for data detecter anchor elements.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/CopyRTF.mm:156
> +            ASSERT_NOT_REACHED();

Please use FAIL() instead.
Comment 11 Darin Adler 2021-01-13 10:54:38 PST
Comment on attachment 417494 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=417494&action=review

>> Source/WebCore/editing/cocoa/DataDetection.mm:650
>> +        convertDataDetectedLinkToSpan(*child);
> 
> Why are we recursing here? We've already found all anchor elements in replaceDataDetectedLinksInContainer.

An error like this makes me think we need to stress test with an absurdly deep hierarchy so we can quickly detect if we accidentally make algorithms have unacceptable complexity.
Comment 12 Aditya Keerthi 2021-01-13 10:54:46 PST
Created attachment 417548 [details]
Patch
Comment 13 Aditya Keerthi 2021-01-13 10:59:04 PST
(In reply to Ryosuke Niwa from comment #10)
> Comment on attachment 417494 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=417494&action=review
> 
> > Source/WebCore/editing/cocoa/DataDetection.mm:650
> > +    for (auto* child = ElementTraversal::firstChild(element); child; child = ElementTraversal::nextSibling(*child))
> > +        convertDataDetectedLinkToSpan(*child);
> 
> Why are we recursing here? We've already found all anchor elements in
> replaceDataDetectedLinksInContainer.

Right, this was unnecessary. Note that this code has been removed entirely from the latest patch, due to the new approach to <span> conversion.
 
> > Source/WebCore/editing/cocoa/DataDetection.mm:662
> > +    spanElement->removeAttribute(x_apple_data_detectorsAttr);
> 
> We should probably also remove color and background-color properties if they
> match the one we add by default.

We don't add a "background-color" property, but we do add "color: currentcolor", which is already removed when creating the inline style.
 
> > Source/WebCore/editing/markup.cpp:639
> > +        if (is<HTMLAnchorElement>(node) && equalIgnoringASCIICase(downcast<HTMLAnchorElement>(node).attributeWithoutSynchronization(x_apple_data_detectorsAttr), "true"))
> > +            m_containsDataDetectedLink = true;
> 
> Instead of this, we should just modify
> StyledMarkupAccumulator::appendStartTag to emit a span instead of <a.
> Namely, look at what we do in StyledMarkupAccumulator::appendStartTag for
> HTMLSlotElement.
> We just need to do something similar and then after the line where we merge
> inline style:
>         if (is<StyledElement>(element) &&
> downcast<StyledElement>(element).inlineStyle())
>            
> newInlineStyle->overrideWithStyle(*downcast<StyledElement>(element).
> inlineStyle());
> we need some code to delete the those inline styles inserted by WebKit for
> data detecter anchor elements.

Updated the patch to modify appendStartTag/appendEndTag rather than manipulating the nodes in a DocumentFragment.

Note that the only style removed explicitly is CSSPropertyTextDecorationColor, which we add when creating the link.
 
> > Tools/TestWebKitAPI/Tests/WebKitCocoa/CopyRTF.mm:156
> > +            ASSERT_NOT_REACHED();
> 
> Please use FAIL() instead.

Done.
Comment 14 Ryosuke Niwa 2021-01-13 12:58:30 PST
Comment on attachment 417548 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=417548&action=review

> Source/WebCore/editing/cocoa/HTMLConverter.mm:1966
> -    if (range.length > 0 && element.hasTagName(aTag)) {
> +    if (range.length > 0 && element.hasTagName(aTag) && ![@"true" isEqualToString:element.getAttribute(x_apple_data_detectorsAttr)]) {

Can we just call DataDetection::isDataDetectorElement(element)?

> Source/WebCore/editing/markup.cpp:532
> +#if ENABLE(DATA_DETECTION)
> +    const bool isDataDetectorElement = DataDetection::isDataDetectorElement(element);
> +#endif

Rather than checking this condition twice, why don't we make shouldReplaceElementWithSpan return an enum class.
e.g.
enum class SpanReplacementType {
    None,
    Slot,
    DataDetector
};
auto replacementType = spanReplacementForElement(element)

> Source/WebCore/editing/markup.cpp:554
> +            if (isDataDetectorElement) {
> +                if (attributeName == x_apple_data_detectorsAttr)

Can we extract this as a helper function in DataDetector like this:
DataDetector::isDataDetectorAttribute(attribute.name())
attribute.name() is super cheap (just loading from offset) so there is no need to avoid calling that twice here really.

> Source/WebCore/editing/markup.cpp:586
> +        if (isDataDetectorElement && newInlineStyle->style())
> +            newInlineStyle->style()->removeProperty(CSSPropertyTextDecorationColor);

How about color: currentcolor?
Comment 15 Ryosuke Niwa 2021-01-13 12:59:18 PST
(In reply to Aditya Keerthi from comment #13)
> (In reply to Ryosuke Niwa from comment #10)
> > Comment on attachment 417494 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=417494&action=review
> >
> > > Source/WebCore/editing/cocoa/DataDetection.mm:662
> > > +    spanElement->removeAttribute(x_apple_data_detectorsAttr);
> > 
> > We should probably also remove color and background-color properties if they
> > match the one we add by default.
> 
> We don't add a "background-color" property, but we do add "color:
> currentcolor", which is already removed when creating the inline style.

Ah, missed this comment. nvm.
Comment 16 Aditya Keerthi 2021-01-13 13:47:44 PST
Created attachment 417559 [details]
Patch
Comment 17 Aditya Keerthi 2021-01-13 13:48:56 PST
(In reply to Ryosuke Niwa from comment #14)
> Comment on attachment 417548 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=417548&action=review
> 
> > Source/WebCore/editing/cocoa/HTMLConverter.mm:1966
> > -    if (range.length > 0 && element.hasTagName(aTag)) {
> > +    if (range.length > 0 && element.hasTagName(aTag) && ![@"true" isEqualToString:element.getAttribute(x_apple_data_detectorsAttr)]) {
> 
> Can we just call DataDetection::isDataDetectorElement(element)?

Done. I also moved the link creation logic into its own method.
 
> > Source/WebCore/editing/markup.cpp:532
> > +#if ENABLE(DATA_DETECTION)
> > +    const bool isDataDetectorElement = DataDetection::isDataDetectorElement(element);
> > +#endif
> 
> Rather than checking this condition twice, why don't we make
> shouldReplaceElementWithSpan return an enum class.
> e.g.
> enum class SpanReplacementType {
>     None,
>     Slot,
>     DataDetector
> };
> auto replacementType = spanReplacementForElement(element)

Updated to use an enum.
 
> > Source/WebCore/editing/markup.cpp:554
> > +            if (isDataDetectorElement) {
> > +                if (attributeName == x_apple_data_detectorsAttr)
> 
> Can we extract this as a helper function in DataDetector like this:
> DataDetector::isDataDetectorAttribute(attribute.name())
> attribute.name() is super cheap (just loading from offset) so there is no
> need to avoid calling that twice here really.

Done.
Comment 18 EWS 2021-01-14 18:00:30 PST
Committed r271507: <https://trac.webkit.org/changeset/271507>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 417559 [details].