Bug 211524 - Preserve character set information when writing to the pasteboard when copying rich text
Summary: Preserve character set information when writing to the pasteboard when copyin...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Wenson Hsieh
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-05-06 12:35 PDT by Wenson Hsieh
Modified: 2022-08-02 21:24 PDT (History)
10 users (show)

See Also:


Attachments
Patch (14.76 KB, patch)
2020-05-07 14:39 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Patch (17.63 KB, patch)
2020-05-07 15:29 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Address feedback (19.68 KB, patch)
2020-05-07 17:33 PDT, Wenson Hsieh
darin: review+
Details | Formatted Diff | Diff
Patch for landing (19.65 KB, patch)
2020-05-08 10:13 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wenson Hsieh 2020-05-06 12:35:31 PDT
See: the discussion in https://bugs.webkit.org/show_bug.cgi?id=211498.
Comment 1 Wenson Hsieh 2020-05-07 14:39:56 PDT Comment hidden (obsolete)
Comment 2 Wenson Hsieh 2020-05-07 15:29:16 PDT
Created attachment 398804 [details]
Patch
Comment 3 Darin Adler 2020-05-07 15:35:15 PDT
Comment on attachment 398804 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398804&action=review

> Source/WebCore/editing/markup.cpp:921
> +#if PLATFORM(COCOA)

I think this kind of issue might exist on other platforms as well. Would be nice to call people’s attention to this in case they want it to take advantage of it.

> Source/WebCore/editing/markup.cpp:926
> +    if (!accumulatedMarkup.isAllASCII()) {
> +        // On Cocoa platforms, this markup is eventually persisted to the pasteboard and read back as UTF-8 data,
> +        // so this meta tag is needed for clients that read this data in the future from the pasteboard and load it.
> +        return makeString("<meta charset=\"UTF-8\">", WTFMove(accumulatedMarkup));
> +    }

Could avoid making a second copy of the entire string by adding an isAllASCII function to StyledMarkupAccumulator and adding another function you can use to add this to m_reversedPrecedingMarkup before calling takeResults. Less economical in code complexity, but more efficient.
Comment 4 Wenson Hsieh 2020-05-07 16:09:29 PDT
Comment on attachment 398804 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398804&action=review

>> Source/WebCore/editing/markup.cpp:926
>> +    }
> 
> Could avoid making a second copy of the entire string by adding an isAllASCII function to StyledMarkupAccumulator and adding another function you can use to add this to m_reversedPrecedingMarkup before calling takeResults. Less economical in code complexity, but more efficient.

Sounds good to me! I’ll update the patch to do this.
Comment 5 Wenson Hsieh 2020-05-07 16:37:30 PDT
(In reply to Wenson Hsieh from comment #4)
> Comment on attachment 398804 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=398804&action=review
> 
> >> Source/WebCore/editing/markup.cpp:926
> >> +    }
> > 
> > Could avoid making a second copy of the entire string by adding an isAllASCII function to StyledMarkupAccumulator and adding another function you can use to add this to m_reversedPrecedingMarkup before calling takeResults. Less economical in code complexity, but more efficient.
> 
> Sounds good to me! I’ll update the patch to do this.

So adding an `isAllASCII()` method to `StyledMarkupAccumulator` would require us to also add an `isAllASCII()` method to StringBuilder — which, I think, seems fine? I imagine it would just be like WTF::String’s. Something like:

bool isAllASCII() const { return !m_buffer || m_buffer->isAllASCII(); }
Comment 6 Wenson Hsieh 2020-05-07 17:12:42 PDT
(In reply to Wenson Hsieh from comment #5)
> (In reply to Wenson Hsieh from comment #4)
> > Comment on attachment 398804 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=398804&action=review
> > 
> > >> Source/WebCore/editing/markup.cpp:926
> > >> +    }
> > > 
> > > Could avoid making a second copy of the entire string by adding an isAllASCII function to StyledMarkupAccumulator and adding another function you can use to add this to m_reversedPrecedingMarkup before calling takeResults. Less economical in code complexity, but more efficient.
> > 
> > Sounds good to me! I’ll update the patch to do this.
> 
> So adding an `isAllASCII()` method to `StyledMarkupAccumulator` would
> require us to also add an `isAllASCII()` method to StringBuilder — which, I
> think, seems fine? I imagine it would just be like WTF::String’s. Something
> like:
> 
> bool isAllASCII() const { return !m_buffer || m_buffer->isAllASCII(); }

…upon further testing, this isn’t correct, because a StringBuilder can be resized (but keep the same m_buffer) :/
Comment 7 Wenson Hsieh 2020-05-07 17:33:47 PDT
Created attachment 398819 [details]
Address feedback
Comment 8 Darin Adler 2020-05-08 09:04:27 PDT
Comment on attachment 398819 [details]
Address feedback

View in context: https://bugs.webkit.org/attachment.cgi?id=398819&action=review

> Source/WebCore/editing/MarkupAccumulator.h:72
> +    bool isAllASCII() const { return m_markup.toStringPreserveCapacity().isAllASCII(); }

We can follow up and make a much more efficient version of this. Fine to land like this I suppose.

> Source/WebCore/editing/markup.cpp:248
> +        m_reversedPrecedingMarkup.append("<meta charset=\"UTF-8\">");

Should add a _s here, I think. It’s more efficient to create a String from an ASCIILiteral, since it doesn’t copy the characters.
Comment 9 Darin Adler 2020-05-08 09:35:03 PDT
Comment on attachment 398819 [details]
Address feedback

View in context: https://bugs.webkit.org/attachment.cgi?id=398819&action=review

>> Source/WebCore/editing/MarkupAccumulator.h:72
>> +    bool isAllASCII() const { return m_markup.toStringPreserveCapacity().isAllASCII(); }
> 
> We can follow up and make a much more efficient version of this. Fine to land like this I suppose.

I’m happy to do this optimization after this lands.
Comment 10 Wenson Hsieh 2020-05-08 10:06:35 PDT
Comment on attachment 398819 [details]
Address feedback

View in context: https://bugs.webkit.org/attachment.cgi?id=398819&action=review

>>> Source/WebCore/editing/MarkupAccumulator.h:72
>>> +    bool isAllASCII() const { return m_markup.toStringPreserveCapacity().isAllASCII(); }
>> 
>> We can follow up and make a much more efficient version of this. Fine to land like this I suppose.
> 
> I’m happy to do this optimization after this lands.

\o/

>> Source/WebCore/editing/markup.cpp:248
>> +        m_reversedPrecedingMarkup.append("<meta charset=\"UTF-8\">");
> 
> Should add a _s here, I think. It’s more efficient to create a String from an ASCIILiteral, since it doesn’t copy the characters.

Done!
Comment 11 Wenson Hsieh 2020-05-08 10:13:16 PDT
Created attachment 398871 [details]
Patch for landing
Comment 12 EWS 2020-05-08 10:35:51 PDT
Committed r261395: <https://trac.webkit.org/changeset/261395>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 398871 [details].
Comment 13 Radar WebKit Bug Importer 2020-05-08 10:41:59 PDT
<rdar://problem/63027006>
Comment 14 Ryosuke Niwa 2022-08-02 21:24:31 PDT
Comment on attachment 398819 [details]
Address feedback

View in context: https://bugs.webkit.org/attachment.cgi?id=398819&action=review

>>> Source/WebCore/editing/markup.cpp:248
>>> +        m_reversedPrecedingMarkup.append("<meta charset=\"UTF-8\">");
>> 
>> Should add a _s here, I think. It’s more efficient to create a String from an ASCIILiteral, since it doesn’t copy the characters.
> 
> Done!

This patch broke WebKit's ability to save XHTML documents.
This element will cause a parsing error in a XHTML document.