RESOLVED FIXED211524
Preserve character set information when writing to the pasteboard when copying rich text
https://bugs.webkit.org/show_bug.cgi?id=211524
Summary Preserve character set information when writing to the pasteboard when copyin...
Wenson Hsieh
Reported 2020-05-06 12:35:31 PDT
Attachments
Patch (14.76 KB, patch)
2020-05-07 14:39 PDT, Wenson Hsieh
no flags
Patch (17.63 KB, patch)
2020-05-07 15:29 PDT, Wenson Hsieh
no flags
Address feedback (19.68 KB, patch)
2020-05-07 17:33 PDT, Wenson Hsieh
darin: review+
Patch for landing (19.65 KB, patch)
2020-05-08 10:13 PDT, Wenson Hsieh
no flags
Wenson Hsieh
Comment 1 2020-05-07 14:39:56 PDT Comment hidden (obsolete)
Wenson Hsieh
Comment 2 2020-05-07 15:29:16 PDT
Darin Adler
Comment 3 2020-05-07 15:35:15 PDT
Comment on attachment 398804 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398804&action=review > Source/WebCore/editing/markup.cpp:921 > +#if PLATFORM(COCOA) I think this kind of issue might exist on other platforms as well. Would be nice to call people’s attention to this in case they want it to take advantage of it. > Source/WebCore/editing/markup.cpp:926 > + if (!accumulatedMarkup.isAllASCII()) { > + // On Cocoa platforms, this markup is eventually persisted to the pasteboard and read back as UTF-8 data, > + // so this meta tag is needed for clients that read this data in the future from the pasteboard and load it. > + return makeString("<meta charset=\"UTF-8\">", WTFMove(accumulatedMarkup)); > + } Could avoid making a second copy of the entire string by adding an isAllASCII function to StyledMarkupAccumulator and adding another function you can use to add this to m_reversedPrecedingMarkup before calling takeResults. Less economical in code complexity, but more efficient.
Wenson Hsieh
Comment 4 2020-05-07 16:09:29 PDT
Comment on attachment 398804 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398804&action=review >> Source/WebCore/editing/markup.cpp:926 >> + } > > Could avoid making a second copy of the entire string by adding an isAllASCII function to StyledMarkupAccumulator and adding another function you can use to add this to m_reversedPrecedingMarkup before calling takeResults. Less economical in code complexity, but more efficient. Sounds good to me! I’ll update the patch to do this.
Wenson Hsieh
Comment 5 2020-05-07 16:37:30 PDT
(In reply to Wenson Hsieh from comment #4) > Comment on attachment 398804 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=398804&action=review > > >> Source/WebCore/editing/markup.cpp:926 > >> + } > > > > Could avoid making a second copy of the entire string by adding an isAllASCII function to StyledMarkupAccumulator and adding another function you can use to add this to m_reversedPrecedingMarkup before calling takeResults. Less economical in code complexity, but more efficient. > > Sounds good to me! I’ll update the patch to do this. So adding an `isAllASCII()` method to `StyledMarkupAccumulator` would require us to also add an `isAllASCII()` method to StringBuilder — which, I think, seems fine? I imagine it would just be like WTF::String’s. Something like: bool isAllASCII() const { return !m_buffer || m_buffer->isAllASCII(); }
Wenson Hsieh
Comment 6 2020-05-07 17:12:42 PDT
(In reply to Wenson Hsieh from comment #5) > (In reply to Wenson Hsieh from comment #4) > > Comment on attachment 398804 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=398804&action=review > > > > >> Source/WebCore/editing/markup.cpp:926 > > >> + } > > > > > > Could avoid making a second copy of the entire string by adding an isAllASCII function to StyledMarkupAccumulator and adding another function you can use to add this to m_reversedPrecedingMarkup before calling takeResults. Less economical in code complexity, but more efficient. > > > > Sounds good to me! I’ll update the patch to do this. > > So adding an `isAllASCII()` method to `StyledMarkupAccumulator` would > require us to also add an `isAllASCII()` method to StringBuilder — which, I > think, seems fine? I imagine it would just be like WTF::String’s. Something > like: > > bool isAllASCII() const { return !m_buffer || m_buffer->isAllASCII(); } …upon further testing, this isn’t correct, because a StringBuilder can be resized (but keep the same m_buffer) :/
Wenson Hsieh
Comment 7 2020-05-07 17:33:47 PDT
Created attachment 398819 [details] Address feedback
Darin Adler
Comment 8 2020-05-08 09:04:27 PDT
Comment on attachment 398819 [details] Address feedback View in context: https://bugs.webkit.org/attachment.cgi?id=398819&action=review > Source/WebCore/editing/MarkupAccumulator.h:72 > + bool isAllASCII() const { return m_markup.toStringPreserveCapacity().isAllASCII(); } We can follow up and make a much more efficient version of this. Fine to land like this I suppose. > Source/WebCore/editing/markup.cpp:248 > + m_reversedPrecedingMarkup.append("<meta charset=\"UTF-8\">"); Should add a _s here, I think. It’s more efficient to create a String from an ASCIILiteral, since it doesn’t copy the characters.
Darin Adler
Comment 9 2020-05-08 09:35:03 PDT
Comment on attachment 398819 [details] Address feedback View in context: https://bugs.webkit.org/attachment.cgi?id=398819&action=review >> Source/WebCore/editing/MarkupAccumulator.h:72 >> + bool isAllASCII() const { return m_markup.toStringPreserveCapacity().isAllASCII(); } > > We can follow up and make a much more efficient version of this. Fine to land like this I suppose. I’m happy to do this optimization after this lands.
Wenson Hsieh
Comment 10 2020-05-08 10:06:35 PDT
Comment on attachment 398819 [details] Address feedback View in context: https://bugs.webkit.org/attachment.cgi?id=398819&action=review >>> Source/WebCore/editing/MarkupAccumulator.h:72 >>> + bool isAllASCII() const { return m_markup.toStringPreserveCapacity().isAllASCII(); } >> >> We can follow up and make a much more efficient version of this. Fine to land like this I suppose. > > I’m happy to do this optimization after this lands. \o/ >> Source/WebCore/editing/markup.cpp:248 >> + m_reversedPrecedingMarkup.append("<meta charset=\"UTF-8\">"); > > Should add a _s here, I think. It’s more efficient to create a String from an ASCIILiteral, since it doesn’t copy the characters. Done!
Wenson Hsieh
Comment 11 2020-05-08 10:13:16 PDT
Created attachment 398871 [details] Patch for landing
EWS
Comment 12 2020-05-08 10:35:51 PDT
Committed r261395: <https://trac.webkit.org/changeset/261395> All reviewed patches have been landed. Closing bug and clearing flags on attachment 398871 [details].
Radar WebKit Bug Importer
Comment 13 2020-05-08 10:41:59 PDT
Ryosuke Niwa
Comment 14 2022-08-02 21:24:31 PDT
Comment on attachment 398819 [details] Address feedback View in context: https://bugs.webkit.org/attachment.cgi?id=398819&action=review >>> Source/WebCore/editing/markup.cpp:248 >>> + m_reversedPrecedingMarkup.append("<meta charset=\"UTF-8\">"); >> >> Should add a _s here, I think. It’s more efficient to create a String from an ASCIILiteral, since it doesn’t copy the characters. > > Done! This patch broke WebKit's ability to save XHTML documents. This element will cause a parsing error in a XHTML document.
Note You need to log in before you can comment on or make changes to this bug.