RESOLVED FIXED Bug 213318
Text manipulation does not observe manipulated text after update
https://bugs.webkit.org/show_bug.cgi?id=213318
Summary Text manipulation does not observe manipulated text after update
Sihui Liu
Reported 2020-06-17 15:13:30 PDT
...
Attachments
Patch (9.93 KB, patch)
2020-06-17 15:18 PDT, Sihui Liu
no flags
Patch (9.97 KB, patch)
2020-06-17 15:20 PDT, Sihui Liu
no flags
Patch (9.85 KB, patch)
2020-06-26 12:46 PDT, Sihui Liu
no flags
Patch (12.49 KB, patch)
2020-06-26 15:21 PDT, Sihui Liu
no flags
Patch (12.02 KB, patch)
2020-06-26 16:25 PDT, Sihui Liu
no flags
Patch (12.74 KB, patch)
2020-06-29 09:58 PDT, Sihui Liu
no flags
Patch for landing (12.72 KB, patch)
2020-07-13 10:01 PDT, Sihui Liu
no flags
Sihui Liu
Comment 1 2020-06-17 15:14:42 PDT
Sihui Liu
Comment 2 2020-06-17 15:18:39 PDT
Sihui Liu
Comment 3 2020-06-17 15:20:36 PDT
Sihui Liu
Comment 4 2020-06-26 12:46:54 PDT
Darin Adler
Comment 5 2020-06-26 12:49:32 PDT
Comment on attachment 402887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review > Source/WebCore/dom/CharacterData.cpp:70 > + if (is<Text>(*this) && !m_data.contains(oldData)) { This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check.
Wenson Hsieh
Comment 6 2020-06-26 13:50:41 PDT
Comment on attachment 402887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review >> Source/WebCore/dom/CharacterData.cpp:70 >> + if (is<Text>(*this) && !m_data.contains(oldData)) { > > This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. > > Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text. That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists?
Darin Adler
Comment 7 2020-06-26 13:51:30 PDT
Comment on attachment 402887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review >>> Source/WebCore/dom/CharacterData.cpp:70 >>> + if (is<Text>(*this) && !m_data.contains(oldData)) { >> >> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. >> >> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. > > I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text. > > That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists? Can contains be a coincidence?
Wenson Hsieh
Comment 8 2020-06-26 13:59:09 PDT
Comment on attachment 402887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402887&action=review >>>> Source/WebCore/dom/CharacterData.cpp:70 >>>> + if (is<Text>(*this) && !m_data.contains(oldData)) { >>> >>> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. >>> >>> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. >> >> I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text. >> >> That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists? > > Can contains be a coincidence? Indeed — it could very well be a coincidence. For instance, maybe the original text was just a single space or symbol... My understanding is that this is just a heuristic that tends to work well in practice. @Sihui — were there any websites where this `contains` check is necessary for compat? If not, perhaps we could simplify this a bit by calling didUpdateContentForText if the text changed at all.
Sihui Liu
Comment 9 2020-06-26 14:08:18 PDT
(In reply to Wenson Hsieh from comment #8) > Comment on attachment 402887 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402887&action=review > > >>>> Source/WebCore/dom/CharacterData.cpp:70 > >>>> + if (is<Text>(*this) && !m_data.contains(oldData)) { > >>> > >>> This contains check seems peculiar. The new data might happen to contain the old data by coincidence; not sure if it’s a meaningful check. > >>> > >>> Also, the is<Text> check seems peculiar. Normally this would be done by making the setData function virtual rather than adding a derived class check. > >> > >> I believe `!m_data.contains(oldData)` is just a translation-specific heuristic in this case, since the translation framework does not handle text that has already been partially translated to the target language. For instance, if we have a text node like "one two" that is translated to "一 二", and then the page's script later sets this text to "three four", we'd want to re-mark it for translation so that it can be translated to "三 四". However, if the web page appends new text in the source language — for example, "一 二 three four" — it is better to avoid re-translating this text. > >> > >> That said, it does sound expensive to check for `contains` every time here; maybe we should just omit the `contains` check, or only do it if the text manipulation controller exists? > > > > Can contains be a coincidence? > > Indeed — it could very well be a coincidence. For instance, maybe the > original text was just a single space or symbol... > > My understanding is that this is just a heuristic that tends to work well in > practice. @Sihui — were there any websites where this `contains` check is > necessary for compat? If not, perhaps we could simplify this a bit by > calling didUpdateContentForText if the text changed at all. This check is not necessary for the radar. I will remove it.
Sihui Liu
Comment 10 2020-06-26 14:52:12 PDT
(In reply to Darin Adler from comment #5) > Comment on attachment 402887 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402887&action=review > > > Source/WebCore/dom/CharacterData.cpp:70 > > + if (is<Text>(*this) && !m_data.contains(oldData)) { > > This contains check seems peculiar. The new data might happen to contain the > old data by coincidence; not sure if it’s a meaningful check. > > Also, the is<Text> check seems peculiar. Normally this would be done by > making the setData function virtual rather than adding a derived class check. I found setData is called in multiple places: CharacterData::setNodeValue, replaceChildrenWithFragment, Text::replaceWholeText, etc, so I just added a general check. I can add didUpdateContentForText to each of them if it's preferred.
Sihui Liu
Comment 11 2020-06-26 15:21:46 PDT
Darin Adler
Comment 12 2020-06-26 15:38:37 PDT
Comment on attachment 402912 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402912&action=review > Source/WebCore/dom/CharacterData.cpp:190 > + if (is<Text>(*this)) { Can we make the setNodeValue function be virtual and overridden in the Text class, instead of doing if here? That’s the pattern for C++ programming. Or is that poorer for performance? And/or why isn’t this in the setData function? Are there calls to setData that must not do this?
Sihui Liu
Comment 13 2020-06-26 16:25:14 PDT
(In reply to Darin Adler from comment #12) > Comment on attachment 402912 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402912&action=review > > > Source/WebCore/dom/CharacterData.cpp:190 > > + if (is<Text>(*this)) { > > Can we make the setNodeValue function be virtual and overridden in the Text > class, instead of doing if here? That’s the pattern for C++ programming. Or > is that poorer for performance? > Sure, will update the patch by overriding setData in Text. > And/or why isn’t this in the setData function? Are there calls to setData > that must not do this? No, I just misunderstood your previous comment.
Sihui Liu
Comment 14 2020-06-26 16:25:54 PDT
Darin Adler
Comment 15 2020-06-28 12:50:48 PDT
Comment on attachment 402922 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=402922&action=review > Source/WebCore/dom/CharacterData.h:35 > + WEBCORE_EXPORT virtual void setData(const String&); Looking even more closely, it seems like the real bottleneck is CharacterData::setDataAndUpdate, called in various cases where setData is not called. So maybe it would be even better to put the new code there.
Sihui Liu
Comment 16 2020-06-29 09:57:08 PDT
(In reply to Darin Adler from comment #15) > Comment on attachment 402922 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=402922&action=review > > > Source/WebCore/dom/CharacterData.h:35 > > + WEBCORE_EXPORT virtual void setData(const String&); > > Looking even more closely, it seems like the real bottleneck is > CharacterData::setDataAndUpdate, called in various cases where setData is > not called. > > So maybe it would be even better to put the new code there. Sure, that should work too.
Sihui Liu
Comment 17 2020-06-29 09:58:52 PDT
Geoffrey Garen
Comment 18 2020-07-10 15:52:00 PDT
Comment on attachment 403077 [details] Patch r=me Looks like all comments have been addressed and incorporated.
Sihui Liu
Comment 19 2020-07-13 10:01:48 PDT
Created attachment 404153 [details] Patch for landing
EWS
Comment 20 2020-07-13 10:36:26 PDT
Committed r264305: <https://trac.webkit.org/changeset/264305> All reviewed patches have been landed. Closing bug and clearing flags on attachment 404153 [details].
Note You need to log in before you can comment on or make changes to this bug.