Bug 214878 - Text manipulation: leading and trailing spaces should be ignored when comparing content
Summary: Text manipulation: leading and trailing spaces should be ignored when compari...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Sihui Liu
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-07-28 09:13 PDT by Sihui Liu
Modified: 2020-08-06 18:24 PDT (History)
5 users (show)

See Also:


Attachments
Patch (11.86 KB, patch)
2020-07-28 09:32 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (6.22 KB, patch)
2020-08-04 14:12 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (6.42 KB, patch)
2020-08-06 09:19 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch for landing (6.61 KB, patch)
2020-08-06 17:56 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sihui Liu 2020-07-28 09:13:18 PDT
...
Comment 1 Sihui Liu 2020-07-28 09:13:54 PDT
<rdar://problem/63735024>
Comment 2 Sihui Liu 2020-07-28 09:32:49 PDT
Created attachment 405365 [details]
Patch
Comment 3 Ryosuke Niwa 2020-07-30 16:08:44 PDT
Comment on attachment 405365 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=405365&action=review

> Source/WebCore/editing/TextIterator.cpp:639
> +        if ((m_behavior & TextIteratorEmitsCollapsedSpacesAtStart) && m_textBox == firstTextBox && textBoxStart == runStart && runStart && !m_emittedSpaceBeforeFirstTextBox) {

Can we instead do this in TextIterator::handleTextNode() like the way we handle
white-space: pre / pre-wrap content (i.e. inside renderer.style().collapseWhiteSpace())?
Comment 4 Sihui Liu 2020-07-30 22:11:07 PDT
(In reply to Ryosuke Niwa from comment #3)
> Comment on attachment 405365 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=405365&action=review
> 
> > Source/WebCore/editing/TextIterator.cpp:639
> > +        if ((m_behavior & TextIteratorEmitsCollapsedSpacesAtStart) && m_textBox == firstTextBox && textBoxStart == runStart && runStart && !m_emittedSpaceBeforeFirstTextBox) {
> 
> Can we instead do this in TextIterator::handleTextNode() like the way we
> handle
> white-space: pre / pre-wrap content (i.e. inside
> renderer.style().collapseWhiteSpace())?

Do you mean not doing white space collapsing with TextIteratorEmitsCollapsedSpacesAtStart or just move this code for handling first text box to TextIterator::handleTextNode()?
Comment 5 Ryosuke Niwa 2020-07-31 00:04:34 PDT
(In reply to Sihui Liu from comment #4)
> (In reply to Ryosuke Niwa from comment #3)
> > Comment on attachment 405365 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=405365&action=review
> > 
> > > Source/WebCore/editing/TextIterator.cpp:639
> > > +        if ((m_behavior & TextIteratorEmitsCollapsedSpacesAtStart) && m_textBox == firstTextBox && textBoxStart == runStart && runStart && !m_emittedSpaceBeforeFirstTextBox) {
> > 
> > Can we instead do this in TextIterator::handleTextNode() like the way we
> > handle
> > white-space: pre / pre-wrap content (i.e. inside
> > renderer.style().collapseWhiteSpace())?
> 
> Do you mean not doing white space collapsing with
> TextIteratorEmitsCollapsedSpacesAtStart or just move this code for handling
> first text box to TextIterator::handleTextNode()?

The latter. On my second thought, why can't we just ignore the leading whitespace in TextManipulationController??
Comment 6 Sihui Liu 2020-07-31 09:38:22 PDT
(In reply to Ryosuke Niwa from comment #5)
> (In reply to Sihui Liu from comment #4)
> > (In reply to Ryosuke Niwa from comment #3)
> > > Comment on attachment 405365 [details]
> > > Patch
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=405365&action=review
> > > 
> > > > Source/WebCore/editing/TextIterator.cpp:639
> > > > +        if ((m_behavior & TextIteratorEmitsCollapsedSpacesAtStart) && m_textBox == firstTextBox && textBoxStart == runStart && runStart && !m_emittedSpaceBeforeFirstTextBox) {
> > > 
> > > Can we instead do this in TextIterator::handleTextNode() like the way we
> > > handle
> > > white-space: pre / pre-wrap content (i.e. inside
> > > renderer.style().collapseWhiteSpace())?
> > 
> > Do you mean not doing white space collapsing with
> > TextIteratorEmitsCollapsedSpacesAtStart or just move this code for handling
> > first text box to TextIterator::handleTextNode()?
> 
> The latter. On my second thought, why can't we just ignore the leading
> whitespace in TextManipulationController??

You mean when checking the content of in replace()? That's another solution.
I think the issue here is we don't get consistent result from TextIterator when text node's content is unchanged, so it's either we change our way to compare(like comparing after removing leading and trailing spaces), or make TextIterator to return more "stable" result for same content. 

(I didn't do the first one as TextIterator may also insert/remove spaces in the middle of node content based on layout, so maybe still need to adjust TextIterator's behavior later)
Comment 7 Ryosuke Niwa 2020-07-31 11:49:14 PDT
(In reply to Sihui Liu from comment #6)
> (In reply to Ryosuke Niwa from comment #5)
> > (In reply to Sihui Liu from comment #4)
> > > (In reply to Ryosuke Niwa from comment #3)
> > > > Comment on attachment 405365 [details]
> > > > Patch
> > > > 
> > > > View in context:
> > > > https://bugs.webkit.org/attachment.cgi?id=405365&action=review
> > > > 
> > > > > Source/WebCore/editing/TextIterator.cpp:639
> > > > > +        if ((m_behavior & TextIteratorEmitsCollapsedSpacesAtStart) && m_textBox == firstTextBox && textBoxStart == runStart && runStart && !m_emittedSpaceBeforeFirstTextBox) {
> > > > 
> > > > Can we instead do this in TextIterator::handleTextNode() like the way we
> > > > handle
> > > > white-space: pre / pre-wrap content (i.e. inside
> > > > renderer.style().collapseWhiteSpace())?
> > > 
> > > Do you mean not doing white space collapsing with
> > > TextIteratorEmitsCollapsedSpacesAtStart or just move this code for handling
> > > first text box to TextIterator::handleTextNode()?
> > 
> > The latter. On my second thought, why can't we just ignore the leading
> > whitespace in TextManipulationController??
> 
> You mean when checking the content of in replace()? That's another solution.

Yes.

> I think the issue here is we don't get consistent result from TextIterator
> when text node's content is unchanged, so it's either we change our way to
> compare(like comparing after removing leading and trailing spaces), or make
> TextIterator to return more "stable" result for same content.

TextIterator is a very complicated class so I'm a bit concerned of adding yet another mode that does a different thing. I think given what we know, ignoring the leading whitespace at the beginning of a paragraph and the trailing whitespace at the end of a paragraph is a better approach. The code change is where it needs to be (TextManipulationController) so it's easier to explain why / understand why we need that code. Whereas if we added this code in TextIterator, someone else might start using it or modify its behavior for whatever reason without understanding why it was added.

> (I didn't do the first one as TextIterator may also insert/remove spaces in
> the middle of node content based on layout, so maybe still need to adjust
> TextIterator's behavior later)

Well, if that were to happen, the content DID change. If CSS styles are modified enough to start treating whitespace differently, etc... then we need to consider that as a content change. e.g. this is akin to how CSS display: none is set on some text nodes or content: "hello" is added to show more CSS generated content, etc...
Comment 8 Sihui Liu 2020-08-04 14:12:34 PDT
Created attachment 405941 [details]
Patch
Comment 9 Ryosuke Niwa 2020-08-05 11:22:05 PDT
Comment on attachment 405941 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=405941&action=review

> Source/WebCore/ChangeLog:11
> +        collapsed space. 
> +        When TextManipulationController starts observing paragraphs, it iterates the whole document and the range of 

Can we either add blank lines so that they're separate paragraphs or just continue the next sentence on the same line?
It looks weird & hard to read otherwise.

> Source/WebCore/editing/TextManipulationController.cpp:286
> +    return content.stripWhiteSpace() == originalContent.stripWhiteSpace();

We should only do this on the very first or the very last token.
Also, we should call stripWhiteSpace only when the regular equality fails.
Comment 10 Sihui Liu 2020-08-06 09:19:03 PDT
Created attachment 406086 [details]
Patch
Comment 11 Ryosuke Niwa 2020-08-06 14:57:36 PDT
Comment on attachment 406086 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=406086&action=review

> Source/WebCore/editing/TextManipulationController.cpp:802
> +            bool isContentUnchanged = (currentToken.content == token.content) || ((currentTokenIndex == 1 || currentTokenIndex == item.tokens.size()) && isContentEquivalent(currentToken.content, token.content));
> +            if (!content.isReplacedContent && !isContentUnchanged)

This is a really long line.
Can we split it like this:
bool isContentUnchanged = currentToken.content == token.content;
if (!UNLIKELY(isContentUnchanged)) {
    bool isFirstOrLastToken = currentTokenIndex == 1 || currentTokenIndex == item.tokens.size();
    isContentUnchanged = isFirstOrLastToken && isContentEquivalent(currentToken.content, token.content);
}

> Tools/ChangeLog:8
> +

Say that we're adding a regression test.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/TextManipulation.mm:3050
> +        "  <style>"

Since you're not inserting newlines, they'd all be in a single line. We might as well omit indentation.
Comment 12 Ryosuke Niwa 2020-08-06 14:58:16 PDT
Comment on attachment 406086 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=406086&action=review

> Source/WebCore/editing/TextManipulationController.cpp:284
> +static bool isContentEquivalent(const String& content, const String& originalContent)

Can we rename this to something like areEqualIgnoringLeadingAndTrailingWhitespaces?
Comment 13 Sihui Liu 2020-08-06 15:46:38 PDT
(In reply to Ryosuke Niwa from comment #12)
> Comment on attachment 406086 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=406086&action=review
> 
> > Source/WebCore/editing/TextManipulationController.cpp:284
> > +static bool isContentEquivalent(const String& content, const String& originalContent)
> 
> Can we rename this to something like
> areEqualIgnoringLeadingAndTrailingWhitespaces?

Yes. Thanks for review!
Comment 14 Sihui Liu 2020-08-06 17:56:12 PDT
Created attachment 406140 [details]
Patch for landing
Comment 15 EWS 2020-08-06 18:24:10 PDT
Committed r265361: <https://trac.webkit.org/changeset/265361>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 406140 [details].