Bug 28306 - double-clicking a word inside <b> beside newline select two words
: double-clicking a word inside <b> beside newline select two words
Status: RESOLVED FIXED
: WebKit
HTML Editing
: 528+ (Nightly build)
: All Mac OS X 10.5
: P2 Normal
Assigned To:
:
:
: 27204
:
  Show dependency treegraph
 
Reported: 2009-08-14 09:18 PST by
Modified: 2009-12-17 00:53 PST (History)


Attachments
html to reproduce. (58 bytes, text/html)
2009-08-14 09:19 PST, MORITA Hajime
no flags Details
a small fix to use string length from RenderText instead of inline box (5.55 KB, patch)
2009-08-14 09:37 PST, MORITA Hajime
eric: review-
Review Patch | Details | Formatted Diff | Diff
patch update 1 (9.34 KB, patch)
2009-08-14 19:49 PST, MORITA Hajime
no flags Review Patch | Details | Formatted Diff | Diff
patch update 2 (9.16 KB, patch)
2009-08-18 09:57 PST, MORITA Hajime
no flags Review Patch | Details | Formatted Diff | Diff
patch revised (6.70 KB, patch)
2009-11-26 07:45 PST, MORITA Hajime
no flags Review Patch | Details | Formatted Diff | Diff
patch update 4 (23.87 KB, patch)
2009-12-12 19:20 PST, MORITA Hajime
no flags Review Patch | Details | Formatted Diff | Diff
patch update 5 (23.87 KB, patch)
2009-12-15 06:00 PST, MORITA Hajime
no flags Review Patch | Details | Formatted Diff | Diff
remove tab from ChagneLog (23.87 KB, patch)
2009-12-16 18:20 PST, MORITA Hajime
eric: commit‑queue-
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-08-14 09:18:15 PST
Overview:
Double-clicking a word to select that word 
unexpectedly results two-word selected.

Steps to Reproduce:
1. open attached file
2. double click "selectme"

Actual Result:
"foo selectme" is selected(hilightened)

Expected Result:
"selectme" is selected.

Build Date & Platform: 
trunk r47196, Mac OS 10.5.7

Additional Information:
this bug is from chromium
http://code.google.com/p/chromium/issues/detail?id=6396
(and privately confirmed at Safari and WebKit trunk)
------- Comment #1 From 2009-08-14 09:19:08 PST -------
Created an attachment (id=34845) [details]
html to reproduce.

html to reproduce.
------- Comment #2 From 2009-08-14 09:37:20 PST -------
Created an attachment (id=34847) [details]
a small fix to use string length from RenderText instead of inline box
------- Comment #3 From 2009-08-14 13:20:01 PST -------
(From update of attachment 34847 [details])
This does not look like the right fix to me. I don't have time to study the issue at this time, and I could be wrong.
------- Comment #4 From 2009-08-14 15:20:09 PST -------
(From update of attachment 34847 [details])
Needs ChangeLog.
------- Comment #5 From 2009-08-14 19:49:41 PST -------
Created an attachment (id=34883) [details]
patch update 1
------- Comment #6 From 2009-08-14 19:52:42 PST -------
Thank you for reviewing.
I updated the patch
- including some more tests (for confidence), and
- appending ChangeLog
------- Comment #7 From 2009-08-17 17:29:26 PST -------
(From update of attachment 34883 [details])
I assume you meant to mark this as a patch. :)  Note that tools like "bugzilla-tool post-diff" or "git-send-bugzilla" will do all the patch/review marking automatically for you. :)
------- Comment #8 From 2009-08-17 17:31:12 PST -------
(From update of attachment 34883 [details])
This change contains a .DS_Store file.  We don't check those in.

is "MORITA" how you would normally capitalize your name?   I notice that your gmail account has one more r than your name, is either a typo?

There are tabs in the ChangeLog, that will cause the commit to fail.
------- Comment #9 From 2009-08-18 09:57:55 PST -------
Created an attachment (id=35046) [details]
patch update 2
------- Comment #10 From 2009-08-18 10:06:10 PST -------
Wow, bugzilla-tool does work! Thank you for your advice, Eric.

this patch
- remove accidentally checked-in .DS_Store files, and
- untabify my ChangeLog entries.

BTW, capitalizing family name is my style, as some other Japanese people does.
- http://www.sljfaq.org/afaq/japanesenames.html
And I choose "morrita" for gmail address just because 
"morita" was already taken;-)
Thanks for the thought, anyway.
------- Comment #11 From 2009-08-18 16:52:21 PST -------
An unsolicited comment... It seems that editing/inserting/editable-html-element.html is failing with your patch. If the new behavior is better, you may need to fix the result of the test. Otherwise, it would help bugs of your patch.
------- Comment #12 From 2009-08-19 19:36:03 PST -------
Hmm I'm sorry. I'm not seeing the failure now. Maybe I did something wrong... Please just ignore my comment.
------- Comment #13 From 2009-10-13 13:00:39 PST -------
What's the status of this patch?  I don't know this code, but the patch is correctly formated, looks small, and purpose to change behavior that is definitely buggy.
------- Comment #14 From 2009-10-13 13:13:04 PST -------
(From update of attachment 35046 [details])
Thanks for tackling this.

I believe this code change is not quite right. It may make the word break work correctly in this case, but it is probably not right in all cases for the backwards text iterator.

The backwards text iterator does need to detect that there's a word break here, but it is not good for it to do this by just emitting all the trailing whitespace that happens to be in the string. I suspect there are also cases where there could be a line break but no actual whitespace in the string, depending on what element comes after the string. One example I can think of would be a subsequent text node that is entirely whitespace.

We should consider carefully why the forward text iterator has whitespace there and figure out how to do the equivalent for backward. The forward iterator definitely does *not* use the trailing whitespace in the DOM string to do this.

> +    // use str.length() instead of caretMaxOffset()
> +    // because caretMaxOffset() is based on InlineBox and not considering trailing whitespaces,
> +    // which should be handled during word-boundary detection.

There should be a capital "U" on use, and this should be broken into lines of similar length.

> +    if (m_positionEndOffset == 0)
> +      return false;

WebKit style is to use !m_positionEndOffset rather than "== 0" and the second line should be indented four spaces, not two.

review- for now. It may turn out that with more analysis we conclude that this is the best fix, but I suspect that it's not quite right.
------- Comment #15 From 2009-11-26 07:45:06 PST -------
Created an attachment (id=43924) [details]
patch revised
------- Comment #16 From 2009-11-26 08:03:41 PST -------
Thank you for reviewing. 

I've investingated TextIterator, 
And now I feels that it's hard to make SimplifiedBackwardsTextIterator to follow TextIterator way
because:

- text boundary detection need "real" whitespace inside RenderText object, 
  but TextIterator::emitCharacter() and SimplifiedBackwardsTextIterator::emitCharacter()
  use fake character storage on TextIterator's instance, 
  which break boundary-to-offset computation, and
- Although TextIterator traverse InlineBox to precisely collect text on the element,
  Doing such on SimplifiedBackwardsTextIterator would make it no longer "simplified".

So I just lookup trailing whitespaces for each text node on
SimplifiedBackwardsTextIterator::handleTextNode(). 
It should be safe because now we inspect the string content to verify trailing whitespaces.
(My old patch didn't see the string content. It was bad thing.)
------- Comment #17 From 2009-11-30 12:45:30 PST -------
style-queue ran check-webkit-style on attachment 43924 [details] without any errors.
------- Comment #18 From 2009-12-07 10:12:01 PST -------
(From update of attachment 43924 [details])
> +	Bug 28306: double-clicking a word inside <b> beside newline select two words

Tab in change log here.

> +static int collapsedSpaceLength(Node* textNode, int textEnd)
> +{
> +    RenderText* renderer = toRenderText(textNode->renderer());
> +    String str = renderer->text();
> +    int strLength = static_cast<int>(str.length());
> +    for (int i = textEnd; i < strLength; ++i)
> +        if (!isCollapsibleWhitespace(str[i]))
> +            return i - textEnd;
> +    return strLength - textEnd;
> +}

The implementation of this function is wrong. The fact that whitespace is collapsible is not enough. You need to take style into account. The right way to do that is to call RenderStyle::isCollapsibleWhitespace, which gives the correct answer. It is almost never correct to call the global isCollapsibleWhitespace function.

Also, the two line for body needs to have braces around it.

It is also unnecessarily inefficient to put the text into a string object. I would instead do this:

    const UChar* characters = renderer->text()->characters();

And then index directly into the characters array.

Also, for local variables we try to use words rather than abbreviations. The "i" is OK, but "str" is not so great. I also think "length" is fine, since the length is the length of the text.

To test the different whitespace modes, we need test cases that cover editable text with the CSS white-space values, normal, pre, pre-wrap, pre-line, and nowrap, covering both spaces and newline characters, and ideally tabs too, although they should basically work the same as spaces.

> +    // We expand range to include trailing (collapsed) whitespaces, 
> +    // which is required for word-boundary detection.
> +    if (m_node != m_endNode)
> +        m_positionEndOffset += collapsedSpaceLength(m_node, m_positionEndOffset);

I'm not entirely sure this change is OK. It's kind of strange to hardcode this behavior down at this low level. For example, there can be leading whitespace too. Why include trailing whitespace but not leading? But I'm willing to accept that we need the change if there are enough test cases, including cases where the newline is just before the word, for example. The testing right now is way too light.

review- because of the multiple issues above
------- Comment #19 From 2009-12-12 19:20:24 PST -------
Created an attachment (id=44748) [details]
patch update 4
------- Comment #20 From 2009-12-12 19:22:20 PST -------
style-queue ran check-webkit-style on attachment 44748 [details] without any errors.
------- Comment #21 From 2009-12-12 19:31:25 PST -------
(In reply to comment #18)

Thank you for your review.
I revised the patch again.

> Tab in change log here.
Fixed.

> The implementation of this function is wrong. The fact that whitespace is
> collapsible is not enough. You need to take style into account. The right way
> to do that is to call RenderStyle::isCollapsibleWhitespace, which gives the
> correct answer. It is almost never correct to call the global
> isCollapsibleWhitespace function.
>
> Also, the two line for body needs to have braces around it.
Fixed. 

And I Added some test cases that fails with last patch.
Using RenderStyle::isCollapsibleWhitespace() resolves the problem. 
Thank you to point it out!

> It is also unnecessarily inefficient to put the text into a string object. I
> would instead do this:
>
>     const UChar* characters = renderer->text()->characters();
> 
> And then index directly into the characters array.
Fixed.

> Also, for local variables we try to use words rather than abbreviations. The
> "i" is OK, but "str" is not so great. I also think "length" is fine, since the
> length is the length of the text.
Fixed.

> To test the different whitespace modes, we need test cases that cover editable
> text with the CSS white-space values, normal, pre, pre-wrap, pre-line, and
> nowrap, covering both spaces and newline characters, and ideally tabs too,
> although they should basically work the same as spaces.
Added test cases for differenct white-space style values and some other edge conditions.

> I'm not entirely sure this change is OK. It's kind of strange to hardcode this
> behavior down at this low level. (snip)
Agreed, and revised the patch trying to clarify the intention.
------- Comment #22 From 2009-12-14 10:24:36 PST -------
(From update of attachment 44748 [details])
> +static int collapsedSpaceLength(Node* textNode, int textEnd)
> +{
> +    RenderText* renderer = toRenderText(textNode->renderer());
> +
> +    const UChar* characters = renderer->text()->characters();
> +    int length = static_cast<int>(renderer->text()->length());

Do we really need a typecast here? Does some compiler generate a warning without it? I'd prefer to see that omitted.

> +    for (int i = textEnd; i < length; ++i) {
> +        if (!renderer->style()->isCollapsibleWhiteSpace(characters[i]))
> +            return i - textEnd;
> +    }
> +
> +    return length - textEnd;
> +}
> +
> +// For the purpose of word boundary detection,
> +// we should iterate all visible text and trailing (collapsed) whitespaces. 
> +static int iterableMaxOffset(Node* node)
> +{
> +    int offset = caretMaxOffset(node);
> +
> +    if (node->renderer() && node->renderer()->isText())
> +        offset += collapsedSpaceLength(node, offset);
> +
> +    return offset;
> +}

The factoring here is a bit strange. The caller gets the renderer and then checks that it's a RenderText. But then it calls the function passing the Node*. Since it has done the type checking, it should pass a RenderText* in rather than a Node* the type checking and type casting should be in the same function.

> -        m_offset = m_node ? caretMaxOffset(m_node) : 0;
> +        m_offset = m_node ? iterableMaxOffset(m_node) : 0;

I am not a huge fan of the name "iterable" here. It would be useful to write out a couple sentences explaining what the sentence does, and then maybe someone could extract some key words to come up with a good name for the function.

r=me, but maybe consider my comments and revise again? Would be OK to land exactly as-is.
------- Comment #23 From 2009-12-15 06:00:42 PST -------
Created an attachment (id=44874) [details]
patch update 5
------- Comment #24 From 2009-12-15 06:03:08 PST -------
style-queue ran check-webkit-style on attachment 44874 [details] without any errors.
------- Comment #25 From 2009-12-15 06:11:28 PST -------
Thank you for reviewing quickly!
I've posted a revised patch.

> 
> Do we really need a typecast here? Does some compiler generate a warning
> without it? I'd prefer to see that omitted.
Fixed; It was redundant and safely removed. Thank you for pointing it out.

> The factoring here is a bit strange. The caller gets the renderer and then
> checks that it's a RenderText. But then it calls the function passing the
> Node*. Since it has done the type checking, it should pass a RenderText* in
> rather than a Node* the type checking and type casting should be in the same
> function.
Fixed. 

> I am not a huge fan of the name "iterable" here. It would be useful to write
> out a couple sentences explaining what the sentence does, and then maybe
> someone could extract some key words to come up with a good name for the
> function.
Agreed, and tried a little more verbose name.
------- Comment #26 From 2009-12-16 18:11:27 PST -------
(From update of attachment 44874 [details])
Rejecting patch 44874 from commit-queue.

Failed to run "['git', 'svn', 'dcommit']" exit_code: 1
Last 500 characters of output:
request failed on '/repository/webkit/trunk': Commit blocked by pre-commit hook (exit code 1) with output:
svnlook: Can't write to stream: Broken pipe

    The following files contain tab characters:

        trunk/LayoutTests/ChangeLog
        trunk/LayoutTests/editing/selection/doubleclick-beside-cr-span.html

    Please use spaces instead to indent.
    If you must commit a file with tabs, use svn propset to set the "allow-tabs" property.
 at /usr/local/git/libexec/git-core/git-svn line 558


Full output: http://webkit-commit-queue.appspot.com/results/127701
------- Comment #27 From 2009-12-16 18:20:22 PST -------
Created an attachment (id=45030) [details]
remove tab from ChagneLog
------- Comment #28 From 2009-12-16 18:21:45 PST -------
style-queue ran check-webkit-style on attachment 45030 [details] without any errors.
------- Comment #29 From 2009-12-16 18:24:24 PST -------
(In reply to comment #26)
> 
>         trunk/LayoutTests/ChangeLog
Remove a tab character.
>         trunk/LayoutTests/editing/selection/doubleclick-beside-cr-span.html
Tab characters in this file cannot be removed.
Because the testcase examines that these tabs can be recognized as word boundaries.
------- Comment #30 From 2009-12-16 18:40:31 PST -------
Because of bug 27204 svn-apply does not understand how to apply SVN properties.  Thus the commit-queue does not understand how to apply svn properties.  Thus this patch cannot be commit-queue'd.  It will need to be landed manually, or bug 27204 will have to be fixed first. :(
------- Comment #31 From 2009-12-16 21:39:37 PST -------
I'll try to check it in manually.

> @@ -2863,7 +2879,6 @@
>          Unskip a bunch of passing http/navigation tests.
>  
>          * platform/qt/Skipped:
> -
>  2009-11-22  Chris Fleizach  <cfleizach@apple.com>
>  
>          Reviewed by Oliver Hunt.

I'll remove this chunk as this looks a mistake :)
------- Comment #32 From 2009-12-16 22:30:49 PST -------
Committed r52235: <http://trac.webkit.org/changeset/52235>
------- Comment #33 From 2009-12-17 00:53:29 PST -------
(From update of attachment 45030 [details])
Dropping review?