Bug 28306 - double-clicking a word inside <b> beside newline select two words
Summary: double-clicking a word inside <b> beside newline select two words
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: All OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 27204
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-14 09:18 PDT by MORITA Hajime
Modified: 2009-12-17 00:53 PST (History)
7 users (show)

See Also:


Attachments
html to reproduce. (58 bytes, text/html)
2009-08-14 09:19 PDT, MORITA Hajime
no flags Details
a small fix to use string length from RenderText instead of inline box (5.55 KB, patch)
2009-08-14 09:37 PDT, MORITA Hajime
eric: review-
Details | Formatted Diff | Diff
patch update 1 (9.34 KB, patch)
2009-08-14 19:49 PDT, MORITA Hajime
no flags Details | Formatted Diff | Diff
patch update 2 (9.16 KB, patch)
2009-08-18 09:57 PDT, MORITA Hajime
no flags Details | Formatted Diff | Diff
patch revised (6.70 KB, patch)
2009-11-26 07:45 PST, MORITA Hajime
no flags Details | Formatted Diff | Diff
patch update 4 (23.87 KB, patch)
2009-12-12 19:20 PST, MORITA Hajime
no flags Details | Formatted Diff | Diff
patch update 5 (23.87 KB, patch)
2009-12-15 06:00 PST, MORITA Hajime
no flags Details | Formatted Diff | Diff
remove tab from ChagneLog (23.87 KB, patch)
2009-12-16 18:20 PST, MORITA Hajime
eric: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description MORITA Hajime 2009-08-14 09:18:15 PDT
Overview:
Double-clicking a word to select that word 
unexpectedly results two-word selected.

Steps to Reproduce:
1. open attached file
2. double click "selectme"

Actual Result:
"foo selectme" is selected(hilightened)

Expected Result:
"selectme" is selected.

Build Date & Platform: 
trunk r47196, Mac OS 10.5.7

Additional Information:
this bug is from chromium
http://code.google.com/p/chromium/issues/detail?id=6396
(and privately confirmed at Safari and WebKit trunk)
Comment 1 MORITA Hajime 2009-08-14 09:19:08 PDT
Created attachment 34845 [details]
html to reproduce.

html to reproduce.
Comment 2 MORITA Hajime 2009-08-14 09:37:20 PDT
Created attachment 34847 [details]
a small fix to use string length from RenderText instead of inline box
Comment 3 Darin Adler 2009-08-14 13:20:01 PDT
Comment on attachment 34847 [details]
a small fix to use string length from RenderText instead of inline box

This does not look like the right fix to me. I don't have time to study the issue at this time, and I could be wrong.
Comment 4 Eric Seidel (no email) 2009-08-14 15:20:09 PDT
Comment on attachment 34847 [details]
a small fix to use string length from RenderText instead of inline box

Needs ChangeLog.
Comment 5 MORITA Hajime 2009-08-14 19:49:41 PDT
Created attachment 34883 [details]
patch update 1
Comment 6 MORITA Hajime 2009-08-14 19:52:42 PDT
Thank you for reviewing.
I updated the patch
- including some more tests (for confidence), and
- appending ChangeLog
Comment 7 Eric Seidel (no email) 2009-08-17 17:29:26 PDT
Comment on attachment 34883 [details]
patch update 1

I assume you meant to mark this as a patch. :)  Note that tools like "bugzilla-tool post-diff" or "git-send-bugzilla" will do all the patch/review marking automatically for you. :)
Comment 8 Eric Seidel (no email) 2009-08-17 17:31:12 PDT
Comment on attachment 34883 [details]
patch update 1

This change contains a .DS_Store file.  We don't check those in.

is "MORITA" how you would normally capitalize your name?   I notice that your gmail account has one more r than your name, is either a typo?

There are tabs in the ChangeLog, that will cause the commit to fail.
Comment 9 MORITA Hajime 2009-08-18 09:57:55 PDT
Created attachment 35046 [details]
patch update 2
Comment 10 MORITA Hajime 2009-08-18 10:06:10 PDT
Wow, bugzilla-tool does work! Thank you for your advice, Eric.

this patch
- remove accidentally checked-in .DS_Store files, and
- untabify my ChangeLog entries.

BTW, capitalizing family name is my style, as some other Japanese people does.
- http://www.sljfaq.org/afaq/japanesenames.html
And I choose "morrita" for gmail address just because 
"morita" was already taken;-)
Thanks for the thought, anyway.
Comment 11 Shinichiro Hamaji 2009-08-18 16:52:21 PDT
An unsolicited comment... It seems that editing/inserting/editable-html-element.html is failing with your patch. If the new behavior is better, you may need to fix the result of the test. Otherwise, it would help bugs of your patch.
Comment 12 Shinichiro Hamaji 2009-08-19 19:36:03 PDT
Hmm I'm sorry. I'm not seeing the failure now. Maybe I did something wrong... Please just ignore my comment.
Comment 13 Adam Barth 2009-10-13 13:00:39 PDT
What's the status of this patch?  I don't know this code, but the patch is correctly formated, looks small, and purpose to change behavior that is definitely buggy.
Comment 14 Darin Adler 2009-10-13 13:13:04 PDT
Comment on attachment 35046 [details]
patch update 2

Thanks for tackling this.

I believe this code change is not quite right. It may make the word break work correctly in this case, but it is probably not right in all cases for the backwards text iterator.

The backwards text iterator does need to detect that there's a word break here, but it is not good for it to do this by just emitting all the trailing whitespace that happens to be in the string. I suspect there are also cases where there could be a line break but no actual whitespace in the string, depending on what element comes after the string. One example I can think of would be a subsequent text node that is entirely whitespace.

We should consider carefully why the forward text iterator has whitespace there and figure out how to do the equivalent for backward. The forward iterator definitely does *not* use the trailing whitespace in the DOM string to do this.

> +    // use str.length() instead of caretMaxOffset()
> +    // because caretMaxOffset() is based on InlineBox and not considering trailing whitespaces,
> +    // which should be handled during word-boundary detection.

There should be a capital "U" on use, and this should be broken into lines of similar length.

> +    if (m_positionEndOffset == 0)
> +      return false;

WebKit style is to use !m_positionEndOffset rather than "== 0" and the second line should be indented four spaces, not two.

review- for now. It may turn out that with more analysis we conclude that this is the best fix, but I suspect that it's not quite right.
Comment 15 MORITA Hajime 2009-11-26 07:45:06 PST
Created attachment 43924 [details]
patch revised
Comment 16 MORITA Hajime 2009-11-26 08:03:41 PST
Thank you for reviewing. 

I've investingated TextIterator, 
And now I feels that it's hard to make SimplifiedBackwardsTextIterator to follow TextIterator way
because:

- text boundary detection need "real" whitespace inside RenderText object, 
  but TextIterator::emitCharacter() and SimplifiedBackwardsTextIterator::emitCharacter()
  use fake character storage on TextIterator's instance, 
  which break boundary-to-offset computation, and
- Although TextIterator traverse InlineBox to precisely collect text on the element,
  Doing such on SimplifiedBackwardsTextIterator would make it no longer "simplified".

So I just lookup trailing whitespaces for each text node on
SimplifiedBackwardsTextIterator::handleTextNode(). 
It should be safe because now we inspect the string content to verify trailing whitespaces.
(My old patch didn't see the string content. It was bad thing.)
Comment 17 Adam Barth 2009-11-30 12:45:30 PST
style-queue ran check-webkit-style on attachment 43924 [details] without any errors.
Comment 18 Darin Adler 2009-12-07 10:12:01 PST
Comment on attachment 43924 [details]
patch revised

> +	Bug 28306: double-clicking a word inside <b> beside newline select two words

Tab in change log here.

> +static int collapsedSpaceLength(Node* textNode, int textEnd)
> +{
> +    RenderText* renderer = toRenderText(textNode->renderer());
> +    String str = renderer->text();
> +    int strLength = static_cast<int>(str.length());
> +    for (int i = textEnd; i < strLength; ++i)
> +        if (!isCollapsibleWhitespace(str[i]))
> +            return i - textEnd;
> +    return strLength - textEnd;
> +}

The implementation of this function is wrong. The fact that whitespace is collapsible is not enough. You need to take style into account. The right way to do that is to call RenderStyle::isCollapsibleWhitespace, which gives the correct answer. It is almost never correct to call the global isCollapsibleWhitespace function.

Also, the two line for body needs to have braces around it.

It is also unnecessarily inefficient to put the text into a string object. I would instead do this:

    const UChar* characters = renderer->text()->characters();

And then index directly into the characters array.

Also, for local variables we try to use words rather than abbreviations. The "i" is OK, but "str" is not so great. I also think "length" is fine, since the length is the length of the text.

To test the different whitespace modes, we need test cases that cover editable text with the CSS white-space values, normal, pre, pre-wrap, pre-line, and nowrap, covering both spaces and newline characters, and ideally tabs too, although they should basically work the same as spaces.

> +    // We expand range to include trailing (collapsed) whitespaces, 
> +    // which is required for word-boundary detection.
> +    if (m_node != m_endNode)
> +        m_positionEndOffset += collapsedSpaceLength(m_node, m_positionEndOffset);

I'm not entirely sure this change is OK. It's kind of strange to hardcode this behavior down at this low level. For example, there can be leading whitespace too. Why include trailing whitespace but not leading? But I'm willing to accept that we need the change if there are enough test cases, including cases where the newline is just before the word, for example. The testing right now is way too light.

review- because of the multiple issues above
Comment 19 MORITA Hajime 2009-12-12 19:20:24 PST
Created attachment 44748 [details]
patch update 4
Comment 20 WebKit Review Bot 2009-12-12 19:22:20 PST
style-queue ran check-webkit-style on attachment 44748 [details] without any errors.
Comment 21 MORITA Hajime 2009-12-12 19:31:25 PST
(In reply to comment #18)

Thank you for your review.
I revised the patch again.

> Tab in change log here.
Fixed.

> The implementation of this function is wrong. The fact that whitespace is
> collapsible is not enough. You need to take style into account. The right way
> to do that is to call RenderStyle::isCollapsibleWhitespace, which gives the
> correct answer. It is almost never correct to call the global
> isCollapsibleWhitespace function.
>
> Also, the two line for body needs to have braces around it.
Fixed. 

And I Added some test cases that fails with last patch.
Using RenderStyle::isCollapsibleWhitespace() resolves the problem. 
Thank you to point it out!

> It is also unnecessarily inefficient to put the text into a string object. I
> would instead do this:
>
>     const UChar* characters = renderer->text()->characters();
> 
> And then index directly into the characters array.
Fixed.

> Also, for local variables we try to use words rather than abbreviations. The
> "i" is OK, but "str" is not so great. I also think "length" is fine, since the
> length is the length of the text.
Fixed.

> To test the different whitespace modes, we need test cases that cover editable
> text with the CSS white-space values, normal, pre, pre-wrap, pre-line, and
> nowrap, covering both spaces and newline characters, and ideally tabs too,
> although they should basically work the same as spaces.
Added test cases for differenct white-space style values and some other edge conditions.

> I'm not entirely sure this change is OK. It's kind of strange to hardcode this
> behavior down at this low level. (snip)
Agreed, and revised the patch trying to clarify the intention.
Comment 22 Darin Adler 2009-12-14 10:24:36 PST
Comment on attachment 44748 [details]
patch update 4

> +static int collapsedSpaceLength(Node* textNode, int textEnd)
> +{
> +    RenderText* renderer = toRenderText(textNode->renderer());
> +
> +    const UChar* characters = renderer->text()->characters();
> +    int length = static_cast<int>(renderer->text()->length());

Do we really need a typecast here? Does some compiler generate a warning without it? I'd prefer to see that omitted.

> +    for (int i = textEnd; i < length; ++i) {
> +        if (!renderer->style()->isCollapsibleWhiteSpace(characters[i]))
> +            return i - textEnd;
> +    }
> +
> +    return length - textEnd;
> +}
> +
> +// For the purpose of word boundary detection,
> +// we should iterate all visible text and trailing (collapsed) whitespaces. 
> +static int iterableMaxOffset(Node* node)
> +{
> +    int offset = caretMaxOffset(node);
> +
> +    if (node->renderer() && node->renderer()->isText())
> +        offset += collapsedSpaceLength(node, offset);
> +
> +    return offset;
> +}

The factoring here is a bit strange. The caller gets the renderer and then checks that it's a RenderText. But then it calls the function passing the Node*. Since it has done the type checking, it should pass a RenderText* in rather than a Node* the type checking and type casting should be in the same function.

> -        m_offset = m_node ? caretMaxOffset(m_node) : 0;
> +        m_offset = m_node ? iterableMaxOffset(m_node) : 0;

I am not a huge fan of the name "iterable" here. It would be useful to write out a couple sentences explaining what the sentence does, and then maybe someone could extract some key words to come up with a good name for the function.

r=me, but maybe consider my comments and revise again? Would be OK to land exactly as-is.
Comment 23 MORITA Hajime 2009-12-15 06:00:42 PST
Created attachment 44874 [details]
patch update 5
Comment 24 WebKit Review Bot 2009-12-15 06:03:08 PST
style-queue ran check-webkit-style on attachment 44874 [details] without any errors.
Comment 25 MORITA Hajime 2009-12-15 06:11:28 PST
Thank you for reviewing quickly!
I've posted a revised patch.

> 
> Do we really need a typecast here? Does some compiler generate a warning
> without it? I'd prefer to see that omitted.
Fixed; It was redundant and safely removed. Thank you for pointing it out.

> The factoring here is a bit strange. The caller gets the renderer and then
> checks that it's a RenderText. But then it calls the function passing the
> Node*. Since it has done the type checking, it should pass a RenderText* in
> rather than a Node* the type checking and type casting should be in the same
> function.
Fixed. 

> I am not a huge fan of the name "iterable" here. It would be useful to write
> out a couple sentences explaining what the sentence does, and then maybe
> someone could extract some key words to come up with a good name for the
> function.
Agreed, and tried a little more verbose name.
Comment 26 WebKit Commit Bot 2009-12-16 18:11:27 PST
Comment on attachment 44874 [details]
patch update 5

Rejecting patch 44874 from commit-queue.

Failed to run "['git', 'svn', 'dcommit']" exit_code: 1
Last 500 characters of output:
request failed on '/repository/webkit/trunk': Commit blocked by pre-commit hook (exit code 1) with output:
svnlook: Can't write to stream: Broken pipe

    The following files contain tab characters:

        trunk/LayoutTests/ChangeLog
        trunk/LayoutTests/editing/selection/doubleclick-beside-cr-span.html

    Please use spaces instead to indent.
    If you must commit a file with tabs, use svn propset to set the "allow-tabs" property.
 at /usr/local/git/libexec/git-core/git-svn line 558


Full output: http://webkit-commit-queue.appspot.com/results/127701
Comment 27 MORITA Hajime 2009-12-16 18:20:22 PST
Created attachment 45030 [details]
remove tab from ChagneLog
Comment 28 WebKit Review Bot 2009-12-16 18:21:45 PST
style-queue ran check-webkit-style on attachment 45030 [details] without any errors.
Comment 29 MORITA Hajime 2009-12-16 18:24:24 PST
(In reply to comment #26)
> 
>         trunk/LayoutTests/ChangeLog
Remove a tab character.
>         trunk/LayoutTests/editing/selection/doubleclick-beside-cr-span.html
Tab characters in this file cannot be removed.
Because the testcase examines that these tabs can be recognized as word boundaries.
Comment 30 Eric Seidel (no email) 2009-12-16 18:40:31 PST
Because of bug 27204 svn-apply does not understand how to apply SVN properties.  Thus the commit-queue does not understand how to apply svn properties.  Thus this patch cannot be commit-queue'd.  It will need to be landed manually, or bug 27204 will have to be fixed first. :(
Comment 31 Shinichiro Hamaji 2009-12-16 21:39:37 PST
I'll try to check it in manually.

> @@ -2863,7 +2879,6 @@
>          Unskip a bunch of passing http/navigation tests.
>  
>          * platform/qt/Skipped:
> -
>  2009-11-22  Chris Fleizach  <cfleizach@apple.com>
>  
>          Reviewed by Oliver Hunt.

I'll remove this chunk as this looks a mistake :)
Comment 32 Shinichiro Hamaji 2009-12-16 22:30:49 PST
Committed r52235: <http://trac.webkit.org/changeset/52235>
Comment 33 Shinichiro Hamaji 2009-12-17 00:53:29 PST
Comment on attachment 45030 [details]
remove tab from ChagneLog

Dropping review?