Bug 27632 - Expose text segmentation through JavaScript
: Expose text segmentation through JavaScript
Status: RESOLVED FIXED
: WebKit
New Bugs
: 528+ (Nightly build)
: All All
: P2 Enhancement
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-07-23 17:16 PST by
Modified: 2009-09-10 16:01 PST (History)


Attachments
patch w/ layout test (17.54 KB, patch)
2009-08-24 18:33 PST, Xiaomei Ji
eric: review-
Review Patch | Details | Formatted Diff | Diff
patch w/ layout test (17.46 KB, patch)
2009-08-25 16:13 PST, Xiaomei Ji
eric: review-
Review Patch | Details | Formatted Diff | Diff
patch w/ layout test (17.47 KB, patch)
2009-08-26 11:28 PST, Xiaomei Ji
no flags Review Patch | Details | Formatted Diff | Diff
patch w/ layout test (17.48 KB, patch)
2009-08-26 11:32 PST, Xiaomei Ji
eric: review-
eric: commit‑queue-
Review Patch | Details | Formatted Diff | Diff
patch w/ layout test (18.35 KB, patch)
2009-09-10 14:16 PST, Xiaomei Ji
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-07-23 17:16:29 PST
specific use case is dictionary, which shows the word definition in its source language and accept-languages when user mouse over or click the word while browsing a webpage.

But such feature should be useful and benefit for other clients in other use cases.

After document.caretRangeFromPoint(), we are able to find out what is the character offset within an element under the mouse.

To further identify the word (especially for those languages that do not use space as word breaker), we will need webkit expose text segmentation through JavaScript.
Maybe by document.wordRangeFromPoint(int x, int y), which could extend document.caretRangeFromPoint() by create word range using startOfWord() and endOfWord() on the visible position computed in caretRanageFromPoint().
------- Comment #1 From 2009-07-24 10:31:02 PST -------
Webkit already has TextBreakingIterator in platform/text (implemented by various ports). So, we can leverage it to expose this to web apps.
------- Comment #2 From 2009-07-24 10:46:09 PST -------
Note that double-clicking for word selection doesn't currently use TextBreakingIterator internally. The results should be pretty similar in most cases though.

> document.wordRangeFromPoint(int x, int y)

Document is not really appropriate to host this, because documents are not necessarily displayed on screen. Extending a DOMRange to word boundaries would be more in line with other APIs.

Another option would be to expose DOMSelection objects that are not tied to selection as visible to the user - this interface has a modify() method taking a granularity.

Finally, it should be possible to make this work reasonably well with current WebKit by using DOMSelection.modify, and then restoring the original selection.
------- Comment #3 From 2009-07-24 10:56:00 PST -------
Hi Alexey,

Thanks for your comments!
I have one question.

> Document is not really appropriate to host this, because documents are not
> necessarily displayed on screen. Extending a DOMRange to word boundaries would
> be more in line with other APIs.

For example, if the API is range.extend('word'), how should we handle if the range is not an empty one (its start offset != end offset).

Thanks,
Xiaomei
------- Comment #4 From 2009-07-24 11:26:10 PST -------
We could extend to word boundary on both sides (selecting several words, if the original range crossed them all), or we could just raise an exception. The former option seems more reasonable to me.

I was thinking of something more like document.extendRange(range, "word").

Anyway, this new API is something that should be discussed with WebApps working group, <http://www.w3.org/2008/webapps/>.
------- Comment #5 From 2009-08-20 23:02:29 PST -------
Bugzilla has experienced amnesia today, here is a comment that was lost:

--- Comment #5 from Xiaomei Ji <xji@chromium.org>  2009-08-20 11:30:01 PDT ---
Following spec has been proposed to both webkit-dev and public-webapps without
any further feedback.

If no object, I will go ahead to implement it in webkit.

*Syntax*: expands the range to the 'unit' boundary.
interface Range {
  void  expand(in DOMString uint)
}

*Parameters*:
unit: String that specifies the units to move in the range, using one of the
following values:
word -- expand the range to include completed words. A word is the smallest
semantic form in one language. In languages use space to break word, such as
English, a word is a collection of characters terminated by a space or
punctuation.
sentence -- expand the range to include completed sentences. A sentence is a
collection of words terminated by punctuation.
block -- expand the range to include completed paragraphs.
document -- expand the range to include the whole document.

*Use case*:
To identify a semantic unit (such as a word) when user mouse over or mouse
click on a page. A specific use case is dictionary, which shows the
worddefinition when user mouse over or mouse click in a webpage.

*Example*:
This example returns the range containing the word user moused over or mouse
clicked (not double-clicked).

var range = document.caretRangeFromPoint(event.clientX, event.clientY);
range.expand('word');

Please refer to
http://dev.w3.org/csswg/cssom-view/#dom-documentview-caretrangefrompoint for
document.caretRangeFromPoint().

*Reference*: Microsoft's  spec: 
http://msdn.microsoft.com/en-us/library/ms536421(VS.85).aspx
------- Comment #6 From 2009-08-24 18:33:51 PST -------
Created an attachment (id=38519) [details]
patch w/ layout test
------- Comment #7 From 2009-08-25 10:41:30 PST -------
(From update of attachment 38519 [details])
 else {
 1819         return;
 1820     }
no {}

No "return;" needed.

1     Position s = rangeCompliantEquivalent(start);
 1822     Position e = rangeCompliantEquivalent(end);
 1823     setStart(s.node(), s.deprecatedEditingOffset(), ec);
 1824     setEnd(e.node(), e.deprecatedEditingOffset(), ec);

That should be:

setStart(s.containerNode(), s.computeOffsetInContainerNode());
setEnd(e.containerNode(), e.computeOffsetInContainerNode());

no rangeCompliantEquivalent calls are needed if you use containerNode and computeOffsetInContainerNode

deprecatedEditingOffset() is basically always the wrong call to use. In general any call with "deprecated" in its name can be assumed to be. :)  The confusion here is that soooo much code still uses deprecatedEditingOffset() :(
------- Comment #8 From 2009-08-25 16:13:49 PST -------
Created an attachment (id=38574) [details]
patch w/ layout test
------- Comment #9 From 2009-08-25 16:49:26 PST -------
(From update of attachment 38574 [details])
Ok, in general this looks fine.

+    VisiblePosition start = VisiblePosition(this->startPosition());
+    VisiblePosition end = VisiblePosition(this->endPosition());

Should just be:
+    VisiblePosition start(startPosition());
+    VisiblePosition end(endPosition());

I don't think this comment is really needed, but it's OK to leave too. :)
+    // Please refer to https://bugs.webkit.org/show_bug.cgi?id=27632 comment #5 for the spec.

This comment doesn't add anything:
+        // Range expand.

If you were a committer, I would just r+ and you could fix those nits on landing.  As is, it would be best if you could post one final patch. :)

Thanks!  Sorry I missed these on the first round.
------- Comment #10 From 2009-08-26 11:28:59 PST -------
Created an attachment (id=38624) [details]
patch w/ layout test
------- Comment #11 From 2009-08-26 11:32:18 PST -------
Created an attachment (id=38625) [details]
patch w/ layout test

Eric, Thanks for the review!
------- Comment #12 From 2009-09-02 14:16:09 PST -------
(From update of attachment 38625 [details])
For the future, it's much better when the test cases include their expected result.

For example:
shouldBe("range.toString()", "Here");
is much clearer than:
+    log("word5: " + range.toString());

Because implementers don't have to compare with a golden file to know if they've passed/failed the test.


Even better is to use functions to make the test expectations even more readable, like:

shouldBe("expandRange(mydiv, 2, mydiv, 3, 'word')", "Here");

Where "expandRange" is some locally defined function which does the right thing.

shouldBe will print the function call text and if it passed or failed.  

Use of the "shouldBe" asserts require the test to be a script test/DOM only test, which is documented here:
http://trac.webkit.org/wiki/Writing%20Layout%20Tests%20for%20DumpRenderTree

I'll r+ the patch for now.  If you'd like to update the test more and upload a new patch that's great.  If you're pressed on time for other things that's OK too, and someone can mark this commit-queue+ tomorrow after you've had a chance to see my comments.
------- Comment #13 From 2009-09-02 14:17:05 PST -------
Well, technically speaking the "shouldBe" stuff doesn't require that your test be js-only, its just that make-js-test-wrappers includes the right LayoutTests/fast/resources/js-test-pre.js for you.  However you can just include those scripts manually to get the fancy functions.
------- Comment #14 From 2009-09-06 21:56:02 PST -------
(From update of attachment 38625 [details])
Tomorrow was a while ago.
------- Comment #15 From 2009-09-06 21:59:18 PST -------
(From update of attachment 38625 [details])
Rejecting patch 38625 from commit-queue.  This patch will require manual commit.

WebKitTools/Scripts/build-webkit failed with exit code 1
------- Comment #16 From 2009-09-07 00:14:47 PST -------
Unfortunately we don't currently save build logs on the commit bot, so I can't tell you the exact compile failure, but it should be pretty obvious if you try compiling on a Mac with tip of tree.

bugzilla-tool apply-patches 27632
or
bugzilla-tool land-patches 27632

would be an easy way to see the error.
------- Comment #17 From 2009-09-08 12:43:30 PST -------
(From update of attachment 38625 [details])
r- because this fails to build, per above.
------- Comment #18 From 2009-09-10 14:16:43 PST -------
Created an attachment (id=39380) [details]
patch w/ layout test

I have no idea how I checked my local build was successful last time. Sorry for the inconvenience!
Uploaded the patch again. And changed the test to have expected results in test file (but I did not try hard enough to make shouldBe work, instead, I used my own log printing.)
------- Comment #19 From 2009-09-10 14:22:14 PST -------
(From update of attachment 39380 [details])
LGTM.

Technically I think that the webkit style for JS it to put spaces around all operators, like "i=0" should be "i = 0".  But it's totally OK as is.
------- Comment #20 From 2009-09-10 14:42:37 PST -------
(From update of attachment 39380 [details])
Rejecting patch 39380 from commit-queue.

This patch will require manual commit. ['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--quiet', '--exit-after-n-failures=1'] failed with exit code 1
------- Comment #21 From 2009-09-10 14:50:05 PST -------
(From update of attachment 39380 [details])
storage/database-lock-after-reload.html -> failed

I need to file a bug about that test, this is not the first commit to fail from it being flakey.
------- Comment #22 From 2009-09-10 16:01:05 PST -------
(From update of attachment 39380 [details])
Clearing flags on attachment: 39380

Committed r48271: <http://trac.webkit.org/changeset/48271>
------- Comment #23 From 2009-09-10 16:01:12 PST -------
All reviewed patches have been landed.  Closing bug.