Bug 27632 - Expose text segmentation through JavaScript
Summary: Expose text segmentation through JavaScript
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Enhancement
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-23 17:16 PDT by Xiaomei Ji
Modified: 2009-09-10 16:01 PDT (History)
9 users (show)

See Also:


Attachments
patch w/ layout test (17.54 KB, patch)
2009-08-24 18:33 PDT, Xiaomei Ji
eric: review-
Details | Formatted Diff | Diff
patch w/ layout test (17.46 KB, patch)
2009-08-25 16:13 PDT, Xiaomei Ji
eric: review-
Details | Formatted Diff | Diff
patch w/ layout test (17.47 KB, patch)
2009-08-26 11:28 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff
patch w/ layout test (17.48 KB, patch)
2009-08-26 11:32 PDT, Xiaomei Ji
eric: review-
eric: commit-queue-
Details | Formatted Diff | Diff
patch w/ layout test (18.35 KB, patch)
2009-09-10 14:16 PDT, Xiaomei Ji
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xiaomei Ji 2009-07-23 17:16:29 PDT
specific use case is dictionary, which shows the word definition in its source language and accept-languages when user mouse over or click the word while browsing a webpage.

But such feature should be useful and benefit for other clients in other use cases.

After document.caretRangeFromPoint(), we are able to find out what is the character offset within an element under the mouse.
 
To further identify the word (especially for those languages that do not use space as word breaker), we will need webkit expose text segmentation through JavaScript.
Maybe by document.wordRangeFromPoint(int x, int y), which could extend document.caretRangeFromPoint() by create word range using startOfWord() and endOfWord() on the visible position computed in caretRanageFromPoint().
Comment 1 Jungshik Shin 2009-07-24 10:31:02 PDT
Webkit already has TextBreakingIterator in platform/text (implemented by various ports). So, we can leverage it to expose this to web apps.
Comment 2 Alexey Proskuryakov 2009-07-24 10:46:09 PDT
Note that double-clicking for word selection doesn't currently use TextBreakingIterator internally. The results should be pretty similar in most cases though.

> document.wordRangeFromPoint(int x, int y)

Document is not really appropriate to host this, because documents are not necessarily displayed on screen. Extending a DOMRange to word boundaries would be more in line with other APIs.

Another option would be to expose DOMSelection objects that are not tied to selection as visible to the user - this interface has a modify() method taking a granularity.

Finally, it should be possible to make this work reasonably well with current WebKit by using DOMSelection.modify, and then restoring the original selection.
Comment 3 Xiaomei Ji 2009-07-24 10:56:00 PDT
Hi Alexey,

Thanks for your comments!
I have one question.
 
> Document is not really appropriate to host this, because documents are not
> necessarily displayed on screen. Extending a DOMRange to word boundaries would
> be more in line with other APIs.

For example, if the API is range.extend('word'), how should we handle if the range is not an empty one (its start offset != end offset).

Thanks,
Xiaomei
Comment 4 Alexey Proskuryakov 2009-07-24 11:26:10 PDT
We could extend to word boundary on both sides (selecting several words, if the original range crossed them all), or we could just raise an exception. The former option seems more reasonable to me.

I was thinking of something more like document.extendRange(range, "word").

Anyway, this new API is something that should be discussed with WebApps working group, <http://www.w3.org/2008/webapps/>.
Comment 5 Alexey Proskuryakov 2009-08-20 23:02:29 PDT
Bugzilla has experienced amnesia today, here is a comment that was lost:

--- Comment #5 from Xiaomei Ji <xji@chromium.org>  2009-08-20 11:30:01 PDT ---
Following spec has been proposed to both webkit-dev and public-webapps without
any further feedback.

If no object, I will go ahead to implement it in webkit.

*Syntax*: expands the range to the 'unit' boundary.
interface Range {
  void  expand(in DOMString uint)
}

*Parameters*:
unit: String that specifies the units to move in the range, using one of the
following values:
word -- expand the range to include completed words. A word is the smallest
semantic form in one language. In languages use space to break word, such as
English, a word is a collection of characters terminated by a space or
punctuation.
sentence -- expand the range to include completed sentences. A sentence is a
collection of words terminated by punctuation.
block -- expand the range to include completed paragraphs.
document -- expand the range to include the whole document.

*Use case*:
To identify a semantic unit (such as a word) when user mouse over or mouse
click on a page. A specific use case is dictionary, which shows the
worddefinition when user mouse over or mouse click in a webpage.

*Example*:
This example returns the range containing the word user moused over or mouse
clicked (not double-clicked).

var range = document.caretRangeFromPoint(event.clientX, event.clientY);
range.expand('word');

Please refer to
http://dev.w3.org/csswg/cssom-view/#dom-documentview-caretrangefrompoint for
document.caretRangeFromPoint().

*Reference*: Microsoft's  spec: 
http://msdn.microsoft.com/en-us/library/ms536421(VS.85).aspx
Comment 6 Xiaomei Ji 2009-08-24 18:33:51 PDT
Created attachment 38519 [details]
patch w/ layout test
Comment 7 Eric Seidel (no email) 2009-08-25 10:41:30 PDT
Comment on attachment 38519 [details]
patch w/ layout test

 else {
 1819         return;
 1820     }
no {}

No "return;" needed.

1     Position s = rangeCompliantEquivalent(start);
 1822     Position e = rangeCompliantEquivalent(end);
 1823     setStart(s.node(), s.deprecatedEditingOffset(), ec);
 1824     setEnd(e.node(), e.deprecatedEditingOffset(), ec);

That should be:

setStart(s.containerNode(), s.computeOffsetInContainerNode());
setEnd(e.containerNode(), e.computeOffsetInContainerNode());

no rangeCompliantEquivalent calls are needed if you use containerNode and computeOffsetInContainerNode

deprecatedEditingOffset() is basically always the wrong call to use. In general any call with "deprecated" in its name can be assumed to be. :)  The confusion here is that soooo much code still uses deprecatedEditingOffset() :(
Comment 8 Xiaomei Ji 2009-08-25 16:13:49 PDT
Created attachment 38574 [details]
patch w/ layout test
Comment 9 Eric Seidel (no email) 2009-08-25 16:49:26 PDT
Comment on attachment 38574 [details]
patch w/ layout test

Ok, in general this looks fine.

+    VisiblePosition start = VisiblePosition(this->startPosition());
+    VisiblePosition end = VisiblePosition(this->endPosition());

Should just be:
+    VisiblePosition start(startPosition());
+    VisiblePosition end(endPosition());

I don't think this comment is really needed, but it's OK to leave too. :)
+    // Please refer to https://bugs.webkit.org/show_bug.cgi?id=27632 comment #5 for the spec.

This comment doesn't add anything:
+        // Range expand.

If you were a committer, I would just r+ and you could fix those nits on landing.  As is, it would be best if you could post one final patch. :)

Thanks!  Sorry I missed these on the first round.
Comment 10 Xiaomei Ji 2009-08-26 11:28:59 PDT
Created attachment 38624 [details]
patch w/ layout test
Comment 11 Xiaomei Ji 2009-08-26 11:32:18 PDT
Created attachment 38625 [details]
patch w/ layout test

Eric, Thanks for the review!
Comment 12 Eric Seidel (no email) 2009-09-02 14:16:09 PDT
Comment on attachment 38625 [details]
patch w/ layout test

For the future, it's much better when the test cases include their expected result.

For example:
shouldBe("range.toString()", "Here");
is much clearer than:
+    log("word5: " + range.toString());

Because implementers don't have to compare with a golden file to know if they've passed/failed the test.


Even better is to use functions to make the test expectations even more readable, like:

shouldBe("expandRange(mydiv, 2, mydiv, 3, 'word')", "Here");

Where "expandRange" is some locally defined function which does the right thing.

shouldBe will print the function call text and if it passed or failed.  

Use of the "shouldBe" asserts require the test to be a script test/DOM only test, which is documented here:
http://trac.webkit.org/wiki/Writing%20Layout%20Tests%20for%20DumpRenderTree

I'll r+ the patch for now.  If you'd like to update the test more and upload a new patch that's great.  If you're pressed on time for other things that's OK too, and someone can mark this commit-queue+ tomorrow after you've had a chance to see my comments.
Comment 13 Eric Seidel (no email) 2009-09-02 14:17:05 PDT
Well, technically speaking the "shouldBe" stuff doesn't require that your test be js-only, its just that make-js-test-wrappers includes the right LayoutTests/fast/resources/js-test-pre.js for you.  However you can just include those scripts manually to get the fancy functions.
Comment 14 Adam Barth 2009-09-06 21:56:02 PDT
Comment on attachment 38625 [details]
patch w/ layout test

Tomorrow was a while ago.
Comment 15 Eric Seidel (no email) 2009-09-06 21:59:18 PDT
Comment on attachment 38625 [details]
patch w/ layout test

Rejecting patch 38625 from commit-queue.  This patch will require manual commit.

WebKitTools/Scripts/build-webkit failed with exit code 1
Comment 16 Eric Seidel (no email) 2009-09-07 00:14:47 PDT
Unfortunately we don't currently save build logs on the commit bot, so I can't tell you the exact compile failure, but it should be pretty obvious if you try compiling on a Mac with tip of tree.

bugzilla-tool apply-patches 27632
or
bugzilla-tool land-patches 27632

would be an easy way to see the error.
Comment 17 Eric Seidel (no email) 2009-09-08 12:43:30 PDT
Comment on attachment 38625 [details]
patch w/ layout test

r- because this fails to build, per above.
Comment 18 Xiaomei Ji 2009-09-10 14:16:43 PDT
Created attachment 39380 [details]
patch w/ layout test

I have no idea how I checked my local build was successful last time. Sorry for the inconvenience!
Uploaded the patch again. And changed the test to have expected results in test file (but I did not try hard enough to make shouldBe work, instead, I used my own log printing.)
Comment 19 Eric Seidel (no email) 2009-09-10 14:22:14 PDT
Comment on attachment 39380 [details]
patch w/ layout test

LGTM.

Technically I think that the webkit style for JS it to put spaces around all operators, like "i=0" should be "i = 0".  But it's totally OK as is.
Comment 20 WebKit Commit Bot 2009-09-10 14:42:37 PDT
Comment on attachment 39380 [details]
patch w/ layout test

Rejecting patch 39380 from commit-queue.

This patch will require manual commit. ['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--quiet', '--exit-after-n-failures=1'] failed with exit code 1
Comment 21 Eric Seidel (no email) 2009-09-10 14:50:05 PDT
Comment on attachment 39380 [details]
patch w/ layout test

storage/database-lock-after-reload.html -> failed

I need to file a bug about that test, this is not the first commit to fail from it being flakey.
Comment 22 WebKit Commit Bot 2009-09-10 16:01:05 PDT
Comment on attachment 39380 [details]
patch w/ layout test

Clearing flags on attachment: 39380

Committed r48271: <http://trac.webkit.org/changeset/48271>
Comment 23 WebKit Commit Bot 2009-09-10 16:01:12 PDT
All reviewed patches have been landed.  Closing bug.