27632 – Expose text segmentation through JavaScript

RESOLVED FIXED 27632

Expose text segmentation through JavaScript

https://bugs.webkit.org/show_bug.cgi?id=27632

Summary Expose text segmentation through JavaScript

Xiaomei Ji

Reported 2009-07-23 17:16:29 PDT

specific use case is dictionary, which shows the word definition in its source language and accept-languages when user mouse over or click the word while browsing a webpage. But such feature should be useful and benefit for other clients in other use cases. After document.caretRangeFromPoint(), we are able to find out what is the character offset within an element under the mouse. To further identify the word (especially for those languages that do not use space as word breaker), we will need webkit expose text segmentation through JavaScript. Maybe by document.wordRangeFromPoint(int x, int y), which could extend document.caretRangeFromPoint() by create word range using startOfWord() and endOfWord() on the visible position computed in caretRanageFromPoint().

Attachments
patch w/ layout test (17.54 KB, patch) 2009-08-24 18:33 PDT, Xiaomei Ji	eric: review-	Details Formatted Diff Diff
patch w/ layout test (17.46 KB, patch) 2009-08-25 16:13 PDT, Xiaomei Ji	eric: review-	Details Formatted Diff Diff
patch w/ layout test (17.47 KB, patch) 2009-08-26 11:28 PDT, Xiaomei Ji	no flags	Details Formatted Diff Diff
patch w/ layout test (17.48 KB, patch) 2009-08-26 11:32 PDT, Xiaomei Ji	eric: review- eric: commit-queue-	Details Formatted Diff Diff
patch w/ layout test (18.35 KB, patch) 2009-09-10 14:16 PDT, Xiaomei Ji	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Jungshik Shin

Comment 1 2009-07-24 10:31:02 PDT

Webkit already has TextBreakingIterator in platform/text (implemented by various ports). So, we can leverage it to expose this to web apps.

Alexey Proskuryakov

Comment 2 2009-07-24 10:46:09 PDT

Note that double-clicking for word selection doesn't currently use TextBreakingIterator internally. The results should be pretty similar in most cases though. > document.wordRangeFromPoint(int x, int y) Document is not really appropriate to host this, because documents are not necessarily displayed on screen. Extending a DOMRange to word boundaries would be more in line with other APIs. Another option would be to expose DOMSelection objects that are not tied to selection as visible to the user - this interface has a modify() method taking a granularity. Finally, it should be possible to make this work reasonably well with current WebKit by using DOMSelection.modify, and then restoring the original selection.

Xiaomei Ji

Comment 3 2009-07-24 10:56:00 PDT

Hi Alexey, Thanks for your comments! I have one question. > Document is not really appropriate to host this, because documents are not > necessarily displayed on screen. Extending a DOMRange to word boundaries would > be more in line with other APIs. For example, if the API is range.extend('word'), how should we handle if the range is not an empty one (its start offset != end offset). Thanks, Xiaomei

Alexey Proskuryakov

Comment 4 2009-07-24 11:26:10 PDT

We could extend to word boundary on both sides (selecting several words, if the original range crossed them all), or we could just raise an exception. The former option seems more reasonable to me. I was thinking of something more like document.extendRange(range, "word"). Anyway, this new API is something that should be discussed with WebApps working group, <http://www.w3.org/2008/webapps/>.

Alexey Proskuryakov

Comment 5 2009-08-20 23:02:29 PDT

Bugzilla has experienced amnesia today, here is a comment that was lost: --- Comment #5 from Xiaomei Ji <xji@chromium.org> 2009-08-20 11:30:01 PDT --- Following spec has been proposed to both webkit-dev and public-webapps without any further feedback. If no object, I will go ahead to implement it in webkit. *Syntax*: expands the range to the 'unit' boundary. interface Range { void expand(in DOMString uint) } *Parameters*: unit: String that specifies the units to move in the range, using one of the following values: word -- expand the range to include completed words. A word is the smallest semantic form in one language. In languages use space to break word, such as English, a word is a collection of characters terminated by a space or punctuation. sentence -- expand the range to include completed sentences. A sentence is a collection of words terminated by punctuation. block -- expand the range to include completed paragraphs. document -- expand the range to include the whole document. *Use case*: To identify a semantic unit (such as a word) when user mouse over or mouse click on a page. A specific use case is dictionary, which shows the worddefinition when user mouse over or mouse click in a webpage. *Example*: This example returns the range containing the word user moused over or mouse clicked (not double-clicked). var range = document.caretRangeFromPoint(event.clientX, event.clientY); range.expand('word'); Please refer to http://dev.w3.org/csswg/cssom-view/#dom-documentview-caretrangefrompoint for document.caretRangeFromPoint(). *Reference*: Microsoft's spec: http://msdn.microsoft.com/en-us/library/ms536421(VS.85).aspx

Xiaomei Ji

Comment 6 2009-08-24 18:33:51 PDT

Created attachment 38519 [details] patch w/ layout test

Eric Seidel (no email)

Comment 7 2009-08-25 10:41:30 PDT

Comment on attachment 38519 [details] patch w/ layout test else { 1819 return; 1820 } no {} No "return;" needed. 1 Position s = rangeCompliantEquivalent(start); 1822 Position e = rangeCompliantEquivalent(end); 1823 setStart(s.node(), s.deprecatedEditingOffset(), ec); 1824 setEnd(e.node(), e.deprecatedEditingOffset(), ec); That should be: setStart(s.containerNode(), s.computeOffsetInContainerNode()); setEnd(e.containerNode(), e.computeOffsetInContainerNode()); no rangeCompliantEquivalent calls are needed if you use containerNode and computeOffsetInContainerNode deprecatedEditingOffset() is basically always the wrong call to use. In general any call with "deprecated" in its name can be assumed to be. :) The confusion here is that soooo much code still uses deprecatedEditingOffset() :(

Xiaomei Ji

Comment 8 2009-08-25 16:13:49 PDT

Created attachment 38574 [details] patch w/ layout test

Eric Seidel (no email)

Comment 9 2009-08-25 16:49:26 PDT

Comment on attachment 38574 [details] patch w/ layout test Ok, in general this looks fine. + VisiblePosition start = VisiblePosition(this->startPosition()); + VisiblePosition end = VisiblePosition(this->endPosition()); Should just be: + VisiblePosition start(startPosition()); + VisiblePosition end(endPosition()); I don't think this comment is really needed, but it's OK to leave too. :) + // Please refer to https://bugs.webkit.org/show_bug.cgi?id=27632 comment #5 for the spec. This comment doesn't add anything: + // Range expand. If you were a committer, I would just r+ and you could fix those nits on landing. As is, it would be best if you could post one final patch. :) Thanks! Sorry I missed these on the first round.

Xiaomei Ji

Comment 10 2009-08-26 11:28:59 PDT

Created attachment 38624 [details] patch w/ layout test

Xiaomei Ji

Comment 11 2009-08-26 11:32:18 PDT

Created attachment 38625 [details] patch w/ layout test Eric, Thanks for the review!

Eric Seidel (no email)

Comment 12 2009-09-02 14:16:09 PDT

Comment on attachment 38625 [details] patch w/ layout test For the future, it's much better when the test cases include their expected result. For example: shouldBe("range.toString()", "Here"); is much clearer than: + log("word5: " + range.toString()); Because implementers don't have to compare with a golden file to know if they've passed/failed the test. Even better is to use functions to make the test expectations even more readable, like: shouldBe("expandRange(mydiv, 2, mydiv, 3, 'word')", "Here"); Where "expandRange" is some locally defined function which does the right thing. shouldBe will print the function call text and if it passed or failed. Use of the "shouldBe" asserts require the test to be a script test/DOM only test, which is documented here: http://trac.webkit.org/wiki/Writing%20Layout%20Tests%20for%20DumpRenderTree I'll r+ the patch for now. If you'd like to update the test more and upload a new patch that's great. If you're pressed on time for other things that's OK too, and someone can mark this commit-queue+ tomorrow after you've had a chance to see my comments.

Eric Seidel (no email)

Comment 13 2009-09-02 14:17:05 PDT

Well, technically speaking the "shouldBe" stuff doesn't require that your test be js-only, its just that make-js-test-wrappers includes the right LayoutTests/fast/resources/js-test-pre.js for you. However you can just include those scripts manually to get the fancy functions.

Adam Barth

Comment 14 2009-09-06 21:56:02 PDT

Comment on attachment 38625 [details] patch w/ layout test Tomorrow was a while ago.

Eric Seidel (no email)

Comment 15 2009-09-06 21:59:18 PDT

Comment on attachment 38625 [details] patch w/ layout test Rejecting patch 38625 from commit-queue. This patch will require manual commit. WebKitTools/Scripts/build-webkit failed with exit code 1

Eric Seidel (no email)

Comment 16 2009-09-07 00:14:47 PDT

Unfortunately we don't currently save build logs on the commit bot, so I can't tell you the exact compile failure, but it should be pretty obvious if you try compiling on a Mac with tip of tree. bugzilla-tool apply-patches 27632 or bugzilla-tool land-patches 27632 would be an easy way to see the error.

Eric Seidel (no email)

Comment 17 2009-09-08 12:43:30 PDT

Comment on attachment 38625 [details] patch w/ layout test r- because this fails to build, per above.

Xiaomei Ji

Comment 18 2009-09-10 14:16:43 PDT

Created attachment 39380 [details] patch w/ layout test I have no idea how I checked my local build was successful last time. Sorry for the inconvenience! Uploaded the patch again. And changed the test to have expected results in test file (but I did not try hard enough to make shouldBe work, instead, I used my own log printing.)

Eric Seidel (no email)

Comment 19 2009-09-10 14:22:14 PDT

Comment on attachment 39380 [details] patch w/ layout test LGTM. Technically I think that the webkit style for JS it to put spaces around all operators, like "i=0" should be "i = 0". But it's totally OK as is.

WebKit Commit Bot

Comment 20 2009-09-10 14:42:37 PDT

Comment on attachment 39380 [details] patch w/ layout test Rejecting patch 39380 from commit-queue. This patch will require manual commit. ['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--quiet', '--exit-after-n-failures=1'] failed with exit code 1

Eric Seidel (no email)

Comment 21 2009-09-10 14:50:05 PDT

Comment on attachment 39380 [details] patch w/ layout test storage/database-lock-after-reload.html -> failed I need to file a bug about that test, this is not the first commit to fail from it being flakey.

WebKit Commit Bot

Comment 22 2009-09-10 16:01:05 PDT

Comment on attachment 39380 [details] patch w/ layout test Clearing flags on attachment: 39380 Committed r48271: <http://trac.webkit.org/changeset/48271>

WebKit Commit Bot

Comment 23 2009-09-10 16:01:12 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Enhancement

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component New Bugs

Assignee

Nobody

Reported

2009-07-23 17:16 PDT

Modified

2009-09-10 16:01 PDT History

CC List

9 users Show

URL

Keywords

Depends on

Blocks