Bug 78361 - Selectively retrieve text content around a given position
Summary: Selectively retrieve text content around a given position
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Leandro Graciá Gil
URL:
Keywords:
Depends on:
Blocks: 82460
  Show dependency treegraph
 
Reported: 2012-02-10 09:09 PST by Leandro Graciá Gil
Modified: 2012-03-28 06:31 PDT (History)
5 users (show)

See Also:


Attachments
Patch (14.55 KB, patch)
2012-02-10 09:18 PST, Leandro Graciá Gil
no flags Details | Formatted Diff | Diff
Patch (14.29 KB, patch)
2012-03-13 07:31 PDT, Leandro Graciá Gil
no flags Details | Formatted Diff | Diff
Patch (14.07 KB, patch)
2012-03-13 08:05 PDT, Leandro Graciá Gil
no flags Details | Formatted Diff | Diff
Patch (14.76 KB, patch)
2012-03-23 12:40 PDT, Leandro Graciá Gil
no flags Details | Formatted Diff | Diff
Patch (15.90 KB, patch)
2012-03-23 15:39 PDT, Leandro Graciá Gil
rniwa: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Leandro Graciá Gil 2012-02-10 09:09:35 PST
Hi,

We are interested in retrieving a string containing the text content around a given VisiblePosition in a way that lets us extract a text range from it later. For example, we tap in a position inside a paragraph of text, retrieve about 50 characters around the tapped position into a string, operate it and then recover a text Range based on offsets of the extracted string. We also want to perform this selectively: we don't want to enter any form controls during the text content retrieval.

To achieve this we've developed a class called for now 'DOMTextContentWalker' (see the patch for details). We would like your opinion about including this in WebCore so that other ports might be able to make use of it, and if so were should it be included. A first proposal could be WebCore/dom given the kind of operation it performs, although it's not part of the DOM API. We're also open to place this in a WebKit API in case this doesn't seem useful for other ports. Any suggestions and opinions are welcome.

As usual, we will provide a layout test or this feature once it's clear what should be done about it.

Thanks,
Leandro
Comment 1 Leandro Graciá Gil 2012-02-10 09:18:15 PST
Created attachment 126522 [details]
Patch
Comment 2 Alexey Proskuryakov 2012-02-10 10:14:12 PST
Extracting text content only doesn't appear useful for Mac port at least - we would need an attributed string. Enrica, do you agree?
Comment 3 Enrica Casucci 2012-02-10 10:37:30 PST
(In reply to comment #2)
> Extracting text content only doesn't appear useful for Mac port at least - we would need an attributed string. Enrica, do you agree?

I can imagine possible uses of a feature like this and I think it could be useful to have it in WebCore.
We can decide to expose it as API based on attributed string in the Mac port.
Comment 4 Ryosuke Niwa 2012-02-10 12:12:43 PST
Comment on attachment 126522 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=126522&action=review

> Source/WebCore/dom/DOMTextContentWalker.cpp:13
> + * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS'' AND ANY

Wrong copyright statement. Please use the "correct" one.

> Source/WebCore/dom/DOMTextContentWalker.cpp:32
> +#include "TextIterator.h"
> +#include "VisiblePosition.h"
> +#include "VisibleSelection.h"
> +#include "visible_units.h"

This class heavily depends on editing. I don't think it should be in DOM. It belongs in editing.

> Source/WebCore/dom/DOMTextContentWalker.cpp:36
> +static PassRefPtr<Range> getRange(const Position& start, const Position& end)

In WebKit, we only use "get"~ if the value is returned through pass-by-reference or pass-by-pointer in the argument list.
Also, "getRange" is an extremely generic name.

> Source/WebCore/dom/DOMTextContentWalker.cpp:38
> +    return VisibleSelection(start.parentAnchoredEquivalent(), end.parentAnchoredEquivalent(), DOWNSTREAM).firstRange();

Why not Range::create(start.parentAnchoredEquivalent(), end.parentAnchoredEquivalent()) ?

> Source/WebCore/dom/DOMTextContentWalker.cpp:45
> +    CharacterIterator forwardChar(makeRange(position, endOfDocument(position)).get(), TextIteratorStopsOnFormControls);

This isn't "char", it's an iterator.

> Source/WebCore/dom/DOMTextContentWalker.h:13
> + * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS'' AND ANY

Ditto about "Apple Inc."

> Source/WebCore/dom/DOMTextContentWalker.h:36
> +// Explore the DOM tree to find the text contents up to a limit
> +// around a position in a given text node.

We don't normally add a comment like this to explain what a class does. Instead, we give a descriptive name to the class such that it's self-evident.

> Source/WebCore/dom/DOMTextContentWalker.h:37
> +class DOMTextContentWalker {

This isn't really a "walker" class. It obtains text around a given visible position. It's certainly not "DOM" because the prefix "DOM" indicates that it's exposed to DOM, which isn't case at all here.
How about SurroundingText?

> Source/WebCore/dom/DOMTextContentWalker.h:45
> +    // Convert start/end positions in the content text string into a text range.

This comment repeats the code, please remove and perhaps rename arguments to startOffsetInContent and endOffsetInContent.

> Source/WebCore/editing/TextIterator.cpp:243
> +static bool checkFormControlElement(Node* startNode)

What are you "checking" here? It should renamed to enclosedByFormControlElement.

> Source/WebCore/editing/TextIterator.cpp:247
> +        if (node->isElementNode() && static_cast<Element*>(node)->isFormControlElement())

Don't we have toElement?

> Source/WebCore/editing/TextIterator.cpp:399
> +        if (!m_shouldStop && m_stopsOnFormControls && checkFormControlElement(m_node))

r- because TextIterator is in extremely hot path and we shouldn't be walking up the tree on every node like this.
Comment 5 Leandro Graciá Gil 2012-02-15 04:15:13 PST
Comment on attachment 126522 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=126522&action=review

>> Source/WebCore/editing/TextIterator.cpp:399
>> +        if (!m_shouldStop && m_stopsOnFormControls && checkFormControlElement(m_node))
> 
> r- because TextIterator is in extremely hot path and we shouldn't be walking up the tree on every node like this.

That should only happen when the new TextIteratorStopsOnFormControls behavior is set. It shouldn't affect other uses of TextIterator.
Comment 6 Leandro Graciá Gil 2012-03-13 07:31:44 PDT
Created attachment 131607 [details]
Patch
Comment 7 Leandro Graciá Gil 2012-03-13 07:32:36 PDT
Comment on attachment 126522 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=126522&action=review

>> Source/WebCore/dom/DOMTextContentWalker.cpp:13
>> + * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS'' AND ANY
> 
> Wrong copyright statement. Please use the "correct" one.

I thought this was the correct one. Could you please point me to it?

>> Source/WebCore/dom/DOMTextContentWalker.cpp:32
>> +#include "visible_units.h"
> 
> This class heavily depends on editing. I don't think it should be in DOM. It belongs in editing.

Fixed. Now located in editing/SurroundingText as suggested below.

>> Source/WebCore/dom/DOMTextContentWalker.cpp:36
>> +static PassRefPtr<Range> getRange(const Position& start, const Position& end)
> 
> In WebKit, we only use "get"~ if the value is returned through pass-by-reference or pass-by-pointer in the argument list.
> Also, "getRange" is an extremely generic name.

Function removed.

>> Source/WebCore/dom/DOMTextContentWalker.cpp:38
>> +    return VisibleSelection(start.parentAnchoredEquivalent(), end.parentAnchoredEquivalent(), DOWNSTREAM).firstRange();
> 
> Why not Range::create(start.parentAnchoredEquivalent(), end.parentAnchoredEquivalent()) ?

Done as suggested.

>> Source/WebCore/dom/DOMTextContentWalker.cpp:45
>> +    CharacterIterator forwardChar(makeRange(position, endOfDocument(position)).get(), TextIteratorStopsOnFormControls);
> 
> This isn't "char", it's an iterator.

Fixed.

>> Source/WebCore/dom/DOMTextContentWalker.h:13
>> + * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS'' AND ANY
> 
> Ditto about "Apple Inc."

Will fix with the correct copyright statement.

>> Source/WebCore/dom/DOMTextContentWalker.h:36
>> +// around a position in a given text node.
> 
> We don't normally add a comment like this to explain what a class does. Instead, we give a descriptive name to the class such that it's self-evident.

Fixed.

>> Source/WebCore/dom/DOMTextContentWalker.h:37
>> +class DOMTextContentWalker {
> 
> This isn't really a "walker" class. It obtains text around a given visible position. It's certainly not "DOM" because the prefix "DOM" indicates that it's exposed to DOM, which isn't case at all here.
> How about SurroundingText?

Fixed.

>> Source/WebCore/dom/DOMTextContentWalker.h:45
>> +    // Convert start/end positions in the content text string into a text range.
> 
> This comment repeats the code, please remove and perhaps rename arguments to startOffsetInContent and endOffsetInContent.

Fixed.

>> Source/WebCore/editing/TextIterator.cpp:243
>> +static bool checkFormControlElement(Node* startNode)
> 
> What are you "checking" here? It should renamed to enclosedByFormControlElement.

Fixed.

>> Source/WebCore/editing/TextIterator.cpp:247
>> +        if (node->isElementNode() && static_cast<Element*>(node)->isFormControlElement())
> 
> Don't we have toElement?

Fixed.
Comment 8 Build Bot 2012-03-13 07:48:34 PDT
Comment on attachment 131607 [details]
Patch

Attachment 131607 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/11948518
Comment 9 Leandro Graciá Gil 2012-03-13 08:05:56 PDT
Created attachment 131616 [details]
Patch
Comment 10 Ryosuke Niwa 2012-03-15 11:23:04 PDT
Comment on attachment 131616 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=131616&action=review

> Source/WebCore/ChangeLog:10
> +        Reviewed by NOBODY (OOPS!).

This line needs to appear before the description.

> Source/WebCore/editing/SurroundingText.cpp:22
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1.  Redistributions of source code must retain the above copyright
> + *     notice, this list of conditions and the following disclaimer.
> + * 2.  Redistributions in binary form must reproduce the above copyright
> + *     notice, this list of conditions and the following disclaimer in the
> + *     documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS'' AND ANY
> + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> + * DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS BE LIABLE FOR ANY
> + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
> + * ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Please use the correct license. e.g. EditingStyle.h

> Source/WebCore/editing/SurroundingText.cpp:37
> +SurroundingText::SurroundingText(const VisiblePosition& position, unsigned maxLength)

It's probably better to call this one visiblePosition.

> Source/WebCore/editing/SurroundingText.cpp:44
> +    // No forward contents, started inside form control.

I don't think this comment is useful (it only says what, not why; also the code is self-evident as is). Please remove.

> Source/WebCore/editing/SurroundingText.cpp:45
> +    Position deepPosition = position.deepEquivalent();

And this one position.

> Source/WebCore/editing/SurroundingText.cpp:47
> +    if (!Range::create(document, deepPosition, forwardIterator.range()->startPosition())->text().length())

You can't pass a position to Range unless it's range compliant. r- because of this.
So you need to do deepPosition.parentAnchoredEquivalent() here.
You probably want to do this when you declare deepPosition.

> Source/WebCore/editing/SurroundingText.cpp:57
> +PassRefPtr<Range> SurroundingText::contentOffsetsToRange(unsigned startOffsetInContent, unsigned endOffsetInContent)

It's probably better to call this function rangeFromContentOffsets.

> Source/WebCore/editing/TextIterator.cpp:245
> +static bool enclosedByFormControlElement(Node* startNode)
> +{
> +    Node* node = startNode;

You could just name startNode node, then you wouldn't have to declare a local variable.

> Source/WebCore/editing/TextIterator.cpp:395
> +        if (!m_shouldStop && m_stopsOnFormControls && enclosedByFormControlElement(m_node))
> +            m_shouldStop = true;

Why do we want to proceed if we're already inside a form control element? Shouldn't be just bail out here?
If you do that, we wouldn't need m_shouldStop. We can just m_node = 0 and everything else should just work.
Comment 11 Leandro Graciá Gil 2012-03-23 12:38:27 PDT
Comment on attachment 131616 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=131616&action=review

>> Source/WebCore/ChangeLog:10
>> +        Reviewed by NOBODY (OOPS!).
> 
> This line needs to appear before the description.

Fixed.

>> Source/WebCore/editing/SurroundingText.cpp:22
>> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> 
> Please use the correct license. e.g. EditingStyle.h

Fixed. Thanks for pointing to a good example.

>> Source/WebCore/editing/SurroundingText.cpp:37
>> +SurroundingText::SurroundingText(const VisiblePosition& position, unsigned maxLength)
> 
> It's probably better to call this one visiblePosition.

Fixed.

>> Source/WebCore/editing/SurroundingText.cpp:44
>> +    // No forward contents, started inside form control.
> 
> I don't think this comment is useful (it only says what, not why; also the code is self-evident as is). Please remove.

Fixed.

>> Source/WebCore/editing/SurroundingText.cpp:45
>> +    Position deepPosition = position.deepEquivalent();
> 
> And this one position.

Fixed.

>> Source/WebCore/editing/SurroundingText.cpp:47
>> +    if (!Range::create(document, deepPosition, forwardIterator.range()->startPosition())->text().length())
> 
> You can't pass a position to Range unless it's range compliant. r- because of this.
> So you need to do deepPosition.parentAnchoredEquivalent() here.
> You probably want to do this when you declare deepPosition.

Fixed. Thanks for finding this.

>> Source/WebCore/editing/SurroundingText.cpp:57
>> +PassRefPtr<Range> SurroundingText::contentOffsetsToRange(unsigned startOffsetInContent, unsigned endOffsetInContent)
> 
> It's probably better to call this function rangeFromContentOffsets.

Fixed.

>> Source/WebCore/editing/TextIterator.cpp:245
>> +    Node* node = startNode;
> 
> You could just name startNode node, then you wouldn't have to declare a local variable.

Fixed.

>> Source/WebCore/editing/TextIterator.cpp:395
>> +            m_shouldStop = true;
> 
> Why do we want to proceed if we're already inside a form control element? Shouldn't be just bail out here?
> If you do that, we wouldn't need m_shouldStop. We can just m_node = 0 and everything else should just work.

Fixed.
Comment 12 Leandro Graciá Gil 2012-03-23 12:39:42 PDT
Comment on attachment 131616 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=131616&action=review

>>> Source/WebCore/editing/TextIterator.cpp:395
>>> +            m_shouldStop = true;
>> 
>> Why do we want to proceed if we're already inside a form control element? Shouldn't be just bail out here?
>> If you do that, we wouldn't need m_shouldStop. We can just m_node = 0 and everything else should just work.
> 
> Fixed.

Correcting my last comment: as discussed in #webkit, we should keep m_shouldStop in order to preserve the range.
Comment 13 Leandro Graciá Gil 2012-03-23 12:40:05 PDT
Created attachment 133537 [details]
Patch
Comment 14 Ryosuke Niwa 2012-03-23 12:57:21 PDT
Comment on attachment 133537 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=133537&action=review

> Source/WebCore/editing/TextIterator.cpp:243
> +static bool enclosedByFormControlElement(Node* node)

Nit: enclosingFormControlElement. You may consider moving this to FromControlElement.h/cpp after changing the return type to Node*.
Comment 15 Leandro Graciá Gil 2012-03-23 15:39:09 PDT
Comment on attachment 133537 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=133537&action=review

>> Source/WebCore/editing/TextIterator.cpp:243
>> +static bool enclosedByFormControlElement(Node* node)
> 
> Nit: enclosingFormControlElement. You may consider moving this to FromControlElement.h/cpp after changing the return type to Node*.

Done.
Comment 16 Leandro Graciá Gil 2012-03-23 15:39:34 PDT
Created attachment 133573 [details]
Patch
Comment 17 Ryosuke Niwa 2012-03-27 14:08:04 PDT
Comment on attachment 133573 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=133573&action=review

> Source/WebCore/editing/TextIterator.cpp:1170
> +    // Prevent changing the iterator position if a form control element was found and advance should stop on it.

This comment repeats what the code already says. Please remove.

> Source/WebCore/html/HTMLFormControlElement.cpp:468
> +    while (node) {

My preference is to use for-loop here.
Comment 18 Leandro Graciá Gil 2012-03-28 05:54:08 PDT
Comment on attachment 133573 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=133573&action=review

Thanks for the review. Since the suggested changes are minor nits I'll be applying them directly to the committed patch.

>> Source/WebCore/editing/TextIterator.cpp:1170
>> +    // Prevent changing the iterator position if a form control element was found and advance should stop on it.
> 
> This comment repeats what the code already says. Please remove.

Removed.

>> Source/WebCore/html/HTMLFormControlElement.cpp:468
>> +    while (node) {
> 
> My preference is to use for-loop here.

Fixed.
Comment 19 Leandro Graciá Gil 2012-03-28 06:03:14 PDT
Committed r112389: <http://trac.webkit.org/changeset/112389>