Bug 5610 - -[WebCoreBridge convertNSRangeToDOMRange] often returns wrong or "unusable" results
Summary: -[WebCoreBridge convertNSRangeToDOMRange] often returns wrong or "unusable" r...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-03 02:07 PST by Evan Gross
Modified: 2016-08-03 10:53 PDT (History)
5 users (show)

See Also:


Attachments
Modified Blot app to illustrate the results of convertNSRangeToDOMRange and what goes wrong when it fails (124.13 KB, application/octet-stream)
2005-12-23 23:17 PST, Evan Gross
no flags Details
some automated test cases (4.40 KB, application/zip)
2006-01-02 00:42 PST, Alexey Proskuryakov
no flags Details
proposed patch (14.99 KB, patch)
2006-01-03 14:35 PST, Alexey Proskuryakov
justin.garcia: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Evan Gross 2005-11-03 02:07:25 PST
It's looking like one of the main reasons that bug 4680, bug 5401, bug 5445, bug 4682 and perhaps 
other clients (really not many besides these NSTextInput methods. There may well be other places that 
other AppKit APIs that have only an NSRange at their disposal that need to use 
convertNSRangeToDOMRange. Spelling Panel (perhaps), AX APIs, AppleScripting perhaps?

Currently, after a bit of range checking, TextIterator::rangeFromLocationAndLength() is used to figure 
out the resulting DOM range. We've tried to fix it up a bit in Bug 4682 (attachment (id=4424)), but think 
that this isn't the right approach.

What this method (function it calls, whatever) REALLY needs to do is return the EXACT DOMRange you'd 
get from -[WebCoreBridge selectedDOMRange] just as though the particular NSRange you want to 
convert were actually the document's selected range.

The above appears to be necessary to get predictable results from a number of methods:

KWQKHTMLPart::attributedString() - if the values you supply for its arguments aren't quite right, you 
get, well, incorrect results. It appears that it may be built on the assumption that they've come from 
some sort of "selected range". Not totally sure, as this is just a monster function that I have no time/
intention of really acquiring a deep knowledge of.

-[WebCoreBridge stringForRange:] is given a DOMRange that may have been converted from an 
NSRange. Seems to be the same story as above.

-[WebCoreBridge selectNSRange:] also converts an NSRange to a DOMRange. No idea if there are issues 
with setting the selection to the current DOMRange returned from convertNSRangeToDOMRange. I think 
there is only one, rarely used client of this particular method.

SO, TO FIX THIS - we probably need to use VisiblePosition, SelectionController, and (unfortunately due 
to current performance) CharacterIterator to create an exact replica of the range you'd get if the given 
selection consisted of the given (arbitrary, as it's NOT actually the selected range in the document!) 
NSRange.

There are places where something like this is done, but it's not quite the same. The code that deals with 
finding/marking spellings is one, there are a couple of text search/find methods (KWQKHTMLPart.mm), 
perhaps somewhere in the accessibility code.

I've tried to come up with something, but just don't know my way around this stuff well enough to get it 
right. Again, a plea for help...

It's simple to test this stuff just with console logging. It's hard with layout tests because there are so 
many possible variations of text and ranges of text with it. What I have been doing is simply playing 
with making selections in Blot with this code pasted into -[WebHTMLView _selectionChanged], then 
analyzing the output.

Selections enclosing line endings, at/before line breaks, and many others will give incorrect results. 
Easy to see - make the selection in Blot, take a look at what was spewed to the Console.

Copy/Paste this in:

- (void)_selectionChanged
{
    [self _updateSelectionForInputManager];
    [self _updateFontPanel];
    _private->startNewKillRingSequence = YES;

#if 1
    WebBridge *bridge = [self _bridge];
    DOMRange *selectedDOMRange = [bridge selectedDOMRange];
    DOMRange *range = [bridge convertNSRangeToDOMRange:[bridge selectedNSRange]];
    NSLog(@"++++_selectionChanged - selectedNSRange:%@\n1. selectedDOMRange:\n%@ and 
backtoNS:%@\n(stringForRange:%@ len:%d)\n2. convertNSRangeToDOMRange:\n%@ and backtoNS:%@
\n(stringForRange:%@ len:%d)\n", NSStringFromRange([bridge selectedNSRange]), selectedDOMRange, 
NSStringFromRange([bridge convertDOMRangeToNSRange:selectedDOMRange]), [bridge 
stringForRange:selectedDOMRange], [[bridge stringForRange:selectedDOMRange] length], range, 
NSStringFromRange([bridge convertDOMRangeToNSRange:range]), [bridge stringForRange:range], 
[[bridge stringForRange:range] length]);

    NSAttributedString *attributedSubString = [bridge attributedStringFrom:[selectedDOMRange 
startContainer] startOffset:[selectedDOMRange startOffset] to:[selectedDOMRange endContainer] 
endOffset:[selectedDOMRange endOffset]];
    NSAttributedString *attributedSubString2 = [bridge attributedStringFrom:[range startContainer] 
startOffset:[range startOffset] to:[range endContainer] endOffset:[range endOffset]];
    NSLog(@"attributedSubString 1:%@ len:%d\nattributedSubString 2:%@len:%d", attributedSubString, 
[attributedSubString length], attributedSubString2, [attributedSubString2 length]);
    
    if ([range startContainer] != [selectedDOMRange startContainer] || [range startOffset] != 
[selectedDOMRange startOffset] || [range endContainer] != [selectedDOMRange endContainer] || [range 
endOffset] != [selectedDOMRange endOffset])
        NSLog(@"ranges DIFFERENT!");
    
    if (![attributedSubString isEqualToAttributedString:attributedSubString2])
        NSLog(@"attributed strings DIFFERENT!");
#endif
}

Convince yourself (if needed) that something's very wrong. Re-do the logging however you see fit.
Comment 1 Evan Gross 2005-11-03 02:09:40 PST
Whoops, the first sentence should begin:

It's looking like *this bug is* one of the main reasons that bug 4680, bug 5401, bug 5445, bug 4682 and 
perhaps other clients...
Comment 2 Evan Gross 2005-11-03 02:16:40 PST
I've taken a crack, but realize I really have no clue how to properly use some of these classes, let alone 
what to do in order to make sure the result is a range that would *exactly* match the one returned if 
the given range of characters constituted the selection. It doesn't work properly at the moment in a few 
cases, like carets at the end of a word before a line break. Maybe someone that knows this stuff can 
spot (how much is?) what's wrong. Or perhaps a better approach!

- (SharedPtr<DOM::RangeImpl>)convertToDOMRange:(NSRange)nsrange
{
    SharedPtr<RangeImpl> result(_part->xmlDocImpl()->createRange());
    
    CharacterIterator it(rangeOfContents(_part->xmlDocImpl()).get());
    int exception = 0;

    if (!it.atEnd())
        it.advance(nsrange.location);
        
    result->setStart(it.range()->startContainer(exception), it.range()->startOffset(exception), exception);
    
    it.string(nsrange.length);
    
    result->setEnd(it.range()->startContainer(exception), it.range()->startOffset(exception), exception);

    return SelectionController(result.get(), DOWNSTREAM, VP_UPSTREAM_IF_POSSIBLE).toRange();
}

- (DOMRange *)convertNSRangeToDOMRange:(NSRange)nsrange
{
    if (nsrange.location > INT_MAX)
        return 0;
    if (nsrange.length > INT_MAX || nsrange.location + nsrange.length > INT_MAX)
        nsrange.length = INT_MAX - nsrange.location;

    SharedPtr<RangeImpl> result([self convertToDOMRange:nsrange]);
    return [DOMRange _rangeWithImpl:result.get()];
}

(I think the current code leaks a RangeImpl - am I correct? And another one in 
smartDeleteRangeForProposedRange (RangeImpl * created near the end?)
Comment 3 Evan Gross 2005-12-23 23:17:41 PST
Created attachment 5256 [details]
Modified Blot app to illustrate the results of convertNSRangeToDOMRange and what goes wrong when it fails

Run this version of Blot with TOT. Open the test document in the archive.
Triple-click the first line just to get started.

View in the inspector window (on-the-fly as you change the selection) the
results of convertNSRangeToDOMRange (and its inverse), and the attributed
string obtained three different ways. When the DOMRange's don't match up, they
are drawn in red. When the attributed strings don't match up with the results
of [bridge selectedAttributedString], the background is drawn in yellow.

Just play for 30 seconds, you'll see there are real problems here...
Comment 4 Alexey Proskuryakov 2005-12-24 02:59:26 PST
At the moment, I see two separate problems with range conversion:
1) DIV-based line feeds cause a lot of problems.
2) DOM ranges for empty selections are different (for an empty document, words with style changes etc).
Anything I'm missing?

Worst of all, I still don't know what is correct conceptually (whether DOM or render trees should be used 
for such conversions).
Comment 5 Evan Gross 2005-12-24 04:25:37 PST
(In reply to comment #4)
> Worst of all, I still don't know what is correct conceptually (whether DOM or render trees should be used 
> for such conversions).

It seems to me that all the machinery that deals with editing/text/strings and DOMRange's expect as input (or simply work best with) the selected range. I see no wrong results from selectedAttributedString...

Might be time to take a different approach as I mentioned before. We need a DOMRange that exactly matches the one you'd get if you positioned the insertion point at range.location and then extended the "selection" by range.length characters. 

We need the selected DOMRange, as though the user placed the insertion point and expanded it by range.length characters - BUT without actually selecting anything in the document.

I tried, made some progress, then just gave up because I just didn't really "get" the right way to do it.

Thoughts? Any closer to confirming this?
Comment 6 Evan Gross 2005-12-24 04:28:01 PST
Sorry, baked line wrapping from my Treo 650...
Comment 7 Alexey Proskuryakov 2005-12-26 14:10:51 PST
I think I finally know what to fix here :)

TextIterator often appends text generated from non-text nodes to other nodes that it finds suitable. This 
is OK for what it does internally, but it
1) breaks KWQKHTMLPart::attributedString in a lot of cases;
2) makes writing a correct attributedString accessor based on TextIterator machinery hard, to say the 
least.

I'm not quite sure if Selection/VisiblePosition code can or should have anything to do with this. Comments 
are welcome.
Comment 8 Alexey Proskuryakov 2006-01-02 00:42:31 PST
Created attachment 5416 [details]
some automated test cases
Comment 9 Alexey Proskuryakov 2006-01-03 14:35:50 PST
Created attachment 5459 [details]
proposed patch

The issue here is that TextIterator ranges aren't expected to work with
TextInput at all - plain and attributed strings can rightfully have different
lengths (for example, this happens for strings containing inline images). So,
even merging TextIterator with VisiblePosition is not a solution here (although
it still may be a possible future direction for other reasons).

To really make TextInput work, TextIterator should probably get a mode for
generating attributed strings (and thus supersede
KWQKHTMLPart::attributedString()).

However, there are some issues with TextIterator::rangeFromLocationAndLength()
which are common for plain and attributed strings, so fixing these should
positively affect both.
Comment 10 Darin Adler 2006-01-04 13:06:30 PST
Comment on attachment 5459 [details]
proposed patch

Looks like there's no call to setEnd at all in the case where startOffset ==
endOffset. Is that OK?
Comment 11 Alexey Proskuryakov 2006-01-04 15:21:42 PST
(In reply to comment #10)
In this case, setEnd should have been called on the previous iteration (when rangeEnd was equal to 
docTextPosition+len).
Comment 12 Justin Garcia 2006-01-04 18:29:41 PST
I have a question about your changes to ::handleNonTextNode().  The ranges returned for emitted characters 
seem wrong to me with or without your patch.  Here's an example:

<div id="parent">
123
<div id="child">456</div>
789
</div>

This is rendered as:

123
456
789

Before your changes, TextIterator returns the following ranges and lengths when iterating over the above snippit:

textnode "123" offset 0 to 3 (length 3)
div id="parent" offset 0 to 1 (length 1)
textnode "456" offset 0 to 3 (length 3)
div id="parent" offset 2 to 2 (length 1)
textnode "789" offset 0 to 3 (length 3)

After your changes:

textnode "123" offset 0 to 3 (length 3)
div id="child" offset 0 to 0 (length 1)
textnode "456" offset 0 to 3 (length 3)
div id="parent" offset 2 to 2 (length 1)
textnode "789" offset 0 to 3 (length 3)

Of course, the range associated with the second emitted '\n' didn't change because you didn't touch exitNode().  I 
think that the ranges associated with the first emitted '\n are wrong.  I think the range should be:

start: (textnode "123" offset 3)
end: (textnode "456" offset 0)

This would also be acceptable:

start: (div id="parent" offset 1)
end: (div id="child" offset 0)

Find on Page is implemented with TextIterators.  When someone searches for a \n, the range returned by the 
TextIterator should be able to be used to select something appropriate for a \n.  Why do you want the ranges for 
emitted characters to be collapsed ranges?
Comment 13 Alexey Proskuryakov 2006-01-05 01:59:19 PST
(In reply to comment #12)
This behavior is due to a difference between TextIterator::range() and the range returned from 
rangeFromLocationAndLength(). An iterator always has the same node as it start/end, and it's the job of 
rangeFromLocationAndLength() to make a usable range out of a sequence of single-node iterator 
positions.

In your example, this indeed happens correctly with the patch applied, and the range returned for the 
first \n is
start: (textnode "123" offset 3)
end: (textnode "456" offset 0)
(before testing, I have removed all the newlines from the source, because they were skewing the 
results).

CharacterIterator and findPlainText() should also honor collapsed ranges in a similar way. At the 
moment, searching for strings with line breaks is unimplemented (there's an explicit check and a 
FIXME).
Comment 14 Justin Garcia 2006-01-05 11:04:19 PST
(In reply to comment #13)
> (In reply to comment #12)
> This behavior is due to a difference between TextIterator::range() and the range returned from 
> rangeFromLocationAndLength(). An iterator always has the same node as it start/end, and it's the job of 
> rangeFromLocationAndLength() to make a usable range out of a sequence of single-node iterator 
> positions.

You're right, TextIterator::range() returns ranges that have the same node as the start and the end.  But 
should this be so?  Does it make more sense to change TextIterator to return ranges that make sense, 
instead of leaving that up to the clients of TextIterator?
Comment 15 Alexey Proskuryakov 2006-01-05 12:02:13 PST
(In reply to comment #14)
Having the iterator only iterate over whole nodes makes it simpler conceptually, so I think this is a nice 
feature to keep... Also, some other clients rely on it (KWQAccObject has an assertion).

Perhaps I will change my opinion as I get more experience with TextIterator, but at the moment I like the 
existing logic.
Comment 16 Justin Garcia 2006-01-05 15:16:34 PST
Yes, KWQAccObject does assert that the start/end containers of the range returned by the TextIterator 
are the same.  But, the question is, in what way does it rely on this fact?  The length == 0 case doesn't 
matter, because we're just changing ranges returned for emitted characters, which have the length == 
1.

In the length == 1 case, the startContainer is passed to AXAttributedStringAppendText.  This method 
uses the startContainer to gather style information which it then uses to create the attributed string.  
So, if we wanted to change the ranges returned by TextIterator::range() in the way I described, we 
would just remove the assert and decide which node's style information to use for emitted characters 
when building an attributed string.

Regardless of whether we use your approach or mine in this patch, it should include changes to other 
clients of TextIterator like findPlainText, or at least a defense of why such changes are unnecessary.
Comment 17 Justin Garcia 2006-01-13 14:52:00 PST
Comment on attachment 5459 [details]
proposed patch

r- for now, until the comments are addressed.
Comment 18 BJ Burg 2016-08-03 10:53:35 PDT
I think we can close this bug, most of this code does not exist any more.