Summary: | -[WebCoreBridge convertNSRangeToDOMRange] often returns wrong or "unusable" results | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Evan Gross <evan> | ||||||||
Component: | HTML Editing | Assignee: | Alexey Proskuryakov <ap> | ||||||||
Status: | NEW --- | ||||||||||
Severity: | Normal | CC: | ap, bburg, ian, justin.garcia, rniwa | ||||||||
Priority: | P2 | ||||||||||
Version: | 420+ | ||||||||||
Hardware: | Mac | ||||||||||
OS: | OS X 10.4 | ||||||||||
Attachments: |
|
Description
Evan Gross
2005-11-03 02:07:25 PST
Whoops, the first sentence should begin: It's looking like *this bug is* one of the main reasons that bug 4680, bug 5401, bug 5445, bug 4682 and perhaps other clients... I've taken a crack, but realize I really have no clue how to properly use some of these classes, let alone what to do in order to make sure the result is a range that would *exactly* match the one returned if the given range of characters constituted the selection. It doesn't work properly at the moment in a few cases, like carets at the end of a word before a line break. Maybe someone that knows this stuff can spot (how much is?) what's wrong. Or perhaps a better approach! - (SharedPtr<DOM::RangeImpl>)convertToDOMRange:(NSRange)nsrange { SharedPtr<RangeImpl> result(_part->xmlDocImpl()->createRange()); CharacterIterator it(rangeOfContents(_part->xmlDocImpl()).get()); int exception = 0; if (!it.atEnd()) it.advance(nsrange.location); result->setStart(it.range()->startContainer(exception), it.range()->startOffset(exception), exception); it.string(nsrange.length); result->setEnd(it.range()->startContainer(exception), it.range()->startOffset(exception), exception); return SelectionController(result.get(), DOWNSTREAM, VP_UPSTREAM_IF_POSSIBLE).toRange(); } - (DOMRange *)convertNSRangeToDOMRange:(NSRange)nsrange { if (nsrange.location > INT_MAX) return 0; if (nsrange.length > INT_MAX || nsrange.location + nsrange.length > INT_MAX) nsrange.length = INT_MAX - nsrange.location; SharedPtr<RangeImpl> result([self convertToDOMRange:nsrange]); return [DOMRange _rangeWithImpl:result.get()]; } (I think the current code leaks a RangeImpl - am I correct? And another one in smartDeleteRangeForProposedRange (RangeImpl * created near the end?) Created attachment 5256 [details]
Modified Blot app to illustrate the results of convertNSRangeToDOMRange and what goes wrong when it fails
Run this version of Blot with TOT. Open the test document in the archive.
Triple-click the first line just to get started.
View in the inspector window (on-the-fly as you change the selection) the
results of convertNSRangeToDOMRange (and its inverse), and the attributed
string obtained three different ways. When the DOMRange's don't match up, they
are drawn in red. When the attributed strings don't match up with the results
of [bridge selectedAttributedString], the background is drawn in yellow.
Just play for 30 seconds, you'll see there are real problems here...
At the moment, I see two separate problems with range conversion: 1) DIV-based line feeds cause a lot of problems. 2) DOM ranges for empty selections are different (for an empty document, words with style changes etc). Anything I'm missing? Worst of all, I still don't know what is correct conceptually (whether DOM or render trees should be used for such conversions). (In reply to comment #4) > Worst of all, I still don't know what is correct conceptually (whether DOM or render trees should be used > for such conversions). It seems to me that all the machinery that deals with editing/text/strings and DOMRange's expect as input (or simply work best with) the selected range. I see no wrong results from selectedAttributedString... Might be time to take a different approach as I mentioned before. We need a DOMRange that exactly matches the one you'd get if you positioned the insertion point at range.location and then extended the "selection" by range.length characters. We need the selected DOMRange, as though the user placed the insertion point and expanded it by range.length characters - BUT without actually selecting anything in the document. I tried, made some progress, then just gave up because I just didn't really "get" the right way to do it. Thoughts? Any closer to confirming this? Sorry, baked line wrapping from my Treo 650... I think I finally know what to fix here :) TextIterator often appends text generated from non-text nodes to other nodes that it finds suitable. This is OK for what it does internally, but it 1) breaks KWQKHTMLPart::attributedString in a lot of cases; 2) makes writing a correct attributedString accessor based on TextIterator machinery hard, to say the least. I'm not quite sure if Selection/VisiblePosition code can or should have anything to do with this. Comments are welcome. Created attachment 5416 [details]
some automated test cases
Created attachment 5459 [details]
proposed patch
The issue here is that TextIterator ranges aren't expected to work with
TextInput at all - plain and attributed strings can rightfully have different
lengths (for example, this happens for strings containing inline images). So,
even merging TextIterator with VisiblePosition is not a solution here (although
it still may be a possible future direction for other reasons).
To really make TextInput work, TextIterator should probably get a mode for
generating attributed strings (and thus supersede
KWQKHTMLPart::attributedString()).
However, there are some issues with TextIterator::rangeFromLocationAndLength()
which are common for plain and attributed strings, so fixing these should
positively affect both.
Comment on attachment 5459 [details]
proposed patch
Looks like there's no call to setEnd at all in the case where startOffset ==
endOffset. Is that OK?
(In reply to comment #10) In this case, setEnd should have been called on the previous iteration (when rangeEnd was equal to docTextPosition+len). I have a question about your changes to ::handleNonTextNode(). The ranges returned for emitted characters seem wrong to me with or without your patch. Here's an example: <div id="parent"> 123 <div id="child">456</div> 789 </div> This is rendered as: 123 456 789 Before your changes, TextIterator returns the following ranges and lengths when iterating over the above snippit: textnode "123" offset 0 to 3 (length 3) div id="parent" offset 0 to 1 (length 1) textnode "456" offset 0 to 3 (length 3) div id="parent" offset 2 to 2 (length 1) textnode "789" offset 0 to 3 (length 3) After your changes: textnode "123" offset 0 to 3 (length 3) div id="child" offset 0 to 0 (length 1) textnode "456" offset 0 to 3 (length 3) div id="parent" offset 2 to 2 (length 1) textnode "789" offset 0 to 3 (length 3) Of course, the range associated with the second emitted '\n' didn't change because you didn't touch exitNode(). I think that the ranges associated with the first emitted '\n are wrong. I think the range should be: start: (textnode "123" offset 3) end: (textnode "456" offset 0) This would also be acceptable: start: (div id="parent" offset 1) end: (div id="child" offset 0) Find on Page is implemented with TextIterators. When someone searches for a \n, the range returned by the TextIterator should be able to be used to select something appropriate for a \n. Why do you want the ranges for emitted characters to be collapsed ranges? (In reply to comment #12) This behavior is due to a difference between TextIterator::range() and the range returned from rangeFromLocationAndLength(). An iterator always has the same node as it start/end, and it's the job of rangeFromLocationAndLength() to make a usable range out of a sequence of single-node iterator positions. In your example, this indeed happens correctly with the patch applied, and the range returned for the first \n is start: (textnode "123" offset 3) end: (textnode "456" offset 0) (before testing, I have removed all the newlines from the source, because they were skewing the results). CharacterIterator and findPlainText() should also honor collapsed ranges in a similar way. At the moment, searching for strings with line breaks is unimplemented (there's an explicit check and a FIXME). (In reply to comment #13) > (In reply to comment #12) > This behavior is due to a difference between TextIterator::range() and the range returned from > rangeFromLocationAndLength(). An iterator always has the same node as it start/end, and it's the job of > rangeFromLocationAndLength() to make a usable range out of a sequence of single-node iterator > positions. You're right, TextIterator::range() returns ranges that have the same node as the start and the end. But should this be so? Does it make more sense to change TextIterator to return ranges that make sense, instead of leaving that up to the clients of TextIterator? (In reply to comment #14) Having the iterator only iterate over whole nodes makes it simpler conceptually, so I think this is a nice feature to keep... Also, some other clients rely on it (KWQAccObject has an assertion). Perhaps I will change my opinion as I get more experience with TextIterator, but at the moment I like the existing logic. Yes, KWQAccObject does assert that the start/end containers of the range returned by the TextIterator are the same. But, the question is, in what way does it rely on this fact? The length == 0 case doesn't matter, because we're just changing ranges returned for emitted characters, which have the length == 1. In the length == 1 case, the startContainer is passed to AXAttributedStringAppendText. This method uses the startContainer to gather style information which it then uses to create the attributed string. So, if we wanted to change the ranges returned by TextIterator::range() in the way I described, we would just remove the assert and decide which node's style information to use for emitted characters when building an attributed string. Regardless of whether we use your approach or mine in this patch, it should include changes to other clients of TextIterator like findPlainText, or at least a defense of why such changes are unnecessary. Comment on attachment 5459 [details]
proposed patch
r- for now, until the comments are addressed.
I think we can close this bug, most of this code does not exist any more. |