Bug 4680 - WebHTMLView (WebNSTextInputSupport) - attributedSubstringFromRange "not yet implemented"
Summary: WebHTMLView (WebNSTextInputSupport) - attributedSubstringFromRange "not yet i...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Dave Hyatt
URL:
Keywords:
: 4708 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-08-26 23:05 PDT by Evan Gross
Modified: 2005-12-16 12:45 PST (History)
1 user (show)

See Also:


Attachments
proposed patch (795 bytes, patch)
2005-09-12 01:10 PDT, Evan Gross
no flags Details | Formatted Diff | Diff
Updated patch (1016 bytes, patch)
2005-10-04 02:12 PDT, Evan Gross
no flags Details | Formatted Diff | Diff
Updated patch (3.28 KB, patch)
2005-10-05 02:32 PDT, Evan Gross
darin: review-
Details | Formatted Diff | Diff
Proposed patch (7.99 KB, patch)
2005-10-23 05:53 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff
Proposed patch - safer? (1.09 KB, patch)
2005-10-30 01:17 PST, Evan Gross
no flags Details | Formatted Diff | Diff
More rigorous layout test for attributedSubstringFromRange to show what doesn't work (26.82 KB, patch)
2005-11-30 22:41 PST, Evan Gross
no flags Details | Formatted Diff | Diff
The actual results from this test, using the proposed patch (20.25 KB, text/plain)
2005-12-03 18:59 PST, Evan Gross
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Evan Gross 2005-08-26 23:05:35 PDT
This method needs to be implemented in order for certain TSM DocAccess events to work properly.

Specifically, kEventClassTSMDocumentAccess/kEventTSMDocumentAccessGetFont fails horribly.

Without knowing a whole lot about how TSM hooks into Cocoa's NSTextInput protocol, this simple bit 
of code *seems* to have mostly solved the problem my input method was having:

- (NSAttributedString *)attributedSubstringFromRange:(NSRange)theRange
{
    return [self _attributeStringFromDOMRange:[[self _bridge] convertToObjCDOMRange:theRange]];
}

Ok, no range checking is being done, and probably needs to. Is it really this simple, or am I missing 
something?

I kind of doubt many input methods are using this at the moment, but it's blocking future development 
of my product (and along with it it's usefulness in Tiger Mail). There are other TSM problems as well I 
need to write up (they are all in radar), most seem fairly easy to fix. But then again, the glue between 
TSM and NSTextInput is private, and I don't really know how it all works.
Comment 1 Alexey Proskuryakov 2005-09-04 08:46:44 PDT
This method also needs to be implemented for Character Palette's "Insert with Font" to work. I haven't 
checked if the proposed fix helps, though.
Comment 2 Evan Gross 2005-09-04 17:55:15 PDT
(In reply to comment #1)
> This method also needs to be implemented for Character Palette's "Insert with Font" to work. I haven't 
> checked if the proposed fix helps, though.

Actually, this is needed for the CP's "Show Character Selected in Application" feature to work. The fix I 
propose DOES do the trick here.

I don't think it helps at all with "Insert with Font", though. Probably WebKit needs to support the Glyph 
Access protocol (maybe it says it does, but actually doesn't right now - haven't taken a look). No idea 
how these TSM calls (glyph access) wind their way down into AppKit's TextInput management, though. 
Figuring out some stuff as I go along here, but it's a totally undocumented area.

Should have patch files for this and other related bugs in the next couple of days...
Comment 3 Alexey Proskuryakov 2005-09-04 21:17:11 PDT
(In reply to comment #2)
> I don't think it helps at all with "Insert with Font", though.

I've just seen this method being called after clicking "Insert with Font"... Though it does appear illogical, I 
agree :)
Comment 4 Evan Gross 2005-09-05 02:35:36 PDT
(In reply to comment #3)
> (In reply to comment #2)
> > I don't think it helps at all with "Insert with Font", though.
> 
> I've just seen this method being called after clicking "Insert with Font"... Though it does appear illogical, 
I 
> agree :)

Sounds plausible, but this fix alone doesn't get "Insert with Font" working. Something else is obviously 
needed as well.
Comment 5 Evan Gross 2005-09-12 01:10:31 PDT
Created attachment 3871 [details]
proposed patch

Not sure if we need to check theRange, or test for a nil result from
_attributeStringFromDOMRange (as is done the other places it's called).
Comment 6 Alexey Proskuryakov 2005-09-25 11:49:47 PDT
This patch breaks the build for me (gcc 3.3, Development style): _attributeStringFromDOMRange needs a 
forward declaration.

Other than that, it worked perfectly in my tests, enabling a lot of features in Kotoeri, Character Palette and 
my own input method. Unfortunately, I also do not know the reason for the nil checks in other places 
where _attributeStringFromDOMRange is called.

BTW, you may want to set a review? flag (by clicking an "Edit" link for the attachment) when you think a 
patch is ready to be reviewed.
Comment 7 Darin Adler 2005-09-25 22:08:57 PDT
Comment on attachment 3871 [details]
proposed patch

Looks fine, although I don't think we need to put the attributedString into a
local variable before returning it. (I'd probably just do this as a one-liner.)
r=me.
Comment 8 Alexey Proskuryakov 2005-09-28 10:39:10 PDT
*** Bug 4708 has been marked as a duplicate of this bug. ***
Comment 9 Evan Gross 2005-10-01 01:01:53 PDT
(In reply to comment #7)
> Looks fine, although I don't think we need to put the attributedString into a
> local variable before returning it. (I'd probably just do this as a one-liner.)

(In reply to comment #7)
> Looks fine, although I don't think we need to put the attributedString into a
> local variable before returning it. (I'd probably just do this as a one-liner.)

Oh boy. More testing has revealed a number of problems. Not quite sure even where to start. No idea 
what exactly is broken. I suspect a few things might be...

I was wondering whether _attributeStringFromDOMRange could fail. Indeed it can, but not in the 
manner you'd expect - i.e. returning nil. It fails by returning a bogus or empty NSAttributedString. Easy 
to reproduce, though it may be instructive to stick an NSLog statement in there to print the DOMRange 
we obtained and the NSAttributedString result to the Console:

1. Fire up Blot or whatever you like to test this stuff with.
2. Type "two<space>words<return><return>".
3. Open the Character Palette.
4. Select the space between "two" and "words" with the mouse (semi-important, if only to reveal yet 
another unrelated problem - 2nd note below 7.).
5. Choose "Show Character Selected in Application".

Result: _attributeStringFromDOMRange returns an empty string instead of the space. So the character's 
not shown in the Character Palette.

Even more fun:
6. Place the insertion point at the end of the last return you typed.
7. Shift-Left-Arrow once.
- (note that there's no indication that anything's selected besides the fact the insertion point isn't 
visible - compare that to NSTextView).
- (note that the behavior here is different from what happens if you immediately did this after typing 
the 2nd return, before changing the selected text).
8. Choose "Show Character Selected in Application".

Result: Seems OK - CP shows 0x000A.

9. Shift-Left-Arrow one more time.
10. Choose "Show Character Selected in Application".

Result: Seems OK. CP shows 0x0001 - looks like a CP bug. _attributeStringFromDOMRange returns a 2-
character string "0x000A0x000A".

9. Shift-Left-Arrow one MORE time.
10. Choose "Show Character Selected in Application".

Result: Trouble._attributeStringFromDOMRange returns a 12-character string - the entire text, for a 3-
character selection (which arguably should only be two characters).

Looks like -convertToObjCDOMRange isn't quite doing what we need (returning too large a range), or 
else _attributeStringFromDOMRange is barfing on that range in some way.

Solution: No idea. This seems semi-promising:

NSAttributedString *attributedString2 = [bridge attributedStringFrom:[range startContainer] startOffset:
[range startOffset] to:[range endContainer] endOffset:[range endOffset]];

in that it returns the right thing for 4. above, when just the space is selected.

Doesn't seem to do the right thing for the multiple-line-endings-selected case, though the results are 
slightly different.

THERE'S more, the problem noted in 4. above is particularly nasty, as it relates to 
kEventTextInputOffsetToPos. If an input sends offsettopos while a space is selected (at least with this 
patch in-place), an exception is thrown when TSM tries to figure out the font/size/lineheight/leading, 
and control never even returns to the input method that sent it. Not good. PLUS the problem is 
compounded due to the strange range that's assembled by NSTSMInputContext when 
kEventTextInputOffsetToPos is called with a negative offset and there's no inline input going on 
(theRange.length is always 1 for negative offsets, but zero otherwise!).
Comment 10 Evan Gross 2005-10-01 02:05:38 PDT
(In reply to comment #9)
> THERE'S more, the problem noted in 4. above is particularly nasty, as it relates to 
> kEventTextInputOffsetToPos. If an input sends kEventTextInputOffsetToPos while a space is selected 
(at least with this 
> patch in-place), an exception is thrown when TSM tries to figure out the font/size/lineheight/
leading, 
> and control never even returns to the input method that sent it. Not good. PLUS the problem is 
> compounded due to the strange range that's assembled by NSTSMInputContext when 
> kEventTextInputOffsetToPos is called with a negative offset and there's no inline input going on 
> (theRange.length is always 1 for negative offsets, but zero otherwise!).
> 

More about this one:

If an input method sends kEventTextInputOffsetToPos with a negative offset (to get info for a character 
offset before the insertion point), firstRectForCharacterRange gets called with theRange.location set 
(seemingly) appropriately, but with theRange.length = 1. This is not the case with positive offsets 
(where theRange.length is always zero).

Not sure if this is as-designed or a NSTSMInputContext bug. Not quite sure what to make of this, there 
are a couple of issues that I suppose should be filed against firstRectForCharacterRange, though they 
intimately involve the results (or lack thereof) from attributedSubstringFromRange.

1. When kEventTextInputOffsetToPos is called with a positive offset, firstRectForCharacterRange gets 
called with theRange.length set to zero. Fine - sort of. When theRange.length == 0, and there's only an 
insertion point, attributedSubstringFromRange is NOT called to figure what to return for 
kEventParamTextInputReplyFMFont, kEventParamTextInputReplyPointSize, 
kEventParamTextInputReplyLineHeight or kEventParamTextInputReplyLineAscent. So the values 
returned are some sort of defaults - and in general, they're wrong.

2. When kEventTextInputOffsetToPos is called with a negative offset, firstRectForCharacterRange gets 
called with theRange.length set to one (no clue why).
- The Good: When theRange.length > 0, attributedSubstringFromRange IS called to calculate the four 
font/style/line-related parameters.
- The Bad: With this patch, anyway, IF the range given to firstRectForCharacterRange happens to be a 
space character (OK for non-breaking space, but this whole spaces/nbsp thing, while necessary, just 
seems a big mess - but I digress), attributedSubstringFromRange will return a zero-length string. This 
doesn't jive with what TSM seems to expect for the range it supplied (a string of length 
theRange.length, I'd guess). So somewhere in TSM, wherever it expects to be able to extract the above 
four return parameters for kEventTextInputOffsetToPos, it barfs on the zero-length string. An out-of-
bounds exception is thrown, and control never returns to the input method (arguably a TSM bug in that 
it should catch the exception so it can always return control back to the IM). Basically, this is akin to a 
crash.

Solutions?

+ The alternate method to obtain the attributed string:

NSAttributedString *attributedString = [bridge attributedStringFrom:[range startContainer] startOffset:
[range startOffset] to:[range endContainer] endOffset:[range endOffset]];

At least deals with the space character problem. Not sure it's good in all situations.

+ Seems we can avoid the exception being thrown simply by testing for a zero-length attributed string 
result for a non-zero theRange.length, and return nil instead of the empty string. May also apply to 
other length mismatches, at least if the attributed string length is less than theRange.length.

But it's an avoidance approach, not really a fix.

Suggestions, anyone?

Guess this more appropriate for a firstRectForCharacterRange report...so I will probably add this info 
over there as well.
Comment 11 Darin Adler 2005-10-02 23:30:12 PDT
I think it's a mistake to let one bug report start tracking all sorts of different problems discovered while 
working on it.

So to get some sanity here, we need this to represent one specific problem. I can't figure out which one, so 
for now I'm just going to clear the review flag and let Evan set it back once we get clear on this one.
Comment 12 Evan Gross 2005-10-04 02:12:05 PDT
Created attachment 4190 [details]
Updated patch

This new patch avoids calling -[self _attributeStringFromDOMRange:] because it
seems to have some problems and limitations, the source isn't available to
folks like myself, and it's not documented as to what it can and can't do. Best
to avoid, I suppose.

This patch doesn't have the problem with spaces returning an empty attributed
string. Seems to do the job, as long as convertToObjCDOMRange: takes care of
the basic range checking that the NSTextInput protocol docs mention (a
DOM-conversant person should know).

Basically, it's the same code used as a fallback in -attributedString, when the
test for a non-nil result from _attributeStringFromDOMRange fails.
Comment 13 Evan Gross 2005-10-04 02:14:15 PDT
(In reply to comment #11)
> I think it's a mistake to let one bug report start tracking all sorts of different problems discovered 
while working on it.

Yep, this is getting out-of-hand. I really think we need to consult with Mr. NSTSMContext (Aki Inoue, if 
I'm correct) to figure out where some of the above problems are. It would really be nice to know how 
TSM actually uses NSTextInput for the various TSM CarbonEvents.

> So to get some sanity here, we need this to represent one specific problem.

Solves just the one, original problem of getting an attributed string from an arbitrary NSRange.

I'll have to file bugs on the other stuff I've found (some are filed, some aren't, some solutions need 
guidance from "those that know").
Comment 14 Evan Gross 2005-10-04 02:29:14 PDT
(In reply to comment #12)
Arggh. Still not quite right. Still returns in certain cases (multiple returns in a row) an empty string when it 
shouldn't. Maybe tomorrow.
Comment 15 Evan Gross 2005-10-05 02:32:49 PDT
Created attachment 4208 [details]
Updated patch

Ok, I like this one better. Not entirely perfect, but it's doing the job for me
in various situations.

Yet more discoveries, it's quite heavily commented (not sure if there still may
be "too much information" supplied).
Comment 16 Evan Gross 2005-10-05 17:24:45 PDT
One way to test: Make any selection in (say) Blot, choose "Show Character Selected in Application" from the 
Character Palette's Action button. This seems to actually only show the character if you have only one 
selected, but it always calls attributedSubstringFromRange with the entire selection range.

To take the "fallback" route in the code, usually selecting a single space will do the trick, as will selecting 
two or three line endings in a row (use shift-left-arrow or something after placing the insertion at the end 
of a document that ends in a few linefeeds).
Comment 17 Darin Adler 2005-10-08 11:17:02 PDT
Comment on attachment 4208 [details]
Updated patch

There's a fundamental problem here because _attributeStringFromDOMRange: uses
an entirely different code path than our usual conversion to a string, and
there's no way to guarantee they are in sync.

I don't like the strategy here; I think instead we need to create a new
conversion to attributed string that *is* based on the WebCore text machinery
(rather than relying no the entirely-outside-WebKit one from AppKit we are
currently using) for this call.

I'm sorry; I know that's a bigger undertaking than this patch.
Comment 18 Evan Gross 2005-10-08 18:41:55 PDT
(In reply to comment #17)
No need to be sorry - I don't much like this either. Better than the first, still not good enough. After 
more testing with this patch, I see:

- performance is really bad, since _attributeStringFromDOMRange, while doing a good job in general, is 
slow, undocumented, and obviously buggy (empty string for a range consisting of a space).
- we may not actually need all the attributes returned by _attributeStringFromDOMRange, at least an 
input method or input manager doesn't.

We *could* simply using -[bridge attributedStringFrom:startOffset:to:endOffset:], however this gives 
only font and foreground color attributes (probably ok for input methods), it's faster, but still not 
always quite right. Seems to return bogus or empty strings for some ranges containing only whitespace 
or newlines.

The BEST way to go (from what I've seen so far), would be to extract the substring from the result of 
calling [self attributedString]. I don't see empty strings returned from any of the "problematic" ranges, 
it's always in-sync with what's considered as the entire attributed string. BUT it's a really bad performer. 
If there was some way to cache the result from [self attributedString] until it was modified, that could 
improve performance significantly, at least for those input methods that are inspecting what's in the 
document even when you're not changing it (Dictionary popup, for example - my product as well at 
certain times). I have no idea how to correctly do the above at this point in time, though.

Ideas?

Comment 19 Alexey Proskuryakov 2005-10-23 05:53:27 PDT
Created attachment 4447 [details]
Proposed patch

This is basically Evan's patch that used [WebCoreBridge
attributedStringFrom:startOffset:to:endOffset:], packaged with an automated
test.

Correct behavior relies on other patches that are currently in commit and
review queues - range checking is fixed in bug 4682; attributed strings testing
is implemented in bug 5303, and there may be other dependencies that I
overlooked.

There are indeed some issues with multiple returns - but they seem to be caused
by deeper problems, and should be handled separately IMO.

WebHTMLView uses _attributeStringFromDOMRange in a couple other places -
shouldn't they be changed to attributedStringFrom:startOffset:to:endOffset:, as
well? Then someone from Apple would need to add support for more attributes,
perhaps by copying it from [NSAttributedString _initWithDOMRange:].
Comment 20 Evan Gross 2005-10-30 01:17:14 PST
Created attachment 4533 [details]
Proposed patch - safer?

This is Alexey's most recent patch (without the tests), pretty much identical
but wrapped in an NS_DURING/NS_ENDHANDLER. Since it's usually the case that
this gets called from a CarbonEvent handler, we should probably make sure that
exceptions don't propagate into the toolbox. Same may well go for other
TextInput methods as well...

I don't think we should count on NSTSMDocumentAccessEventHandler or
NSTSMTextInputEventHandler to do this for us. At this point in time (10.4.2),
if an exception is raised but not caught, SendTextInputEvent() will not return
to the input method, nasty things can happen.

I'd recommend that this be done for all the top-level NSTextInput methods that
make calls to anything that could conceivably raise an exception. Realize that
I have no idea if this is going to be adequate for exceptions that originate in
lower-level C++ layers. All I know for sure is that  if you call +[NSException
raise:format] (or whatever) in this or (probably) any other NSTextInput method,
the above nastiness happens.

OK, that said, I've done quite a bit of real-world testing with this and the
most recent versions of any other proposed patches (almost all are required to
enable the new functionality I'm working on in my product). I know quite a bit
more now than I did a few weeks ago, at least wrt where most of the problems
I've described actually originate. More bugs will be filed shortly...

Provided the bugs that affect the proper behavior of this patch are fixed
(there are a bunch, both at the WebCore level below this, and the AppKit level
above), I think this patch will do what's needed. I'm not seeing any real issue
related to the fact that [WebCoreBridge
attributedStringFrom:startOffset:to:endOffset:] (in general) adds only font and
foreground color attributes. Seems enough for TSM events that make use of
attributes. Not sure if that's true for Cocoa input servers.
Comment 21 Alexey Proskuryakov 2005-11-04 14:44:55 PST
The problem with line breaks is bug 5631.

AFAIK, the methods used here aren't supposed to throw exceptions, so the decision on whether to 
safeguard with NS_DURING/NS_ENDHANDLER is not entirely obvious... I like to write my own code 
defensively, but WebKit seems to usually have fewer safeguards.
Comment 22 Alexey Proskuryakov 2005-11-12 00:47:42 PST
It doesn't look like WebNSTextInputSupport methods should be using TextIterator at all. TextIterator uses 
rendered text (see, for example, bug 3429), and may be eventually converted to use render tree (see its 
FIXME comments) - while TextInput is IMO supposed to provide DOM text.
Comment 23 Darin Adler 2005-11-23 17:26:17 PST
Comment on attachment 4533 [details]
Proposed patch - safer?

We don't need the NS_DURING code here, so this is no better than the other
patch (which still needs review).
Comment 24 Evan Gross 2005-11-30 22:25:11 PST
(In reply to comment #22)
> It doesn't look like WebNSTextInputSupport methods should be using TextIterator at all.

That may well be true...

> ...while TextInput is IMO supposed to provide DOM text.

Not sure about this statement yet. Absolutely when the given range lies within marked text! But it's not 
so clear-cut for something like the Dictionary pop-up. In fact, a quick test with the Dictionary pop-up 
with text-transform:uppercase applied to some lowercase text, shows that on 10.4.3 it displays the 
DOM (lowercase) text, but with the nightly build from a couple days ago, it displays the rendered text!

Since the Dictionary pop-up doesn't have the ability to modify the document's text, I think showing the 
rendered text is the right thing to do.

BUT for an input method that can modify any range of text via UAIA w/replacerange, it could go either 
way. I waffle between the two myself, since my input method can (not the shipping version!) do both - 
draw what's in the document in an overlay, AND replace that text via UAIA.
Comment 25 Evan Gross 2005-11-30 22:41:13 PST
Created attachment 4889 [details]
More rigorous layout test for attributedSubstringFromRange to show what doesn't work

The layout test in attachment 4447 [details] is a bit "too easy." It doesn't show the
current troubles with ranges that enclose some sort of line break.

This one creates 3 lines of text, and does it more like it would actually
occur, via typing commands. It will display both the attributed string and its
length, so it's more obvious when things go wrong (i.e. string with wrong
length is returned).

The expected results in this patch were created using this version of
attributedSubstringFromRange:

- (NSAttributedString *)attributedSubstringFromRange:(NSRange)theRange
{
    WebBridge *bridge = [self _bridge];
    NSAttributedString *attributedString = [bridge attributedStringFrom:[bridge
DOMDocument] startOffset:0 to:nil endOffset:0];
    
    return !NSLocationInRange(theRange.location, NSMakeRange(0,
[attributedString length])) ? nil : [attributedString
attributedSubstringFromRange:NSIntersectionRange(theRange, NSMakeRange(0,
[attributedString length]))];
}

This seems to always give the correct results (according to -[bridge
attributedStringFrom:startOffset:to:endOffset:] with current patches applied).

With the latest proposed patch, it's easy to see the failures when you run this
test.

Not sure if there's some better way to package this, I've been building with
the above version of attributedSubstringFromRange to get expected results, then
again with the latest proposed patch to find the problems. A bit of a pain, but
you'll see how screwy the results can be...
Comment 26 Darin Adler 2005-12-03 12:33:28 PST
Comment on attachment 4447 [details]
Proposed patch

OK, lets land this even though we know it won't work in all cases. We need to
figure out how to make this work properly, though. We should come up with test
cases that show AppKit HTML conversion not matching WebKit's plain text
conversion, I guess.
Comment 27 Evan Gross 2005-12-03 18:59:38 PST
Created attachment 4924 [details]
The actual results from this test, using the proposed patch

Forgot to also upload the actual results I'm getting, so here they are. The
main issue is that ranges that include a line break, or maybe even start at the
beginning of a line, give totally wrong results. Take a look at the results for
range (0,7) for instance - the entire text is returned.

I don't think this patch is the problem, but that bug 5610 is the reason it's
failing. I don't really think that additional layout tests are necessary, at
least until we can fix things so this one passes. Once we get a "sane"
attributed string for any arbitrary NSRange, then we might want to worry about
whether the text actually matches up with the results from other plain text
methods.
Comment 28 Darin Adler 2005-12-16 08:50:59 PST
I checked this in, but the layout tests are not giving the results that were in the patch.
Comment 29 Alexey Proskuryakov 2005-12-16 12:17:50 PST
The new results appear to be quite wrong :-(. I'm going to investigate.
Comment 30 Alexey Proskuryakov 2005-12-16 12:45:32 PST
Ok, it's because the results were depending on the patch in bug 4682, as noted in comment 19. I've 
noticed no other differences in attributed-substring-from-range; haven't checked attributed-substring-
from-range-lines (but this one wasn't giving correct results anyway).