RESOLVED FIXED Bug 4680
WebHTMLView (WebNSTextInputSupport) - attributedSubstringFromRange "not yet implemented"
https://bugs.webkit.org/show_bug.cgi?id=4680
Summary WebHTMLView (WebNSTextInputSupport) - attributedSubstringFromRange "not yet i...
Evan Gross
Reported 2005-08-26 23:05:35 PDT
This method needs to be implemented in order for certain TSM DocAccess events to work properly. Specifically, kEventClassTSMDocumentAccess/kEventTSMDocumentAccessGetFont fails horribly. Without knowing a whole lot about how TSM hooks into Cocoa's NSTextInput protocol, this simple bit of code *seems* to have mostly solved the problem my input method was having: - (NSAttributedString *)attributedSubstringFromRange:(NSRange)theRange { return [self _attributeStringFromDOMRange:[[self _bridge] convertToObjCDOMRange:theRange]]; } Ok, no range checking is being done, and probably needs to. Is it really this simple, or am I missing something? I kind of doubt many input methods are using this at the moment, but it's blocking future development of my product (and along with it it's usefulness in Tiger Mail). There are other TSM problems as well I need to write up (they are all in radar), most seem fairly easy to fix. But then again, the glue between TSM and NSTextInput is private, and I don't really know how it all works.
Attachments
proposed patch (795 bytes, patch)
2005-09-12 01:10 PDT, Evan Gross
no flags
Updated patch (1016 bytes, patch)
2005-10-04 02:12 PDT, Evan Gross
no flags
Updated patch (3.28 KB, patch)
2005-10-05 02:32 PDT, Evan Gross
darin: review-
Proposed patch (7.99 KB, patch)
2005-10-23 05:53 PDT, Alexey Proskuryakov
darin: review+
Proposed patch - safer? (1.09 KB, patch)
2005-10-30 01:17 PST, Evan Gross
no flags
More rigorous layout test for attributedSubstringFromRange to show what doesn't work (26.82 KB, patch)
2005-11-30 22:41 PST, Evan Gross
no flags
The actual results from this test, using the proposed patch (20.25 KB, text/plain)
2005-12-03 18:59 PST, Evan Gross
no flags
Alexey Proskuryakov
Comment 1 2005-09-04 08:46:44 PDT
This method also needs to be implemented for Character Palette's "Insert with Font" to work. I haven't checked if the proposed fix helps, though.
Evan Gross
Comment 2 2005-09-04 17:55:15 PDT
(In reply to comment #1) > This method also needs to be implemented for Character Palette's "Insert with Font" to work. I haven't > checked if the proposed fix helps, though. Actually, this is needed for the CP's "Show Character Selected in Application" feature to work. The fix I propose DOES do the trick here. I don't think it helps at all with "Insert with Font", though. Probably WebKit needs to support the Glyph Access protocol (maybe it says it does, but actually doesn't right now - haven't taken a look). No idea how these TSM calls (glyph access) wind their way down into AppKit's TextInput management, though. Figuring out some stuff as I go along here, but it's a totally undocumented area. Should have patch files for this and other related bugs in the next couple of days...
Alexey Proskuryakov
Comment 3 2005-09-04 21:17:11 PDT
(In reply to comment #2) > I don't think it helps at all with "Insert with Font", though. I've just seen this method being called after clicking "Insert with Font"... Though it does appear illogical, I agree :)
Evan Gross
Comment 4 2005-09-05 02:35:36 PDT
(In reply to comment #3) > (In reply to comment #2) > > I don't think it helps at all with "Insert with Font", though. > > I've just seen this method being called after clicking "Insert with Font"... Though it does appear illogical, I > agree :) Sounds plausible, but this fix alone doesn't get "Insert with Font" working. Something else is obviously needed as well.
Evan Gross
Comment 5 2005-09-12 01:10:31 PDT
Created attachment 3871 [details] proposed patch Not sure if we need to check theRange, or test for a nil result from _attributeStringFromDOMRange (as is done the other places it's called).
Alexey Proskuryakov
Comment 6 2005-09-25 11:49:47 PDT
This patch breaks the build for me (gcc 3.3, Development style): _attributeStringFromDOMRange needs a forward declaration. Other than that, it worked perfectly in my tests, enabling a lot of features in Kotoeri, Character Palette and my own input method. Unfortunately, I also do not know the reason for the nil checks in other places where _attributeStringFromDOMRange is called. BTW, you may want to set a review? flag (by clicking an "Edit" link for the attachment) when you think a patch is ready to be reviewed.
Darin Adler
Comment 7 2005-09-25 22:08:57 PDT
Comment on attachment 3871 [details] proposed patch Looks fine, although I don't think we need to put the attributedString into a local variable before returning it. (I'd probably just do this as a one-liner.) r=me.
Alexey Proskuryakov
Comment 8 2005-09-28 10:39:10 PDT
*** Bug 4708 has been marked as a duplicate of this bug. ***
Evan Gross
Comment 9 2005-10-01 01:01:53 PDT
(In reply to comment #7) > Looks fine, although I don't think we need to put the attributedString into a > local variable before returning it. (I'd probably just do this as a one-liner.) (In reply to comment #7) > Looks fine, although I don't think we need to put the attributedString into a > local variable before returning it. (I'd probably just do this as a one-liner.) Oh boy. More testing has revealed a number of problems. Not quite sure even where to start. No idea what exactly is broken. I suspect a few things might be... I was wondering whether _attributeStringFromDOMRange could fail. Indeed it can, but not in the manner you'd expect - i.e. returning nil. It fails by returning a bogus or empty NSAttributedString. Easy to reproduce, though it may be instructive to stick an NSLog statement in there to print the DOMRange we obtained and the NSAttributedString result to the Console: 1. Fire up Blot or whatever you like to test this stuff with. 2. Type "two<space>words<return><return>". 3. Open the Character Palette. 4. Select the space between "two" and "words" with the mouse (semi-important, if only to reveal yet another unrelated problem - 2nd note below 7.). 5. Choose "Show Character Selected in Application". Result: _attributeStringFromDOMRange returns an empty string instead of the space. So the character's not shown in the Character Palette. Even more fun: 6. Place the insertion point at the end of the last return you typed. 7. Shift-Left-Arrow once. - (note that there's no indication that anything's selected besides the fact the insertion point isn't visible - compare that to NSTextView). - (note that the behavior here is different from what happens if you immediately did this after typing the 2nd return, before changing the selected text). 8. Choose "Show Character Selected in Application". Result: Seems OK - CP shows 0x000A. 9. Shift-Left-Arrow one more time. 10. Choose "Show Character Selected in Application". Result: Seems OK. CP shows 0x0001 - looks like a CP bug. _attributeStringFromDOMRange returns a 2- character string "0x000A0x000A". 9. Shift-Left-Arrow one MORE time. 10. Choose "Show Character Selected in Application". Result: Trouble._attributeStringFromDOMRange returns a 12-character string - the entire text, for a 3- character selection (which arguably should only be two characters). Looks like -convertToObjCDOMRange isn't quite doing what we need (returning too large a range), or else _attributeStringFromDOMRange is barfing on that range in some way. Solution: No idea. This seems semi-promising: NSAttributedString *attributedString2 = [bridge attributedStringFrom:[range startContainer] startOffset: [range startOffset] to:[range endContainer] endOffset:[range endOffset]]; in that it returns the right thing for 4. above, when just the space is selected. Doesn't seem to do the right thing for the multiple-line-endings-selected case, though the results are slightly different. THERE'S more, the problem noted in 4. above is particularly nasty, as it relates to kEventTextInputOffsetToPos. If an input sends offsettopos while a space is selected (at least with this patch in-place), an exception is thrown when TSM tries to figure out the font/size/lineheight/leading, and control never even returns to the input method that sent it. Not good. PLUS the problem is compounded due to the strange range that's assembled by NSTSMInputContext when kEventTextInputOffsetToPos is called with a negative offset and there's no inline input going on (theRange.length is always 1 for negative offsets, but zero otherwise!).
Evan Gross
Comment 10 2005-10-01 02:05:38 PDT
(In reply to comment #9) > THERE'S more, the problem noted in 4. above is particularly nasty, as it relates to > kEventTextInputOffsetToPos. If an input sends kEventTextInputOffsetToPos while a space is selected (at least with this > patch in-place), an exception is thrown when TSM tries to figure out the font/size/lineheight/ leading, > and control never even returns to the input method that sent it. Not good. PLUS the problem is > compounded due to the strange range that's assembled by NSTSMInputContext when > kEventTextInputOffsetToPos is called with a negative offset and there's no inline input going on > (theRange.length is always 1 for negative offsets, but zero otherwise!). > More about this one: If an input method sends kEventTextInputOffsetToPos with a negative offset (to get info for a character offset before the insertion point), firstRectForCharacterRange gets called with theRange.location set (seemingly) appropriately, but with theRange.length = 1. This is not the case with positive offsets (where theRange.length is always zero). Not sure if this is as-designed or a NSTSMInputContext bug. Not quite sure what to make of this, there are a couple of issues that I suppose should be filed against firstRectForCharacterRange, though they intimately involve the results (or lack thereof) from attributedSubstringFromRange. 1. When kEventTextInputOffsetToPos is called with a positive offset, firstRectForCharacterRange gets called with theRange.length set to zero. Fine - sort of. When theRange.length == 0, and there's only an insertion point, attributedSubstringFromRange is NOT called to figure what to return for kEventParamTextInputReplyFMFont, kEventParamTextInputReplyPointSize, kEventParamTextInputReplyLineHeight or kEventParamTextInputReplyLineAscent. So the values returned are some sort of defaults - and in general, they're wrong. 2. When kEventTextInputOffsetToPos is called with a negative offset, firstRectForCharacterRange gets called with theRange.length set to one (no clue why). - The Good: When theRange.length > 0, attributedSubstringFromRange IS called to calculate the four font/style/line-related parameters. - The Bad: With this patch, anyway, IF the range given to firstRectForCharacterRange happens to be a space character (OK for non-breaking space, but this whole spaces/nbsp thing, while necessary, just seems a big mess - but I digress), attributedSubstringFromRange will return a zero-length string. This doesn't jive with what TSM seems to expect for the range it supplied (a string of length theRange.length, I'd guess). So somewhere in TSM, wherever it expects to be able to extract the above four return parameters for kEventTextInputOffsetToPos, it barfs on the zero-length string. An out-of- bounds exception is thrown, and control never returns to the input method (arguably a TSM bug in that it should catch the exception so it can always return control back to the IM). Basically, this is akin to a crash. Solutions? + The alternate method to obtain the attributed string: NSAttributedString *attributedString = [bridge attributedStringFrom:[range startContainer] startOffset: [range startOffset] to:[range endContainer] endOffset:[range endOffset]]; At least deals with the space character problem. Not sure it's good in all situations. + Seems we can avoid the exception being thrown simply by testing for a zero-length attributed string result for a non-zero theRange.length, and return nil instead of the empty string. May also apply to other length mismatches, at least if the attributed string length is less than theRange.length. But it's an avoidance approach, not really a fix. Suggestions, anyone? Guess this more appropriate for a firstRectForCharacterRange report...so I will probably add this info over there as well.
Darin Adler
Comment 11 2005-10-02 23:30:12 PDT
I think it's a mistake to let one bug report start tracking all sorts of different problems discovered while working on it. So to get some sanity here, we need this to represent one specific problem. I can't figure out which one, so for now I'm just going to clear the review flag and let Evan set it back once we get clear on this one.
Evan Gross
Comment 12 2005-10-04 02:12:05 PDT
Created attachment 4190 [details] Updated patch This new patch avoids calling -[self _attributeStringFromDOMRange:] because it seems to have some problems and limitations, the source isn't available to folks like myself, and it's not documented as to what it can and can't do. Best to avoid, I suppose. This patch doesn't have the problem with spaces returning an empty attributed string. Seems to do the job, as long as convertToObjCDOMRange: takes care of the basic range checking that the NSTextInput protocol docs mention (a DOM-conversant person should know). Basically, it's the same code used as a fallback in -attributedString, when the test for a non-nil result from _attributeStringFromDOMRange fails.
Evan Gross
Comment 13 2005-10-04 02:14:15 PDT
(In reply to comment #11) > I think it's a mistake to let one bug report start tracking all sorts of different problems discovered while working on it. Yep, this is getting out-of-hand. I really think we need to consult with Mr. NSTSMContext (Aki Inoue, if I'm correct) to figure out where some of the above problems are. It would really be nice to know how TSM actually uses NSTextInput for the various TSM CarbonEvents. > So to get some sanity here, we need this to represent one specific problem. Solves just the one, original problem of getting an attributed string from an arbitrary NSRange. I'll have to file bugs on the other stuff I've found (some are filed, some aren't, some solutions need guidance from "those that know").
Evan Gross
Comment 14 2005-10-04 02:29:14 PDT
(In reply to comment #12) Arggh. Still not quite right. Still returns in certain cases (multiple returns in a row) an empty string when it shouldn't. Maybe tomorrow.
Evan Gross
Comment 15 2005-10-05 02:32:49 PDT
Created attachment 4208 [details] Updated patch Ok, I like this one better. Not entirely perfect, but it's doing the job for me in various situations. Yet more discoveries, it's quite heavily commented (not sure if there still may be "too much information" supplied).
Evan Gross
Comment 16 2005-10-05 17:24:45 PDT
One way to test: Make any selection in (say) Blot, choose "Show Character Selected in Application" from the Character Palette's Action button. This seems to actually only show the character if you have only one selected, but it always calls attributedSubstringFromRange with the entire selection range. To take the "fallback" route in the code, usually selecting a single space will do the trick, as will selecting two or three line endings in a row (use shift-left-arrow or something after placing the insertion at the end of a document that ends in a few linefeeds).
Darin Adler
Comment 17 2005-10-08 11:17:02 PDT
Comment on attachment 4208 [details] Updated patch There's a fundamental problem here because _attributeStringFromDOMRange: uses an entirely different code path than our usual conversion to a string, and there's no way to guarantee they are in sync. I don't like the strategy here; I think instead we need to create a new conversion to attributed string that *is* based on the WebCore text machinery (rather than relying no the entirely-outside-WebKit one from AppKit we are currently using) for this call. I'm sorry; I know that's a bigger undertaking than this patch.
Evan Gross
Comment 18 2005-10-08 18:41:55 PDT
(In reply to comment #17) No need to be sorry - I don't much like this either. Better than the first, still not good enough. After more testing with this patch, I see: - performance is really bad, since _attributeStringFromDOMRange, while doing a good job in general, is slow, undocumented, and obviously buggy (empty string for a range consisting of a space). - we may not actually need all the attributes returned by _attributeStringFromDOMRange, at least an input method or input manager doesn't. We *could* simply using -[bridge attributedStringFrom:startOffset:to:endOffset:], however this gives only font and foreground color attributes (probably ok for input methods), it's faster, but still not always quite right. Seems to return bogus or empty strings for some ranges containing only whitespace or newlines. The BEST way to go (from what I've seen so far), would be to extract the substring from the result of calling [self attributedString]. I don't see empty strings returned from any of the "problematic" ranges, it's always in-sync with what's considered as the entire attributed string. BUT it's a really bad performer. If there was some way to cache the result from [self attributedString] until it was modified, that could improve performance significantly, at least for those input methods that are inspecting what's in the document even when you're not changing it (Dictionary popup, for example - my product as well at certain times). I have no idea how to correctly do the above at this point in time, though. Ideas?
Alexey Proskuryakov
Comment 19 2005-10-23 05:53:27 PDT
Created attachment 4447 [details] Proposed patch This is basically Evan's patch that used [WebCoreBridge attributedStringFrom:startOffset:to:endOffset:], packaged with an automated test. Correct behavior relies on other patches that are currently in commit and review queues - range checking is fixed in bug 4682; attributed strings testing is implemented in bug 5303, and there may be other dependencies that I overlooked. There are indeed some issues with multiple returns - but they seem to be caused by deeper problems, and should be handled separately IMO. WebHTMLView uses _attributeStringFromDOMRange in a couple other places - shouldn't they be changed to attributedStringFrom:startOffset:to:endOffset:, as well? Then someone from Apple would need to add support for more attributes, perhaps by copying it from [NSAttributedString _initWithDOMRange:].
Evan Gross
Comment 20 2005-10-30 01:17:14 PST
Created attachment 4533 [details] Proposed patch - safer? This is Alexey's most recent patch (without the tests), pretty much identical but wrapped in an NS_DURING/NS_ENDHANDLER. Since it's usually the case that this gets called from a CarbonEvent handler, we should probably make sure that exceptions don't propagate into the toolbox. Same may well go for other TextInput methods as well... I don't think we should count on NSTSMDocumentAccessEventHandler or NSTSMTextInputEventHandler to do this for us. At this point in time (10.4.2), if an exception is raised but not caught, SendTextInputEvent() will not return to the input method, nasty things can happen. I'd recommend that this be done for all the top-level NSTextInput methods that make calls to anything that could conceivably raise an exception. Realize that I have no idea if this is going to be adequate for exceptions that originate in lower-level C++ layers. All I know for sure is that if you call +[NSException raise:format] (or whatever) in this or (probably) any other NSTextInput method, the above nastiness happens. OK, that said, I've done quite a bit of real-world testing with this and the most recent versions of any other proposed patches (almost all are required to enable the new functionality I'm working on in my product). I know quite a bit more now than I did a few weeks ago, at least wrt where most of the problems I've described actually originate. More bugs will be filed shortly... Provided the bugs that affect the proper behavior of this patch are fixed (there are a bunch, both at the WebCore level below this, and the AppKit level above), I think this patch will do what's needed. I'm not seeing any real issue related to the fact that [WebCoreBridge attributedStringFrom:startOffset:to:endOffset:] (in general) adds only font and foreground color attributes. Seems enough for TSM events that make use of attributes. Not sure if that's true for Cocoa input servers.
Alexey Proskuryakov
Comment 21 2005-11-04 14:44:55 PST
The problem with line breaks is bug 5631. AFAIK, the methods used here aren't supposed to throw exceptions, so the decision on whether to safeguard with NS_DURING/NS_ENDHANDLER is not entirely obvious... I like to write my own code defensively, but WebKit seems to usually have fewer safeguards.
Alexey Proskuryakov
Comment 22 2005-11-12 00:47:42 PST
It doesn't look like WebNSTextInputSupport methods should be using TextIterator at all. TextIterator uses rendered text (see, for example, bug 3429), and may be eventually converted to use render tree (see its FIXME comments) - while TextInput is IMO supposed to provide DOM text.
Darin Adler
Comment 23 2005-11-23 17:26:17 PST
Comment on attachment 4533 [details] Proposed patch - safer? We don't need the NS_DURING code here, so this is no better than the other patch (which still needs review).
Evan Gross
Comment 24 2005-11-30 22:25:11 PST
(In reply to comment #22) > It doesn't look like WebNSTextInputSupport methods should be using TextIterator at all. That may well be true... > ...while TextInput is IMO supposed to provide DOM text. Not sure about this statement yet. Absolutely when the given range lies within marked text! But it's not so clear-cut for something like the Dictionary pop-up. In fact, a quick test with the Dictionary pop-up with text-transform:uppercase applied to some lowercase text, shows that on 10.4.3 it displays the DOM (lowercase) text, but with the nightly build from a couple days ago, it displays the rendered text! Since the Dictionary pop-up doesn't have the ability to modify the document's text, I think showing the rendered text is the right thing to do. BUT for an input method that can modify any range of text via UAIA w/replacerange, it could go either way. I waffle between the two myself, since my input method can (not the shipping version!) do both - draw what's in the document in an overlay, AND replace that text via UAIA.
Evan Gross
Comment 25 2005-11-30 22:41:13 PST
Created attachment 4889 [details] More rigorous layout test for attributedSubstringFromRange to show what doesn't work The layout test in attachment 4447 [details] is a bit "too easy." It doesn't show the current troubles with ranges that enclose some sort of line break. This one creates 3 lines of text, and does it more like it would actually occur, via typing commands. It will display both the attributed string and its length, so it's more obvious when things go wrong (i.e. string with wrong length is returned). The expected results in this patch were created using this version of attributedSubstringFromRange: - (NSAttributedString *)attributedSubstringFromRange:(NSRange)theRange { WebBridge *bridge = [self _bridge]; NSAttributedString *attributedString = [bridge attributedStringFrom:[bridge DOMDocument] startOffset:0 to:nil endOffset:0]; return !NSLocationInRange(theRange.location, NSMakeRange(0, [attributedString length])) ? nil : [attributedString attributedSubstringFromRange:NSIntersectionRange(theRange, NSMakeRange(0, [attributedString length]))]; } This seems to always give the correct results (according to -[bridge attributedStringFrom:startOffset:to:endOffset:] with current patches applied). With the latest proposed patch, it's easy to see the failures when you run this test. Not sure if there's some better way to package this, I've been building with the above version of attributedSubstringFromRange to get expected results, then again with the latest proposed patch to find the problems. A bit of a pain, but you'll see how screwy the results can be...
Darin Adler
Comment 26 2005-12-03 12:33:28 PST
Comment on attachment 4447 [details] Proposed patch OK, lets land this even though we know it won't work in all cases. We need to figure out how to make this work properly, though. We should come up with test cases that show AppKit HTML conversion not matching WebKit's plain text conversion, I guess.
Evan Gross
Comment 27 2005-12-03 18:59:38 PST
Created attachment 4924 [details] The actual results from this test, using the proposed patch Forgot to also upload the actual results I'm getting, so here they are. The main issue is that ranges that include a line break, or maybe even start at the beginning of a line, give totally wrong results. Take a look at the results for range (0,7) for instance - the entire text is returned. I don't think this patch is the problem, but that bug 5610 is the reason it's failing. I don't really think that additional layout tests are necessary, at least until we can fix things so this one passes. Once we get a "sane" attributed string for any arbitrary NSRange, then we might want to worry about whether the text actually matches up with the results from other plain text methods.
Darin Adler
Comment 28 2005-12-16 08:50:59 PST
I checked this in, but the layout tests are not giving the results that were in the patch.
Alexey Proskuryakov
Comment 29 2005-12-16 12:17:50 PST
The new results appear to be quite wrong :-(. I'm going to investigate.
Alexey Proskuryakov
Comment 30 2005-12-16 12:45:32 PST
Ok, it's because the results were depending on the patch in bug 4682, as noted in comment 19. I've noticed no other differences in attributed-substring-from-range; haven't checked attributed-substring- from-range-lines (but this one wasn't giving correct results anyway).
Note You need to log in before you can comment on or make changes to this bug.