Bug 53696

Summary: Caret is rendered at an incorrect position at the boundary of Arabic number in a LTR context
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED INVALID    
Severity: Normal CC: adele, darin, ddavidso, hyatt, leviw, mitz, playmobil, xji
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 49111    
Attachments:
Description Flags
demo
none
work in progress
none
work in progress 2
none
fix the bug (hopefully correct)
none
fix the bug (removed eol-style changes)
none
Rewrote while loops as for loops none

Description Ryosuke Niwa 2011-02-03 11:03:28 PST
Created attachment 81089 [details]
demo

In the attachment, offset 2 is rendered on the right of Arabic numbers but it should be rendered on the left of Arabic numbers (between Arabic text and Arabic numbers).
Comment 1 Ryosuke Niwa 2011-02-03 17:06:17 PST
Created attachment 81148 [details]
work in progress
Comment 2 Ryosuke Niwa 2011-02-03 17:14:12 PST
This bug seems to caused by Position::getInlineBoxAndOffset.  One bug was that in tertiary run cases, we were using the opposite offsets.  When we're on the right edge and setting to the left edge of that run, we must be setting to the right edge of the previous leaf child.

The other but is in the case of inlineBox->direction() == primaryDirection.  We weren't considering next/previous leaf children with a higher bidi-level when previous/leaf leaf children were not present.  But I'm less confident about that fix.

Dan, Xiaomei, could you take a look at my work in progress patch and do the sanity check?

I don't know how to test all the edges though.  It seems like I must be adding 10-20 pixel tests because there seems to be no way of obtaining the caret position from JavaScript.
Comment 3 Ryosuke Niwa 2011-02-04 11:40:48 PST
Created attachment 81256 [details]
work in progress 2
Comment 4 Ryosuke Niwa 2011-02-04 16:29:32 PST
Created attachment 81317 [details]
fix the bug (hopefully correct)
Comment 5 Ryosuke Niwa 2011-02-04 16:33:01 PST
Created attachment 81318 [details]
fix the bug (removed eol-style changes)
Comment 6 Ryosuke Niwa 2011-02-04 16:52:07 PST
Comment on attachment 81318 [details]
fix the bug (removed eol-style changes)

View in context: https://bugs.webkit.org/attachment.cgi?id=81318&action=review

> Source/WebCore/dom/Position.cpp:1126
> -            if (!nextBox || nextBox->bidiLevel() >= level)
> +            if (nextBox && nextBox->bidiLevel() >= level)
>                  return;
>  
> +            if (!nextBox) {
> +                InlineBox* prevBox = inlineBox;
> +                do {
> +                    if (prevBox->bidiLevel() < level) {
> +                        inlineBox = prevBox;
> +                        caretOffset = inlineBox->caretRightmostOffset();
> +                        break;
> +                    }
> +                    prevBox = prevBox->prevLeafChild();
> +                } while (prevBox);
> +                return;
> +            }
> +

I'm not confident about this change.  I think it makes sense but I might be missing some cases since I'm not that familiar with this code.
Comment 7 Ryosuke Niwa 2011-02-22 18:46:44 PST
Hi, could someone review my patch or give me some feedback?
Comment 8 mitz 2011-02-23 12:11:36 PST
Going to take a look at this today.
Comment 9 mitz 2011-02-23 13:48:38 PST
Comment on attachment 81318 [details]
fix the bug (removed eol-style changes)

View in context: https://bugs.webkit.org/attachment.cgi?id=81318&action=review

> Source/WebCore/dom/Position.cpp:1123
> +                InlineBox* prevBox = inlineBox;
> +                do {
> +                    if (prevBox->bidiLevel() < level) {
> +                        inlineBox = prevBox;
> +                        caretOffset = inlineBox->caretRightmostOffset();
> +                        break;
> +                    }
> +                    prevBox = prevBox->prevLeafChild();
> +                } while (prevBox);

Is there a reason not to write this as a for() loop?
Comment 10 mitz 2011-02-23 13:56:19 PST
The current behavior in attachment 81089 [details] matches the behavior of NSTextView in Mac OS X 10.6. Why do you deem it incorrect?
Comment 11 Xiaomei Ji 2011-02-23 15:00:34 PST
(In reply to comment #10)
> The current behavior in attachment 81089 [details] matches the behavior of NSTextView in Mac OS X 10.6. Why do you deem it incorrect?

It is not possible to place cursor between the Arabic letters and Arabic numbers in current behavior. Is it normal?
Comment 12 mitz 2011-02-23 15:05:24 PST
(In reply to comment #11)
> (In reply to comment #10)
> > The current behavior in attachment 81089 [details] [details] matches the behavior of NSTextView in Mac OS X 10.6. Why do you deem it incorrect?
> 
> It is not possible to place cursor between the Arabic letters and Arabic numbers in current behavior. Is it normal?

I am not sure what exactly “normal” means. Not being able to place the cursor between the letters and the numerals is a difference between WebKit and AppKit’s NSTextView. In NSTextView, the caret renders between the letters and the numerals when it’s at offset 0 (i.e. before the numerals).
Comment 13 mitz 2011-02-23 15:07:28 PST
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > The current behavior in attachment 81089 [details] matches the behavior of NSTextView in Mac OS X 10.6. Why do you deem it incorrect?
> > 
> > It is not possible to place cursor between the Arabic letters and Arabic numbers in current behavior. Is it normal?
> 
> I am not sure what exactly “normal” means. Not being able to place the cursor between the letters and the numerals is a difference between WebKit and AppKit’s NSTextView. In NSTextView, the caret renders between the letters and the numerals when it’s at offset 0 (i.e. before the numerals).

I guess this sounds like a contradiction. I should clarify that comment #10 was about where the cursor is rendered when it’s at offset 2, and not about the entire test case.
Comment 14 Ryosuke Niwa 2011-02-23 17:04:53 PST
(In reply to comment #10)
> The current behavior in attachment 81089 [details] matches the behavior of NSTextView in Mac OS X 10.6. Why do you deem it incorrect?

The problem is that both offsets 2 and 4 correspond to the rightmost location visually, and there's no logical offset that corresponds to the location between Arabic letters and numbers.

Of offsets 2 and 4, offset 4 is the one that needs to be on the rightmost location because we're in a LTR block and the location is at the end of a RTL run.  Offset 2 must be between the letters and the numbers (i.e. on the left of a LTR run of Arabic numbers) because it's at the end of a LTR run inside a RTL run.
Comment 15 Ryosuke Niwa 2011-02-23 17:52:07 PST
Created attachment 83593 [details]
Rewrote while loops as for loops
Comment 16 Ryosuke Niwa 2011-02-23 17:52:53 PST
(In reply to comment #9)
> (From update of attachment 81318 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=81318&action=review
> 
> > Source/WebCore/dom/Position.cpp:1123
> > +                InlineBox* prevBox = inlineBox;
> > +                do {
> > +                    if (prevBox->bidiLevel() < level) {
> > +                        inlineBox = prevBox;
> > +                        caretOffset = inlineBox->caretRightmostOffset();
> > +                        break;
> > +                    }
> > +                    prevBox = prevBox->prevLeafChild();
> > +                } while (prevBox);
> 
> Is there a reason not to write this as a for() loop?

Very good point.  Fixed.
Comment 17 Ryosuke Niwa 2011-02-23 17:55:53 PST
(In reply to comment #12)
> (In reply to comment #11)
> > It is not possible to place cursor between the Arabic letters and Arabic numbers in current behavior. Is it normal?
> 
> I am not sure what exactly “normal” means. Not being able to place the cursor between the letters and the numerals is a difference between WebKit and AppKit’s NSTextView. In NSTextView, the caret renders between the letters and the numerals when it’s at offset 0 (i.e. before the numerals).

Ah, that's interesting.  NSTextView seems to follow Firefox/Internet Explorer convention then.  For WebKit, we can't use offset 0 to mean the location between the letters and the numbers because the offset 0 corresponds and only corresponds to the leftmost boundary of the line.
Comment 18 mitz 2011-02-23 17:59:18 PST
(In reply to comment #17)
 
>  For WebKit, we can't use offset 0 to mean the location between the letters and the numbers because the offset 0 corresponds and only corresponds to the leftmost boundary of the line.

I don’t understand this explanation (or maybe it’s not an explanation?).
Comment 19 Ryosuke Niwa 2011-02-23 18:12:16 PST
(In reply to comment #18)
> (In reply to comment #17)
> 
> >  For WebKit, we can't use offset 0 to mean the location between the letters and the numbers because the offset 0 corresponds and only corresponds to the leftmost boundary of the line.
> 
> I don’t understand this explanation (or maybe it’s not an explanation?).

Let me try again.  So in WebKit, we use the convention that the offsets at the boundary of a run follows the direction of the surrounding run.  For example, if we have CBA (RTL text) in a LTR run, then we have offsets as (0)C(2)B(1)A(3) while we have natural (3)C(2)B(1)A(0) if it were to appear in a RTL run.

Now, the text in question is CBAabc.  Since "abc" is inside a RTL run of CBA, offsets at the boundaries follows RTL conventions as in (3)a(1)b(2)c(0).  However, the outer RTL run is in a LTR block, which means offset at the boundary of the RTL run has to follow LTR convention of the outer block hence we have: (0)CBA(3)a(1)b(2)c.  Now because CBA(abc) is a RTL run inside a LTR block, the offsets at its boundary follows LTR convention, resulting in (0)C(5)B(4)A(3)a(1)b(2)c(6).

Does this make sense?  Or am I misunderstanding something?
Comment 20 mitz 2011-02-24 09:15:40 PST
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #17)
> > 
> > >  For WebKit, we can't use offset 0 to mean the location between the letters and the numbers because the offset 0 corresponds and only corresponds to the leftmost boundary of the line.
> > 
> > I don’t understand this explanation (or maybe it’s not an explanation?).
> 
> Let me try again.  So in WebKit, we use the convention that the offsets at the boundary of a run follows the direction of the surrounding run.

Perhaps this is the wrong “convention” to use in this case. What the code currently does was arrived at, I think, through trying to match the native text system. If I or someone else got some cases wrong, then that should be fixed. There is no apparent reason to diverge from the native behavior.
Comment 21 Ryosuke Niwa 2011-02-24 16:59:21 PST
(In reply to comment #20)
> Perhaps this is the wrong “convention” to use in this case. What the code currently does was arrived at, I think, through trying to match the native text system. If I or someone else got some cases wrong, then that should be fixed. There is no apparent reason to diverge from the native behavior.

Mn... what is the right convention to use in this case then?
Comment 22 mitz 2011-02-24 17:15:39 PST
(In reply to comment #21)
> (In reply to comment #20)
> > Perhaps this is the wrong “convention” to use in this case. What the code currently does was arrived at, I think, through trying to match the native text system. If I or someone else got some cases wrong, then that should be fixed. There is no apparent reason to diverge from the native behavior.
> 
> Mn... what is the right convention to use in this case then?

In this case, NSTextView maps offset 0 to the position between the letters and the numerals. I don’t know yet what the general principle is, nor whether this is a bug. I am going to try to find out more.
Comment 23 Ryosuke Niwa 2011-02-24 18:10:40 PST
(In reply to comment #22)
> (In reply to comment #21)
> > Mn... what is the right convention to use in this case then?
> 
> In this case, NSTextView maps offset 0 to the position between the letters and the numerals. I don’t know yet what the general principle is, nor whether this is a bug. I am going to try to find out more.

Ok, thanks!  Is there some application I can use to test NSTextView behavior myself?  In particular, how do I know which offset corresponds to which insertion point location?
Comment 24 mitz 2011-02-24 19:24:24 PST
(In reply to comment #23)
> (In reply to comment #22)
> > (In reply to comment #21)
> > > Mn... what is the right convention to use in this case then?
> > 
> > In this case, NSTextView maps offset 0 to the position between the letters and the numerals. I don’t know yet what the general principle is, nor whether this is a bug. I am going to try to find out more.
> 
> Ok, thanks!  Is there some application I can use to test NSTextView behavior myself? 

Yes, TextEdit. Make sure to explicitly set the paragraph direction, since the default is “automatic” (natural). Also, it’s less confusing if you disable the split caret in International & Text preferences.

> In particular, how do I know which offset corresponds to which insertion point location?

I usually just press Return to insert a newline to see where I was :)
Comment 25 Ryosuke Niwa 2011-02-27 22:05:41 PST
(In reply to comment #24)
> (In reply to comment #23)
> > In particular, how do I know which offset corresponds to which insertion point location?
> 
> I usually just press Return to insert a newline to see where I was :)

Using this method, I found that the caret is at offset 2 if the text direction is explicitly set to RTL but it's at offset 0 if the text direction is explicitly set to LTR.

Now, if I set the text direction to LTR, and put the caret on the rightmost boundary by clicking on the far right of the text, then the split caret shows up on the upper rightmost corner and the lower leftmost corner.  If I then press left key (assuming selection is on default mode), then the lower caret moves to the boundary between arabic letters and numerals.  And pressing enter here results in inserting \n at offset 2 as well.  And this lower caret is what I was trying to implement.

Now, TextEdit seems to have various insertion point bugs with this particular input.  In particular, putting the insertion point between alphabet and numeral, and then moving to the left results in infinite loop on the single letter ‎ر.  Given that I'm not even sure if we should try to mimic TextEdit or NSTextView.
Comment 26 Ryosuke Niwa 2011-03-03 22:20:10 PST
Hi Dan,

I tested TextEdit's behavior by your method and it seems that TextEdit follows Firefox / Internet Explorer's offset behavior here: (4)B(3)A(0)1(1)2(2).  But even WebKit follows this convention when inserting new line because the enclosing block in this case is LTR.

One potential problem with this research method is the location at which character is inserted/removed depends on the natural direction of the character and the direction of the enclosing block.  For example, if I change the direction of the document to RTL, then the apparent offsets on TextEdit change to (4)B(3)A(2)1(1)2(0).

Anyway, if you think we should align WebKit's behavior to that of TextEdit, then it seems like we need to reopen and fix https://bugs.webkit.org/show_bug.cgi?id=52176.
Comment 27 Ryosuke Niwa 2011-03-07 18:41:23 PST
After some discussion with Xiaomei, I'm little more convinced that my current approach (at least offset assignments) is indeed correct.  We considered several possibilites for offsets in CBA123 (arabic numbers followed by arabic letters in a LTR block) here:

First, the following is the offsets assigned by Firefox/IE.
Internet Explorer: (0)(1)(7)C(6)B(5)A(4)1(2)2(3)3
Firefox: (0)(7)C(6)B(5)A(1)(4)1(2)2(3)3

Note that neither assignments seem ideal in this case and both of them have offset 7 even though there are only 6 letters.  Now, let us consider 4 offset assignments:

(0)C(5)B(4)A(1)1(2)2(3)3(6) - The offsets between ABC contradict with those of Firefox/IE.  Also it's odd that even though (1) corresponds to the position before 2 but it appears between A and 1.

(0)C(6)B(5)A(1)1(2)2(3)3(4) - This is a little better in that offsets between CBA match with those of Firefox/IE.

(6)C(5)B(4)A(0)1(1)2(2)3(3) - This is similar to what TextEdit, Firefox, Internet Explorer does but contradicts with the offset assignments in WebKit when CBA appeared in a LTR block: (0)C(2)B(1)A(3).  Also, the offsets between ABC contradict with those of Firefox/IE.

(0)C(5)B(4)A(3)1(1)2(2)3(6) - This is similar to the above but all offsets except (0) and (6) flows in the same order as those of Firefox and Internet Explorer except (1) and we're off by 1.
Comment 28 Xiaomei Ji 2011-03-07 18:58:07 PST
I think we need the general principal here so that the visible positions are anticipated, if not able to be standardized.

For example, given a LTR box inside a LTR block.

There are 9 basic scenarios considering the box's surrounding box's directionality:

xLx (x menas NULL box).
xLL
xLR

LLx
LLL
LLR

RLx
RLL
RLR

In each scenario, how the visible positions should be laid out when surrounding boxes' bidi level is same as/less/greater than the box.

Same for a RTL box in LTR block, LTR/RTL box in RTL block.

Then, can the rule be generalized?
Comment 29 Ryosuke Niwa 2011-03-31 02:05:30 PDT
After long attempts to fix this bug, I figured out that our current behavior matches NSTextView.  So this bug is actually invalid.  We need to implement split caret / hinted-caret on Mac / non-Mac in order to allow carets between letters.
Comment 30 Eric Seidel (no email) 2011-04-06 10:35:34 PDT
Comment on attachment 83593 [details]
Rewrote while loops as for loops

Cleared review? from attachment 83593 [details] so that this bug does not appear in http://webkit.org/pending-review.  If you would like this patch reviewed, please attach it to a new bug (or re-open this bug before marking it for review again).