Bug 5172 - Bidi algorithm not applied correctly across line breaks
Summary: Bidi algorithm not applied correctly across line breaks
Status: VERIFIED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Dave Hyatt
URL:
Keywords:
: 5840 (view as bug list)
Depends on: 5532
Blocks: 5840
  Show dependency treegraph
 
Reported: 2005-09-28 14:07 PDT by mitz
Modified: 2005-11-28 08:24 PST (History)
1 user (show)

See Also:


Attachments
testcase (2.20 KB, text/html)
2005-09-28 14:08 PDT, mitz
no flags Details
better testcase (2.99 KB, text/html)
2005-09-30 16:21 PDT, mitz
no flags Details
work in progress (22.76 KB, patch)
2005-11-04 08:24 PST, mitz
no flags Details | Formatted Diff | Diff
Proposed patch (35.96 KB, patch)
2005-11-05 09:40 PST, mitz
no flags Details | Formatted Diff | Diff
Alternative patch (36.59 KB, patch)
2005-11-05 10:11 PST, mitz
no flags Details | Formatted Diff | Diff
Revised patch (38.25 KB, patch)
2005-11-06 16:29 PST, mitz
hyatt: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mitz 2005-09-28 14:07:48 PDT
Summary: WebKit sometimes assigns incorrect bidi embedding levels to text runs immediately before 
or immediately after a line break, resulting in incorrect line order.

To reproduce: Open the testcase in Safari.

Expected: the Test column to match the Reference column. Firefox 1.0.4 behaves as expected.

Actual: in rows 2, 4 and 10, the columns mismatch.

Analysis: When a line ends, it is sometimes impossible to know the embedding level for the last run in 
the line without looking ahead into the next line. When a line begins, the embedding level of the first 
run may depend on the last run in the previous line. Currently, WebKit treats each line essentially as a 
separate paragraph, using the embedding direction for sor and eor.
Comment 1 mitz 2005-09-28 14:08:19 PDT
Created attachment 4083 [details]
testcase
Comment 2 mitz 2005-09-30 16:07:37 PDT
Using <br/> to force line breaks in the testcase was a bad idea, since IE and NSTextView actually treat line 
separators like paragraph separators in terms of the bidi algorithm. Firefox treats them as whitespace 
(which is according to spec). WebKit should probably match IE and the rest of OS X. I should make a 
testcase where lines wrap naturally.
Comment 3 mitz 2005-09-30 16:21:16 PDT
Created attachment 4121 [details]
better testcase
Comment 4 mitz 2005-11-04 08:24:58 PST
Created attachment 4593 [details]
work in progress

This doesn't address partial relayout (the bidi status and context at the end
of each line need to be cached for that to work) and currently treats <br> as
whitespace.
Comment 5 mitz 2005-11-05 09:40:01 PST
Created attachment 4607 [details]
Proposed patch
Comment 6 mitz 2005-11-05 10:11:47 PST
Created attachment 4608 [details]
Alternative patch
Comment 7 Darin Adler 2005-11-05 17:22:09 PST
Comment on attachment 4608 [details]
Alternative patch

I would suggest using SharedPtr for m_lineBreakContext and for the context
field of BidiState.

This patch looks greate otherwise, although I think the real proof is in the
test cases, r=me.
Comment 8 Dave Hyatt 2005-11-06 13:51:48 PST
+inline bool operator!=(const BidiStatus& status1, const BidiStatus& status2)
+{
+    return status1.eor != status2.eor || status1.last != status2.last || status1.lastStrong != 
status2.lastStrong;
+}
+

Couldn't this just be implemented in terms of the == operator?

Also, this patch increases the size of all root lines by 4 bytes.  Is there any way to avoid this?
Comment 9 Dave Hyatt 2005-11-06 13:53:48 PST
+            if (previousLineBrokeCleanly) {
+                // A deviation from the Unicode Bidi Algorithm in order to match
+                // Mac OS X text and WinIE: a hard line break resets bidi state.

Did you verify this with both preformatted text newlines as well as brs?
Comment 10 mitz 2005-11-06 14:03:16 PST
(In reply to comment #8)

> Couldn't this just be implemented in terms of the == operator?

Not when partial relayout takes place: you pick up the context from the last clean line and during 
relayout of subsequent lines you need to push a new context onto the stack, then you reach the first 
clean line with a context that's different pointer-wise from the saved context for that line, but it's 
actually the same. The "Proposed patch" uses pointer comparison (that's the only difference between it 
and the alternative). It might save a little on the comparisons but cost more in doing unnecessary 
relayout of all the lines up to the end of the block.

>  Also, this patch increases the size of all root lines by 4 bytes.  Is there any way to avoid this?

I counted more than 4 bytes (the BidiContext* and the BidiStatus which is at least 3 bytes). As far as I 
can tell, all of this is necessary if you want to start relayout in the middle.
Comment 11 mitz 2005-11-06 14:13:08 PST
(In reply to comment #9)

> Did you verify this with both preformatted text newlines as well as brs?

Oops, WinIE does that with preformatted text newlines but the patch doesn't. I expected 
previousLineBrokeCleanly to be true in that case as well but it's not :-/
Comment 12 mitz 2005-11-06 14:35:51 PST
(In reply to comment #8)

Oops again! In comment #10 I answered a totally different question from what I was asked!

Yes, != can be implemented in terms of ==. I just followed the BidiIterator != example.
Comment 13 Dave Hyatt 2005-11-06 15:57:21 PST
Can you not infer the appropriate information based off the cached line break object?  Does that not give 
you enough information to effectively recover the bidi state?

Comment 14 mitz 2005-11-06 16:14:33 PST
(In reply to comment #13)
> Can you not infer the appropriate information based off the cached line break object?  Does that not 
give 
> you enough information to effectively recover the bidi state?

No. The stack of embedding levels at the end of the line depends on all the embedding changes prior to 
that point. Also, the last strong character may have appeared before the last object on the line, and the 
direction at eor may not be possible to determine based on that object alone.
Comment 15 mitz 2005-11-06 16:29:07 PST
Created attachment 4616 [details]
Revised patch

Changed to treat newlines in preformatted text like brs, redefined != using ==,
and changed to use SharedPointer where Darin has suggested doing so.
Comment 16 Dave Hyatt 2005-11-06 17:22:36 PST
Hmmm, actually I'm a bit surprised I didn't set "previousLineBrokeCleanly" in the preformatted text case.  
That is probably worth investigating.  The variable name certainly sounds like something you'd expect to 
be set in the preformatted text case as well.
Comment 17 Dave Hyatt 2005-11-08 15:17:06 PST
Comment on attachment 4616 [details]
Revised patch

r=me, i encourage you to test with all five white-space values and with br and
\n (so 10 total combinations).
Comment 18 Eric Seidel (no email) 2005-11-27 13:28:12 PST
*** Bug 5840 has been marked as a duplicate of this bug. ***