Bug 52128 - ISO-8859-8 Hebrew text displayed reversed with dir="rtl"
Summary: ISO-8859-8 Hebrew text displayed reversed with dir="rtl"
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL: http://berry.co.il/disco/chalordeluna...
Keywords:
: 42048 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-01-09 01:52 PST by Jeremy Moskovich
Modified: 2011-02-14 12:22 PST (History)
9 users (show)

See Also:


Attachments
Testcase by Ilya Konstantionov (141 bytes, text/html; charset=ISO-8859-8)
2011-01-09 02:20 PST, Jeremy Moskovich
no flags Details
patch w/ layout test (11.82 KB, patch)
2011-01-12 16:33 PST, Xiaomei Ji
mitz: review-
Details | Formatted Diff | Diff
patch w/ layout test (10.04 KB, patch)
2011-02-01 12:05 PST, Xiaomei Ji
mitz: review-
Details | Formatted Diff | Diff
patch w/ layout test (15.01 KB, patch)
2011-02-08 18:15 PST, Xiaomei Ji
mitz: review-
Details | Formatted Diff | Diff
patch w/ layout test (14.93 KB, patch)
2011-02-10 11:41 PST, Xiaomei Ji
mitz: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Moskovich 2011-01-09 01:52:38 PST
Chromium bug: http://crbug.com/60403

This page declares it's encoding as ISO-8859-8 using a meta tag.  Content is in Logical order.

Safari/Chrome incorrectly display the text in Visual order.
IE/FF/Opera display the text correctly in Logical order.
Comment 1 Jeremy Moskovich 2011-01-09 02:19:36 PST
*** Bug 42048 has been marked as a duplicate of this bug. ***
Comment 2 Jeremy Moskovich 2011-01-09 02:20:44 PST
Created attachment 78345 [details]
Testcase by Ilya Konstantionov
Comment 3 Jeremy Moskovich 2011-01-10 01:33:23 PST
ap: This bug is about interpreting Visual Hebrew as Logical Hebrew - i.e. purely encoding related, so if my understanding is correct then dir=rtl doesn't seem relevant.

FYI: Mozilla has code to detect logical/visual Hebrew: 
http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/src/base/nsHebrewProber.cpp

Wikipedia says that the ISO-8859-8 encoding can represent either logical or visual Hebrew so my guess is that we need to add similar logic in WebKit.

While this doesn't affect many modern sites, there is a long tail of legacy sites that are affecte. It appears that WebKit is the only major browsing engine that doesn't display these correctly.
Comment 4 Jeremy Moskovich 2011-01-10 01:34:08 PST
ap: Whoops make that "interpreting logical Hebrew as visual Hebrew"
Comment 5 Alexey Proskuryakov 2011-01-10 02:03:45 PST
We match Firefox if there's no dir="rtl". I'm no expert, but it seems relevant to me.
Comment 6 Opher Shachar 2011-01-10 12:05:44 PST
jm: I, humbly, refer you to my comment in my bug report https://bugs.webkit.org/show_bug.cgi?id=42048#c0
My before last sentence reads:
     When an element has 'dir=rtl' attribute IE and FF treat the "primary display 
     direction" as being Right-To-Left. Chrome does not.
Comment 7 Opher Shachar 2011-01-10 12:31:09 PST
jm: The Wikipedia article is wrong in making that point.
ISO-8859-8 in HTML has *always* and *still* refer exclusively to "visual order".

It's true that in XML it would, probably, be treated as visual ordering. But, I have yet to see an XML document encoded 'iso-8859-8' outside of textbook examples.
Comment 8 Xiaomei Ji 2011-01-11 15:47:48 PST
Following change fixes the problem.

Index: RenderBlockLineLayout.cpp
===================================================================
--- RenderBlockLineLayout.cpp   (revision 75315)
+++ RenderBlockLineLayout.cpp   (working copy)
@@ -269,7 +269,7 @@
         parentBox->addToLine(box);

         bool visuallyOrdered = r->m_object->style()->visuallyOrdered();
-        box->setBidiLevel(visuallyOrdered ? 0 : r->level());
+        box->setBidiLevel(visuallyOrdered ? (style()->isLeftToRightDirection()
? 0 : 1) : r->level());

         if (box->isInlineTextBox()) {
             InlineTextBox* text = static_cast<InlineTextBox*>(box);




One question is how to handle the following case:
<div>שנב <span dir=rtl>גקכ</span></div>

FF and IE give different result on displaying גקכ. FF displays it LTR, while IE displays it RTL. 

RFC1556 says:
Visual

   Visual directionality is a presentation method that displays text
   according to the primary display direction only, which is left to
   right.  All text is viewed in the same direction which is the primary
   display direction.  The displaying application is not aware of the
   contents direction and displays the text as if it were a uni-
   directional text.  The composing application needs to prepare the
   text in such a way that it will be displayed correctly.  No control
   characters or algorithms are used to determine how the data is to be
   displayed. This is the simplest of all methods and the default method
   for use with MIME encoded texts.


Looks like using the block's directionality as primary display directionality is more reasonable. Above fix matches the FF behavior.
Comment 9 Jeremy Moskovich 2011-01-12 04:24:29 PST
Another URL that's actually logical: https://service.isracard.co.il/templates/isracardEntry/contactus.jsp
Comment 10 Xiaomei Ji 2011-01-12 16:33:04 PST
Created attachment 78755 [details]
patch w/ layout test
Comment 11 mitz 2011-02-01 10:35:14 PST
Comment on attachment 78755 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=78755&action=review

> Source/WebCore/ChangeLog:15
> +        Fix rendering of ISO-8859-8 when dir="rtl".
> +        https://bugs.webkit.org/show_bug.cgi?id=52128
> +
> +        Test: fast/dom/iso-8859-8.html
> +
> +        * platform/text/BidiResolver.h:
> +        (WebCore::::createBidiRunsForLine):
> +        * rendering/RenderBlockLineLayout.cpp:
> +        (WebCore::RenderBlock::constructLine):
> +        (WebCore::RenderBlock::layoutInlineChildren):
> +

This should explain what is changing and why. I can’t even review the patch without knowing what it’s trying to do.

> Source/WebCore/platform/text/BidiResolver.h:965
> +    } else if (direction == RTL) {
> +        reverseRuns(0, runCount() - 1);
>      }

There shouldn’t be braces around a single-line block.

> LayoutTests/ChangeLog:9
> +        * fast/dom/iso-8859-8.html: Added.

This is not a DOM change. Please don’t put the test in fast/dom.

> LayoutTests/fast/dom/iso-8859-8.html:55
> +    function checkRenderingResult(sel, node, offset, direction, element, index) {
> +        for (var i = 0; i < element.innerText.length; ++i) {
> +            sel.setBaseAndExtent(node, offset, node, offset);
> +            if (direction == "ltr") {
> +                sel.modify("move", "right", "character");
> +                sel.modify("extend", "left", "character");
> +            } else {
> +                sel.modify("move", "left", "character");
> +                sel.modify("extend", "right", "character");
> +            }
> +            if (direction == "ltr") {
> +                assertEqual("Test " + index + " " + direction + " the " + i + "th character from left", sel.toString(), element.innerText[i]);
> +            } else {
> +                assertEqual("Test " + index + " " + direction + " the " + i + "th character from right", sel.toString(), element.innerText[i]);
> +            }
> +            node = sel.baseNode;
> +            offset = sel.baseOffset;
> +        }
> +    }

Using the non-trivial logic for modifying selection is a very indirect way to test this change. The best test for this would be a render tree dump. Those include the directionality (and override state) of individual text runs, which should suffice. If you can’t do that, then you can test directly for the visual position of characters using Range.getClientRects().
Comment 12 Xiaomei Ji 2011-02-01 12:05:07 PST
Created attachment 80800 [details]
patch w/ layout test

(In reply to comment #11)
> (From update of attachment 78755 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=78755&action=review
> 
> > Source/WebCore/ChangeLog:15
> > +        Fix rendering of ISO-8859-8 when dir="rtl".
> > +        https://bugs.webkit.org/show_bug.cgi?id=52128
> > +
> > +        Test: fast/dom/iso-8859-8.html
> > +
> > +        * platform/text/BidiResolver.h:
> > +        (WebCore::::createBidiRunsForLine):
> > +        * rendering/RenderBlockLineLayout.cpp:
> > +        (WebCore::RenderBlock::constructLine):
> > +        (WebCore::RenderBlock::layoutInlineChildren):
> > +
> 
> This should explain what is changing and why. I can’t even review the patch without knowing what it’s trying to do.

Done.


> 
> > Source/WebCore/platform/text/BidiResolver.h:965
> > +    } else if (direction == RTL) {
> > +        reverseRuns(0, runCount() - 1);
> >      }
> 
> There shouldn’t be braces around a single-line block.

Done.
Thought it needs to be braces due to the braces in "if" part.

> 
> > LayoutTests/ChangeLog:9
> > +        * fast/dom/iso-8859-8.html: Added.
> 
> This is not a DOM change. Please don’t put the test in fast/dom.

Moved to fast/text/international/

> 
> > LayoutTests/fast/dom/iso-8859-8.html:55
> > +    function checkRenderingResult(sel, node, offset, direction, element, index) {
> > +        for (var i = 0; i < element.innerText.length; ++i) {
> > +            sel.setBaseAndExtent(node, offset, node, offset);
> > +            if (direction == "ltr") {
> > +                sel.modify("move", "right", "character");
> > +                sel.modify("extend", "left", "character");
> > +            } else {
> > +                sel.modify("move", "left", "character");
> > +                sel.modify("extend", "right", "character");
> > +            }
> > +            if (direction == "ltr") {
> > +                assertEqual("Test " + index + " " + direction + " the " + i + "th character from left", sel.toString(), element.innerText[i]);
> > +            } else {
> > +                assertEqual("Test " + index + " " + direction + " the " + i + "th character from right", sel.toString(), element.innerText[i]);
> > +            }
> > +            node = sel.baseNode;
> > +            offset = sel.baseOffset;
> > +        }
> > +    }
> 
> Using the non-trivial logic for modifying selection is a very indirect way to test this change. The best test for this would be a render tree dump. Those include the directionality (and override state) of individual text runs, which should suffice. If you can’t do that, then you can test directly for the visual position of characters using Range.getClientRects().

You're right. Changed to use dumpRenderTree. I was thinking to make the expected result cross-platform.
Comment 13 Xiaomei Ji 2011-02-01 12:06:57 PST
Comment on attachment 80800 [details]
patch w/ layout test

Thanks for the review!
Comment 14 mitz 2011-02-01 15:07:59 PST
Comment on attachment 80800 [details]
patch w/ layout test

This is not a very elegant way to achieve this behavior (whose benefit is questionable anyway), and it’s unfortunate that BidiResolver is even used (and goes through all the work of resolving bidi levels) for lines that override bidi reordering. It’s especially ugly that this comes in the form of an extra parameter which you can’t even tell only affects the visualOrder == true case. At the very least, I think you should instead of adding a parameter, change the existing parameter to an enum type such as DirectionOverride { NoOverride, LeftToRightOverride, RightToLeftOverride }.
Comment 15 mitz 2011-02-01 15:11:26 PST
(In reply to comment #12)

> > Using the non-trivial logic for modifying selection is a very indirect way to test this change. The best test for this would be a render tree dump. Those include the directionality (and override state) of individual text runs, which should suffice. If you can’t do that, then you can test directly for the visual position of characters using Range.getClientRects().
> 
> You're right. Changed to use dumpRenderTree. I was thinking to make the expected result cross-platform.

For a cross-platform test, you can use getClientRects() and compare coordinates to determine whether one range (which can be a single character) is to the left or to the right of another.
Comment 16 Xiaomei Ji 2011-02-08 18:15:09 PST
Created attachment 81727 [details]
patch w/ layout test

Thanks for the review!
Updated patch per feedback.

The 2nd part of the diff (about reordering runs) inside void BidiResolver<Iterator, Run>::createBidiRunsForLine() is format change (decrease indent) due to the removal of condition "if (!visualOrder)".

I tested the bidi resolver part for lines with visual ordering using
https://service.isracard.co.il/templates/isracardEntry/contactus.jsp and http://berry.co.il/disco/chalordeluna/playme.php
they works correctly.

Dan, Jeremy, if you have more real webpage that uses iso-8859-8, please pass them to me for test.
Comment 17 mitz 2011-02-09 10:33:19 PST
Comment on attachment 81727 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=81727&action=review

Very nice, but needs some improvement.

> Source/WebCore/ChangeLog:11
> +        (WebCore::::createBidiRunsForLine): For lines with visually direction override, create bidi runs without resolving bidi levels (one run per render object), set bidi level as 0 or 1 depends on LTR or RTL override, and reverse run s for RTL override.

typo: “run s”

> Source/WebCore/platform/text/BidiResolver.h:133
> +enum VisuallyDirectionOverride {
> +    NoVisuallyOverride,
> +    VisuallyLeftToRightOverride,
> +    VisuallyRightToLeftOverride
> +};

I don’t think the adverb “visually” is the right part of speech here. You could use “visual” but really “DirectionOverride” is best.

> Source/WebCore/platform/text/BidiResolver.h:531
> +        while (true) {

What if current == end or current.atEnd() when you enter this loop? Won’t you miss it? I.e. is there a reason why this isn’t a while(current != end && !current.atEnd()) loop?

> Source/WebCore/platform/text/BidiResolver.h:535
> +            eor = current;

Is it necessary to change eor inside the loop? Does increment() depend on it?

> Source/WebCore/rendering/RenderBlockLineLayout.cpp:680
> +                enum VisuallyDirectionOverride override = (style()->visuallyOrdered() ? (style()->direction() == LTR ? VisuallyLeftToRightOverride : VisuallyRightToLeftOverride) : NoVisuallyOverride);

You shouldn’t need “enum” here.
Comment 18 Xiaomei Ji 2011-02-09 18:01:35 PST
Thanks for the quick review!
Please see my comments inline.


(In reply to comment #17)
> (From update of attachment 81727 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=81727&action=review
> 
> > Source/WebCore/platform/text/BidiResolver.h:133
> > +enum VisuallyDirectionOverride {
> > +    NoVisuallyOverride,
> > +    VisuallyLeftToRightOverride,
> > +    VisuallyRightToLeftOverride
> > +};
> 
> I don’t think the adverb “visually” is the right part of speech here. You could use “visual” but really “DirectionOverride” is best.


LeftToRightOverride and RightToLeftOverride are defined in WTF::Unicode.
If I reuse the names here, I need to prepend every usage of the existing names with WTF::Unicode, which I think is not neat and not consistent. That is why "Visually" is prepended here.

> 
> > Source/WebCore/platform/text/BidiResolver.h:531
> > +        while (true) {
> 
> What if current == end or current.atEnd() when you enter this loop? Won’t you miss it? I.e. is there a reason why this isn’t a while(current != end && !current.atEnd()) loop?

You are right.
It should be:
        eor = Iterator();
        while(current != end && !current.atEnd()) {
            eor = current;
            increment();
        }


> 
> > Source/WebCore/platform/text/BidiResolver.h:535
> > +            eor = current;
> 
> Is it necessary to change eor inside the loop? Does increment() depend on it?

increment() does not depend or update it.
But eor need to be assigned inside the loop to be the inclusive end of the line.
On exit of the loop, current will be the exclusive end of the line (similar as string.length() vs. string.length() -1).
Comment 19 Xiaomei Ji 2011-02-10 11:41:01 PST
Created attachment 82015 [details]
patch w/ layout test
Comment 20 mitz 2011-02-13 13:10:13 PST
Comment on attachment 82015 [details]
patch w/ layout test

View in context: https://bugs.webkit.org/attachment.cgi?id=82015&action=review

> Source/WebCore/ChangeLog:11
> +        (WebCore::::createBidiRunsForLine): For lines with visually direction override, create bidi runs without resolving bidi levels (one run per render object), set bidi level as 0 or 1 depends on LTR or RTL override, and reverse runs for RTL override.

You may want to use shorter lines in the change log (~100 characters seems to be the norm). Also change “visually” to “visual” and “depends” to “depending”.

> LayoutTests/fast/text/international/iso-8859-8.html:4
> +<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-8"/>

The / before the > is not necessary. And in HTML5 this can be expressed simply as <meta charset="ISO-8859-8">.

> LayoutTests/fast/text/international/iso-8859-8.html:13
> +<div contenteditable class="test">ùðá</div>
> +<div contenteditable class="test">ùðá â÷ë </div>
> +<div contenteditable class="test">ùðá abc â÷ë</div>
> +<div contenteditable class="test">abc ùðá def</div>
> +<div contenteditable class="test">ùðá <span dir=ltr>â÷ë</span></div>
> +<div contenteditable class="test">ùðá <span dir=rtl>â÷ë</span></div>

Do these really need to be contenteditable?

> LayoutTests/fast/text/international/iso-8859-8.html:24
> +    function log(str)
> +    {
> +        var li = document.createElement("li");
> +        li.appendChild(document.createTextNode(str));
> +        var console = document.getElementById("console");
> +        console.appendChild(li);
> +    }

Since you’re already including js-test-pre.js, why not use testPassed() and testFailed()?
Comment 21 Xiaomei Ji 2011-02-14 11:21:41 PST
Committed r78491: <http://trac.webkit.org/changeset/78491>
Comment 22 Xiaomei Ji 2011-02-14 11:23:15 PST
updated patch per Dan's comment in #20 and committed.
Comment 23 WebKit Review Bot 2011-02-14 12:22:33 PST
http://trac.webkit.org/changeset/78491 might have broken GTK Linux 32-bit Release and GTK Linux 64-bit Debug