RESOLVED FIXED Bug 52128
ISO-8859-8 Hebrew text displayed reversed with dir="rtl"
https://bugs.webkit.org/show_bug.cgi?id=52128
Summary ISO-8859-8 Hebrew text displayed reversed with dir="rtl"
Jeremy Moskovich
Reported 2011-01-09 01:52:38 PST
Chromium bug: http://crbug.com/60403 This page declares it's encoding as ISO-8859-8 using a meta tag. Content is in Logical order. Safari/Chrome incorrectly display the text in Visual order. IE/FF/Opera display the text correctly in Logical order.
Attachments
Testcase by Ilya Konstantionov (141 bytes, text/html; charset=ISO-8859-8)
2011-01-09 02:20 PST, Jeremy Moskovich
no flags
patch w/ layout test (11.82 KB, patch)
2011-01-12 16:33 PST, Xiaomei Ji
mitz: review-
patch w/ layout test (10.04 KB, patch)
2011-02-01 12:05 PST, Xiaomei Ji
mitz: review-
patch w/ layout test (15.01 KB, patch)
2011-02-08 18:15 PST, Xiaomei Ji
mitz: review-
patch w/ layout test (14.93 KB, patch)
2011-02-10 11:41 PST, Xiaomei Ji
mitz: review+
Jeremy Moskovich
Comment 1 2011-01-09 02:19:36 PST
*** Bug 42048 has been marked as a duplicate of this bug. ***
Jeremy Moskovich
Comment 2 2011-01-09 02:20:44 PST
Created attachment 78345 [details] Testcase by Ilya Konstantionov
Jeremy Moskovich
Comment 3 2011-01-10 01:33:23 PST
ap: This bug is about interpreting Visual Hebrew as Logical Hebrew - i.e. purely encoding related, so if my understanding is correct then dir=rtl doesn't seem relevant. FYI: Mozilla has code to detect logical/visual Hebrew: http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/src/base/nsHebrewProber.cpp Wikipedia says that the ISO-8859-8 encoding can represent either logical or visual Hebrew so my guess is that we need to add similar logic in WebKit. While this doesn't affect many modern sites, there is a long tail of legacy sites that are affecte. It appears that WebKit is the only major browsing engine that doesn't display these correctly.
Jeremy Moskovich
Comment 4 2011-01-10 01:34:08 PST
ap: Whoops make that "interpreting logical Hebrew as visual Hebrew"
Alexey Proskuryakov
Comment 5 2011-01-10 02:03:45 PST
We match Firefox if there's no dir="rtl". I'm no expert, but it seems relevant to me.
Opher Shachar
Comment 6 2011-01-10 12:05:44 PST
jm: I, humbly, refer you to my comment in my bug report https://bugs.webkit.org/show_bug.cgi?id=42048#c0 My before last sentence reads: When an element has 'dir=rtl' attribute IE and FF treat the "primary display direction" as being Right-To-Left. Chrome does not.
Opher Shachar
Comment 7 2011-01-10 12:31:09 PST
jm: The Wikipedia article is wrong in making that point. ISO-8859-8 in HTML has *always* and *still* refer exclusively to "visual order". It's true that in XML it would, probably, be treated as visual ordering. But, I have yet to see an XML document encoded 'iso-8859-8' outside of textbook examples.
Xiaomei Ji
Comment 8 2011-01-11 15:47:48 PST
Following change fixes the problem. Index: RenderBlockLineLayout.cpp =================================================================== --- RenderBlockLineLayout.cpp (revision 75315) +++ RenderBlockLineLayout.cpp (working copy) @@ -269,7 +269,7 @@ parentBox->addToLine(box); bool visuallyOrdered = r->m_object->style()->visuallyOrdered(); - box->setBidiLevel(visuallyOrdered ? 0 : r->level()); + box->setBidiLevel(visuallyOrdered ? (style()->isLeftToRightDirection() ? 0 : 1) : r->level()); if (box->isInlineTextBox()) { InlineTextBox* text = static_cast<InlineTextBox*>(box); One question is how to handle the following case: <div>שנב <span dir=rtl>גקכ</span></div> FF and IE give different result on displaying גקכ. FF displays it LTR, while IE displays it RTL. RFC1556 says: Visual Visual directionality is a presentation method that displays text according to the primary display direction only, which is left to right. All text is viewed in the same direction which is the primary display direction. The displaying application is not aware of the contents direction and displays the text as if it were a uni- directional text. The composing application needs to prepare the text in such a way that it will be displayed correctly. No control characters or algorithms are used to determine how the data is to be displayed. This is the simplest of all methods and the default method for use with MIME encoded texts. Looks like using the block's directionality as primary display directionality is more reasonable. Above fix matches the FF behavior.
Jeremy Moskovich
Comment 9 2011-01-12 04:24:29 PST
Xiaomei Ji
Comment 10 2011-01-12 16:33:04 PST
Created attachment 78755 [details] patch w/ layout test
mitz
Comment 11 2011-02-01 10:35:14 PST
Comment on attachment 78755 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=78755&action=review > Source/WebCore/ChangeLog:15 > + Fix rendering of ISO-8859-8 when dir="rtl". > + https://bugs.webkit.org/show_bug.cgi?id=52128 > + > + Test: fast/dom/iso-8859-8.html > + > + * platform/text/BidiResolver.h: > + (WebCore::::createBidiRunsForLine): > + * rendering/RenderBlockLineLayout.cpp: > + (WebCore::RenderBlock::constructLine): > + (WebCore::RenderBlock::layoutInlineChildren): > + This should explain what is changing and why. I can’t even review the patch without knowing what it’s trying to do. > Source/WebCore/platform/text/BidiResolver.h:965 > + } else if (direction == RTL) { > + reverseRuns(0, runCount() - 1); > } There shouldn’t be braces around a single-line block. > LayoutTests/ChangeLog:9 > + * fast/dom/iso-8859-8.html: Added. This is not a DOM change. Please don’t put the test in fast/dom. > LayoutTests/fast/dom/iso-8859-8.html:55 > + function checkRenderingResult(sel, node, offset, direction, element, index) { > + for (var i = 0; i < element.innerText.length; ++i) { > + sel.setBaseAndExtent(node, offset, node, offset); > + if (direction == "ltr") { > + sel.modify("move", "right", "character"); > + sel.modify("extend", "left", "character"); > + } else { > + sel.modify("move", "left", "character"); > + sel.modify("extend", "right", "character"); > + } > + if (direction == "ltr") { > + assertEqual("Test " + index + " " + direction + " the " + i + "th character from left", sel.toString(), element.innerText[i]); > + } else { > + assertEqual("Test " + index + " " + direction + " the " + i + "th character from right", sel.toString(), element.innerText[i]); > + } > + node = sel.baseNode; > + offset = sel.baseOffset; > + } > + } Using the non-trivial logic for modifying selection is a very indirect way to test this change. The best test for this would be a render tree dump. Those include the directionality (and override state) of individual text runs, which should suffice. If you can’t do that, then you can test directly for the visual position of characters using Range.getClientRects().
Xiaomei Ji
Comment 12 2011-02-01 12:05:07 PST
Created attachment 80800 [details] patch w/ layout test (In reply to comment #11) > (From update of attachment 78755 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=78755&action=review > > > Source/WebCore/ChangeLog:15 > > + Fix rendering of ISO-8859-8 when dir="rtl". > > + https://bugs.webkit.org/show_bug.cgi?id=52128 > > + > > + Test: fast/dom/iso-8859-8.html > > + > > + * platform/text/BidiResolver.h: > > + (WebCore::::createBidiRunsForLine): > > + * rendering/RenderBlockLineLayout.cpp: > > + (WebCore::RenderBlock::constructLine): > > + (WebCore::RenderBlock::layoutInlineChildren): > > + > > This should explain what is changing and why. I can’t even review the patch without knowing what it’s trying to do. Done. > > > Source/WebCore/platform/text/BidiResolver.h:965 > > + } else if (direction == RTL) { > > + reverseRuns(0, runCount() - 1); > > } > > There shouldn’t be braces around a single-line block. Done. Thought it needs to be braces due to the braces in "if" part. > > > LayoutTests/ChangeLog:9 > > + * fast/dom/iso-8859-8.html: Added. > > This is not a DOM change. Please don’t put the test in fast/dom. Moved to fast/text/international/ > > > LayoutTests/fast/dom/iso-8859-8.html:55 > > + function checkRenderingResult(sel, node, offset, direction, element, index) { > > + for (var i = 0; i < element.innerText.length; ++i) { > > + sel.setBaseAndExtent(node, offset, node, offset); > > + if (direction == "ltr") { > > + sel.modify("move", "right", "character"); > > + sel.modify("extend", "left", "character"); > > + } else { > > + sel.modify("move", "left", "character"); > > + sel.modify("extend", "right", "character"); > > + } > > + if (direction == "ltr") { > > + assertEqual("Test " + index + " " + direction + " the " + i + "th character from left", sel.toString(), element.innerText[i]); > > + } else { > > + assertEqual("Test " + index + " " + direction + " the " + i + "th character from right", sel.toString(), element.innerText[i]); > > + } > > + node = sel.baseNode; > > + offset = sel.baseOffset; > > + } > > + } > > Using the non-trivial logic for modifying selection is a very indirect way to test this change. The best test for this would be a render tree dump. Those include the directionality (and override state) of individual text runs, which should suffice. If you can’t do that, then you can test directly for the visual position of characters using Range.getClientRects(). You're right. Changed to use dumpRenderTree. I was thinking to make the expected result cross-platform.
Xiaomei Ji
Comment 13 2011-02-01 12:06:57 PST
Comment on attachment 80800 [details] patch w/ layout test Thanks for the review!
mitz
Comment 14 2011-02-01 15:07:59 PST
Comment on attachment 80800 [details] patch w/ layout test This is not a very elegant way to achieve this behavior (whose benefit is questionable anyway), and it’s unfortunate that BidiResolver is even used (and goes through all the work of resolving bidi levels) for lines that override bidi reordering. It’s especially ugly that this comes in the form of an extra parameter which you can’t even tell only affects the visualOrder == true case. At the very least, I think you should instead of adding a parameter, change the existing parameter to an enum type such as DirectionOverride { NoOverride, LeftToRightOverride, RightToLeftOverride }.
mitz
Comment 15 2011-02-01 15:11:26 PST
(In reply to comment #12) > > Using the non-trivial logic for modifying selection is a very indirect way to test this change. The best test for this would be a render tree dump. Those include the directionality (and override state) of individual text runs, which should suffice. If you can’t do that, then you can test directly for the visual position of characters using Range.getClientRects(). > > You're right. Changed to use dumpRenderTree. I was thinking to make the expected result cross-platform. For a cross-platform test, you can use getClientRects() and compare coordinates to determine whether one range (which can be a single character) is to the left or to the right of another.
Xiaomei Ji
Comment 16 2011-02-08 18:15:09 PST
Created attachment 81727 [details] patch w/ layout test Thanks for the review! Updated patch per feedback. The 2nd part of the diff (about reordering runs) inside void BidiResolver<Iterator, Run>::createBidiRunsForLine() is format change (decrease indent) due to the removal of condition "if (!visualOrder)". I tested the bidi resolver part for lines with visual ordering using https://service.isracard.co.il/templates/isracardEntry/contactus.jsp and http://berry.co.il/disco/chalordeluna/playme.php they works correctly. Dan, Jeremy, if you have more real webpage that uses iso-8859-8, please pass them to me for test.
mitz
Comment 17 2011-02-09 10:33:19 PST
Comment on attachment 81727 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=81727&action=review Very nice, but needs some improvement. > Source/WebCore/ChangeLog:11 > + (WebCore::::createBidiRunsForLine): For lines with visually direction override, create bidi runs without resolving bidi levels (one run per render object), set bidi level as 0 or 1 depends on LTR or RTL override, and reverse run s for RTL override. typo: “run s” > Source/WebCore/platform/text/BidiResolver.h:133 > +enum VisuallyDirectionOverride { > + NoVisuallyOverride, > + VisuallyLeftToRightOverride, > + VisuallyRightToLeftOverride > +}; I don’t think the adverb “visually” is the right part of speech here. You could use “visual” but really “DirectionOverride” is best. > Source/WebCore/platform/text/BidiResolver.h:531 > + while (true) { What if current == end or current.atEnd() when you enter this loop? Won’t you miss it? I.e. is there a reason why this isn’t a while(current != end && !current.atEnd()) loop? > Source/WebCore/platform/text/BidiResolver.h:535 > + eor = current; Is it necessary to change eor inside the loop? Does increment() depend on it? > Source/WebCore/rendering/RenderBlockLineLayout.cpp:680 > + enum VisuallyDirectionOverride override = (style()->visuallyOrdered() ? (style()->direction() == LTR ? VisuallyLeftToRightOverride : VisuallyRightToLeftOverride) : NoVisuallyOverride); You shouldn’t need “enum” here.
Xiaomei Ji
Comment 18 2011-02-09 18:01:35 PST
Thanks for the quick review! Please see my comments inline. (In reply to comment #17) > (From update of attachment 81727 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=81727&action=review > > > Source/WebCore/platform/text/BidiResolver.h:133 > > +enum VisuallyDirectionOverride { > > + NoVisuallyOverride, > > + VisuallyLeftToRightOverride, > > + VisuallyRightToLeftOverride > > +}; > > I don’t think the adverb “visually” is the right part of speech here. You could use “visual” but really “DirectionOverride” is best. LeftToRightOverride and RightToLeftOverride are defined in WTF::Unicode. If I reuse the names here, I need to prepend every usage of the existing names with WTF::Unicode, which I think is not neat and not consistent. That is why "Visually" is prepended here. > > > Source/WebCore/platform/text/BidiResolver.h:531 > > + while (true) { > > What if current == end or current.atEnd() when you enter this loop? Won’t you miss it? I.e. is there a reason why this isn’t a while(current != end && !current.atEnd()) loop? You are right. It should be: eor = Iterator(); while(current != end && !current.atEnd()) { eor = current; increment(); } > > > Source/WebCore/platform/text/BidiResolver.h:535 > > + eor = current; > > Is it necessary to change eor inside the loop? Does increment() depend on it? increment() does not depend or update it. But eor need to be assigned inside the loop to be the inclusive end of the line. On exit of the loop, current will be the exclusive end of the line (similar as string.length() vs. string.length() -1).
Xiaomei Ji
Comment 19 2011-02-10 11:41:01 PST
Created attachment 82015 [details] patch w/ layout test
mitz
Comment 20 2011-02-13 13:10:13 PST
Comment on attachment 82015 [details] patch w/ layout test View in context: https://bugs.webkit.org/attachment.cgi?id=82015&action=review > Source/WebCore/ChangeLog:11 > + (WebCore::::createBidiRunsForLine): For lines with visually direction override, create bidi runs without resolving bidi levels (one run per render object), set bidi level as 0 or 1 depends on LTR or RTL override, and reverse runs for RTL override. You may want to use shorter lines in the change log (~100 characters seems to be the norm). Also change “visually” to “visual” and “depends” to “depending”. > LayoutTests/fast/text/international/iso-8859-8.html:4 > +<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-8"/> The / before the > is not necessary. And in HTML5 this can be expressed simply as <meta charset="ISO-8859-8">. > LayoutTests/fast/text/international/iso-8859-8.html:13 > +<div contenteditable class="test">ùðá</div> > +<div contenteditable class="test">ùðá â÷ë </div> > +<div contenteditable class="test">ùðá abc â÷ë</div> > +<div contenteditable class="test">abc ùðá def</div> > +<div contenteditable class="test">ùðá <span dir=ltr>â÷ë</span></div> > +<div contenteditable class="test">ùðá <span dir=rtl>â÷ë</span></div> Do these really need to be contenteditable? > LayoutTests/fast/text/international/iso-8859-8.html:24 > + function log(str) > + { > + var li = document.createElement("li"); > + li.appendChild(document.createTextNode(str)); > + var console = document.getElementById("console"); > + console.appendChild(li); > + } Since you’re already including js-test-pre.js, why not use testPassed() and testFailed()?
Xiaomei Ji
Comment 21 2011-02-14 11:21:41 PST
Xiaomei Ji
Comment 22 2011-02-14 11:23:15 PST
updated patch per Dan's comment in #20 and committed.
WebKit Review Bot
Comment 23 2011-02-14 12:22:33 PST
http://trac.webkit.org/changeset/78491 might have broken GTK Linux 32-bit Release and GTK Linux 64-bit Debug
Note You need to log in before you can comment on or make changes to this bug.