Summary: | RTL: Directionality always reset on hard line break | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Jeremy Moskovich <playmobil> | ||||||||||||||||||
Component: | Layout and Rendering | Assignee: | Nobody <webkit-unassigned> | ||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||
Severity: | Normal | CC: | abarth, aharon, ap, eric, hyatt, ian, leviw, mitz, progame+wk, rniwa, simon.fraser, webkit.review.bot, xji, zimmermann | ||||||||||||||||||
Priority: | P2 | Keywords: | GoogleBug | ||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||
Hardware: | Mac | ||||||||||||||||||||
OS: | OS X 10.5 | ||||||||||||||||||||
Bug Depends on: | |||||||||||||||||||||
Bug Blocks: | 50910 | ||||||||||||||||||||
Attachments: |
|
Description
Jeremy Moskovich
2009-01-05 13:47:18 PST
Created attachment 26439 [details]
Test case
This seems to be about hard line breaks. A reduction with fewer elements: data:text/html,<span dir="rtl">א.<br>א.</span> I suspect that this behavior is related to the special treatment of hard line breaks in BidiResolver::createBidiRunsForLine(): // A deviation from the Unicode Bidi Algorithm in order to match // Mac OS X text and WinIE: a hard line break resets bidi state. Perhaps the behavior of IE should be studied more carefully and the above should be refined. Maybe it should only affect embedding using Unicode control characters, and not embedding using CSS properties. this is not only a mac issue of course Created attachment 30956 [details]
Test Case
Tests case of both dir=rtl & unicode dir override chars.
Created attachment 30959 [details]
TestCase rendering in IE,WebKit,FF & Opera
IE: Only resets directionality in the case of Unicode override chars.
FF,Opera: Never reset directionality.
WebKit: Resets directionality in both cases.
If we want to match IE here, looks like the right thing to do is to change the current behavior to not reset directionality in the case of css specified directionality per mitz's comment.
Created attachment 30972 [details]
Only reset directionality if caused by Unicode Control Char
Comment on attachment 30972 [details]
Only reset directionality if caused by Unicode Control Char
WebKit uses CamelCase for variable names.
Mitz should review this for content though. You probably want to upload a new patch in the meanwhile so he doesn't have to think about the style issues.
Also:
\ No newline at end of file
Created attachment 31050 [details]
Only reset directionality if caused by Unicode Control Char
Fixed var naming + Newline.
Comment on attachment 31050 [details] Only reset directionality if caused by Unicode Control Char > + Direction current_context_direction = context()->dir(); WebKit variable naming style is to use camelCase. Is it correct to pop out of all explicit embedding levels, which can be a mixture of style-originated and control-character-originated, just because the topmost was started by a control character? I think the test should cover such cases. When creating test cases for this bug I came across the following case: ‮ א.<br> א.<br> <span dir="rtl"></span> ‬ WebKit display this as: .a .a While IE displays this as: .a a. Removing the span causes WebKit to match IE. Created attachment 31485 [details]
Added more thorough layout test
* layout test added to cover the case of top level unicode control char.
* fixed style issues.
* Added comment.
* Included layout test results.
Mitz: Are there any other tests you'd like me to add here? If so I'd be grateful if you could mention them explicitly.
This patch changes the output of a couple of layout tests but they all still appear to pass.
Rendering of the new "complex" layout test is identical to IE except for the strange behavior I highlighted in my previous comment (i.e. it looks different for reasons unrelated to this patch).
mitz: Could you take another look please? If you have any other concerns or there's anything else you'd like me to check please let me know. How should a case like this behave? <span dir="rtl">A ‮ B<span dir="ltr">C <br> D </span> E ‬ F</span> Namely, what should the context stack look like after the <br>, after the inner span is closed, and after the PDF? Comment on attachment 31485 [details]
Added more thorough layout test
I don’t think I can review this patch without knowing the expected behavior for nested cases like the ones I’ve mentioned before.
I haven't been able to work on this for a while, will get back to you with that information as soon as I can. Thanks! Adding GoogleBug keyword since this affects: http://www.google.co.il/m/search?q=%D7%AA%D7%9C-%D7%90%D7%91%D7%99%D7%91 The issues is with the way the URL's are displayed on the page. A few thoughts about this issue. - The most important test case for the effect of <br> on the Unicode BiDi Alogorithm is as follows: 1. Start with א. 2. ב is the next step. The period on the first line is should come out to the right of the aleph and the "2." on the second line should come out to the left of the bet. That is where Firefox and Opera go wrong. The second most important test case is that in <span dir=rtl>א.<br>א.</span>, the two lines should comeout the same (with the period to the left of the aleph). This is where WebKit is currently going wrong. The issue of the effect of <br> on embedding levels started via Unicode BiDi formatting characters like LRE is of secondary importance, since these really aren't supposed to be used in HTML documents except where mark-up can't be used. As far as I am concerned, the only legitimate reason for these characters to appear in HTML documents otherwise is as a result of including plain text in it. One HTML-escapes the plain text, of course, and translates newlines to <br>s - but what is one supposed to do with the LREs etc.? What usually happens is that the LREs are ignored, i.e. left in the HTML document as is. It is good in such cases for the <br>s (that used to be newlines) to close them, as they had done in the original plain-text context, where a newline is a paragraph break. - However <br> behaves in this whole matter, <hr> and embedded block elements (e.g. <div></div>) should behave the same. - The Unicode BiDi algorithm unfortunately does not deal with a good parrallel to <br>. The obvious candidate would, of course, be paragraph breaks, except that paragraph breaks are supposed to terminate all embedding levels, and it makes no sense for <br> to terminate the effect of the dir attributes of its ancestors. It is the lack of a good parallel that caused all the browsers to do such different things. - Given that the Unicode BiDi Algorithm gives no clearcut answer on what the effect of <br> should be, and that IE, Firefox, and WebKit currently all do different things, I do not think that WebKit needs to do exactly what IE or Firefox does (unless that's what makes sense). - IE's approach of digging into the stack and killing the embedding levels started with LRE, RLE, LRO,and RLO, but not those started by HTML elements within the scope of these characters and their matching PDFs gives me the willies. I would like to offer an alternative: close embedding levels up to, but not including the topmost one started by mark-up. That way, we don't have to do surgery on the stack, but we still close LREs started in included plain text. Here is a test case: <div dir=ltr> ==> ‫ =>> <span dir=ltr> ==> ‫ =>> ‪ ==> <br> ==> ‬ ==> ‬ ==> </span> =>> ‬ ==> </div> The result should be that all the ==> arrows point left, while all the =>> arrows point right. Aharon proposed to treat <br> as bidi paragraph separator to HTML5. Following is his proposal: http://www.w3.org/TR/html-bidi/#br-as-separator (Amendament: Although IE7 continued to treat <br> as a bidi separator, IE8 and and IE9, when in standards mode, treat <br> as bidi whitespace, as per the HTML4 spec.) And he filed the following bug to HTML5: http://www.w3.org/Bugs/Public/show_bug.cgi?id=10828 The proposal was adopted in HTML5 which defines <br> as a bidi paragraph separator. http://dev.w3.org/html5/spec/Overview.html#the-br-element Just as a heads-up, the spec might change again. (In reply to comment #3) > I suspect that this behavior is related to the special treatment of hard line breaks in BidiResolver::createBidiRunsForLine(): > > // A deviation from the Unicode Bidi Algorithm in order to match > // Mac OS X text and WinIE: a hard line break resets bidi state. For those following along at home, this comment can be found at: http://trac.webkit.org/browser/trunk/Source/WebCore/platform/text/BidiResolver.h#L560 Created attachment 87328 [details]
Patch
Comment on attachment 87328 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=87328&action=review I think this is good,b ut needs another round of cleanup. > Source/WebCore/platform/text/BidiContext.cpp:62 > +static inline PassRefPtr<BidiContext> clone(BidiContext* context, BidiContext* parent) I would separate this into two functions. One which is a copy() function on the BidiContext, and a second where you set the level by calcuating a new level in a local helper fuction. > Source/WebCore/platform/text/BidiContext.cpp:69 > + // Go to the least greater odd integer > + newLevel += 1; > + newLevel |= 1; Just put these into helper functions. nextGreaterOdd() > Source/WebCore/platform/text/BidiContext.cpp:78 > +// The BidiContext stack must be immutable -- they're re-used for repainting -- so They're re-used when we re-layout after dom modification/editing. > Source/WebCore/platform/text/BidiContext.cpp:80 > +PassRefPtr<BidiContext> BidiContext::clearUnicodeEmbeddingContexts() copyStackRemovingUnicodeEmbeddingContexts? > Source/WebCore/platform/text/BidiContext.cpp:82 > + Vector<BidiContext*> contexts; Either this should have inline capacity of 64 (to avoid ever mallocing) or you shoudl do this in two passes. One which makes a clone of the list w/o unicodeEmbeddings, and a second walk which fixes the levels. I think I'm slightly more in favor of doing two passes. But it's really up to you. A linked list walk is also expensive, but probably less expensive than a malloc. Although we can avoid the malloc with the inline capcity. so again, up to you. > Source/WebCore/platform/text/BidiContext.h:33 > +enum BidiEmbeddingClass { This seems confusing to me. Maybe BidiSource? ContextSource? Donno. But currently confusing. > Source/WebCore/platform/text/BidiContext.h:34 > + DOMEmbedding, could be FromDOM and FromUnicode? BidiEmbeddingSource? maybe? > Source/WebCore/platform/text/BidiContext.h:65 > + BidiEmbeddingClass m_embeddingClass : 1; There may be some concern about makign sure this is unsigned on MSVC. I dont' rmember the exact rules. but enums + bitfields + msvc = confusion. > Source/WebCore/platform/text/BidiResolver.h:80 > + BidiEmbedding(WTF::Unicode::Direction direction, BidiEmbeddingClass embeddingClass = UnicodeEmbedding) I'm not sure you want a default argumetn value here. > Source/WebCore/platform/text/BidiResolver.h:82 > + : m_direction(direction) > + , m_embeddingClass(embeddingClass) indent. > Source/WebCore/platform/text/BidiResolver.h:610 > + embed(BidiEmbedding(dirCurrent)); This can be implicitly constructed, no? I geuss it's better to be clear. > Source/WebCore/rendering/InlineIterator.h:161 > + resolver->embed(BidiEmbedding(d, DOMEmbedding)); I think it's cleaner to make embed just take two params, and ahve it construct the Embedding thing. Comment on attachment 87328 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=87328&action=review > Source/WebCore/platform/text/BidiContext.h:35 > + DOMEmbedding, > + UnicodeEmbedding I think this is confusing. DOMEmbedding sounds like it would only come from DOM (<span dir="...) attributes, but inline styles would affect this as well (<span style=".."). Please correct me if I'm wrong. Comment on attachment 87328 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=87328&action=review Thanks for the review. I'll correct your points :) >> Source/WebCore/platform/text/BidiContext.cpp:82 >> + Vector<BidiContext*> contexts; > > Either this should have inline capacity of 64 (to avoid ever mallocing) or you shoudl do this in two passes. One which makes a clone of the list w/o unicodeEmbeddings, and a second walk which fixes the levels. > > I think I'm slightly more in favor of doing two passes. But it's really up to you. A linked list walk is also expensive, but probably less expensive than a malloc. Although we can avoid the malloc with the inline capcity. so again, up to you. I'd love to do a 2-pass solution, but I don't believe we can do this properly, since each context derives its level from those below it, and we need a parallel data structure to walk from hte bottom to the top. Comment on attachment 87328 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=87328&action=review >> Source/WebCore/platform/text/BidiContext.h:35 >> + UnicodeEmbedding > > I think this is confusing. DOMEmbedding sounds like it would only come from DOM (<span dir="...) attributes, but inline styles would affect this as well (<span style=".."). > Please correct me if I'm wrong. You're right that style-based directions would also be classified here as originating from the "DOM." Perhaps FromUnicode and FromStyleOrDOM? You could jsut do FromStyle (or FromDOM). DOM just gets mapped into Style anyway. (In reply to comment #28) > You could jsut do FromStyle (or FromDOM). DOM just gets mapped into Style anyway. FromStyle / FromUnicode is fine. But I still find it confusing, I propose: ExternalEmbedding / IntrinsicEmbedding, where external is any DOM attribute or CSS style attribute, either defined on the element or any of its parents (when inheritance applies) and intrinsic denotes that the directional information is included within the text. Created attachment 87354 [details]
Patch
Comment on attachment 87354 [details]
Patch
This looks great to me. You might want to give mitz overnight to see if he has any thoughts.
Committed r82424: <http://trac.webkit.org/changeset/82424> http://trac.webkit.org/changeset/82424 might have broken GTK Linux 32-bit Debug |