Bug 53203 - CSS 2.1 failure: bidi-breaking-002.htm test fails (PARAGRAPH SEPARATOR does not break line)
Summary: CSS 2.1 failure: bidi-breaking-002.htm test fails (PARAGRAPH SEPARATOR does n...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Levi Weintraub
URL: https://trac.webkit.org/wiki/CSS21Res...
Keywords:
Depends on:
Blocks: 47141 50910
  Show dependency treegraph
 
Reported: 2011-01-26 16:47 PST by Levi Weintraub
Modified: 2013-10-02 12:27 PDT (History)
14 users (show)

See Also:


Attachments
Test Case (bidi-breaking-002.htm) (3.29 KB, text/html)
2011-01-26 16:47 PST, Levi Weintraub
no flags Details
Make lines break on LINE and PARAGRAPH SEPARATOR characters (3.12 KB, patch)
2011-01-27 16:20 PST, Levi Weintraub
no flags Details | Formatted Diff | Diff
Patch (50.03 KB, patch)
2011-01-31 13:57 PST, Levi Weintraub
no flags Details | Formatted Diff | Diff
Patch (51.55 KB, patch)
2011-02-01 14:41 PST, Levi Weintraub
no flags Details | Formatted Diff | Diff
Patch (52.83 KB, patch)
2011-05-02 18:12 PDT, Levi Weintraub
no flags Details | Formatted Diff | Diff
Patch (52.78 KB, patch)
2011-06-21 14:48 PDT, Levi Weintraub
eric: review-
Details | Formatted Diff | Diff
the W3C test case that currently tests for this (http://test.csswg.org/source/approved/css2.1/src/bidi-text/bidi-breaking-003.xht) (80 bytes, text/plain)
2011-08-03 05:41 PDT, Aharon (Vladimir) Lanin
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Levi Weintraub 2011-01-26 16:47:03 PST
Created attachment 80264 [details]
Test Case (bidi-breaking-002.htm)

html4/bidi-breaking-002 fails
Comment 1 Levi Weintraub 2011-01-27 16:20:24 PST
Created attachment 80374 [details]
Make lines break on LINE and PARAGRAPH SEPARATOR characters

This adds the LINE and PARAGRAPH SEPARATOR characters as valid line break characters for the RenderBlock::findNextLineBreak algorithm.

There is still a bug that the line is being measured as if the line break isn't happening. I'd love a hint about where to look to find how this is happening!
Comment 2 Levi Weintraub 2011-01-27 17:09:29 PST
(In reply to comment #1)
> Created an attachment (id=80374) [details]
> Make lines break on LINE and PARAGRAPH SEPARATOR characters
> 
> This adds the LINE and PARAGRAPH SEPARATOR characters as valid line break characters for the RenderBlock::findNextLineBreak algorithm.
> 
> There is still a bug that the line is being measured as if the line break isn't happening. I'd love a hint about where to look to find how this is happening!

Got help finding it from Evan Martin. Spinning a patch now.
Comment 3 Levi Weintraub 2011-01-31 13:57:55 PST
Created attachment 80678 [details]
Patch
Comment 4 Levi Weintraub 2011-01-31 13:59:33 PST
(In reply to comment #3)
> Created an attachment (id=80678) [details]
> Patch

This renders properly and doesn't break any known layout tests, but we admittedly don't have real coverage for line and paragraph separators.
Comment 5 Darin Adler 2011-01-31 15:39:33 PST
Comment on attachment 80678 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review

> Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711
> +                bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen)));

I think this is going to be a huge slowdown. The category function is expensive.
Comment 6 Levi Weintraub 2011-01-31 15:43:25 PST
(In reply to comment #5)
> (From update of attachment 80678 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review
> 
> > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711
> > +                bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen)));
> 
> I think this is going to be a huge slowdown. The category function is expensive.

Thanks for taking a look!

I worried that could be an issue... Any advice on an optimization? I want to keep things clean and I'm afraid that Unicode category stuff is outside of my expertise.
Comment 7 Levi Weintraub 2011-01-31 16:40:40 PST
(In reply to comment #6)
> (In reply to comment #5)
> > (From update of attachment 80678 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review
> > 
> > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711
> > > +                bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen)));
> > 
> > I think this is going to be a huge slowdown. The category function is expensive.
> 
> Thanks for taking a look!
> 
> I worried that could be an issue... Any advice on an optimization? I want to keep things clean and I'm afraid that Unicode category stuff is outside of my expertise.

I also realized that the constant re-use of category(c) & (Line_Separator & Paragraph_Separator) should be replaced with a function. Now I need an optimization to make that check efficient :)
Comment 8 WebKit Review Bot 2011-01-31 16:52:53 PST
Attachment 80678 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7687037
Comment 9 Levi Weintraub 2011-01-31 17:09:39 PST
Comment on attachment 80678 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review

> Source/WebCore/rendering/RenderBlockLineLayout.cpp:1796
> +                    if (c == '\n' && preserveNewline || category(c) & (Separator_Line | Separator_Paragraph)) {

This needs parenthesis. Also removing r? while I figure out how to make this fast enough to be usable.
Comment 10 Early Warning System Bot 2011-01-31 17:41:30 PST
Attachment 80678 [details] did not build on qt:
Build output: http://queues.webkit.org/results/7682487
Comment 11 Levi Weintraub 2011-02-01 14:41:15 PST
Created attachment 80831 [details]
Patch
Comment 12 Levi Weintraub 2011-02-01 14:45:30 PST
(In reply to comment #11)
> Created an attachment (id=80831) [details]
> Patch

I'm now doing a direct comparison with the UChar values for Line and Paragraph Separator. Assuming this is acceptable, this should resolve the performance issue present before.
Comment 13 Levi Weintraub 2011-02-08 15:45:38 PST
(In reply to comment #11)
> Created an attachment (id=80831) [details]
> Patch

Can someone take a look at this and see if it's acceptable? 

Also, if my method for checking against Paragraph and Line Separators is correct, I'd love to stick that re-occurring code into a function, but I'm not sure where the best place for it is.
Comment 14 Levi Weintraub 2011-02-22 17:47:33 PST
Anyone? Review please?
Comment 15 Eric Seidel (no email) 2011-02-24 03:00:08 PST
Mitz or Ap are your best bet here.
Comment 16 Levi Weintraub 2011-05-02 18:12:12 PDT
Created attachment 92025 [details]
Patch
Comment 17 Levi Weintraub 2011-05-02 18:15:22 PDT
(In reply to comment #16)
> Created an attachment (id=92025) [details]
> Patch

Updating since the original patch grew stale. Anyone willing to take a look before it happens again? You know you want WebKit to support Unicode Line and Paragraph Separators!
Comment 18 WebKit Review Bot 2011-05-02 20:47:13 PDT
Attachment 92025 [details] did not pass chromium-ews:
Output: http://queues.webkit.org/results/8530853
New failing tests:
fast/css/bidi-breaking-002.html
Comment 19 Adam Barth 2011-05-02 22:44:32 PDT
(In reply to comment #18)
> Attachment 92025 [details] did not pass chromium-ews:
> Output: http://queues.webkit.org/results/8530853
> New failing tests:
> fast/css/bidi-breaking-002.html

Looks like this patch will cause this test to fail on Chromium Linux.
Comment 20 Levi Weintraub 2011-05-04 14:08:10 PDT
(In reply to comment #19)
> (In reply to comment #18)
> > Attachment 92025 [details] [details] did not pass chromium-ews:
> > Output: http://queues.webkit.org/results/8530853
> > New failing tests:
> > fast/css/bidi-breaking-002.html
> 
> Looks like this patch will cause this test to fail on Chromium Linux.

Are the results available? This is the new test I'm adding with the patch :p
Comment 21 Levi Weintraub 2011-06-10 15:58:03 PDT
The disinterest in fixing this makes me sad :(
Comment 22 Eric Seidel (no email) 2011-06-10 17:53:14 PDT
You and I will make a date to go through this in person next week.  Ok?
Comment 23 Levi Weintraub 2011-06-10 18:19:48 PDT
(In reply to comment #22)
> You and I will make a date to go through this in person next week.  Ok?

Deal, thanks! :)
Comment 24 Levi Weintraub 2011-06-21 14:48:03 PDT
Created attachment 98059 [details]
Patch
Comment 25 Eric Seidel (no email) 2011-06-21 15:05:08 PDT
Comment on attachment 98059 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98059&action=review

> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2117
>                  bool betweenWords = c == '\n' || (currWS != PRE && !atStart && isBreakable(lineBreakIteratorInfo.second, current.m_pos, current.m_nextBreakablePosition, breakNBSP)
> -                    && (style->hyphens() != HyphensNone || (current.previousInSameNode() != softHyphen)));
> +                    && (style->hyphens() != HyphensNone || (current.previousInSameNode() != softHyphen))) || isUnicodeLineBreak(c);

So this would be better if the \n check was rolled into the unicode one.

> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2198
> +                    if ((c == '\n' && preserveNewline) || isUnicodeLineBreak(c)) {

We need a comment here as to why unicode linebreaks ignore preserveNewline.

> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2206
> -                        lineInfo.setPreviousLineBrokeCleanly(true);
> +                        lineInfo.setPreviousLineBrokeCleanly(c != WTF::Unicode::lineSeparator);

This needs a comment as well. :)

If this only affects the bidi algorithm, we should have a better name for this.  Like bidiAlgorithResetsOnNextLine... (not really that much better, but I'm sure we can come up with something better!)

> Source/WebCore/rendering/RenderText.cpp:706
> +            while (i + linelen < len && text[i + linelen] != '\n' && !(isUnicodeLineBreak(text[i + linelen])))

Good use for a function which includes both \n and the unicode ones.

> Source/WebCore/rendering/RenderText.cpp:821
> +        } else if (isUnicodeLineBreak(c)) {
> +            m_hasBreak = true;
> +            isNewline = true;
> +            isSpace = false;

Seems OK.  Should we be rolling this into the \n check above?

> Source/WebCore/rendering/RenderText.cpp:852
> +        while (c != '\n' && !isSpaceAccordingToStyle(c, style()) && c != '\t' && c != softHyphen && !(isUnicodeLineBreak(c))) {

This should again use an inline whihc checks all of \n and the unicode newlines.  Checking \n first of course. :)

> Source/WebCore/rendering/break_lines.h:44
> +inline bool isUnicodeLineBreak(UChar c)
> +{
> +    return c == WTF::Unicode::lineSeparator || c == WTF::Unicode::paragraphSeparator;
> +}
> +

I believe we should have a isLineBreak inline which includes \n and possibly \r.  And then use that everywhere instead of calling this unicode-only one in addition to the explicit \n checks.

When you add a second function here, we may need to add comments about when to use each of the two newline-checking-functions.
Comment 26 Aharon (Vladimir) Lanin 2011-07-18 14:14:10 PDT
This bug should block #50910 (Master bug for HTML5 Bidi).
Comment 27 Aharon (Vladimir) Lanin 2011-08-03 05:41:04 PDT
Created attachment 102771 [details]
the W3C test case that currently tests for this (http://test.csswg.org/source/approved/css2.1/src/bidi-text/bidi-breaking-003.xht)
Comment 28 Aharon (Vladimir) Lanin 2011-08-03 05:42:54 PDT
(In reply to comment #26)
> This bug should block #50910 (Master bug for HTML5 Bidi).

Levi, could you please mark this bug as blocking 50910.
Comment 29 Levi Weintraub 2011-08-03 09:48:30 PDT
Of course :)
Comment 30 Randy Hudson 2013-04-24 09:42:08 PDT
The behavior requested by this bug is incorrect.  This character is used in BiDi text.  The purpose of the paragraph separator is to break apart unrelated segments of text within the SAME line/paragraph.  For example:

"A description of something that happened – 12 hours ago"

When the "description" ends with RTL content, in order to make " – 12 hours ago" always appear at the trailing end of the line, a paragraph separator can be inserted in front of it.  The only effect is should have is on bidi layout.

The following should all render the same way ("WERBEH" is used to mean Hebrew, spelled using the Hebrew alphabet):

A description ending in WERBEH &#x2029;- 12 hours ago
&#x202a;A description ending in WERBEH&#x202c; - 12 hours ago
<span style="unicode-bidi:isolate;direction=ltr;">A description ending in WERBEH </span>- 12 hours ago
<bdi>A description ending in WERBEH </bdi>- 12 hours ago

(202a is LRE, 202c is PDF)

This character does not belong to the "line separator" category of the unicode specification.
Comment 31 Aharon (Vladimir) Lanin 2013-04-30 05:56:28 PDT
(In reply to comment #30)
> The purpose of the paragraph separator is to break apart unrelated segments
> of text within the SAME line
> [...]
> The only effect is should have is on bidi layout
> [as opposed to causing a new line to be started -
> if I understood you correctly]

Can you give any reference for this?

I believe this statement is incorrect. UTR#20 (http://www.unicode.org/reports/tr20/#Line) says that PARAGRAPH SEPARATOR is supposed to be used as an unambiguous replacement for newline (which implies it should indeed break a line), and that in HTML it should be replaced with the use of <xhtml:p> and </xhtml:p> (which, of course, will cause line breaks by default).

(I obviously agree that in HTML, mark-up should be used to denote paragraphs. The test is about what should happen when a page nevertheless does use PARAGRAPH SEPARATOR, despite UTR#20 discouraging its use.)