Created attachment 80264 [details] Test Case (bidi-breaking-002.htm) html4/bidi-breaking-002 fails
Created attachment 80374 [details] Make lines break on LINE and PARAGRAPH SEPARATOR characters This adds the LINE and PARAGRAPH SEPARATOR characters as valid line break characters for the RenderBlock::findNextLineBreak algorithm. There is still a bug that the line is being measured as if the line break isn't happening. I'd love a hint about where to look to find how this is happening!
(In reply to comment #1) > Created an attachment (id=80374) [details] > Make lines break on LINE and PARAGRAPH SEPARATOR characters > > This adds the LINE and PARAGRAPH SEPARATOR characters as valid line break characters for the RenderBlock::findNextLineBreak algorithm. > > There is still a bug that the line is being measured as if the line break isn't happening. I'd love a hint about where to look to find how this is happening! Got help finding it from Evan Martin. Spinning a patch now.
Created attachment 80678 [details] Patch
(In reply to comment #3) > Created an attachment (id=80678) [details] > Patch This renders properly and doesn't break any known layout tests, but we admittedly don't have real coverage for line and paragraph separators.
Comment on attachment 80678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711 > + bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen))); I think this is going to be a huge slowdown. The category function is expensive.
(In reply to comment #5) > (From update of attachment 80678 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review > > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711 > > + bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen))); > > I think this is going to be a huge slowdown. The category function is expensive. Thanks for taking a look! I worried that could be an issue... Any advice on an optimization? I want to keep things clean and I'm afraid that Unicode category stuff is outside of my expertise.
(In reply to comment #6) > (In reply to comment #5) > > (From update of attachment 80678 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review > > > > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1711 > > > + bool betweenWords = c == '\n' || category(c) & (Separator_Line | Separator_Paragraph) || (currWS != PRE && !atStart && isBreakable(str, pos, strlen, nextBreakable, breakNBSP) && (style->hyphens() != HyphensNone || (pos && str[pos - 1] != softHyphen))); > > > > I think this is going to be a huge slowdown. The category function is expensive. > > Thanks for taking a look! > > I worried that could be an issue... Any advice on an optimization? I want to keep things clean and I'm afraid that Unicode category stuff is outside of my expertise. I also realized that the constant re-use of category(c) & (Line_Separator & Paragraph_Separator) should be replaced with a function. Now I need an optimization to make that check efficient :)
Attachment 80678 [details] did not build on chromium: Build output: http://queues.webkit.org/results/7687037
Comment on attachment 80678 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=80678&action=review > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1796 > + if (c == '\n' && preserveNewline || category(c) & (Separator_Line | Separator_Paragraph)) { This needs parenthesis. Also removing r? while I figure out how to make this fast enough to be usable.
Attachment 80678 [details] did not build on qt: Build output: http://queues.webkit.org/results/7682487
Created attachment 80831 [details] Patch
(In reply to comment #11) > Created an attachment (id=80831) [details] > Patch I'm now doing a direct comparison with the UChar values for Line and Paragraph Separator. Assuming this is acceptable, this should resolve the performance issue present before.
(In reply to comment #11) > Created an attachment (id=80831) [details] > Patch Can someone take a look at this and see if it's acceptable? Also, if my method for checking against Paragraph and Line Separators is correct, I'd love to stick that re-occurring code into a function, but I'm not sure where the best place for it is.
Anyone? Review please?
Mitz or Ap are your best bet here.
Created attachment 92025 [details] Patch
(In reply to comment #16) > Created an attachment (id=92025) [details] > Patch Updating since the original patch grew stale. Anyone willing to take a look before it happens again? You know you want WebKit to support Unicode Line and Paragraph Separators!
Attachment 92025 [details] did not pass chromium-ews: Output: http://queues.webkit.org/results/8530853 New failing tests: fast/css/bidi-breaking-002.html
(In reply to comment #18) > Attachment 92025 [details] did not pass chromium-ews: > Output: http://queues.webkit.org/results/8530853 > New failing tests: > fast/css/bidi-breaking-002.html Looks like this patch will cause this test to fail on Chromium Linux.
(In reply to comment #19) > (In reply to comment #18) > > Attachment 92025 [details] [details] did not pass chromium-ews: > > Output: http://queues.webkit.org/results/8530853 > > New failing tests: > > fast/css/bidi-breaking-002.html > > Looks like this patch will cause this test to fail on Chromium Linux. Are the results available? This is the new test I'm adding with the patch :p
The disinterest in fixing this makes me sad :(
You and I will make a date to go through this in person next week. Ok?
(In reply to comment #22) > You and I will make a date to go through this in person next week. Ok? Deal, thanks! :)
Created attachment 98059 [details] Patch
Comment on attachment 98059 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98059&action=review > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2117 > bool betweenWords = c == '\n' || (currWS != PRE && !atStart && isBreakable(lineBreakIteratorInfo.second, current.m_pos, current.m_nextBreakablePosition, breakNBSP) > - && (style->hyphens() != HyphensNone || (current.previousInSameNode() != softHyphen))); > + && (style->hyphens() != HyphensNone || (current.previousInSameNode() != softHyphen))) || isUnicodeLineBreak(c); So this would be better if the \n check was rolled into the unicode one. > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2198 > + if ((c == '\n' && preserveNewline) || isUnicodeLineBreak(c)) { We need a comment here as to why unicode linebreaks ignore preserveNewline. > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2206 > - lineInfo.setPreviousLineBrokeCleanly(true); > + lineInfo.setPreviousLineBrokeCleanly(c != WTF::Unicode::lineSeparator); This needs a comment as well. :) If this only affects the bidi algorithm, we should have a better name for this. Like bidiAlgorithResetsOnNextLine... (not really that much better, but I'm sure we can come up with something better!) > Source/WebCore/rendering/RenderText.cpp:706 > + while (i + linelen < len && text[i + linelen] != '\n' && !(isUnicodeLineBreak(text[i + linelen]))) Good use for a function which includes both \n and the unicode ones. > Source/WebCore/rendering/RenderText.cpp:821 > + } else if (isUnicodeLineBreak(c)) { > + m_hasBreak = true; > + isNewline = true; > + isSpace = false; Seems OK. Should we be rolling this into the \n check above? > Source/WebCore/rendering/RenderText.cpp:852 > + while (c != '\n' && !isSpaceAccordingToStyle(c, style()) && c != '\t' && c != softHyphen && !(isUnicodeLineBreak(c))) { This should again use an inline whihc checks all of \n and the unicode newlines. Checking \n first of course. :) > Source/WebCore/rendering/break_lines.h:44 > +inline bool isUnicodeLineBreak(UChar c) > +{ > + return c == WTF::Unicode::lineSeparator || c == WTF::Unicode::paragraphSeparator; > +} > + I believe we should have a isLineBreak inline which includes \n and possibly \r. And then use that everywhere instead of calling this unicode-only one in addition to the explicit \n checks. When you add a second function here, we may need to add comments about when to use each of the two newline-checking-functions.
This bug should block #50910 (Master bug for HTML5 Bidi).
Created attachment 102771 [details] the W3C test case that currently tests for this (http://test.csswg.org/source/approved/css2.1/src/bidi-text/bidi-breaking-003.xht)
(In reply to comment #26) > This bug should block #50910 (Master bug for HTML5 Bidi). Levi, could you please mark this bug as blocking 50910.
Of course :)
The behavior requested by this bug is incorrect. This character is used in BiDi text. The purpose of the paragraph separator is to break apart unrelated segments of text within the SAME line/paragraph. For example: "A description of something that happened – 12 hours ago" When the "description" ends with RTL content, in order to make " – 12 hours ago" always appear at the trailing end of the line, a paragraph separator can be inserted in front of it. The only effect is should have is on bidi layout. The following should all render the same way ("WERBEH" is used to mean Hebrew, spelled using the Hebrew alphabet): A description ending in WERBEH 
- 12 hours ago ‪A description ending in WERBEH‬ - 12 hours ago <span style="unicode-bidi:isolate;direction=ltr;">A description ending in WERBEH </span>- 12 hours ago <bdi>A description ending in WERBEH </bdi>- 12 hours ago (202a is LRE, 202c is PDF) This character does not belong to the "line separator" category of the unicode specification.
(In reply to comment #30) > The purpose of the paragraph separator is to break apart unrelated segments > of text within the SAME line > [...] > The only effect is should have is on bidi layout > [as opposed to causing a new line to be started - > if I understood you correctly] Can you give any reference for this? I believe this statement is incorrect. UTR#20 (http://www.unicode.org/reports/tr20/#Line) says that PARAGRAPH SEPARATOR is supposed to be used as an unambiguous replacement for newline (which implies it should indeed break a line), and that in HTML it should be replaced with the use of <xhtml:p> and </xhtml:p> (which, of course, will cause line breaks by default). (I obviously agree that in HTML, mark-up should be used to denote paragraphs. The test is about what should happen when a page nevertheless does use PARAGRAPH SEPARATOR, despite UTR#20 discouraging its use.)
Created attachment 460391 [details] Updated Status in Safari 15.5 I am unclear on expected result across following test (attached) but please refer to attached screenshot for updated result across all browsers (Chrome Canary 105, Firefox Nightly) and Safari 15.5. Thanks!