17427 – Line breaking opportunities at the end of a text node are missed

RESOLVED FIXED 17427

Line breaking opportunities at the end of a text node are missed

https://bugs.webkit.org/show_bug.cgi?id=17427

Summary Line breaking opportunities at the end of a text node are missed

mitz

Reported 2008-02-18 11:54:45 PST

A line breaking opportunity at the end of a text node is ignored. See the attached test case for example (there is a line-breaking opportunity after a question mark).

Attachments
Test case (226 bytes, text/html) 2008-02-18 11:55 PST, mitz	no flags	Details
Proposed patch (6.73 KB, patch) 2012-09-16 17:25 PDT, Alex Henrie	alexhenrie24: review- webkit.review.bot: commit-queue-	Details Formatted Diff Diff
Proposed patch v2 (10.22 KB, patch) 2012-09-18 20:45 PDT, Alex Henrie	darin: review- webkit.review.bot: commit-queue-	Details Formatted Diff Diff
Proposed patch v3 (20.64 KB, patch) 2012-09-24 21:02 PDT, Alex Henrie	no flags	Details Formatted Diff Diff
Proposed patch v4 (11.87 KB, patch) 2012-09-24 21:28 PDT, Alex Henrie	no flags	Details Formatted Diff Diff
Proposed patch v5 (11.80 KB, patch) 2012-09-24 21:38 PDT, Alex Henrie	buildbot: commit-queue-	Details Formatted Diff Diff
Proposed patch v6 (11.66 KB, patch) 2012-11-04 22:07 PST, Alex Henrie	no flags	Details Formatted Diff Diff
Patch (8.39 KB, patch) 2013-02-24 15:13 PST, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (8.39 KB, patch) 2013-02-25 07:30 PST, Glenn Adams	no flags	Details Formatted Diff Diff
perf results - attachment 190055 (26.90 KB, text/html) 2013-03-06 08:01 PST, Glenn Adams	no flags	Details
Patch (8.97 KB, patch) 2013-03-10 15:35 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch (11.88 KB, patch) 2013-03-10 19:25 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch for landing (11.82 KB, patch) 2013-03-10 19:54 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch for landing (11.82 KB, patch) 2013-03-10 19:59 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Patch for landing (11.84 KB, patch) 2013-03-10 20:03 PDT, Glenn Adams	no flags	Details Formatted Diff Diff
Show Obsolete (13) View All Add attachment proposed patch, testcase, etc.

mitz

Comment 1 2008-02-18 11:55:26 PST

Created attachment 19193 [details] Test case

Xianzhu Wang

Comment 2 2011-09-06 18:49:34 PDT

For now the line breaking iterator can only see the text in the current element. The solution might be also passing the text of the previous element to the line breaking iterator, if the previous element is in the same paragraph of the current element. For ICU, we can avoid the extra string allocation by passing the texts in UText. Solution to bug 67022 will also use UText.

Darin Adler

Comment 3 2011-09-07 14:24:27 PDT

If we do keep a UText object around, it should be done through an abstraction that would let other platforms do the same kind of thing with whatever context object they would use.

Xianzhu Wang

Comment 4 2011-09-07 18:50:09 PDT

I think at WebKit side, we can change the interfaces of TextBreakIterator by adding a 'context' parameter besides the original text parameter for the xxxxIterator() functions. In TextBreakIteratorICU.cpp we can combine the two parameters into one UText and pass to ICU. For other ports use their own proper methods if possible, or just allocate a new string to copy the context and the text into it, or maybe just ignore the context.

mitz

Comment 5 2012-05-18 22:19:46 PDT

<rdar://problem/11489433>

Alex Henrie

Comment 6 2012-09-16 17:25:33 PDT

Created attachment 164327 [details] Proposed patch I've written a patch that fixes this bug. It determines whether or not a line break is possible immediately before a text node by running the line-breaking algorithm on a temporary string made from the first two UChars of that text node and the last two UChars of the preceding text node. I chose this approach over adding a 'context' parameter to TextBreakIterator for two reasons: First, I could not figure out how to combine two UChar* strings into a UText without allocating a new string and copying them both into it, and second, because it would have required much more intrusive changes for a dubious benefit. I noticed that my patch causes LayoutTests/fast/text/international/003.html and LayoutTests/fast/text/international/text-combine-image-test.html to "fail". I believe that my patch actually makes the rendering of 003.html more correct because it correctly allows a line break between the number 20 and the character 日. As far as text-combine-image-test.html, I do not believe that this page renders correctly with or without the patch because the text should render in a single vertical column, the same as if it had no markup. For what it's worth, I think the page looks a little nicer with the patch than without. Even if you think there is a better approach, could this patch be accepted as an interim solution? This bug needs to be fixed to make the best use of limited screen space on mobile devices, for consistency with other web browsers and applications in general, and for web applications such as [1] that require line breaking to be consistent. [1] http://meta.wikimedia.org/wiki/User:Remember_the_dot/Syntax_highlighter

WebKit Review Bot

Comment 7 2012-09-16 21:16:08 PDT

Comment on attachment 164327 [details] Proposed patch Attachment 164327 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13874271 New failing tests: fast/text/international/003.html fast/text/line-break-before-text-node.html fast/css/word-spacing-characters.html fast/writing-mode/Kusa-Makura-background-canvas.html

Ryosuke Niwa

Comment 8 2012-09-17 14:07:14 PDT

Comment on attachment 164327 [details] Proposed patch Why are we using 4 characters?

Darin Adler

Comment 9 2012-09-17 17:44:30 PDT

Comment on attachment 164327 [details] Proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=164327&action=review > Source/WebCore/ChangeLog:13 > + Adding semantic markup to text should not affect the line-breaking > + algorithm. The standard line-breaking algorithm is defined in Unicode > + Annex 14 (UAX14) and strict adherence to it is necessary to make the > + best use of limited screen space on mobile devices, for consistency > + with other web browsers and applications in general, and for web > + applications that require line breaking to be consistent. This paragraph is not needed in ChangeLog. > Source/WebCore/ChangeLog:18 > + This patch determines whether or not a line break is possible > + immediately before a text node by running the line-breaking algorithm on > + a temporary string made from the first two UChars of that text node and > + the last two UChars of the preceding text node. Why are two characters before and two character after sufficient?

Alex Henrie

Comment 10 2012-09-17 20:24:14 PDT

Comment on attachment 164327 [details] Proposed patch Okay, a few notes. text-combine-image-test.html is the test with the string "20日". I looked at this test again and without any markup there is no line break, so the line break my patch causes must be incorrect. However, I wasn't able to investigate it further because I spent all my free time today trying to recompile WebKit and when I run the GTK launcher all I get is a segmentation fault. GDB reports: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff4645a91 in JSC::ArrayConstructor::finishCreation(JSC::ExecState*, JSC::ArrayPrototype*) () from /home/alex/webkit/WebKitBuild/Release/.libs/libjavascriptcoregtk-3.0.so.0 I used four characters because I thought that that was always going to be enough context to determine line breaking, and in everything I've tested, it is. However, reading UAX14 more closely it looks like Southeast Asian languages (such as Thai) require dictionary-based line-breaking algorithms, which would require more context. Do you know of a way to provide this context to the line-breaking algorithm without allocating a new string and copying the contents of the previous and current text nodes into it? Will Thai require more context than just the first previous text node? Any tips on how to fix the segfault?

Ryosuke Niwa

Comment 11 2012-09-17 20:34:32 PDT

> Do you know of a way to provide this context to the line-breaking algorithm without allocating a new string and copying the contents of the previous and current text nodes into it? Use UText as suggested in comments #2-#4? > Will Thai require more context than just the first previous text node? It appears that line breaking in Thai requires dictionary lookups so I'd expect it to require much more than just few characters to determine the line breaking.

Alex Henrie

Comment 12 2012-09-17 20:41:11 PDT

Do you know the syntax for creating a UText object without allocating a new string? I could not find it anywhere.

Alex Henrie

Comment 13 2012-09-18 20:45:28 PDT

Created attachment 164653 [details] Proposed patch v2 Okay, here is version 2 of the patch (the segfault seems to have been fixed in the latest SVN revision). I added a test for Thai line breaking this time around. Also, I got around the failure on fast/text/line-break-before-text-node.html by not running the new code if -webkit-text-combine is present. I know that the string manipulation in this patch is highly inefficient; help would really be appreciated. [1] and [2] talk about creating UText objects from discontinuous strings but it is not clear whether this feature is anything more than theoretical. Any other information about how you want the solution to look would also be helpful. [1] http://userguide.icu-project.org/strings/utext [2] http://icu-project.org/apiref/icu4c/utext_8h.html

WebKit Review Bot

Comment 14 2012-09-18 20:48:36 PDT

Attachment 164653 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1 Source/WebCore/rendering/RenderBlockLineLayout.cpp:2445: This { should be at the end of the previous line [whitespace/braces] [4] Source/WebCore/rendering/RenderBlockLineLayout.cpp:2450: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/rendering/RenderBlockLineLayout.cpp:2451: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/rendering/RenderBlockLineLayout.cpp:2460: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/rendering/RenderBlockLineLayout.cpp:2461: More than one command on the same line in if [whitespace/parens] [4] Total errors found: 5 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.

WebKit Review Bot

Comment 15 2012-09-19 03:27:06 PDT

Comment on attachment 164653 [details] Proposed patch v2 Attachment 164653 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13890488 New failing tests: fast/forms/access-key-for-all-elements.html css3/filters/custom/custom-filter-property-computed-style.html fast/canvas/webgl/data-view-test.html fast/loader/unload-form-post-about-blank.html fast/forms/text-control-intrinsic-widths.html fast/hidpi/video-controls-in-hidpi.html editing/selection/extend-selection-home-end.html editing/selection/home-end.html editing/selection/extend-selection-character.html css3/filters/custom/custom-filter-property-parsing.html fast/js/comparison-operators-less.html css3/filters/filter-property-parsing.html fast/js/comparison-operators-greater.html fast/js/comparison-operators.html fast/canvas/webgl/array-unit-tests.html editing/selection/programmatic-selection-on-mac-is-directionless.html fast/canvas/webgl/array-message-passing.html css3/filters/filter-property-computed-style.html fast/canvas/webgl/object-deletion-behaviour.html fast/dom/Window/window-lookup-precedence.html fast/canvas/webgl/type-conversion-test.html fast/js/array-enumerators-functions.html fast/js/kde/encode_decode_uri.html fast/js/kde/Number.html http/tests/xmlhttprequest/web-apps/001.html fast/dom/wrapper-classes.html fast/css/word-spacing-characters.html fast/canvas/webgl/tex-image-and-sub-image-2d-with-array-buffer-view.html fast/forms/option-strip-unicode-spaces.html fast/loader/local-CSS-from-local.html

Darin Adler

Comment 16 2012-09-19 10:00:40 PDT

Comment on attachment 164653 [details] Proposed patch v2 View in context: https://bugs.webkit.org/attachment.cgi?id=164653&action=review Please fix the style complaints from the style checking. They are all correctly expressing WebKit coding style. > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2446 > + // Thai and other Southeast Asian languages use dictionary-based line breaking algorithms that require a lot of context Please add a period to this sentence-style comment. The comment is a bit mysterious. This says “a lot of context”, but does not explain how much is enough. > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2447 > + String textWithContext; Just using a StringBuilder instead of a String will be significantly more efficient. But perhaps we can do even better. > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2468 > + current.m_nextBreakablePosition = 0; // Yes, a line break may occur immediately before I find this comment confusing. Maybe you could rewrite it to be a bit more explicit?

Darin Adler

Comment 17 2012-09-19 10:01:35 PDT

(In reply to comment #13) > [1] and [2] talk about creating UText objects from discontinuous strings but it is not clear whether this feature is anything more than theoretical. It is not theoretical. In fact Michael Saboff is doing something interesting with this to make Latin-1 work with ICU without making UTF-16 copies of the strings.

Alex Henrie

Comment 18 2012-09-24 21:02:32 PDT

Created attachment 165510 [details] Proposed patch v3 Try 3. I used a StringBuilder this time, and made the code only concatenate the context together once for an entire set of runs. This works quite nicely. I tried a few different ways and finally settled on using a hash table to determine where each text node lies in the context string. I'll email Michael Saboff to ask what would have to be done to change this code to use UText instead of StringBuilder, but I think this patch would be fine as-is.

Alex Henrie

Comment 19 2012-09-24 21:28:38 PDT

Created attachment 165512 [details] Proposed patch v4 Apologies, the tests were repeated multiple times in the last patch. The patch is now fixed.

Alex Henrie

Comment 20 2012-09-24 21:38:05 PDT

Created attachment 165516 [details] Proposed patch v5 I really apologize. Try 4 still had a duplicate <meta> tag on the Thai test. I will definitely be more careful to make sure I don't accidentally duplicate the tests again in any future patches.

Build Bot

Comment 21 2012-09-24 22:29:30 PDT

Comment on attachment 165516 [details] Proposed patch v5 Attachment 165516 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13992883 New failing tests: fast/text/international/003.html fast/text/line-break-between-text-nodes.html fast/text/line-break-in-marked-up-thai.html

WebKit Review Bot

Comment 22 2012-09-25 07:55:58 PDT

Comment on attachment 165516 [details] Proposed patch v5 Attachment 165516 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14021139 New failing tests: fast/text/international/003.html fast/text/line-break-between-text-nodes.html fast/writing-mode/Kusa-Makura-background-canvas.html fast/text/line-break-in-marked-up-thai.html

Michael Saboff

Comment 23 2012-09-25 09:28:03 PDT

(In reply to comment #17) > (In reply to comment #13) > > [1] and [2] talk about creating UText objects from discontinuous strings but it is not clear whether this feature is anything more than theoretical. > > It is not theoretical. In fact Michael Saboff is doing something interesting with this to make Latin-1 work with ICU without making UTF-16 copies of the strings. You can write your own UText "object", basically a UText struct with function pointers to "virtual" functions for your UText. See http://icu-project.org/apiref/icu4c/structUText.html for some details. You can also look at my patch out for review at https://bugs.webkit.org/attachment.cgi?id=165221&action=review. I do copy chunks because I'm starting with 8 bit source data and the ICU breaking code working with UText expects UChar data. For read only uses, you don't need all the UText functions. For breaking, I think you'll need clone (shallow only I think), nativeLength, access, extract, mapOffsetToNative, mapNativeIndexToUTF16 and close. Your functions need to be set in pFuncs along with all the other UText fields, likely in your own makeUTextForCombinedRenderText method. Given that you have data across multiple elements, your access function will need to map in the appropriate element (RenderText) and it's offsets into the UText structure as part of access processing. Using this method you shouldn't need to copy RenderText data. THis method will only work on platforms that have version 4 or later of ICU, IIRC. Hope this helps. You changes and my changes are touching similar code. I plan on changing RenderText so that the m_text String can contain 8 bit data.

Alex Henrie

Comment 24 2012-09-25 16:22:38 PDT

Michael, thanks for the explanation. I think I understand how your patch works. So my next question is, how would we make the the line-breaking functions take a UText if available and a String if not? Would it require changes to all platforms? Also, given that UText is not available on all platforms, and what we want to do with it requires ICU 4 or later, would you be okay with using the StringBuilder-based implementation to fix the rendering bugs now and adding optimizations for ICU platforms afterwards in a separate patch?

Michael Saboff

Comment 25 2012-09-26 09:48:21 PDT

(In reply to comment #24) > Michael, thanks for the explanation. I think I understand how your patch works. So my next question is, how would we make the the line-breaking functions take a UText if available and a String if not? Would it require changes to all platforms? > > Also, given that UText is not available on all platforms, and what we want to do with it requires ICU 4 or later, would you be okay with using the StringBuilder-based implementation to fix the rendering bugs now and adding optimizations for ICU platforms afterwards in a separate patch? I think all the platforms that use ICU are at ICU 4 or greater. We just updated Mac to ICU 4.6. I think Chromium is already at 4.6 or higher. My proposed patch that used UText built on all platforms, although those changes are in Source/WebCore/platform/text/TextBreakIteratorICU.cpp. I would suggest that you make your changes in an existing or new "ICU" file. No sense to land a StringBuilder fix when we can make things better for ICU capable platforms.

Alex Henrie

Comment 26 2012-09-26 18:25:29 PDT

I agree that optimizations for ICU platforms would be nice. A few other considerations also come to mind. First, fixing rendering bugs (which are causing problems in real-world applications, see comments #6 and #11) is generally more important than platform-specific optimizations. Second, I am a volunteer: I have limited time, and I only have access to Linux machines. I could probably come up with a UText solution but I would need help. Third, the line-breaking functions in TextBreakIterator.h currently expect a UChar*, not a UText: TextBreakIterator* acquireLineBreakIterator(const UChar*, int length, const AtomicString& locale); void LazyLineBreakIterator::reset(const UChar* string, int length, const AtomicString& locale); How would you suggest changing these functions so that they accept a UText when available and a String or UChar* when not? Won't any change here affect all platforms? If it affects all platforms, how can I test my changes? Thanks for the help.

Michael Saboff

Comment 27 2012-09-27 11:48:23 PDT

(In reply to comment #26) > I agree that optimizations for ICU platforms would be nice. A few other considerations also come to mind. First, fixing rendering bugs (which are causing problems in real-world applications, see comments #6 and #11) is generally more important than platform-specific optimizations. Second, I am a volunteer: I have limited time, and I only have access to Linux machines. I could probably come up with a UText solution but I would need help. Third, the line-breaking functions in TextBreakIterator.h currently expect a UChar*, not a UText: > > TextBreakIterator* acquireLineBreakIterator(const UChar*, int length, const AtomicString& locale); > void LazyLineBreakIterator::reset(const UChar* string, int length, const AtomicString& locale); > > How would you suggest changing these functions so that they accept a UText when available and a String or UChar* when not? Won't any change here affect all platforms? If it affects all platforms, how can I test my changes? > > Thanks for the help. I think the best way to do this is to create a new acquireLineBreakIterator() and LazyLineBreakIterator::reset() that take the list of RenderText's or TextNode's that contain all of the strings. The implementation of those new methods could be in a ICU file. It may be sufficient to provide this implementation and leave other implementations to other platforms. You might need to #if the definitions.

Alex Henrie

Comment 28 2012-10-01 21:14:31 PDT

layoutRunsAndFloatsInRange calls acquireLineBreakIterator once for each RenderText object in the set, using the locale of the current RenderText object to configure any special line-breaking behavior. With ICU, acquireLineBreakIterator would work best if given a precomputed mapping from indices to RenderText objects. Without ICU, acquireLineBreakIterator would work best if given a precomputed context string. In both cases, layoutRunsAndFloatsInRange also needs a mapping from RenderText objects to indices. So, to optimize for both cases, we would need to write code for both in layoutRunsAndFloatsInRange and use #ifdef WTF_USE_ICU_UNICODE to switch between the two. Likewise, #ifdef WTF_USE_ICU_UNICODE would add to TextBreakIterator.h a function declaration for an acquireLineBreakIterator that takes (instead of a string) mappings between indices and RenderText objects. The same goes for struct RenderTextInfo, which would have m_textWithContext on non-ICU platforms and the mapping of indices to RenderText objects on ICU. If this explanation doesn't make sense at first, look at what the StringBuilder patch has to do and think about how it would have to be different to optimize for ICU. Can you think of a better solution?

Darin Adler

Comment 29 2012-10-02 09:54:02 PDT

(In reply to comment #28) > With ICU, acquireLineBreakIterator would work best if given a precomputed mapping from indices to RenderText objects. Without ICU, acquireLineBreakIterator would work best if given a precomputed context string. Can the non-ICU acquireLineBreakIterator function compute and cache the context string? I don’t understand why the platform independent code needs to build the string, just in case the line break code needs to consolidate everything into a single string.

Alex Henrie

Comment 30 2012-10-02 21:44:31 PDT

How would acquireLineBreakIterator know when the list of RenderText objects or their contents has changed? If it takes a pointer to the list, there is no guarantee that the data at the pointer's address is still the same. I suppose we could add a boolean parameter to acquireLineBreakIterator to tell it when to recompute the context string. What did you have in mind?

Darin Adler

Comment 31 2012-10-03 09:07:59 PDT

(In reply to comment #30) > How would acquireLineBreakIterator know when the list of RenderText objects or their contents has changed? If it takes a pointer to the list, there is no guarantee that the data at the pointer's address is still the same. I suppose we could add a boolean parameter to acquireLineBreakIterator to tell it when to recompute the context string. What did you have in mind? I didn’t have anything specific in mind, but I suspect we can come up with something. The point is to look for a way to efficiently abstract away these differences in interface rather than putting #if into the platform-independent code.

Alex Henrie

Comment 32 2012-10-03 21:43:25 PDT

I want to see this bug fixed and I've written a cross-platform solution, but I don't see a nice clean way to add ICU optimizations. The WebKit architects need to either agree on a messier solution (so that the solution can be coded up) or leave ICU optimization for later.

Michael Saboff

Comment 33 2012-10-04 13:03:40 PDT

(In reply to comment #32) > I want to see this bug fixed and I've written a cross-platform solution, but I don't see a nice clean way to add ICU optimizations. The WebKit architects need to either agree on a messier solution (so that the solution can be coded up) or leave ICU optimization for later. I think the best way to make a cross-platform solution is to create an API that can be used for both an ICU solution and a non-ICU solution. Put this definition in a common .h file, like TextBreakIterator.h. Then put the corresponding implementation in 2 or more platform specific .cpp files. This new API will likely need to walk through the nodes and therefore need a list of nodes. For the ICU version, you create the UText routines that walk through the node list and then you use your implementation for non-ICU platforms.

Alex Henrie

Comment 34 2012-10-07 22:32:55 PDT

I thought of a worse problem: If acquireLineBreakIterator takes a list of RenderText objects and on some platforms needs to generate a reusable string, where would that string be stored in memory? It seems like the most logical way to solve this problem would be to rewrite all the line-breaking functions to be part of a line-breaker class, and then the class could store whatever variables it needs to maintain its state. However, this would amount to a complete redesign of the line-breaking API on all platforms, and I am unable to test any platform but Linux. I could really use some specific help designing and implementing a solution that makes everyone happy.

Michael Saboff

Comment 35 2012-10-08 16:56:47 PDT

(In reply to comment #34) > I thought of a worse problem: If acquireLineBreakIterator takes a list of RenderText objects and on some platforms needs to generate a reusable string, where would that string be stored in memory? Since TextBreakIterator is really just a pointer type and internal objects are cast to and from it, you could create your own type on those platforms that need a string. > It seems like the most logical way to solve this problem would be to rewrite all the line-breaking functions to be part of a line-breaker class, and then the class could store whatever variables it needs to maintain its state. However, this would amount to a complete redesign of the line-breaking API on all platforms, and I am unable to test any platform but Linux. > > I could really use some specific help designing and implementing a solution that makes everyone happy.

Alex Henrie

Comment 36 2012-10-09 22:05:39 PDT

Changing the non-ICU platforms may not be as daunting a task as I thought. Looking through Source/WebCore/platform/text, it seems that only WinCE and possibly GTK do not use ICU. Is GTK ever used with ICU disabled? If not then only ICU and WinCE need to be supported. I think we're getting closer to figuring out how to code this. Regardless, I still don't have a way to test changes to WinCE. Any ideas?

Alex Henrie

Comment 37 2012-11-04 22:07:41 PST

Created attachment 172268 [details] Proposed patch v6 I've spent more time on this and, unfortunately, have not come up with anything workable. The entire system, from acquireLineBreakIterator to LazyLineBreakIterator to nextBreakablePosition, is designed to work with a single string. If you wrap the nodes in a UText and take away the ability to see a single string then multiple problems arise, the most obvious being that there is no way to have the algorithm treat NBSP as a simple space. If you try to fix this problem by passing around the list of nodes as well as the iterator over them, you run into the difficulty of having to find a way to remember and tell all these functions which node and which index within that node they are on, as well as the difficulty of rewriting everything to use a list instead of a string. Furthermore, writing the necessary functions for pFuncs is nontrivial even with Michael's code as an example. In the implementation of UTextAccess, if the last character of a chunk is requested "going forward" then a new, temporary chunk must be created from the end of the current chunk and the start of the next chunk. According to the ICU documentation, this new chunk MUST "begin and end on code point boundaries" and "a single code point comprised of multiple storage units must never span a chunk boundary." The same complication arises when the first character of a chunk is requested "going backward". I also learned recently that in addition to the previously listed problems that this bug is causing, the combination of this bug and bug 76131 causes incorrect rendering of Google Translate's yellow highlighting if, at a non-space line break in the textbox, a newline is added and then deleted. I've rebased the StringBuilder patch against the current tree. I realize that a more efficient ICU solution is theoretically possible, but due to the massive complexity of such a solution and lack of willing developers I just don't see it happening. StringBuilder is designed to minimize memory reallocations, so the performance shouldn't be terrible. Performance might be further improved by changing LazyLineBreakIterator to take a pointer to the string instead of recopying the string. If there is someone out there who can come up with something better that works, please come forward...

WebKit Review Bot

Comment 38 2012-11-04 22:08:22 PST

Comment on attachment 172268 [details] Proposed patch v6 Rejecting attachment 172268 [details] from review queue. alexhenrie24@gmail.com does not have reviewer permissions according to http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/committers.py. - If you do not have reviewer rights please read http://webkit.org/coding/contributing.html for instructions on how to use bugzilla flags. - If you have reviewer rights please correct the error in Tools/Scripts/webkitpy/common/config/committers.py by adding yourself to the file (no review needed). The commit-queue restarts itself every 2 hours. After restart the commit-queue will correctly respect your reviewer rights.

WebKit Review Bot

Comment 39 2012-11-04 22:08:58 PST

Comment on attachment 172268 [details] Proposed patch v6 Rejecting attachment 172268 [details] from commit-queue. alexhenrie24@gmail.com does not have committer permissions according to http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/committers.py. - If you do not have committer rights please read http://webkit.org/coding/contributing.html for instructions on how to use bugzilla flags. - If you have committer rights please correct the error in Tools/Scripts/webkitpy/common/config/committers.py by adding yourself to the file (no review needed). The commit-queue restarts itself every 2 hours. After restart the commit-queue will correctly respect your committer rights.

Build Bot

Comment 40 2013-01-15 19:47:58 PST

Comment on attachment 172268 [details] Proposed patch v6 Attachment 172268 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://queues.webkit.org/results/15903195 New failing tests: fast/text/international/003.html fast/text/line-break-between-text-nodes.html fast/writing-mode/Kusa-Makura-background-canvas.html fast/text/line-break-in-marked-up-thai.html

Lu jing

Comment 41 2013-02-22 22:00:40 PST

In RenderBlockLineLayout.cpp, Change the following code: ==== from ==== // If the next item on the line is text, and if we did not end with // a space, then the next text run continues our word (and so it needs to // keep adding to the uncommitted width. Just update and continue. checkForBreak = !currentCharacterIsSpace && (c == ' ' || c == '\t' || (c == '\n' && !next->preservesNewline())); ==== to ==== checkForBreak = !currentCharacterIsSpace && (c == ' ' || c == '\t' || (c == '\n' && !next->preservesNewline()) || isBreakable(renderTextInfo.m_lineBreakIterator, current.m_pos, current.m_nextBreakablePosition, true)); then I can get the expected result. As the comment above, breaking opportunities are missed here. How about this solution? I am going to check the side effects.

Lu jing

Comment 42 2013-02-22 23:19:22 PST

I just found some incorrect result, the inline nodes like span will always cause a line breaking. Perhaps more check is needed near isBreakable...

Glenn Adams

Comment 43 2013-02-24 15:13:32 PST

Created attachment 189990 [details] Patch

Glenn Adams

Comment 44 2013-02-24 17:35:55 PST

(In reply to comment #41) > In RenderBlockLineLayout.cpp, > > Change the following code: > > ==== from ==== > > // If the next item on the line is text, and if we did not end with > // a space, then the next text run continues our word (and so it needs to > // keep adding to the uncommitted width. Just update and continue. > checkForBreak = !currentCharacterIsSpace && (c == ' ' || c == '\t' || (c == '\n' && !next->preservesNewline())); > > ==== to ==== > > checkForBreak = !currentCharacterIsSpace && > (c == ' ' || c == '\t' || (c == '\n' && !next->preservesNewline()) > || isBreakable(renderTextInfo.m_lineBreakIterator, current.m_pos, current.m_nextBreakablePosition, true)); > > then I can get the expected result. > As the comment above, breaking opportunities are missed here. > How about this solution? I am going to check the side effects. Could you provide a test file that demonstrates what you were trying to fix with this change?

Glenn Adams

Comment 45 2013-02-25 07:30:55 PST

Created attachment 190055 [details] Patch Resubmit same for EWS retry.

Ryosuke Niwa

Comment 46 2013-02-25 12:34:23 PST

Comment on attachment 190055 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=190055&action=review > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1563 > + StringBuilder contextBuilder; > + for (InlineIterator i = resolver.position(); i.m_obj; i.moveToStartOf(bidiNextSkippingEmptyInlines(this, i.m_obj))) { > + if (i.m_obj->isText()) { > + renderTextInfo.m_positionInContext.add(i.m_obj, contextBuilder.length()); > + contextBuilder.append(toRenderText(i.m_obj)->text()); > + } > + } The most important question is how much of performance impact this change has. > LayoutTests/TestExpectations:8 > +# pending rebaseline on passing ports > +webkit.org/b/17427 fast/text/international/003.html [ Skip ] > +webkit.org/b/17427 fast/writing-mode/Kusa-Makura-background-canvas.html [ Skip ] > + Please generate pixel results on at least one port.

Glenn Adams

Comment 47 2013-02-25 12:49:39 PST

(In reply to comment #46) > (From update of attachment 190055 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=190055&action=review > > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1563 > > + StringBuilder contextBuilder; > > + for (InlineIterator i = resolver.position(); i.m_obj; i.moveToStartOf(bidiNextSkippingEmptyInlines(this, i.m_obj))) { > > + if (i.m_obj->isText()) { > > + renderTextInfo.m_positionInContext.add(i.m_obj, contextBuilder.length()); > > + contextBuilder.append(toRenderText(i.m_obj)->text()); > > + } > > + } > > The most important question is how much of performance impact this change has. Thanks for the pre-review, though I didn't r? yet. I'm testing performance now and expect to need to optimize the original proposed code (above). > > > LayoutTests/TestExpectations:8 > > +# pending rebaseline on passing ports > > +webkit.org/b/17427 fast/text/international/003.html [ Skip ] > > +webkit.org/b/17427 fast/writing-mode/Kusa-Makura-background-canvas.html [ Skip ] > > + > > Please generate pixel results on at least one port. Will do. Just want to see if I could get this through EWS once.

Glenn Adams

Comment 48 2013-03-06 08:01:20 PST

Created attachment 191752 [details] perf results - attachment 190055 [details] Running perftest on apple mac (release) port (MBP R hw) shows: * 0.85% regression on line-layout * 0.75% regression on html5-full-render Is this too high to warrant a commit?

Ryosuke Niwa

Comment 49 2013-03-06 09:08:20 PST

(In reply to comment #48) > Created an attachment (id=191752) [details] > perf results - attachment 190055 [details] > > Running perftest on apple mac (release) port (MBP R hw) shows: > > * 0.85% regression on line-layout > * 0.75% regression on html5-full-render I don't think 1% regression is acceptable.

Ryosuke Niwa

Comment 50 2013-03-06 09:10:26 PST

On the other hand, it seems like you got only 3 samples? We need something like 30-50 samples to get a statistically significant result.

Glenn Adams

Comment 51 2013-03-10 15:35:33 PDT

Created attachment 192388 [details] Patch

Glenn Adams

Comment 52 2013-03-10 15:39:27 PDT

(In reply to comment #51) > Created an attachment (id=192388) [details] > Patch Reimplement patch so as to eliminate any performance regression (in run time and memory allocation). This patch handles the ASCII shortcut path through nextBreakPosition() and doesn't change (breaking or performance) behavior for non-ASCII path. A subsequent patch against bug 105692 will address the non-ASCII path.

Darin Adler

Comment 53 2013-03-10 17:48:49 PDT

Comment on attachment 192388 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=192388&action=review Please look into the single-character-node issue I mention below. That’s the only reason I gave this a review- instead of review+. > Source/WebCore/platform/text/TextBreakIterator.h:63 > + , m_lastChar(0) > + , m_lastLastChar(0) I know that our existing code calls these lastCh and lastLastCh. But, in WebKit code we normally prefer to use words rather than abbreviations and there is a normal English phrase for the “last last character” that I think is clear. I would prefer m_lastCharacter, m_secondToLastCharacter, setLastTwoCharacters, and resetLastTwoCharacters. All those seem easy to read and less like jargon. > Source/WebCore/platform/text/TextBreakIterator.h:95 > + inline UChar lastChar() const { return m_lastChar; } > + inline UChar lastLastChar() const { return m_lastLastChar; } > + inline void setLastChars(UChar last, UChar lastLast) > + { > + m_lastChar = last; > + m_lastLastChar = lastLast; > + } > + inline void resetLastChars() > + { > + m_lastChar = 0; > + m_lastLastChar = 0; > + } The “inline” here are redundant and have no effect. All function definitions that are inside a class definition are treated as if “inline” is specified automatically. Please omit them. > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2873 > + UChar lastCharInNode = 0; lastCharacterInNode, please > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2874 > + UChar lastLastCharInNode = 0; secondToLastCharacterInNode, please > Source/WebCore/rendering/RenderBlockLineLayout.cpp:2917 > + goto nextChar; nextCharacter, please > Source/WebCore/rendering/RenderBlockLineLayout.cpp:3083 > + renderTextInfo.m_lineBreakIterator.setLastChars(lastCharInNode, lastLastCharInNode); This does not seem to correctly handle a text node that has only one character in it (or zero characters, for that matter). I think that in that case it would overwrite a characters from the second to last node with a zero. > Source/WebCore/rendering/break_lines.cpp:157 > + CharacterType lastLastCh = pos > 1 ? str[pos - 2] : CharacterType(lazyBreakIterator.lastLastChar()); > + CharacterType lastCh = pos > 0 ? str[pos - 1] : CharacterType(lazyBreakIterator.lastChar()); Please use static_cast<CharacterType>(x) instead of CharacterType(x).

Glenn Adams

Comment 54 2013-03-10 17:57:05 PDT

(In reply to comment #53) > (From update of attachment 192388 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=192388&action=review > > Please look into the single-character-node issue I mention below. That’s the only reason I gave this a review- instead of review+. > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:3083 > > + renderTextInfo.m_lineBreakIterator.setLastChars(lastCharInNode, lastLastCharInNode); > > This does not seem to correctly handle a text node that has only one character in it (or zero characters, for that matter). I think that in that case it would overwrite a characters from the second to last node with a zero. yes, i can see that... of course the current code (without this patch) doesn't handle any way, so this patch wouldn't regress that; i wonder if it could be addressed in a follow-up bug/patch, just to separate the behavioral impact a bit? or do you think it should be fixed here before moving forward (after i can address the other comments)?

Darin Adler

Comment 55 2013-03-10 18:15:25 PDT

I think it would be best to handle that case, too, in this patch. It seems likely to be a small delta on the patch you have already put together.

Glenn Adams

Comment 56 2013-03-10 19:25:18 PDT

Created attachment 192396 [details] Patch

Glenn Adams

Comment 57 2013-03-10 19:26:12 PDT

(In reply to comment #56) > Created an attachment (id=192396) [details] > Patch address comment #53 (darin)

Glenn Adams

Comment 58 2013-03-10 19:54:29 PDT

Created attachment 192397 [details] Patch for landing

Glenn Adams

Comment 59 2013-03-10 19:55:08 PDT

(In reply to comment #58) > Created an attachment (id=192397) [details] > Patch for landing tweaked ChangeLog

WebKit Review Bot

Comment 60 2013-03-10 19:58:24 PDT

Comment on attachment 192397 [details] Patch for landing Rejecting attachment 192397 [details] from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', '--bot-id=gce-cq-03', 'validate-changelog', '--non-interactive', 192397, '--port=chromium-xvfb']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue darin found in /mnt/git/webkit-commit-queue/LayoutTests/ChangeLog does not appear to be a valid reviewer according to committers.py. /mnt/git/webkit-commit-queue/LayoutTests/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive). Full output: http://webkit-commit-queue.appspot.com/results/16991738

Glenn Adams

Comment 61 2013-03-10 19:59:53 PDT

Created attachment 192398 [details] Patch for landing

WebKit Review Bot

Comment 62 2013-03-10 20:01:48 PDT

Comment on attachment 192398 [details] Patch for landing Rejecting attachment 192398 [details] from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', '--bot-id=gce-cq-01', 'validate-changelog', '--non-interactive', 192398, '--port=chromium-xvfb']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue darin found in /mnt/git/webkit-commit-queue/LayoutTests/ChangeLog does not appear to be a valid reviewer according to committers.py. /mnt/git/webkit-commit-queue/LayoutTests/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive). Full output: http://webkit-commit-queue.appspot.com/results/17114338

Glenn Adams

Comment 63 2013-03-10 20:03:13 PDT

Created attachment 192399 [details] Patch for landing

WebKit Review Bot

Comment 64 2013-03-10 20:33:10 PDT

Comment on attachment 192399 [details] Patch for landing Clearing flags on attachment: 192399 Committed r145338: <http://trac.webkit.org/changeset/145338>

WebKit Review Bot

Comment 65 2013-03-10 20:33:18 PDT

All reviewed patches have been landed. Closing bug.

Alexey Proskuryakov

Comment 66 2024-05-06 14:43:37 PDT

*** Bug 63894 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS OS X 10.5

Product WebKit

Component Text

Assignee

Glenn Adams

Reported

2008-02-18 11:54 PST

Modified

2024-05-06 14:43 PDT History

CC List

22 users Show

URL

Keywords InRadar

Duplicates (1)

63894 View as bug list

Depends on

Blocks

105692 110322 110468

Dependencies

tree graph

alexhenrie24
ap
buildbot
darin
dglazkov
dujid0
edwardjsabol
enrica
eric
esprehn+autocc
glenn
hgneng
kojii
mitz
msaboff
ojan.autocc
rniwa
syoichi
wangxianzhu
webkit.review.bot
xji
xz911jp