WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
17427
Line breaking opportunities at the end of a text node are missed
https://bugs.webkit.org/show_bug.cgi?id=17427
Summary
Line breaking opportunities at the end of a text node are missed
mitz
Reported
2008-02-18 11:54:45 PST
A line breaking opportunity at the end of a text node is ignored. See the attached test case for example (there is a line-breaking opportunity after a question mark).
Attachments
Test case
(226 bytes, text/html)
2008-02-18 11:55 PST
,
mitz
no flags
Details
Proposed patch
(6.73 KB, patch)
2012-09-16 17:25 PDT
,
Alex Henrie
alexhenrie24
: review-
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch v2
(10.22 KB, patch)
2012-09-18 20:45 PDT
,
Alex Henrie
darin
: review-
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch v3
(20.64 KB, patch)
2012-09-24 21:02 PDT
,
Alex Henrie
no flags
Details
Formatted Diff
Diff
Proposed patch v4
(11.87 KB, patch)
2012-09-24 21:28 PDT
,
Alex Henrie
no flags
Details
Formatted Diff
Diff
Proposed patch v5
(11.80 KB, patch)
2012-09-24 21:38 PDT
,
Alex Henrie
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch v6
(11.66 KB, patch)
2012-11-04 22:07 PST
,
Alex Henrie
no flags
Details
Formatted Diff
Diff
Patch
(8.39 KB, patch)
2013-02-24 15:13 PST
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(8.39 KB, patch)
2013-02-25 07:30 PST
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
perf results - attachment 190055
(26.90 KB, text/html)
2013-03-06 08:01 PST
,
Glenn Adams
no flags
Details
Patch
(8.97 KB, patch)
2013-03-10 15:35 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch
(11.88 KB, patch)
2013-03-10 19:25 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch for landing
(11.82 KB, patch)
2013-03-10 19:54 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch for landing
(11.82 KB, patch)
2013-03-10 19:59 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Patch for landing
(11.84 KB, patch)
2013-03-10 20:03 PDT
,
Glenn Adams
no flags
Details
Formatted Diff
Diff
Show Obsolete
(13)
View All
Add attachment
proposed patch, testcase, etc.
mitz
Comment 1
2008-02-18 11:55:26 PST
Created
attachment 19193
[details]
Test case
Xianzhu Wang
Comment 2
2011-09-06 18:49:34 PDT
For now the line breaking iterator can only see the text in the current element. The solution might be also passing the text of the previous element to the line breaking iterator, if the previous element is in the same paragraph of the current element. For ICU, we can avoid the extra string allocation by passing the texts in UText. Solution to
bug 67022
will also use UText.
Darin Adler
Comment 3
2011-09-07 14:24:27 PDT
If we do keep a UText object around, it should be done through an abstraction that would let other platforms do the same kind of thing with whatever context object they would use.
Xianzhu Wang
Comment 4
2011-09-07 18:50:09 PDT
I think at WebKit side, we can change the interfaces of TextBreakIterator by adding a 'context' parameter besides the original text parameter for the xxxxIterator() functions. In TextBreakIteratorICU.cpp we can combine the two parameters into one UText and pass to ICU. For other ports use their own proper methods if possible, or just allocate a new string to copy the context and the text into it, or maybe just ignore the context.
mitz
Comment 5
2012-05-18 22:19:46 PDT
<
rdar://problem/11489433
>
Alex Henrie
Comment 6
2012-09-16 17:25:33 PDT
Created
attachment 164327
[details]
Proposed patch I've written a patch that fixes this bug. It determines whether or not a line break is possible immediately before a text node by running the line-breaking algorithm on a temporary string made from the first two UChars of that text node and the last two UChars of the preceding text node. I chose this approach over adding a 'context' parameter to TextBreakIterator for two reasons: First, I could not figure out how to combine two UChar* strings into a UText without allocating a new string and copying them both into it, and second, because it would have required much more intrusive changes for a dubious benefit. I noticed that my patch causes LayoutTests/fast/text/international/003.html and LayoutTests/fast/text/international/text-combine-image-test.html to "fail". I believe that my patch actually makes the rendering of 003.html more correct because it correctly allows a line break between the number 20 and the character 日. As far as text-combine-image-test.html, I do not believe that this page renders correctly with or without the patch because the text should render in a single vertical column, the same as if it had no markup. For what it's worth, I think the page looks a little nicer with the patch than without. Even if you think there is a better approach, could this patch be accepted as an interim solution? This bug needs to be fixed to make the best use of limited screen space on mobile devices, for consistency with other web browsers and applications in general, and for web applications such as [1] that require line breaking to be consistent. [1]
http://meta.wikimedia.org/wiki/User:Remember_the_dot/Syntax_highlighter
WebKit Review Bot
Comment 7
2012-09-16 21:16:08 PDT
Comment on
attachment 164327
[details]
Proposed patch
Attachment 164327
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13874271
New failing tests: fast/text/international/003.html fast/text/line-break-before-text-node.html fast/css/word-spacing-characters.html fast/writing-mode/Kusa-Makura-background-canvas.html
Ryosuke Niwa
Comment 8
2012-09-17 14:07:14 PDT
Comment on
attachment 164327
[details]
Proposed patch Why are we using 4 characters?
Darin Adler
Comment 9
2012-09-17 17:44:30 PDT
Comment on
attachment 164327
[details]
Proposed patch View in context:
https://bugs.webkit.org/attachment.cgi?id=164327&action=review
> Source/WebCore/ChangeLog:13 > + Adding semantic markup to text should not affect the line-breaking > + algorithm. The standard line-breaking algorithm is defined in Unicode > + Annex 14 (UAX14) and strict adherence to it is necessary to make the > + best use of limited screen space on mobile devices, for consistency > + with other web browsers and applications in general, and for web > + applications that require line breaking to be consistent.
This paragraph is not needed in ChangeLog.
> Source/WebCore/ChangeLog:18 > + This patch determines whether or not a line break is possible > + immediately before a text node by running the line-breaking algorithm on > + a temporary string made from the first two UChars of that text node and > + the last two UChars of the preceding text node.
Why are two characters before and two character after sufficient?
Alex Henrie
Comment 10
2012-09-17 20:24:14 PDT
Comment on
attachment 164327
[details]
Proposed patch Okay, a few notes. text-combine-image-test.html is the test with the string "20日". I looked at this test again and without any markup there is no line break, so the line break my patch causes must be incorrect. However, I wasn't able to investigate it further because I spent all my free time today trying to recompile WebKit and when I run the GTK launcher all I get is a segmentation fault. GDB reports: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff4645a91 in JSC::ArrayConstructor::finishCreation(JSC::ExecState*, JSC::ArrayPrototype*) () from /home/alex/webkit/WebKitBuild/Release/.libs/libjavascriptcoregtk-3.0.so.0 I used four characters because I thought that that was always going to be enough context to determine line breaking, and in everything I've tested, it is. However, reading UAX14 more closely it looks like Southeast Asian languages (such as Thai) require dictionary-based line-breaking algorithms, which would require more context. Do you know of a way to provide this context to the line-breaking algorithm without allocating a new string and copying the contents of the previous and current text nodes into it? Will Thai require more context than just the first previous text node? Any tips on how to fix the segfault?
Ryosuke Niwa
Comment 11
2012-09-17 20:34:32 PDT
> Do you know of a way to provide this context to the line-breaking algorithm without allocating a new string and copying the contents of the previous and current text nodes into it?
Use UText as suggested in comments #2-#4?
> Will Thai require more context than just the first previous text node?
It appears that line breaking in Thai requires dictionary lookups so I'd expect it to require much more than just few characters to determine the line breaking.
Alex Henrie
Comment 12
2012-09-17 20:41:11 PDT
Do you know the syntax for creating a UText object without allocating a new string? I could not find it anywhere.
Alex Henrie
Comment 13
2012-09-18 20:45:28 PDT
Created
attachment 164653
[details]
Proposed patch v2 Okay, here is version 2 of the patch (the segfault seems to have been fixed in the latest SVN revision). I added a test for Thai line breaking this time around. Also, I got around the failure on fast/text/line-break-before-text-node.html by not running the new code if -webkit-text-combine is present. I know that the string manipulation in this patch is highly inefficient; help would really be appreciated. [1] and [2] talk about creating UText objects from discontinuous strings but it is not clear whether this feature is anything more than theoretical. Any other information about how you want the solution to look would also be helpful. [1]
http://userguide.icu-project.org/strings/utext
[2]
http://icu-project.org/apiref/icu4c/utext_8h.html
WebKit Review Bot
Comment 14
2012-09-18 20:48:36 PDT
Attachment 164653
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1 Source/WebCore/rendering/RenderBlockLineLayout.cpp:2445: This { should be at the end of the previous line [whitespace/braces] [4] Source/WebCore/rendering/RenderBlockLineLayout.cpp:2450: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/rendering/RenderBlockLineLayout.cpp:2451: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/rendering/RenderBlockLineLayout.cpp:2460: More than one command on the same line in if [whitespace/parens] [4] Source/WebCore/rendering/RenderBlockLineLayout.cpp:2461: More than one command on the same line in if [whitespace/parens] [4] Total errors found: 5 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 15
2012-09-19 03:27:06 PDT
Comment on
attachment 164653
[details]
Proposed patch v2
Attachment 164653
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13890488
New failing tests: fast/forms/access-key-for-all-elements.html css3/filters/custom/custom-filter-property-computed-style.html fast/canvas/webgl/data-view-test.html fast/loader/unload-form-post-about-blank.html fast/forms/text-control-intrinsic-widths.html fast/hidpi/video-controls-in-hidpi.html editing/selection/extend-selection-home-end.html editing/selection/home-end.html editing/selection/extend-selection-character.html css3/filters/custom/custom-filter-property-parsing.html fast/js/comparison-operators-less.html css3/filters/filter-property-parsing.html fast/js/comparison-operators-greater.html fast/js/comparison-operators.html fast/canvas/webgl/array-unit-tests.html editing/selection/programmatic-selection-on-mac-is-directionless.html fast/canvas/webgl/array-message-passing.html css3/filters/filter-property-computed-style.html fast/canvas/webgl/object-deletion-behaviour.html fast/dom/Window/window-lookup-precedence.html fast/canvas/webgl/type-conversion-test.html fast/js/array-enumerators-functions.html fast/js/kde/encode_decode_uri.html fast/js/kde/Number.html http/tests/xmlhttprequest/web-apps/001.html fast/dom/wrapper-classes.html fast/css/word-spacing-characters.html fast/canvas/webgl/tex-image-and-sub-image-2d-with-array-buffer-view.html fast/forms/option-strip-unicode-spaces.html fast/loader/local-CSS-from-local.html
Darin Adler
Comment 16
2012-09-19 10:00:40 PDT
Comment on
attachment 164653
[details]
Proposed patch v2 View in context:
https://bugs.webkit.org/attachment.cgi?id=164653&action=review
Please fix the style complaints from the style checking. They are all correctly expressing WebKit coding style.
> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2446 > + // Thai and other Southeast Asian languages use dictionary-based line breaking algorithms that require a lot of context
Please add a period to this sentence-style comment. The comment is a bit mysterious. This says “a lot of context”, but does not explain how much is enough.
> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2447 > + String textWithContext;
Just using a StringBuilder instead of a String will be significantly more efficient. But perhaps we can do even better.
> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2468 > + current.m_nextBreakablePosition = 0; // Yes, a line break may occur immediately before
I find this comment confusing. Maybe you could rewrite it to be a bit more explicit?
Darin Adler
Comment 17
2012-09-19 10:01:35 PDT
(In reply to
comment #13
)
> [1] and [2] talk about creating UText objects from discontinuous strings but it is not clear whether this feature is anything more than theoretical.
It is not theoretical. In fact Michael Saboff is doing something interesting with this to make Latin-1 work with ICU without making UTF-16 copies of the strings.
Alex Henrie
Comment 18
2012-09-24 21:02:32 PDT
Created
attachment 165510
[details]
Proposed patch v3 Try 3. I used a StringBuilder this time, and made the code only concatenate the context together once for an entire set of runs. This works quite nicely. I tried a few different ways and finally settled on using a hash table to determine where each text node lies in the context string. I'll email Michael Saboff to ask what would have to be done to change this code to use UText instead of StringBuilder, but I think this patch would be fine as-is.
Alex Henrie
Comment 19
2012-09-24 21:28:38 PDT
Created
attachment 165512
[details]
Proposed patch v4 Apologies, the tests were repeated multiple times in the last patch. The patch is now fixed.
Alex Henrie
Comment 20
2012-09-24 21:38:05 PDT
Created
attachment 165516
[details]
Proposed patch v5 I really apologize. Try 4 still had a duplicate <meta> tag on the Thai test. I will definitely be more careful to make sure I don't accidentally duplicate the tests again in any future patches.
Build Bot
Comment 21
2012-09-24 22:29:30 PDT
Comment on
attachment 165516
[details]
Proposed patch v5
Attachment 165516
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13992883
New failing tests: fast/text/international/003.html fast/text/line-break-between-text-nodes.html fast/text/line-break-in-marked-up-thai.html
WebKit Review Bot
Comment 22
2012-09-25 07:55:58 PDT
Comment on
attachment 165516
[details]
Proposed patch v5
Attachment 165516
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/14021139
New failing tests: fast/text/international/003.html fast/text/line-break-between-text-nodes.html fast/writing-mode/Kusa-Makura-background-canvas.html fast/text/line-break-in-marked-up-thai.html
Michael Saboff
Comment 23
2012-09-25 09:28:03 PDT
(In reply to
comment #17
)
> (In reply to
comment #13
) > > [1] and [2] talk about creating UText objects from discontinuous strings but it is not clear whether this feature is anything more than theoretical. > > It is not theoretical. In fact Michael Saboff is doing something interesting with this to make Latin-1 work with ICU without making UTF-16 copies of the strings.
You can write your own UText "object", basically a UText struct with function pointers to "virtual" functions for your UText. See
http://icu-project.org/apiref/icu4c/structUText.html
for some details. You can also look at my patch out for review at
https://bugs.webkit.org/attachment.cgi?id=165221&action=review
. I do copy chunks because I'm starting with 8 bit source data and the ICU breaking code working with UText expects UChar data. For read only uses, you don't need all the UText functions. For breaking, I think you'll need clone (shallow only I think), nativeLength, access, extract, mapOffsetToNative, mapNativeIndexToUTF16 and close. Your functions need to be set in pFuncs along with all the other UText fields, likely in your own makeUTextForCombinedRenderText method. Given that you have data across multiple elements, your access function will need to map in the appropriate element (RenderText) and it's offsets into the UText structure as part of access processing. Using this method you shouldn't need to copy RenderText data. THis method will only work on platforms that have version 4 or later of ICU, IIRC. Hope this helps. You changes and my changes are touching similar code. I plan on changing RenderText so that the m_text String can contain 8 bit data.
Alex Henrie
Comment 24
2012-09-25 16:22:38 PDT
Michael, thanks for the explanation. I think I understand how your patch works. So my next question is, how would we make the the line-breaking functions take a UText if available and a String if not? Would it require changes to all platforms? Also, given that UText is not available on all platforms, and what we want to do with it requires ICU 4 or later, would you be okay with using the StringBuilder-based implementation to fix the rendering bugs now and adding optimizations for ICU platforms afterwards in a separate patch?
Michael Saboff
Comment 25
2012-09-26 09:48:21 PDT
(In reply to
comment #24
)
> Michael, thanks for the explanation. I think I understand how your patch works. So my next question is, how would we make the the line-breaking functions take a UText if available and a String if not? Would it require changes to all platforms? > > Also, given that UText is not available on all platforms, and what we want to do with it requires ICU 4 or later, would you be okay with using the StringBuilder-based implementation to fix the rendering bugs now and adding optimizations for ICU platforms afterwards in a separate patch?
I think all the platforms that use ICU are at ICU 4 or greater. We just updated Mac to ICU 4.6. I think Chromium is already at 4.6 or higher. My proposed patch that used UText built on all platforms, although those changes are in Source/WebCore/platform/text/TextBreakIteratorICU.cpp. I would suggest that you make your changes in an existing or new "ICU" file. No sense to land a StringBuilder fix when we can make things better for ICU capable platforms.
Alex Henrie
Comment 26
2012-09-26 18:25:29 PDT
I agree that optimizations for ICU platforms would be nice. A few other considerations also come to mind. First, fixing rendering bugs (which are causing problems in real-world applications, see comments #6 and #11) is generally more important than platform-specific optimizations. Second, I am a volunteer: I have limited time, and I only have access to Linux machines. I could probably come up with a UText solution but I would need help. Third, the line-breaking functions in TextBreakIterator.h currently expect a UChar*, not a UText: TextBreakIterator* acquireLineBreakIterator(const UChar*, int length, const AtomicString& locale); void LazyLineBreakIterator::reset(const UChar* string, int length, const AtomicString& locale); How would you suggest changing these functions so that they accept a UText when available and a String or UChar* when not? Won't any change here affect all platforms? If it affects all platforms, how can I test my changes? Thanks for the help.
Michael Saboff
Comment 27
2012-09-27 11:48:23 PDT
(In reply to
comment #26
)
> I agree that optimizations for ICU platforms would be nice. A few other considerations also come to mind. First, fixing rendering bugs (which are causing problems in real-world applications, see comments #6 and #11) is generally more important than platform-specific optimizations. Second, I am a volunteer: I have limited time, and I only have access to Linux machines. I could probably come up with a UText solution but I would need help. Third, the line-breaking functions in TextBreakIterator.h currently expect a UChar*, not a UText: > > TextBreakIterator* acquireLineBreakIterator(const UChar*, int length, const AtomicString& locale); > void LazyLineBreakIterator::reset(const UChar* string, int length, const AtomicString& locale); > > How would you suggest changing these functions so that they accept a UText when available and a String or UChar* when not? Won't any change here affect all platforms? If it affects all platforms, how can I test my changes? > > Thanks for the help.
I think the best way to do this is to create a new acquireLineBreakIterator() and LazyLineBreakIterator::reset() that take the list of RenderText's or TextNode's that contain all of the strings. The implementation of those new methods could be in a ICU file. It may be sufficient to provide this implementation and leave other implementations to other platforms. You might need to #if the definitions.
Alex Henrie
Comment 28
2012-10-01 21:14:31 PDT
layoutRunsAndFloatsInRange calls acquireLineBreakIterator once for each RenderText object in the set, using the locale of the current RenderText object to configure any special line-breaking behavior. With ICU, acquireLineBreakIterator would work best if given a precomputed mapping from indices to RenderText objects. Without ICU, acquireLineBreakIterator would work best if given a precomputed context string. In both cases, layoutRunsAndFloatsInRange also needs a mapping from RenderText objects to indices. So, to optimize for both cases, we would need to write code for both in layoutRunsAndFloatsInRange and use #ifdef WTF_USE_ICU_UNICODE to switch between the two. Likewise, #ifdef WTF_USE_ICU_UNICODE would add to TextBreakIterator.h a function declaration for an acquireLineBreakIterator that takes (instead of a string) mappings between indices and RenderText objects. The same goes for struct RenderTextInfo, which would have m_textWithContext on non-ICU platforms and the mapping of indices to RenderText objects on ICU. If this explanation doesn't make sense at first, look at what the StringBuilder patch has to do and think about how it would have to be different to optimize for ICU. Can you think of a better solution?
Darin Adler
Comment 29
2012-10-02 09:54:02 PDT
(In reply to
comment #28
)
> With ICU, acquireLineBreakIterator would work best if given a precomputed mapping from indices to RenderText objects. Without ICU, acquireLineBreakIterator would work best if given a precomputed context string.
Can the non-ICU acquireLineBreakIterator function compute and cache the context string? I don’t understand why the platform independent code needs to build the string, just in case the line break code needs to consolidate everything into a single string.
Alex Henrie
Comment 30
2012-10-02 21:44:31 PDT
How would acquireLineBreakIterator know when the list of RenderText objects or their contents has changed? If it takes a pointer to the list, there is no guarantee that the data at the pointer's address is still the same. I suppose we could add a boolean parameter to acquireLineBreakIterator to tell it when to recompute the context string. What did you have in mind?
Darin Adler
Comment 31
2012-10-03 09:07:59 PDT
(In reply to
comment #30
)
> How would acquireLineBreakIterator know when the list of RenderText objects or their contents has changed? If it takes a pointer to the list, there is no guarantee that the data at the pointer's address is still the same. I suppose we could add a boolean parameter to acquireLineBreakIterator to tell it when to recompute the context string. What did you have in mind?
I didn’t have anything specific in mind, but I suspect we can come up with something. The point is to look for a way to efficiently abstract away these differences in interface rather than putting #if into the platform-independent code.
Alex Henrie
Comment 32
2012-10-03 21:43:25 PDT
I want to see this bug fixed and I've written a cross-platform solution, but I don't see a nice clean way to add ICU optimizations. The WebKit architects need to either agree on a messier solution (so that the solution can be coded up) or leave ICU optimization for later.
Michael Saboff
Comment 33
2012-10-04 13:03:40 PDT
(In reply to
comment #32
)
> I want to see this bug fixed and I've written a cross-platform solution, but I don't see a nice clean way to add ICU optimizations. The WebKit architects need to either agree on a messier solution (so that the solution can be coded up) or leave ICU optimization for later.
I think the best way to make a cross-platform solution is to create an API that can be used for both an ICU solution and a non-ICU solution. Put this definition in a common .h file, like TextBreakIterator.h. Then put the corresponding implementation in 2 or more platform specific .cpp files. This new API will likely need to walk through the nodes and therefore need a list of nodes. For the ICU version, you create the UText routines that walk through the node list and then you use your implementation for non-ICU platforms.
Alex Henrie
Comment 34
2012-10-07 22:32:55 PDT
I thought of a worse problem: If acquireLineBreakIterator takes a list of RenderText objects and on some platforms needs to generate a reusable string, where would that string be stored in memory? It seems like the most logical way to solve this problem would be to rewrite all the line-breaking functions to be part of a line-breaker class, and then the class could store whatever variables it needs to maintain its state. However, this would amount to a complete redesign of the line-breaking API on all platforms, and I am unable to test any platform but Linux. I could really use some specific help designing and implementing a solution that makes everyone happy.
Michael Saboff
Comment 35
2012-10-08 16:56:47 PDT
(In reply to
comment #34
)
> I thought of a worse problem: If acquireLineBreakIterator takes a list of RenderText objects and on some platforms needs to generate a reusable string, where would that string be stored in memory?
Since TextBreakIterator is really just a pointer type and internal objects are cast to and from it, you could create your own type on those platforms that need a string.
> It seems like the most logical way to solve this problem would be to rewrite all the line-breaking functions to be part of a line-breaker class, and then the class could store whatever variables it needs to maintain its state. However, this would amount to a complete redesign of the line-breaking API on all platforms, and I am unable to test any platform but Linux. > > I could really use some specific help designing and implementing a solution that makes everyone happy.
Alex Henrie
Comment 36
2012-10-09 22:05:39 PDT
Changing the non-ICU platforms may not be as daunting a task as I thought. Looking through Source/WebCore/platform/text, it seems that only WinCE and possibly GTK do not use ICU. Is GTK ever used with ICU disabled? If not then only ICU and WinCE need to be supported. I think we're getting closer to figuring out how to code this. Regardless, I still don't have a way to test changes to WinCE. Any ideas?
Alex Henrie
Comment 37
2012-11-04 22:07:41 PST
Created
attachment 172268
[details]
Proposed patch v6 I've spent more time on this and, unfortunately, have not come up with anything workable. The entire system, from acquireLineBreakIterator to LazyLineBreakIterator to nextBreakablePosition, is designed to work with a single string. If you wrap the nodes in a UText and take away the ability to see a single string then multiple problems arise, the most obvious being that there is no way to have the algorithm treat NBSP as a simple space. If you try to fix this problem by passing around the list of nodes as well as the iterator over them, you run into the difficulty of having to find a way to remember and tell all these functions which node and which index within that node they are on, as well as the difficulty of rewriting everything to use a list instead of a string. Furthermore, writing the necessary functions for pFuncs is nontrivial even with Michael's code as an example. In the implementation of UTextAccess, if the last character of a chunk is requested "going forward" then a new, temporary chunk must be created from the end of the current chunk and the start of the next chunk. According to the ICU documentation, this new chunk MUST "begin and end on code point boundaries" and "a single code point comprised of multiple storage units must never span a chunk boundary." The same complication arises when the first character of a chunk is requested "going backward". I also learned recently that in addition to the previously listed problems that this bug is causing, the combination of this bug and
bug 76131
causes incorrect rendering of Google Translate's yellow highlighting if, at a non-space line break in the textbox, a newline is added and then deleted. I've rebased the StringBuilder patch against the current tree. I realize that a more efficient ICU solution is theoretically possible, but due to the massive complexity of such a solution and lack of willing developers I just don't see it happening. StringBuilder is designed to minimize memory reallocations, so the performance shouldn't be terrible. Performance might be further improved by changing LazyLineBreakIterator to take a pointer to the string instead of recopying the string. If there is someone out there who can come up with something better that works, please come forward...
WebKit Review Bot
Comment 38
2012-11-04 22:08:22 PST
Comment on
attachment 172268
[details]
Proposed patch v6 Rejecting
attachment 172268
[details]
from review queue.
alexhenrie24@gmail.com
does not have reviewer permissions according to
http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/committers.py
. - If you do not have reviewer rights please read
http://webkit.org/coding/contributing.html
for instructions on how to use bugzilla flags. - If you have reviewer rights please correct the error in Tools/Scripts/webkitpy/common/config/committers.py by adding yourself to the file (no review needed). The commit-queue restarts itself every 2 hours. After restart the commit-queue will correctly respect your reviewer rights.
WebKit Review Bot
Comment 39
2012-11-04 22:08:58 PST
Comment on
attachment 172268
[details]
Proposed patch v6 Rejecting
attachment 172268
[details]
from commit-queue.
alexhenrie24@gmail.com
does not have committer permissions according to
http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/committers.py
. - If you do not have committer rights please read
http://webkit.org/coding/contributing.html
for instructions on how to use bugzilla flags. - If you have committer rights please correct the error in Tools/Scripts/webkitpy/common/config/committers.py by adding yourself to the file (no review needed). The commit-queue restarts itself every 2 hours. After restart the commit-queue will correctly respect your committer rights.
Build Bot
Comment 40
2013-01-15 19:47:58 PST
Comment on
attachment 172268
[details]
Proposed patch v6
Attachment 172268
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://queues.webkit.org/results/15903195
New failing tests: fast/text/international/003.html fast/text/line-break-between-text-nodes.html fast/writing-mode/Kusa-Makura-background-canvas.html fast/text/line-break-in-marked-up-thai.html
Lu jing
Comment 41
2013-02-22 22:00:40 PST
In RenderBlockLineLayout.cpp, Change the following code: ==== from ==== // If the next item on the line is text, and if we did not end with // a space, then the next text run continues our word (and so it needs to // keep adding to the uncommitted width. Just update and continue. checkForBreak = !currentCharacterIsSpace && (c == ' ' || c == '\t' || (c == '\n' && !next->preservesNewline())); ==== to ==== checkForBreak = !currentCharacterIsSpace && (c == ' ' || c == '\t' || (c == '\n' && !next->preservesNewline()) || isBreakable(renderTextInfo.m_lineBreakIterator, current.m_pos, current.m_nextBreakablePosition, true)); then I can get the expected result. As the comment above, breaking opportunities are missed here. How about this solution? I am going to check the side effects.
Lu jing
Comment 42
2013-02-22 23:19:22 PST
I just found some incorrect result, the inline nodes like span will always cause a line breaking. Perhaps more check is needed near isBreakable...
Glenn Adams
Comment 43
2013-02-24 15:13:32 PST
Created
attachment 189990
[details]
Patch
Glenn Adams
Comment 44
2013-02-24 17:35:55 PST
(In reply to
comment #41
)
> In RenderBlockLineLayout.cpp, > > Change the following code: > > ==== from ==== > > // If the next item on the line is text, and if we did not end with > // a space, then the next text run continues our word (and so it needs to > // keep adding to the uncommitted width. Just update and continue. > checkForBreak = !currentCharacterIsSpace && (c == ' ' || c == '\t' || (c == '\n' && !next->preservesNewline())); > > ==== to ==== > > checkForBreak = !currentCharacterIsSpace && > (c == ' ' || c == '\t' || (c == '\n' && !next->preservesNewline()) > || isBreakable(renderTextInfo.m_lineBreakIterator, current.m_pos, current.m_nextBreakablePosition, true)); > > then I can get the expected result. > As the comment above, breaking opportunities are missed here. > How about this solution? I am going to check the side effects.
Could you provide a test file that demonstrates what you were trying to fix with this change?
Glenn Adams
Comment 45
2013-02-25 07:30:55 PST
Created
attachment 190055
[details]
Patch Resubmit same for EWS retry.
Ryosuke Niwa
Comment 46
2013-02-25 12:34:23 PST
Comment on
attachment 190055
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=190055&action=review
> Source/WebCore/rendering/RenderBlockLineLayout.cpp:1563 > + StringBuilder contextBuilder; > + for (InlineIterator i = resolver.position(); i.m_obj; i.moveToStartOf(bidiNextSkippingEmptyInlines(this, i.m_obj))) { > + if (i.m_obj->isText()) { > + renderTextInfo.m_positionInContext.add(i.m_obj, contextBuilder.length()); > + contextBuilder.append(toRenderText(i.m_obj)->text()); > + } > + }
The most important question is how much of performance impact this change has.
> LayoutTests/TestExpectations:8 > +# pending rebaseline on passing ports > +
webkit.org/b/17427
fast/text/international/003.html [ Skip ] > +
webkit.org/b/17427
fast/writing-mode/Kusa-Makura-background-canvas.html [ Skip ] > +
Please generate pixel results on at least one port.
Glenn Adams
Comment 47
2013-02-25 12:49:39 PST
(In reply to
comment #46
)
> (From update of
attachment 190055
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=190055&action=review
> > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:1563 > > + StringBuilder contextBuilder; > > + for (InlineIterator i = resolver.position(); i.m_obj; i.moveToStartOf(bidiNextSkippingEmptyInlines(this, i.m_obj))) { > > + if (i.m_obj->isText()) { > > + renderTextInfo.m_positionInContext.add(i.m_obj, contextBuilder.length()); > > + contextBuilder.append(toRenderText(i.m_obj)->text()); > > + } > > + } > > The most important question is how much of performance impact this change has.
Thanks for the pre-review, though I didn't r? yet. I'm testing performance now and expect to need to optimize the original proposed code (above).
> > > LayoutTests/TestExpectations:8 > > +# pending rebaseline on passing ports > > +
webkit.org/b/17427
fast/text/international/003.html [ Skip ] > > +
webkit.org/b/17427
fast/writing-mode/Kusa-Makura-background-canvas.html [ Skip ] > > + > > Please generate pixel results on at least one port.
Will do. Just want to see if I could get this through EWS once.
Glenn Adams
Comment 48
2013-03-06 08:01:20 PST
Created
attachment 191752
[details]
perf results -
attachment 190055
[details]
Running perftest on apple mac (release) port (MBP R hw) shows: * 0.85% regression on line-layout * 0.75% regression on html5-full-render Is this too high to warrant a commit?
Ryosuke Niwa
Comment 49
2013-03-06 09:08:20 PST
(In reply to
comment #48
)
> Created an attachment (id=191752) [details] > perf results -
attachment 190055
[details]
> > Running perftest on apple mac (release) port (MBP R hw) shows: > > * 0.85% regression on line-layout > * 0.75% regression on html5-full-render
I don't think 1% regression is acceptable.
Ryosuke Niwa
Comment 50
2013-03-06 09:10:26 PST
On the other hand, it seems like you got only 3 samples? We need something like 30-50 samples to get a statistically significant result.
Glenn Adams
Comment 51
2013-03-10 15:35:33 PDT
Created
attachment 192388
[details]
Patch
Glenn Adams
Comment 52
2013-03-10 15:39:27 PDT
(In reply to
comment #51
)
> Created an attachment (id=192388) [details] > Patch
Reimplement patch so as to eliminate any performance regression (in run time and memory allocation). This patch handles the ASCII shortcut path through nextBreakPosition() and doesn't change (breaking or performance) behavior for non-ASCII path. A subsequent patch against
bug 105692
will address the non-ASCII path.
Darin Adler
Comment 53
2013-03-10 17:48:49 PDT
Comment on
attachment 192388
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=192388&action=review
Please look into the single-character-node issue I mention below. That’s the only reason I gave this a review- instead of review+.
> Source/WebCore/platform/text/TextBreakIterator.h:63 > + , m_lastChar(0) > + , m_lastLastChar(0)
I know that our existing code calls these lastCh and lastLastCh. But, in WebKit code we normally prefer to use words rather than abbreviations and there is a normal English phrase for the “last last character” that I think is clear. I would prefer m_lastCharacter, m_secondToLastCharacter, setLastTwoCharacters, and resetLastTwoCharacters. All those seem easy to read and less like jargon.
> Source/WebCore/platform/text/TextBreakIterator.h:95 > + inline UChar lastChar() const { return m_lastChar; } > + inline UChar lastLastChar() const { return m_lastLastChar; } > + inline void setLastChars(UChar last, UChar lastLast) > + { > + m_lastChar = last; > + m_lastLastChar = lastLast; > + } > + inline void resetLastChars() > + { > + m_lastChar = 0; > + m_lastLastChar = 0; > + }
The “inline” here are redundant and have no effect. All function definitions that are inside a class definition are treated as if “inline” is specified automatically. Please omit them.
> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2873 > + UChar lastCharInNode = 0;
lastCharacterInNode, please
> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2874 > + UChar lastLastCharInNode = 0;
secondToLastCharacterInNode, please
> Source/WebCore/rendering/RenderBlockLineLayout.cpp:2917 > + goto nextChar;
nextCharacter, please
> Source/WebCore/rendering/RenderBlockLineLayout.cpp:3083 > + renderTextInfo.m_lineBreakIterator.setLastChars(lastCharInNode, lastLastCharInNode);
This does not seem to correctly handle a text node that has only one character in it (or zero characters, for that matter). I think that in that case it would overwrite a characters from the second to last node with a zero.
> Source/WebCore/rendering/break_lines.cpp:157 > + CharacterType lastLastCh = pos > 1 ? str[pos - 2] : CharacterType(lazyBreakIterator.lastLastChar()); > + CharacterType lastCh = pos > 0 ? str[pos - 1] : CharacterType(lazyBreakIterator.lastChar());
Please use static_cast<CharacterType>(x) instead of CharacterType(x).
Glenn Adams
Comment 54
2013-03-10 17:57:05 PDT
(In reply to
comment #53
)
> (From update of
attachment 192388
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=192388&action=review
> > Please look into the single-character-node issue I mention below. That’s the only reason I gave this a review- instead of review+. > > Source/WebCore/rendering/RenderBlockLineLayout.cpp:3083 > > + renderTextInfo.m_lineBreakIterator.setLastChars(lastCharInNode, lastLastCharInNode); > > This does not seem to correctly handle a text node that has only one character in it (or zero characters, for that matter). I think that in that case it would overwrite a characters from the second to last node with a zero.
yes, i can see that... of course the current code (without this patch) doesn't handle any way, so this patch wouldn't regress that; i wonder if it could be addressed in a follow-up bug/patch, just to separate the behavioral impact a bit? or do you think it should be fixed here before moving forward (after i can address the other comments)?
Darin Adler
Comment 55
2013-03-10 18:15:25 PDT
I think it would be best to handle that case, too, in this patch. It seems likely to be a small delta on the patch you have already put together.
Glenn Adams
Comment 56
2013-03-10 19:25:18 PDT
Created
attachment 192396
[details]
Patch
Glenn Adams
Comment 57
2013-03-10 19:26:12 PDT
(In reply to
comment #56
)
> Created an attachment (id=192396) [details] > Patch
address
comment #53
(darin)
Glenn Adams
Comment 58
2013-03-10 19:54:29 PDT
Created
attachment 192397
[details]
Patch for landing
Glenn Adams
Comment 59
2013-03-10 19:55:08 PDT
(In reply to
comment #58
)
> Created an attachment (id=192397) [details] > Patch for landing
tweaked ChangeLog
WebKit Review Bot
Comment 60
2013-03-10 19:58:24 PDT
Comment on
attachment 192397
[details]
Patch for landing Rejecting
attachment 192397
[details]
from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', '--bot-id=gce-cq-03', 'validate-changelog', '--non-interactive', 192397, '--port=chromium-xvfb']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue darin found in /mnt/git/webkit-commit-queue/LayoutTests/ChangeLog does not appear to be a valid reviewer according to committers.py. /mnt/git/webkit-commit-queue/LayoutTests/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive). Full output:
http://webkit-commit-queue.appspot.com/results/16991738
Glenn Adams
Comment 61
2013-03-10 19:59:53 PDT
Created
attachment 192398
[details]
Patch for landing
WebKit Review Bot
Comment 62
2013-03-10 20:01:48 PDT
Comment on
attachment 192398
[details]
Patch for landing Rejecting
attachment 192398
[details]
from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', '--bot-id=gce-cq-01', 'validate-changelog', '--non-interactive', 192398, '--port=chromium-xvfb']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue darin found in /mnt/git/webkit-commit-queue/LayoutTests/ChangeLog does not appear to be a valid reviewer according to committers.py. /mnt/git/webkit-commit-queue/LayoutTests/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive). Full output:
http://webkit-commit-queue.appspot.com/results/17114338
Glenn Adams
Comment 63
2013-03-10 20:03:13 PDT
Created
attachment 192399
[details]
Patch for landing
WebKit Review Bot
Comment 64
2013-03-10 20:33:10 PDT
Comment on
attachment 192399
[details]
Patch for landing Clearing flags on attachment: 192399 Committed
r145338
: <
http://trac.webkit.org/changeset/145338
>
WebKit Review Bot
Comment 65
2013-03-10 20:33:18 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 66
2024-05-06 14:43:37 PDT
***
Bug 63894
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug