96935 – Add Latin-1 Line Break Iterator to TextBreakIteratorICU.cpp

RESOLVED FIXED 96935

Add Latin-1 Line Break Iterator to TextBreakIteratorICU.cpp

https://bugs.webkit.org/show_bug.cgi?id=96935

Summary Add Latin-1 Line Break Iterator to TextBreakIteratorICU.cpp

Michael Saboff

Reported 2012-09-17 11:24:12 PDT

Currently there is a 16 bit line break iterator provided by the function acquireLineBreakIterator() provided in TextBreakIteratorICU.cpp. To support 8-bit rendering, a 8 bit line break iterator should be added.

Attachments
Draft patch without build changes for non-Mac platforms (16.55 KB, patch) 2012-09-17 11:54 PDT, Michael Saboff	no flags	Details Formatted Diff Diff
Updated Patch (16.48 KB, patch) 2012-09-21 16:01 PDT, Michael Saboff	webkit-ews: commit-queue-	Details Formatted Diff Diff
Patch again (16.48 KB, patch) 2012-09-21 16:15 PDT, Michael Saboff	ggaren: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Michael Saboff

Comment 1 2012-09-17 11:54:11 PDT

Created attachment 164430 [details] Draft patch without build changes for non-Mac platforms Will post an updated patch with changes to other platform that use TextBreakIteratorICU.cpp. The change is neutral in performance on the Layout tests.

Eric Seidel (no email)

Comment 2 2012-09-17 12:25:05 PDT

I assume by LayoutTests you mean PLT? or PerformanceTests? Do we have a complex-text line-breaking perf test? PerformanceTests/Layout/line-breaking.html only does the ascii case.

Eric Seidel (no email)

Comment 3 2012-09-17 12:25:51 PDT

We probably should add a complex line-breaking microbenchmark to PerformanceTests. I'm sure there is a lot there we could work through yet. :)

Benjamin Poulain

Comment 4 2012-09-17 12:29:51 PDT

Comment on attachment 164430 [details] Draft patch without build changes for non-Mac platforms View in context: https://bugs.webkit.org/attachment.cgi?id=164430&action=review > Source/WebCore/platform/text/TextBreakIterator.h:62 > + : m_string(String()) > + , m_locale(AtomicString()) You can remove those. > Source/WebCore/rendering/break_lines.cpp:151 > +template<typename CharacterType, bool treatNoBreakSpaceAsBreak> > +static inline int nextBreakablePosition(LazyLineBreakIterator& lazyBreakIterator, const CharacterType* str, unsigned length, int pos) Chances are we should fully specialize that function instead. I can do that after your patch if you want.

Michael Saboff

Comment 5 2012-09-17 12:45:27 PDT

(In reply to comment #2) > I assume by LayoutTests you mean PLT? or PerformanceTests? > > Do we have a complex-text line-breaking perf test? PerformanceTests/Layout/line-breaking.html only does the ascii case. I mean PerformanceTests/Layout. I don't know if we have a complex text test for layout. This patch doesn't enable using the 8 bit code path as Text nodes and thus RenderText objects are 16 bit until we enable the 8 bit path in the HTML parser.

Michael Saboff

Comment 6 2012-09-17 12:49:12 PDT

(In reply to comment #4) > (From update of attachment 164430 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=164430&action=review > > > Source/WebCore/platform/text/TextBreakIterator.h:62 > > + : m_string(String()) > > + , m_locale(AtomicString()) > > You can remove those. Done. > > Source/WebCore/rendering/break_lines.cpp:151 > > +template<typename CharacterType, bool treatNoBreakSpaceAsBreak> > > +static inline int nextBreakablePosition(LazyLineBreakIterator& lazyBreakIterator, const CharacterType* str, unsigned length, int pos) > > Chances are we should fully specialize that function instead. > > I can do that after your patch if you want. Please elaborate.

Eric Seidel (no email)

Comment 7 2012-09-17 13:14:45 PDT

(In reply to comment #5) > (In reply to comment #2) > > I assume by LayoutTests you mean PLT? or PerformanceTests? > > > > Do we have a complex-text line-breaking perf test? PerformanceTests/Layout/line-breaking.html only does the ascii case. > > I mean PerformanceTests/Layout. I don't know if we have a complex text test for layout. > > This patch doesn't enable using the 8 bit code path as Text nodes and thus RenderText objects are 16 bit until we enable the 8 bit path in the HTML parser. I see. So this path is as of yet unused. Thanks. We should probably catch mitz or jungshik's ear about adding some complex-text Layout PerformanceTests as thats an area we could micro-benchmark more and probably eek more performance out of.

Benjamin Poulain

Comment 8 2012-09-17 13:53:12 PDT

> > > Source/WebCore/rendering/break_lines.cpp:151 > > > +template<typename CharacterType, bool treatNoBreakSpaceAsBreak> > > > +static inline int nextBreakablePosition(LazyLineBreakIterator& lazyBreakIterator, const CharacterType* str, unsigned length, int pos) > > > > Chances are we should fully specialize that function instead. > > > > I can do that after your patch if you want. > > Please elaborate. If you look at nextBreakablePosition(), there is fast path (first if()), and a slower patch for non-ASCII characters (second if()). Chances are we could generalize the fast path and get rid of the second if() for Latin1. What takes time in that particular loop is the number of branches needed for each character. If we could get rid of the second if(), it would likely be a big win.

Michael Saboff

Comment 9 2012-09-17 14:55:37 PDT

(In reply to comment #8) > > > > Source/WebCore/rendering/break_lines.cpp:151 > > > > +template<typename CharacterType, bool treatNoBreakSpaceAsBreak> > > > > +static inline int nextBreakablePosition(LazyLineBreakIterator& lazyBreakIterator, const CharacterType* str, unsigned length, int pos) > > > > > > Chances are we should fully specialize that function instead. > > > > > > I can do that after your patch if you want. > > > > Please elaborate. > > If you look at nextBreakablePosition(), there is fast path (first if()), and a slower patch for non-ASCII characters (second if()). > > Chances are we could generalize the fast path and get rid of the second if() for Latin1. > > What takes time in that particular loop is the number of branches needed for each character. If we could get rid of the second if(), it would likely be a big win. We could do that in this patch or as a follow on. We would have to generate the new entries for the line break table.

Michael Saboff

Comment 10 2012-09-21 16:01:27 PDT

Created attachment 165218 [details] Updated Patch Removed the unneeded explicit initializations.

Early Warning System Bot

Comment 11 2012-09-21 16:06:46 PDT

Comment on attachment 165218 [details] Updated Patch Attachment 165218 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13957821

Gyuyoung Kim

Comment 12 2012-09-21 16:07:10 PDT

Comment on attachment 165218 [details] Updated Patch Attachment 165218 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13954780

Early Warning System Bot

Comment 13 2012-09-21 16:08:51 PDT

Comment on attachment 165218 [details] Updated Patch Attachment 165218 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13949796

Michael Saboff

Comment 14 2012-09-21 16:15:11 PDT

Created attachment 165221 [details] Patch again

Geoffrey Garen

Comment 15 2012-09-25 23:35:38 PDT

Comment on attachment 165221 [details] Patch again r=me

Michael Saboff

Comment 16 2012-09-26 09:54:17 PDT

Committed r129662: <http://trac.webkit.org/changeset/129662>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Layout and Rendering

Assignee

Michael Saboff

Reported

2012-09-17 11:24 PDT

Modified

2022-02-27 23:34 PST History

CC List

8 users Show

URL

Keywords

Depends on

Blocks