Bug 96935 - Add Latin-1 Line Break Iterator to TextBreakIteratorICU.cpp
Summary: Add Latin-1 Line Break Iterator to TextBreakIteratorICU.cpp
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-17 11:24 PDT by Michael Saboff
Modified: 2022-02-27 23:34 PST (History)
8 users (show)

See Also:


Attachments
Draft patch without build changes for non-Mac platforms (16.55 KB, patch)
2012-09-17 11:54 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated Patch (16.48 KB, patch)
2012-09-21 16:01 PDT, Michael Saboff
webkit-ews: commit-queue-
Details | Formatted Diff | Diff
Patch again (16.48 KB, patch)
2012-09-21 16:15 PDT, Michael Saboff
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2012-09-17 11:24:12 PDT
Currently there is a 16 bit line break iterator provided by the function acquireLineBreakIterator() provided in TextBreakIteratorICU.cpp.  To support 8-bit rendering, a 8 bit line break iterator should be added.
Comment 1 Michael Saboff 2012-09-17 11:54:11 PDT
Created attachment 164430 [details]
Draft patch without build changes for non-Mac platforms

Will post an updated patch with changes to other platform that use TextBreakIteratorICU.cpp.

The change is neutral in performance on the Layout tests.
Comment 2 Eric Seidel (no email) 2012-09-17 12:25:05 PDT
I assume by LayoutTests you mean PLT? or PerformanceTests?

Do we have a complex-text line-breaking perf test?  PerformanceTests/Layout/line-breaking.html only does the ascii case.
Comment 3 Eric Seidel (no email) 2012-09-17 12:25:51 PDT
We probably should add a complex line-breaking microbenchmark to PerformanceTests.  I'm sure there is a lot there we could work through yet. :)
Comment 4 Benjamin Poulain 2012-09-17 12:29:51 PDT
Comment on attachment 164430 [details]
Draft patch without build changes for non-Mac platforms

View in context: https://bugs.webkit.org/attachment.cgi?id=164430&action=review

> Source/WebCore/platform/text/TextBreakIterator.h:62
> +        : m_string(String())
> +        , m_locale(AtomicString())

You can remove those.

> Source/WebCore/rendering/break_lines.cpp:151
> +template<typename CharacterType, bool treatNoBreakSpaceAsBreak>
> +static inline int nextBreakablePosition(LazyLineBreakIterator& lazyBreakIterator, const CharacterType* str, unsigned length, int pos)

Chances are we should fully specialize that function instead.

I can do that after your patch if you want.
Comment 5 Michael Saboff 2012-09-17 12:45:27 PDT
(In reply to comment #2)
> I assume by LayoutTests you mean PLT? or PerformanceTests?
> 
> Do we have a complex-text line-breaking perf test?  PerformanceTests/Layout/line-breaking.html only does the ascii case.

I mean PerformanceTests/Layout.  I don't know if we have a complex text test for layout.

This patch doesn't enable using the 8 bit code path as Text nodes and thus RenderText objects are 16 bit until we enable the 8 bit path in the HTML parser.
Comment 6 Michael Saboff 2012-09-17 12:49:12 PDT
(In reply to comment #4)
> (From update of attachment 164430 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=164430&action=review
> 
> > Source/WebCore/platform/text/TextBreakIterator.h:62
> > +        : m_string(String())
> > +        , m_locale(AtomicString())
> 
> You can remove those.

Done.
 
> > Source/WebCore/rendering/break_lines.cpp:151
> > +template<typename CharacterType, bool treatNoBreakSpaceAsBreak>
> > +static inline int nextBreakablePosition(LazyLineBreakIterator& lazyBreakIterator, const CharacterType* str, unsigned length, int pos)
> 
> Chances are we should fully specialize that function instead.
> 
> I can do that after your patch if you want.

Please elaborate.
Comment 7 Eric Seidel (no email) 2012-09-17 13:14:45 PDT
(In reply to comment #5)
> (In reply to comment #2)
> > I assume by LayoutTests you mean PLT? or PerformanceTests?
> > 
> > Do we have a complex-text line-breaking perf test?  PerformanceTests/Layout/line-breaking.html only does the ascii case.
> 
> I mean PerformanceTests/Layout.  I don't know if we have a complex text test for layout.
> 
> This patch doesn't enable using the 8 bit code path as Text nodes and thus RenderText objects are 16 bit until we enable the 8 bit path in the HTML parser.

I see.  So this path is as of yet unused.  Thanks.

We should probably catch mitz or jungshik's ear about adding some complex-text Layout PerformanceTests as thats an area we could micro-benchmark more and probably eek more performance out of.
Comment 8 Benjamin Poulain 2012-09-17 13:53:12 PDT
> > > Source/WebCore/rendering/break_lines.cpp:151
> > > +template<typename CharacterType, bool treatNoBreakSpaceAsBreak>
> > > +static inline int nextBreakablePosition(LazyLineBreakIterator& lazyBreakIterator, const CharacterType* str, unsigned length, int pos)
> > 
> > Chances are we should fully specialize that function instead.
> > 
> > I can do that after your patch if you want.
> 
> Please elaborate.

If you look at nextBreakablePosition(), there is fast path (first if()), and a slower patch for non-ASCII characters (second if()).

Chances are we could generalize the fast path and get rid of the second if() for Latin1.

What takes time in that particular loop is the number of branches needed for each character. If we could get rid of the second if(), it would likely be a big win.
Comment 9 Michael Saboff 2012-09-17 14:55:37 PDT
(In reply to comment #8)
> > > > Source/WebCore/rendering/break_lines.cpp:151
> > > > +template<typename CharacterType, bool treatNoBreakSpaceAsBreak>
> > > > +static inline int nextBreakablePosition(LazyLineBreakIterator& lazyBreakIterator, const CharacterType* str, unsigned length, int pos)
> > > 
> > > Chances are we should fully specialize that function instead.
> > > 
> > > I can do that after your patch if you want.
> > 
> > Please elaborate.
> 
> If you look at nextBreakablePosition(), there is fast path (first if()), and a slower patch for non-ASCII characters (second if()).
> 
> Chances are we could generalize the fast path and get rid of the second if() for Latin1.
> 
> What takes time in that particular loop is the number of branches needed for each character. If we could get rid of the second if(), it would likely be a big win.

We could do that in this patch or as a follow on.  We would have to generate the new entries for the line break table.
Comment 10 Michael Saboff 2012-09-21 16:01:27 PDT
Created attachment 165218 [details]
Updated Patch

Removed the unneeded explicit initializations.
Comment 11 Early Warning System Bot 2012-09-21 16:06:46 PDT
Comment on attachment 165218 [details]
Updated Patch

Attachment 165218 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/13957821
Comment 12 Gyuyoung Kim 2012-09-21 16:07:10 PDT
Comment on attachment 165218 [details]
Updated Patch

Attachment 165218 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13954780
Comment 13 Early Warning System Bot 2012-09-21 16:08:51 PDT
Comment on attachment 165218 [details]
Updated Patch

Attachment 165218 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13949796
Comment 14 Michael Saboff 2012-09-21 16:15:11 PDT
Created attachment 165221 [details]
Patch again
Comment 15 Geoffrey Garen 2012-09-25 23:35:38 PDT
Comment on attachment 165221 [details]
Patch again

r=me
Comment 16 Michael Saboff 2012-09-26 09:54:17 PDT
Committed r129662: <http://trac.webkit.org/changeset/129662>