WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
116877
Optimize RenderText::offsetNext for 8 bit strings
https://bugs.webkit.org/show_bug.cgi?id=116877
Summary
Optimize RenderText::offsetNext for 8 bit strings
Adam Barth
Reported
2013-05-28 11:58:50 PDT
Optimize RenderText::offsetNext for 8 bit strings
Attachments
Patch
(2.54 KB, patch)
2013-05-28 12:00 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch
(59.88 KB, patch)
2013-05-28 22:01 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2013-05-28 12:00:56 PDT
Created
attachment 203074
[details]
Patch
Adam Barth
Comment 2
2013-05-28 12:02:34 PDT
@benjaminp: I figured you'd be a good reviewer for the 8-bit strings aspect of this change, but I'm unsure who would be a good reviewer for the RenderText aspects. Would you be willing to either review this patch or suggest another reviewer? Thanks!
Ryosuke Niwa
Comment 3
2013-05-28 12:33:50 PDT
Comment on
attachment 203074
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=203074&action=review
> Source/WebCore/rendering/RenderText.cpp:1731 > + if (isAllASCII() || m_text.is8Bit()) > + return current - 1;
I don't understand how this works. Why is it okay to never call textBreakPreceding in this case?
Adam Barth
Comment 4
2013-05-28 13:11:17 PDT
Comment on
attachment 203074
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=203074&action=review
>> Source/WebCore/rendering/RenderText.cpp:1731 >> + return current - 1; > > I don't understand how this works. Why is it okay to never call textBreakPreceding in this case?
The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed?
Glenn Adams
Comment 5
2013-05-28 13:20:17 PDT
(In reply to
comment #4
)
> (From update of
attachment 203074
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=203074&action=review
> > >> Source/WebCore/rendering/RenderText.cpp:1731 > >> + return current - 1; > > > > I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? > > The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed?
Any issue with ligatures (ff, fi, fl, ffi, ffl)?
Darin Adler
Comment 6
2013-05-28 14:06:42 PDT
Comment on
attachment 203074
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=203074&action=review
>>>> Source/WebCore/rendering/RenderText.cpp:1731 >>>> + return current - 1; >>> >>> I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? >> >> The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed? > > Any issue with ligatures (ff, fi, fl, ffi, ffl)?
The only characters I can think of that that might be a problem: Control characters 0000-001F and 0080-009F, Soft hyphen 00AD. Others should be fine. And no, I don’t see any issue with ligatures.
Darin Adler
Comment 7
2013-05-28 14:09:21 PDT
Comment on
attachment 203074
[details]
Patch We obviously want something like this; if there is an issue with soft hyphen we should be able to deal with it simply. Would be nice to have a test to be sure. The isAllASCII part of this patch could be landed separately and already delivers the bulk of the value.
Adam Barth
Comment 8
2013-05-28 14:19:24 PDT
Comment on
attachment 203074
[details]
Patch (In reply to
comment #7
)
> (From update of
attachment 203074
[details]
) > We obviously want something like this; if there is an issue with soft hyphen we should be able to deal with it simply.
It looks like there is an issue with soft hyphen.
> Would be nice to have a test to be sure.
I verified that soft hypen has non-trivial cursor movement by making the Wikipedia page for soft hypen contenteditable and moving the cursor around the page.
> The isAllASCII part of this patch could be landed separately and already delivers the bulk of the value.
Indeed. /me will work on a test for the soft hypen issue. Thanks for the review!
Adam Barth
Comment 9
2013-05-28 21:59:37 PDT
Experimenting with a brute force test, I believe this optimization is valid for characters below U+0300 (at least when arranged in an ascending sequence). U+0300 is COMBINING GRAVE ACCENT:
http://www.fileformat.info/info/unicode/char/300/index.htm
I'll upload a new patch with a test momentarily.
Adam Barth
Comment 10
2013-05-28 22:01:17 PDT
Created
attachment 203120
[details]
Patch
Adam Barth
Comment 11
2013-05-28 22:03:04 PDT
I also experimented with a RELEASE_ASSERT and a test case involving a bunch of different line breaking scenarios for soft hyphen. The RELEASE_ASSERT didn't fire (except for test cases involving characters >= U+0300), and the interactive behavior was unchanged.
Alexey Proskuryakov
Comment 12
2013-05-28 22:38:51 PDT
The behavior may depend on locale. I do not have a specific example where ICU does that (because I don't know how to search for it), but have a look at digraphs section at <
https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet
>.
Ryosuke Niwa
Comment 13
2013-05-29 01:01:00 PDT
Related blink change:
https://chromium.googlesource.com/chromium/blink/+/441ee6b89876e80754147f3aeccc09b5da423c6b
Adam Barth
Comment 14
2013-05-29 11:53:07 PDT
(In reply to
comment #13
)
> Related blink change: >
https://chromium.googlesource.com/chromium/blink/+/441ee6b89876e80754147f3aeccc09b5da423c6b
The most closely related change is of course
https://chromium.googlesource.com/chromium/blink/+/c02a5a95086dd4ac48bc46d64518db1b083cac11
Anyway, I thought you all might be interested in these optimizations as well.
Darin Adler
Comment 15
2013-05-29 12:21:22 PDT
Comment on
attachment 203120
[details]
Patch OK. I’m satisfied this performance optimization probably doesn’t change behavior. We have looked and haven’t found any problems. Seems like someone motivated could investigate if we have good behavior for soft hyphens. Aside from this specific function, do we edit them in the way we would like?
Glenn Adams
Comment 16
2013-05-29 12:22:54 PDT
Have you done any perf tests to show what gain is achieved?
Darin Adler
Comment 17
2013-05-29 12:24:17 PDT
The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”.
WebKit Commit Bot
Comment 18
2013-05-29 12:26:03 PDT
Comment on
attachment 203120
[details]
Patch Clearing flags on attachment: 203120 Committed
r150922
: <
http://trac.webkit.org/changeset/150922
>
WebKit Commit Bot
Comment 19
2013-05-29 12:26:07 PDT
All reviewed patches have been landed. Closing bug.
Glenn Adams
Comment 20
2013-05-29 12:29:44 PDT
(In reply to
comment #17
)
> The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”.
I was just wondering about the (time) performance impact of adding the isAllASCII() call.
Ryosuke Niwa
Comment 21
2013-05-29 12:31:01 PDT
(In reply to
comment #20
)
> (In reply to
comment #17
) > > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. > > I was just wondering about the (time) performance impact of adding the isAllASCII() call.
bool isAllASCII() const { return m_isAllASCII; } There should be no performance impact.
Adam Barth
Comment 22
2013-05-29 12:31:45 PDT
> Aside from this specific function, do we edit them in the way we would like?
I'm not sure how users expect to be able to edit them, but the behavior isn't what I would have guessed. foo-bar In this case, the - is invisible between foo and bar, but if you try to move the caret through the text (e.g., with the right arrow key), you need to press right-arrow an "extra" time (visually) to get over the invisible hypen. foo- bar In this case, if you have the caret at fo|o, then moving pressing right-arrow once moves the caret after the soft hyphen: foo-|. If you type "a", the character is inserted after the last o and the caret remains after the hypen: fooa-| I don't know enough about editing to know whether those are sensible behaviors, but they don't appear to be affected by this patch.
> I hesitate to say this, but “the memory gain is dramatic and obvious”.
I profiled all the sources of 16 bit strings in Mobile Gmail in Blink, and this was one of the top sources (aside from a bunch of Chromium-specific sources, which aren't relevant for WebKit). I don't have exact numbers, unfortunately, because I only recently got my memory benchmark to be stable enough to generate repeatable numbers.
Glenn Adams
Comment 23
2013-05-29 12:45:34 PDT
(In reply to
comment #21
)
> (In reply to
comment #20
) > > (In reply to
comment #17
) > > > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. > > > > I was just wondering about the (time) performance impact of adding the isAllASCII() call. > > bool isAllASCII() const { return m_isAllASCII; } > > There should be no performance impact.
Excellent! I was worried it was iterating over the text (again).
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug