Bug 116877

Summary:

Optimize RenderText::offsetNext for 8 bit strings

Product:

WebKit

Reporter:

Adam Barth <abarth>

Component:

New Bugs

Assignee:

Adam Barth <abarth>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

ap, benjamin, commit-queue, darin, esprehn+autocc, glenn, ojan, rniwa

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Attachments:

Description	Flags
Patch	none
Patch	none

Adam Barth

Reported 2013-05-28 11:58:50 PDT

Optimize RenderText::offsetNext for 8 bit strings

Attachments
Patch (2.54 KB, patch) 2013-05-28 12:00 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Patch (59.88 KB, patch) 2013-05-28 22:01 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Adam Barth

Comment 1 2013-05-28 12:00:56 PDT

Created attachment 203074 [details] Patch

Adam Barth

Comment 2 2013-05-28 12:02:34 PDT

@benjaminp: I figured you'd be a good reviewer for the 8-bit strings aspect of this change, but I'm unsure who would be a good reviewer for the RenderText aspects. Would you be willing to either review this patch or suggest another reviewer? Thanks!

Ryosuke Niwa

Comment 3 2013-05-28 12:33:50 PDT

Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review > Source/WebCore/rendering/RenderText.cpp:1731 > + if (isAllASCII() || m_text.is8Bit()) > + return current - 1; I don't understand how this works. Why is it okay to never call textBreakPreceding in this case?

Adam Barth

Comment 4 2013-05-28 13:11:17 PDT

Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review >> Source/WebCore/rendering/RenderText.cpp:1731 >> + return current - 1; > > I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed?

Glenn Adams

Comment 5 2013-05-28 13:20:17 PDT

(In reply to comment #4) > (From update of attachment 203074 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review > > >> Source/WebCore/rendering/RenderText.cpp:1731 > >> + return current - 1; > > > > I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? > > The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed? Any issue with ligatures (ff, fi, fl, ffi, ffl)?

Darin Adler

Comment 6 2013-05-28 14:06:42 PDT

Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review >>>> Source/WebCore/rendering/RenderText.cpp:1731 >>>> + return current - 1; >>> >>> I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? >> >> The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed? > > Any issue with ligatures (ff, fi, fl, ffi, ffl)? The only characters I can think of that that might be a problem: Control characters 0000-001F and 0080-009F, Soft hyphen 00AD. Others should be fine. And no, I don’t see any issue with ligatures.

Darin Adler

Comment 7 2013-05-28 14:09:21 PDT

Comment on attachment 203074 [details] Patch We obviously want something like this; if there is an issue with soft hyphen we should be able to deal with it simply. Would be nice to have a test to be sure. The isAllASCII part of this patch could be landed separately and already delivers the bulk of the value.

Adam Barth

Comment 8 2013-05-28 14:19:24 PDT

Comment on attachment 203074 [details] Patch (In reply to comment #7) > (From update of attachment 203074 [details]) > We obviously want something like this; if there is an issue with soft hyphen we should be able to deal with it simply. It looks like there is an issue with soft hyphen. > Would be nice to have a test to be sure. I verified that soft hypen has non-trivial cursor movement by making the Wikipedia page for soft hypen contenteditable and moving the cursor around the page. > The isAllASCII part of this patch could be landed separately and already delivers the bulk of the value. Indeed. /me will work on a test for the soft hypen issue. Thanks for the review!

Adam Barth

Comment 9 2013-05-28 21:59:37 PDT

Experimenting with a brute force test, I believe this optimization is valid for characters below U+0300 (at least when arranged in an ascending sequence). U+0300 is COMBINING GRAVE ACCENT: http://www.fileformat.info/info/unicode/char/300/index.htm I'll upload a new patch with a test momentarily.

Adam Barth

Comment 10 2013-05-28 22:01:17 PDT

Created attachment 203120 [details] Patch

Adam Barth

Comment 11 2013-05-28 22:03:04 PDT

I also experimented with a RELEASE_ASSERT and a test case involving a bunch of different line breaking scenarios for soft hyphen. The RELEASE_ASSERT didn't fire (except for test cases involving characters >= U+0300), and the interactive behavior was unchanged.

Alexey Proskuryakov

Comment 12 2013-05-28 22:38:51 PDT

The behavior may depend on locale. I do not have a specific example where ICU does that (because I don't know how to search for it), but have a look at digraphs section at <https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet>.

Ryosuke Niwa

Comment 13 2013-05-29 01:01:00 PDT

Related blink change: https://chromium.googlesource.com/chromium/blink/+/441ee6b89876e80754147f3aeccc09b5da423c6b

Adam Barth

Comment 14 2013-05-29 11:53:07 PDT

(In reply to comment #13) > Related blink change: > https://chromium.googlesource.com/chromium/blink/+/441ee6b89876e80754147f3aeccc09b5da423c6b The most closely related change is of course https://chromium.googlesource.com/chromium/blink/+/c02a5a95086dd4ac48bc46d64518db1b083cac11 Anyway, I thought you all might be interested in these optimizations as well.

Darin Adler

Comment 15 2013-05-29 12:21:22 PDT

Comment on attachment 203120 [details] Patch OK. I’m satisfied this performance optimization probably doesn’t change behavior. We have looked and haven’t found any problems. Seems like someone motivated could investigate if we have good behavior for soft hyphens. Aside from this specific function, do we edit them in the way we would like?

Glenn Adams

Comment 16 2013-05-29 12:22:54 PDT

Have you done any perf tests to show what gain is achieved?

Darin Adler

Comment 17 2013-05-29 12:24:17 PDT

The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”.

WebKit Commit Bot

Comment 18 2013-05-29 12:26:03 PDT

Comment on attachment 203120 [details] Patch Clearing flags on attachment: 203120 Committed r150922: <http://trac.webkit.org/changeset/150922>

WebKit Commit Bot

Comment 19 2013-05-29 12:26:07 PDT

All reviewed patches have been landed. Closing bug.

Glenn Adams

Comment 20 2013-05-29 12:29:44 PDT

(In reply to comment #17) > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. I was just wondering about the (time) performance impact of adding the isAllASCII() call.

Ryosuke Niwa

Comment 21 2013-05-29 12:31:01 PDT

(In reply to comment #20) > (In reply to comment #17) > > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. > > I was just wondering about the (time) performance impact of adding the isAllASCII() call. bool isAllASCII() const { return m_isAllASCII; } There should be no performance impact.

Adam Barth

Comment 22 2013-05-29 12:31:45 PDT

> Aside from this specific function, do we edit them in the way we would like? I'm not sure how users expect to be able to edit them, but the behavior isn't what I would have guessed. foo-bar In this case, the - is invisible between foo and bar, but if you try to move the caret through the text (e.g., with the right arrow key), you need to press right-arrow an "extra" time (visually) to get over the invisible hypen. foo- bar In this case, if you have the caret at fo|o, then moving pressing right-arrow once moves the caret after the soft hyphen: foo-|. If you type "a", the character is inserted after the last o and the caret remains after the hypen: fooa-| I don't know enough about editing to know whether those are sensible behaviors, but they don't appear to be affected by this patch. > I hesitate to say this, but “the memory gain is dramatic and obvious”. I profiled all the sources of 16 bit strings in Mobile Gmail in Blink, and this was one of the top sources (aside from a bunch of Chromium-specific sources, which aren't relevant for WebKit). I don't have exact numbers, unfortunately, because I only recently got my memory benchmark to be stable enough to generate repeatable numbers.

Glenn Adams

Comment 23 2013-05-29 12:45:34 PDT

(In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #17) > > > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. > > > > I was just wondering about the (time) performance impact of adding the isAllASCII() call. > > bool isAllASCII() const { return m_isAllASCII; } > > There should be no performance impact. Excellent! I was worried it was iterating over the text (again).

Note You need to log in before you can comment on or make changes to this bug.