Bug 116877

Summary: Optimize RenderText::offsetNext for 8 bit strings
Product: WebKit Reporter: Adam Barth <abarth>
Component: New BugsAssignee: Adam Barth <abarth>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, benjamin, commit-queue, darin, esprehn+autocc, glenn, ojan, rniwa
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch none

Adam Barth
Reported 2013-05-28 11:58:50 PDT
Optimize RenderText::offsetNext for 8 bit strings
Attachments
Patch (2.54 KB, patch)
2013-05-28 12:00 PDT, Adam Barth
no flags
Patch (59.88 KB, patch)
2013-05-28 22:01 PDT, Adam Barth
no flags
Adam Barth
Comment 1 2013-05-28 12:00:56 PDT
Adam Barth
Comment 2 2013-05-28 12:02:34 PDT
@benjaminp: I figured you'd be a good reviewer for the 8-bit strings aspect of this change, but I'm unsure who would be a good reviewer for the RenderText aspects. Would you be willing to either review this patch or suggest another reviewer? Thanks!
Ryosuke Niwa
Comment 3 2013-05-28 12:33:50 PDT
Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review > Source/WebCore/rendering/RenderText.cpp:1731 > + if (isAllASCII() || m_text.is8Bit()) > + return current - 1; I don't understand how this works. Why is it okay to never call textBreakPreceding in this case?
Adam Barth
Comment 4 2013-05-28 13:11:17 PDT
Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review >> Source/WebCore/rendering/RenderText.cpp:1731 >> + return current - 1; > > I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed?
Glenn Adams
Comment 5 2013-05-28 13:20:17 PDT
(In reply to comment #4) > (From update of attachment 203074 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review > > >> Source/WebCore/rendering/RenderText.cpp:1731 > >> + return current - 1; > > > > I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? > > The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed? Any issue with ligatures (ff, fi, fl, ffi, ffl)?
Darin Adler
Comment 6 2013-05-28 14:06:42 PDT
Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review >>>> Source/WebCore/rendering/RenderText.cpp:1731 >>>> + return current - 1; >>> >>> I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? >> >> The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed? > > Any issue with ligatures (ff, fi, fl, ffi, ffl)? The only characters I can think of that that might be a problem: Control characters 0000-001F and 0080-009F, Soft hyphen 00AD. Others should be fine. And no, I don’t see any issue with ligatures.
Darin Adler
Comment 7 2013-05-28 14:09:21 PDT
Comment on attachment 203074 [details] Patch We obviously want something like this; if there is an issue with soft hyphen we should be able to deal with it simply. Would be nice to have a test to be sure. The isAllASCII part of this patch could be landed separately and already delivers the bulk of the value.
Adam Barth
Comment 8 2013-05-28 14:19:24 PDT
Comment on attachment 203074 [details] Patch (In reply to comment #7) > (From update of attachment 203074 [details]) > We obviously want something like this; if there is an issue with soft hyphen we should be able to deal with it simply. It looks like there is an issue with soft hyphen. > Would be nice to have a test to be sure. I verified that soft hypen has non-trivial cursor movement by making the Wikipedia page for soft hypen contenteditable and moving the cursor around the page. > The isAllASCII part of this patch could be landed separately and already delivers the bulk of the value. Indeed. /me will work on a test for the soft hypen issue. Thanks for the review!
Adam Barth
Comment 9 2013-05-28 21:59:37 PDT
Experimenting with a brute force test, I believe this optimization is valid for characters below U+0300 (at least when arranged in an ascending sequence). U+0300 is COMBINING GRAVE ACCENT: http://www.fileformat.info/info/unicode/char/300/index.htm I'll upload a new patch with a test momentarily.
Adam Barth
Comment 10 2013-05-28 22:01:17 PDT
Adam Barth
Comment 11 2013-05-28 22:03:04 PDT
I also experimented with a RELEASE_ASSERT and a test case involving a bunch of different line breaking scenarios for soft hyphen. The RELEASE_ASSERT didn't fire (except for test cases involving characters >= U+0300), and the interactive behavior was unchanged.
Alexey Proskuryakov
Comment 12 2013-05-28 22:38:51 PDT
The behavior may depend on locale. I do not have a specific example where ICU does that (because I don't know how to search for it), but have a look at digraphs section at <https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet>.
Adam Barth
Comment 14 2013-05-29 11:53:07 PDT
(In reply to comment #13) > Related blink change: > https://chromium.googlesource.com/chromium/blink/+/441ee6b89876e80754147f3aeccc09b5da423c6b The most closely related change is of course https://chromium.googlesource.com/chromium/blink/+/c02a5a95086dd4ac48bc46d64518db1b083cac11 Anyway, I thought you all might be interested in these optimizations as well.
Darin Adler
Comment 15 2013-05-29 12:21:22 PDT
Comment on attachment 203120 [details] Patch OK. I’m satisfied this performance optimization probably doesn’t change behavior. We have looked and haven’t found any problems. Seems like someone motivated could investigate if we have good behavior for soft hyphens. Aside from this specific function, do we edit them in the way we would like?
Glenn Adams
Comment 16 2013-05-29 12:22:54 PDT
Have you done any perf tests to show what gain is achieved?
Darin Adler
Comment 17 2013-05-29 12:24:17 PDT
The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”.
WebKit Commit Bot
Comment 18 2013-05-29 12:26:03 PDT
Comment on attachment 203120 [details] Patch Clearing flags on attachment: 203120 Committed r150922: <http://trac.webkit.org/changeset/150922>
WebKit Commit Bot
Comment 19 2013-05-29 12:26:07 PDT
All reviewed patches have been landed. Closing bug.
Glenn Adams
Comment 20 2013-05-29 12:29:44 PDT
(In reply to comment #17) > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. I was just wondering about the (time) performance impact of adding the isAllASCII() call.
Ryosuke Niwa
Comment 21 2013-05-29 12:31:01 PDT
(In reply to comment #20) > (In reply to comment #17) > > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. > > I was just wondering about the (time) performance impact of adding the isAllASCII() call. bool isAllASCII() const { return m_isAllASCII; } There should be no performance impact.
Adam Barth
Comment 22 2013-05-29 12:31:45 PDT
> Aside from this specific function, do we edit them in the way we would like? I'm not sure how users expect to be able to edit them, but the behavior isn't what I would have guessed. foo-bar In this case, the - is invisible between foo and bar, but if you try to move the caret through the text (e.g., with the right arrow key), you need to press right-arrow an "extra" time (visually) to get over the invisible hypen. foo- bar In this case, if you have the caret at fo|o, then moving pressing right-arrow once moves the caret after the soft hyphen: foo-|. If you type "a", the character is inserted after the last o and the caret remains after the hypen: fooa-| I don't know enough about editing to know whether those are sensible behaviors, but they don't appear to be affected by this patch. > I hesitate to say this, but “the memory gain is dramatic and obvious”. I profiled all the sources of 16 bit strings in Mobile Gmail in Blink, and this was one of the top sources (aside from a bunch of Chromium-specific sources, which aren't relevant for WebKit). I don't have exact numbers, unfortunately, because I only recently got my memory benchmark to be stable enough to generate repeatable numbers.
Glenn Adams
Comment 23 2013-05-29 12:45:34 PDT
(In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #17) > > > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. > > > > I was just wondering about the (time) performance impact of adding the isAllASCII() call. > > bool isAllASCII() const { return m_isAllASCII; } > > There should be no performance impact. Excellent! I was worried it was iterating over the text (again).
Note You need to log in before you can comment on or make changes to this bug.