Summary: | Optimize RenderText::offsetNext for 8 bit strings | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Adam Barth <abarth> | ||||||
Component: | New Bugs | Assignee: | Adam Barth <abarth> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Normal | CC: | ap, benjamin, commit-queue, darin, esprehn+autocc, glenn, ojan, rniwa | ||||||
Priority: | P2 | ||||||||
Version: | 528+ (Nightly build) | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Attachments: |
|
Description
Adam Barth
2013-05-28 11:58:50 PDT
Created attachment 203074 [details]
Patch
@benjaminp: I figured you'd be a good reviewer for the 8-bit strings aspect of this change, but I'm unsure who would be a good reviewer for the RenderText aspects. Would you be willing to either review this patch or suggest another reviewer? Thanks! Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review > Source/WebCore/rendering/RenderText.cpp:1731 > + if (isAllASCII() || m_text.is8Bit()) > + return current - 1; I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review >> Source/WebCore/rendering/RenderText.cpp:1731 >> + return current - 1; > > I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed? (In reply to comment #4) > (From update of attachment 203074 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review > > >> Source/WebCore/rendering/RenderText.cpp:1731 > >> + return current - 1; > > > > I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? > > The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed? Any issue with ligatures (ff, fi, fl, ffi, ffl)? Comment on attachment 203074 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203074&action=review >>>> Source/WebCore/rendering/RenderText.cpp:1731 >>>> + return current - 1; >>> >>> I don't understand how this works. Why is it okay to never call textBreakPreceding in this case? >> >> The idea is that the cursorMovementIterator will always move one character at a time for Latin-1 strings. Perhaps that idea is flawed? > > Any issue with ligatures (ff, fi, fl, ffi, ffl)? The only characters I can think of that that might be a problem: Control characters 0000-001F and 0080-009F, Soft hyphen 00AD. Others should be fine. And no, I don’t see any issue with ligatures. Comment on attachment 203074 [details]
Patch
We obviously want something like this; if there is an issue with soft hyphen we should be able to deal with it simply. Would be nice to have a test to be sure. The isAllASCII part of this patch could be landed separately and already delivers the bulk of the value.
Comment on attachment 203074 [details] Patch (In reply to comment #7) > (From update of attachment 203074 [details]) > We obviously want something like this; if there is an issue with soft hyphen we should be able to deal with it simply. It looks like there is an issue with soft hyphen. > Would be nice to have a test to be sure. I verified that soft hypen has non-trivial cursor movement by making the Wikipedia page for soft hypen contenteditable and moving the cursor around the page. > The isAllASCII part of this patch could be landed separately and already delivers the bulk of the value. Indeed. /me will work on a test for the soft hypen issue. Thanks for the review! Experimenting with a brute force test, I believe this optimization is valid for characters below U+0300 (at least when arranged in an ascending sequence). U+0300 is COMBINING GRAVE ACCENT: http://www.fileformat.info/info/unicode/char/300/index.htm I'll upload a new patch with a test momentarily. Created attachment 203120 [details]
Patch
I also experimented with a RELEASE_ASSERT and a test case involving a bunch of different line breaking scenarios for soft hyphen. The RELEASE_ASSERT didn't fire (except for test cases involving characters >= U+0300), and the interactive behavior was unchanged. The behavior may depend on locale. I do not have a specific example where ICU does that (because I don't know how to search for it), but have a look at digraphs section at <https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet>. Related blink change: https://chromium.googlesource.com/chromium/blink/+/441ee6b89876e80754147f3aeccc09b5da423c6b (In reply to comment #13) > Related blink change: > https://chromium.googlesource.com/chromium/blink/+/441ee6b89876e80754147f3aeccc09b5da423c6b The most closely related change is of course https://chromium.googlesource.com/chromium/blink/+/c02a5a95086dd4ac48bc46d64518db1b083cac11 Anyway, I thought you all might be interested in these optimizations as well. Comment on attachment 203120 [details]
Patch
OK. I’m satisfied this performance optimization probably doesn’t change behavior. We have looked and haven’t found any problems.
Seems like someone motivated could investigate if we have good behavior for soft hyphens. Aside from this specific function, do we edit them in the way we would like?
Have you done any perf tests to show what gain is achieved? The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. Comment on attachment 203120 [details] Patch Clearing flags on attachment: 203120 Committed r150922: <http://trac.webkit.org/changeset/150922> All reviewed patches have been landed. Closing bug. (In reply to comment #17) > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. I was just wondering about the (time) performance impact of adding the isAllASCII() call. (In reply to comment #20) > (In reply to comment #17) > > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. > > I was just wondering about the (time) performance impact of adding the isAllASCII() call. bool isAllASCII() const { return m_isAllASCII; } There should be no performance impact. > Aside from this specific function, do we edit them in the way we would like? I'm not sure how users expect to be able to edit them, but the behavior isn't what I would have guessed. foo-bar In this case, the - is invisible between foo and bar, but if you try to move the caret through the text (e.g., with the right arrow key), you need to press right-arrow an "extra" time (visually) to get over the invisible hypen. foo- bar In this case, if you have the caret at fo|o, then moving pressing right-arrow once moves the caret after the soft hyphen: foo-|. If you type "a", the character is inserted after the last o and the caret remains after the hypen: fooa-| I don't know enough about editing to know whether those are sensible behaviors, but they don't appear to be affected by this patch. > I hesitate to say this, but “the memory gain is dramatic and obvious”. I profiled all the sources of 16 bit strings in Mobile Gmail in Blink, and this was one of the top sources (aside from a bunch of Chromium-specific sources, which aren't relevant for WebKit). I don't have exact numbers, unfortunately, because I only recently got my memory benchmark to be stable enough to generate repeatable numbers. (In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #17) > > > The main gain here is memory; all these 8-bit strings were converted to 16-bit strings. I hesitate to say this, but “the memory gain is dramatic and obvious”. > > > > I was just wondering about the (time) performance impact of adding the isAllASCII() call. > > bool isAllASCII() const { return m_isAllASCII; } > > There should be no performance impact. Excellent! I was worried it was iterating over the text (again). |