Bug 234380 - [LFC][IFC] Simple RTL content may need visual reordering
Summary: [LFC][IFC] Simple RTL content may need visual reordering
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: zalan
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-12-15 19:36 PST by zalan
Modified: 2021-12-17 07:19 PST (History)
6 users (show)

See Also:


Attachments
Patch (3.69 KB, patch)
2021-12-15 19:40 PST, zalan
no flags Details | Formatted Diff | Diff
Patch (2.88 KB, patch)
2021-12-16 07:56 PST, zalan
no flags Details | Formatted Diff | Diff
Patch (2.42 KB, patch)
2021-12-16 19:16 PST, zalan
no flags Details | Formatted Diff | Diff
Patch (2.41 KB, patch)
2021-12-17 06:17 PST, zalan
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description zalan 2021-12-15 19:36:41 PST
as it turns out there are certain unicode categories that could make the simple (8bit) content visually reordered.
Comment 1 zalan 2021-12-15 19:40:09 PST
Created attachment 447318 [details]
Patch
Comment 2 Antti Koivisto 2021-12-16 06:20:45 PST
Comment on attachment 447318 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=447318&action=review

> Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:279
> +        , paragraphContentBuilder.is8Bit() ? StringView(paragraphContentBuilder).upconvertedCharacters() : paragraphContentBuilder.characters16()

I think you can just call upconvertedCharacters unconditionally. It handles all cases efficiently.

I'd keep this call as a single and use variables if needed.
Comment 3 zalan 2021-12-16 07:56:46 PST
Created attachment 447355 [details]
Patch
Comment 4 EWS 2021-12-16 09:37:49 PST
Committed r287142 (245326@main): <https://commits.webkit.org/245326@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 447355 [details].
Comment 5 Radar WebKit Bug Importer 2021-12-16 09:38:17 PST
<rdar://problem/86580841>
Comment 6 Darin Adler 2021-12-16 13:24:49 PST
Comment on attachment 447355 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=447355&action=review

> Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:279
> +        , StringView(paragraphContentBuilder).upconvertedCharacters()

Is the lifetime OK here? I think it might not be. As soon as the ubidi_setPara function is done, the temporary buffer containing the unconverted characters will be deallocated.

Also worried about the performance implications; the upconvertedCharacters code path is not necessarily the most efficient way to get 16-bit characters for a purpose like this.
Comment 7 zalan 2021-12-16 13:37:40 PST
(In reply to Darin Adler from comment #6)
> Comment on attachment 447355 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=447355&action=review
> 
> > Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:279
> > +        , StringView(paragraphContentBuilder).upconvertedCharacters()
> 
> Is the lifetime OK here? I think it might not be. As soon as the
> ubidi_setPara function is done, the temporary buffer containing the
> unconverted characters will be deallocated.
You are right. According to the documentation
"...a pointer to the text that the Bidi algorithm will be performed on. This pointer is stored in the UBiDi object and can be retrieved with ubidi_getText()."
Will fix. Thanks!

> 
> Also worried about the performance implications; the upconvertedCharacters
> code path is not necessarily the most efficient way to get 16-bit characters
> for a purpose like this.
Do you have any suggestion? In most cases we already have a 16bit content here, only a very simple, 8bit content with RTL direction needs upconverting.
Comment 8 Darin Adler 2021-12-16 13:42:46 PST
(In reply to zalan from comment #7)
> Will fix. Thanks!

Can fix it by putting the upconvertedCharacters result into a local variable, but it needs to be of type UpconvertedCharacters (likely done with "auto"), not a const UChar*.

> > Also worried about the performance implications; the upconvertedCharacters
> > code path is not necessarily the most efficient way to get 16-bit characters
> > for a purpose like this.
>
> Do you have any suggestion? In most cases we already have a 16bit content
> here, only a very simple, 8bit content with RTL direction needs upconverting.

Oh, I didn’t realize that. If we’ll skip this code path in the common case and not do it for all cases of 8-bit characters, then I withdraw my objection.

One example of how this is not necessarily best optimized for heavier uses is that UpconvertedCharacters puts the characters on the stack rather than doing a malloc/free pair if there are <= 32 characters. Explicitly-written code could set a different threshold, which would be valuable if longer strings are common, and the malloc/free cost is a significant part of the total cost.
Comment 9 zalan 2021-12-16 13:54:01 PST
(In reply to Darin Adler from comment #8)
> (In reply to zalan from comment #7)
> > Will fix. Thanks!
> 
> Can fix it by putting the upconvertedCharacters result into a local
> variable, but it needs to be of type UpconvertedCharacters (likely done with
> "auto"), not a const UChar*.
Thanks.

> 
> > > Also worried about the performance implications; the upconvertedCharacters
> > > code path is not necessarily the most efficient way to get 16-bit characters
> > > for a purpose like this.
> >
> > Do you have any suggestion? In most cases we already have a 16bit content
> > here, only a very simple, 8bit content with RTL direction needs upconverting.
> 
> Oh, I didn’t realize that. If we’ll skip this code path in the common case
> and not do it for all cases of 8-bit characters, then I withdraw my
> objection.
> 
> One example of how this is not necessarily best optimized for heavier uses
> is that UpconvertedCharacters puts the characters on the stack rather than
> doing a malloc/free pair if there are <= 32 characters. Explicitly-written
> code could set a different threshold, which would be valuable if longer
> strings are common, and the malloc/free cost is a significant part of the
> total cost.
Makes sense. For the common codepath this would certainly need an explicitly-written code (luckily this is not the case here -also, on ToT we always land with a 16bit content here (RTL is not yet enabled)).
Comment 10 zalan 2021-12-16 19:16:03 PST
Reopening to attach new patch.
Comment 11 zalan 2021-12-16 19:16:05 PST
Created attachment 447411 [details]
Patch
Comment 12 Darin Adler 2021-12-16 21:37:13 PST
Comment on attachment 447411 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=447411&action=review

> Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:281
> +        , bidiContent.get()

Should work without the explicit call to get()
Comment 13 zalan 2021-12-17 06:17:58 PST
Created attachment 447452 [details]
Patch
Comment 14 EWS 2021-12-17 07:19:14 PST
Committed r287184 (245351@main): <https://commits.webkit.org/245351@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 447452 [details].