as it turns out there are certain unicode categories that could make the simple (8bit) content visually reordered.
Created attachment 447318 [details] Patch
Comment on attachment 447318 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=447318&action=review > Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:279 > + , paragraphContentBuilder.is8Bit() ? StringView(paragraphContentBuilder).upconvertedCharacters() : paragraphContentBuilder.characters16() I think you can just call upconvertedCharacters unconditionally. It handles all cases efficiently. I'd keep this call as a single and use variables if needed.
Created attachment 447355 [details] Patch
Committed r287142 (245326@main): <https://commits.webkit.org/245326@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 447355 [details].
<rdar://problem/86580841>
Comment on attachment 447355 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=447355&action=review > Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:279 > + , StringView(paragraphContentBuilder).upconvertedCharacters() Is the lifetime OK here? I think it might not be. As soon as the ubidi_setPara function is done, the temporary buffer containing the unconverted characters will be deallocated. Also worried about the performance implications; the upconvertedCharacters code path is not necessarily the most efficient way to get 16-bit characters for a purpose like this.
(In reply to Darin Adler from comment #6) > Comment on attachment 447355 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=447355&action=review > > > Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:279 > > + , StringView(paragraphContentBuilder).upconvertedCharacters() > > Is the lifetime OK here? I think it might not be. As soon as the > ubidi_setPara function is done, the temporary buffer containing the > unconverted characters will be deallocated. You are right. According to the documentation "...a pointer to the text that the Bidi algorithm will be performed on. This pointer is stored in the UBiDi object and can be retrieved with ubidi_getText()." Will fix. Thanks! > > Also worried about the performance implications; the upconvertedCharacters > code path is not necessarily the most efficient way to get 16-bit characters > for a purpose like this. Do you have any suggestion? In most cases we already have a 16bit content here, only a very simple, 8bit content with RTL direction needs upconverting.
(In reply to zalan from comment #7) > Will fix. Thanks! Can fix it by putting the upconvertedCharacters result into a local variable, but it needs to be of type UpconvertedCharacters (likely done with "auto"), not a const UChar*. > > Also worried about the performance implications; the upconvertedCharacters > > code path is not necessarily the most efficient way to get 16-bit characters > > for a purpose like this. > > Do you have any suggestion? In most cases we already have a 16bit content > here, only a very simple, 8bit content with RTL direction needs upconverting. Oh, I didn’t realize that. If we’ll skip this code path in the common case and not do it for all cases of 8-bit characters, then I withdraw my objection. One example of how this is not necessarily best optimized for heavier uses is that UpconvertedCharacters puts the characters on the stack rather than doing a malloc/free pair if there are <= 32 characters. Explicitly-written code could set a different threshold, which would be valuable if longer strings are common, and the malloc/free cost is a significant part of the total cost.
(In reply to Darin Adler from comment #8) > (In reply to zalan from comment #7) > > Will fix. Thanks! > > Can fix it by putting the upconvertedCharacters result into a local > variable, but it needs to be of type UpconvertedCharacters (likely done with > "auto"), not a const UChar*. Thanks. > > > > Also worried about the performance implications; the upconvertedCharacters > > > code path is not necessarily the most efficient way to get 16-bit characters > > > for a purpose like this. > > > > Do you have any suggestion? In most cases we already have a 16bit content > > here, only a very simple, 8bit content with RTL direction needs upconverting. > > Oh, I didn’t realize that. If we’ll skip this code path in the common case > and not do it for all cases of 8-bit characters, then I withdraw my > objection. > > One example of how this is not necessarily best optimized for heavier uses > is that UpconvertedCharacters puts the characters on the stack rather than > doing a malloc/free pair if there are <= 32 characters. Explicitly-written > code could set a different threshold, which would be valuable if longer > strings are common, and the malloc/free cost is a significant part of the > total cost. Makes sense. For the common codepath this would certainly need an explicitly-written code (luckily this is not the case here -also, on ToT we always land with a 16bit content here (RTL is not yet enabled)).
Reopening to attach new patch.
Created attachment 447411 [details] Patch
Comment on attachment 447411 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=447411&action=review > Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:281 > + , bidiContent.get() Should work without the explicit call to get()
Created attachment 447452 [details] Patch
Committed r287184 (245351@main): <https://commits.webkit.org/245351@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 447452 [details].