RESOLVED FIXED231606
ASSERT hit in surrogatePairAwareIndex and surrogatePairAwareStart lambdas for text with unpaired surrogates
https://bugs.webkit.org/show_bug.cgi?id=231606
Summary ASSERT hit in surrogatePairAwareIndex and surrogatePairAwareStart lambdas for...
Gabriel Nava Marino
Reported 2021-10-12 12:19:11 PDT
surrogatePairAwareIndex and surrogatePairAwareStart lambdas in TextUtil::MidWordBreak TextUtil::midWordBreak are assuming there are no unpaired surrogates but that is not guaranteed as they don't always come in pairs.
Attachments
Patch (12.34 KB, patch)
2021-10-12 14:23 PDT, Gabriel Nava Marino
no flags
Patch (3.87 KB, patch)
2021-10-13 15:46 PDT, Gabriel Nava Marino
no flags
Patch (2.98 KB, patch)
2021-10-13 16:17 PDT, Gabriel Nava Marino
no flags
Patch (2.93 KB, patch)
2021-10-13 16:43 PDT, Gabriel Nava Marino
no flags
Gabriel Nava Marino
Comment 1 2021-10-12 14:23:35 PDT
Myles C. Maxfield
Comment 2 2021-10-12 14:31:55 PDT
Comment on attachment 440982 [details] Patch Awesome 💯
Darin Adler
Comment 3 2021-10-12 15:36:11 PDT
Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:150 > + bool isLead = U16_IS_LEAD(text[index]) && (index + 1) < text.length() && U16_IS_TRAIL(text[index + 1]); Given that this is WTF::String, we don’t need the rang checks on index. The subscript operator does enough range checking. > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:-152 > - RELEASE_ASSERT(index + 1 < text.length()); Why remove this? > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:153 > return ++index; This is bizarre. It should return index + 1; why use ++? > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:172 > + bool isTrail = index && U16_IS_LEAD(text[index - 1]) && index < text.length() && U16_IS_TRAIL(text[index]); Given that this is WTF::String, we don’t need the additional range checks on index. The subscript operator does enough range checking. > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:176 > return --index; This is bizarre. It should return index - 1; why use --?
EWS
Comment 4 2021-10-12 15:47:43 PDT
Committed r284050 (242879@main): <https://commits.webkit.org/242879@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 440982 [details].
Radar WebKit Bug Importer
Comment 5 2021-10-12 15:48:39 PDT
Gabriel Nava Marino
Comment 6 2021-10-12 15:53:31 PDT
View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:150 >> + bool isLead = U16_IS_LEAD(text[index]) && (index + 1) < text.length() && U16_IS_TRAIL(text[index + 1]); > > Given that this is WTF::String, we don’t need the rang checks on index. The subscript operator does enough range checking. Thank you for the clarification. I will remove the range checks. >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:-152 >> - RELEASE_ASSERT(index + 1 < text.length()); > > Why remove this? This is enforced via early return in the if statement range check above. However, since I will remove the range checks, I will re-add this back here. >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:153 >> return ++index; > > This is bizarre. It should return index + 1; why use ++? I will update to use index + 1. >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:172 >> + bool isTrail = index && U16_IS_LEAD(text[index - 1]) && index < text.length() && U16_IS_TRAIL(text[index]); > > Given that this is WTF::String, we don’t need the additional range checks on index. The subscript operator does enough range checking. Thank you for the clarification. I will remove the range checks from here as well. >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:176 >> return --index; > > This is bizarre. It should return index - 1; why use --? I will update to use index - 1.
Gabriel Nava Marino
Comment 7 2021-10-12 15:58:52 PDT
Reopened to address latest review comments.
Darin Adler
Comment 8 2021-10-12 16:06:20 PDT
Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:178 > // When the substring does not fit, the right side is supposed to be the start of the surrogate pair if applicable. One other thought, not something we need to do, but maybe worth considering. The ICU library includes a macro for this specific operation, U16_SET_CP_START. We could just write: right = middle; U16_SET_CP_START(text, 0, right); Then we would not need to write the surrogatePairAwareStart function at all. Perhaps there is a suitable macro for the surrogatePairAwareIndex operation too; I didn’t find it.
Gabriel Nava Marino
Comment 9 2021-10-12 16:50:47 PDT
(In reply to Darin Adler from comment #8) > Comment on attachment 440982 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=440982&action=review > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:178 > > // When the substring does not fit, the right side is supposed to be the start of the surrogate pair if applicable. > > One other thought, not something we need to do, but maybe worth considering. > > The ICU library includes a macro for this specific operation, > U16_SET_CP_START. We could just write: > > right = middle; > U16_SET_CP_START(text, 0, right); > > Then we would not need to write the surrogatePairAwareStart function at all. Thank you for pointing this out. I will try it! > Perhaps there is a suitable macro for the surrogatePairAwareIndex operation > too; I didn’t find it. It looks like this could have been the closest one, but not quite: U16_SET_CP_LIMIT(s, start, i, length)
Darin Adler
Comment 10 2021-10-12 17:22:27 PDT
(In reply to Gabriel Nava Marino from comment #9) > It looks like this could have been the closest one, but not quite: > U16_SET_CP_LIMIT(s, start, i, length) Yes, I do think clients of surrogatePairAwareIndex can be rewritten to use U16_SET_CP_LIMIT.
Darin Adler
Comment 11 2021-10-12 17:22:39 PDT
(In reply to Darin Adler from comment #10) > (In reply to Gabriel Nava Marino from comment #9) > > It looks like this could have been the closest one, but not quite: > > U16_SET_CP_LIMIT(s, start, i, length) > > Yes, I do think clients of surrogatePairAwareIndex can be rewritten to use > U16_SET_CP_LIMIT. Simply and fairly elegantly.
Darin Adler
Comment 12 2021-10-12 17:35:52 PDT
Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > It would be: auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); U16_SET_CP_LIMIT(text, 0, right, text.length()); auto middle = surrogatePairAwareIndex((left + right) / 2); U16_SET_CP_LIMIT(text, 0, middle, text.length());
Gabriel Nava Marino
Comment 13 2021-10-12 17:46:42 PDT
(In reply to Darin Adler from comment #12) > Comment on attachment 440982 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=440982&action=review > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > > > > It would be: > > auto right = std::min<unsigned>(left + offset, (startPosition + length - > 1)); > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > auto middle = surrogatePairAwareIndex((left + right) / 2); > U16_SET_CP_LIMIT(text, 0, middle, text.length()); Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ ++(i); \ } \ } UPRV_BLOCK_MACRO_END
Darin Adler
Comment 14 2021-10-12 17:50:54 PDT
Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review >>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 >>> >> >> It would be: >> >> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); >> U16_SET_CP_LIMIT(text, 0, right, text.length()); >> >> auto middle = surrogatePairAwareIndex((left + right) / 2); >> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > > Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > > #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > ++(i); \ > } \ > } UPRV_BLOCK_MACRO_END Oh, OK, good point. So my code is wrong. It would be more like this: auto right = std::min<unsigned>(left + offset + 1, startPosition + length); U16_SET_CP_LIMIT(text, 0, right, text.length()); --right;
Darin Adler
Comment 15 2021-10-12 17:53:08 PDT
Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review >>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 >>>> >>> >>> It would be: >>> >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); >>> >>> auto middle = surrogatePairAwareIndex((left + right) / 2); >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); >> >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: >> >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ >> ++(i); \ >> } \ >> } UPRV_BLOCK_MACRO_END > > Oh, OK, good point. So my code is wrong. It would be more like this: > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > U16_SET_CP_LIMIT(text, 0, right, text.length()); > --right; And perhaps then could change the logic to get rid of the need for the "--right". Maybe a waste of time, but I do think that using the U16 macros is still a little easier to get correct than writing our own code.
Gabriel Nava Marino
Comment 16 2021-10-12 18:07:21 PDT
(In reply to Darin Adler from comment #15) > Comment on attachment 440982 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=440982&action=review > > >>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > >>>> > >>> > >>> It would be: > >>> > >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); > >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); > >>> > >>> auto middle = surrogatePairAwareIndex((left + right) / 2); > >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > >> > >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > >> > >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > >> ++(i); \ > >> } \ > >> } UPRV_BLOCK_MACRO_END > > > > Oh, OK, good point. So my code is wrong. It would be more like this: > > > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > --right; > > And perhaps then could change the logic to get rid of the need for the > "--right". I will work to update the logic to avoid the extra decrement. Thank you for the suggestion Darin. > > Maybe a waste of time, but I do think that using the U16 macros is still a > little easier to get correct than writing our own code. Yes, agreed! I will work to get this macro working. The upside being that the "safe" macro offers handling of unpaired surrogates and checks for string boundaries.
Gabriel Nava Marino
Comment 17 2021-10-13 15:41:46 PDT
(In reply to Darin Adler from comment #15) > Comment on attachment 440982 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=440982&action=review > > >>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > >>>> > >>> > >>> It would be: > >>> > >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); > >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); > >>> > >>> auto middle = surrogatePairAwareIndex((left + right) / 2); > >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > >> > >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > >> > >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > >> ++(i); \ > >> } \ > >> } UPRV_BLOCK_MACRO_END > > > > Oh, OK, good point. So my code is wrong. It would be more like this: > > > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > --right; > > And perhaps then could change the logic to get rid of the need for the > "--right". I tried adjusting the logic so that we could get rid of the need for the "--right". However, because of the loop condition "while (left < right)" and need for two U16_SET_CP_LIMIT calls and then the subsequent U16_SET_CP_START, we would need at least one "--right" after U16_SET_CP_START. To avoid this, I have opted to keep surrogatePairAwareIndex and return index - 1 instead. Although we are still using this lambda, it is only using the U16 U16_SET_CP_LIMIT macro now. > > Maybe a waste of time, but I do think that using the U16 macros is still a > little easier to get correct than writing our own code.
Darin Adler
Comment 18 2021-10-13 15:42:33 PDT
(In reply to Gabriel Nava Marino from comment #17) > To avoid this, I have opted to keep surrogatePairAwareIndex and return index > - 1 instead. Although we are still using this lambda, it is only using the > U16 U16_SET_CP_LIMIT macro now. OK, sounds pretty good.
Gabriel Nava Marino
Comment 19 2021-10-13 15:46:03 PDT
Darin Adler
Comment 20 2021-10-13 16:05:13 PDT
Comment on attachment 441150 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=441150&action=review Thanks for taking advantage of my suggestion to use higher level U16 macros instead of writing the code ourselves. I think it’s looking good. > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:149 > + U16_SET_CP_LIMIT(text, 0, index, text.length()); I think it’s a little bit less clear to modify an argument, index, than to modify a copy of it. It’s an informal coding style rule in WebKit that we typically don’t use arguments that way, even when they are passed by value and can be used that way. > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:159 > + auto right = surrogatePairAwareIndex(std::min<unsigned>(left + offset + 1, (startPosition + length))); Why add the + 1 for the the argument to surrogatePairAwareIndex at each place it’s called? Seems like the + 1 could be done inside that function. Maybe you think the function makes more sense with these semantics? If so, that’s OK, but if not, I’d do it there.
Gabriel Nava Marino
Comment 21 2021-10-13 16:12:41 PDT
(In reply to Darin Adler from comment #20) > Comment on attachment 441150 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=441150&action=review > > Thanks for taking advantage of my suggestion to use higher level U16 macros > instead of writing the code ourselves. I think it’s looking good. > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:149 > > + U16_SET_CP_LIMIT(text, 0, index, text.length()); > > I think it’s a little bit less clear to modify an argument, index, than to > modify a copy of it. It’s an informal coding style rule in WebKit that we > typically don’t use arguments that way, even when they are passed by value > and can be used that way. > Thank you for the suggestion. I will make a copy of the argument. > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:159 > > + auto right = surrogatePairAwareIndex(std::min<unsigned>(left + offset + 1, (startPosition + length))); > > Why add the + 1 for the the argument to surrogatePairAwareIndex at each > place it’s called? Seems like the + 1 could be done inside that function. > > Maybe you think the function makes more sense with these semantics? If so, > that’s OK, but if not, I’d do it there. It was done primarily to avoid doing the + 1 inside the function, but if we are also making a copy of the argument, then I will do the + 1 inside the function at the same time. Thank you for the suggestion.
Gabriel Nava Marino
Comment 22 2021-10-13 16:17:18 PDT
Darin Adler
Comment 23 2021-10-13 16:32:08 PDT
Comment on attachment 441156 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=441156&action=review > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:151 > + RELEASE_ASSERT(offset - 1 < text.length()); Not 100% sure we need this RELEASE_ASSERT.
Gabriel Nava Marino
Comment 24 2021-10-13 16:42:17 PDT
(In reply to Darin Adler from comment #23) > Comment on attachment 441156 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=441156&action=review > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:151 > > + RELEASE_ASSERT(offset - 1 < text.length()); > > Not 100% sure we need this RELEASE_ASSERT. We don't need this. I will update this to remove it. Thank you.
Gabriel Nava Marino
Comment 25 2021-10-13 16:43:21 PDT
EWS
Comment 26 2021-10-13 17:57:22 PDT
Committed r284141 (242961@main): <https://commits.webkit.org/242961@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 441158 [details].
Note You need to log in before you can comment on or make changes to this bug.