Summary: | ASSERT hit in surrogatePairAwareIndex and surrogatePairAwareStart lambdas for text with unpaired surrogates | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Gabriel Nava Marino <gnavamarino> | ||||||||||
Component: | Text | Assignee: | Nobody <webkit-unassigned> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | darin, mmaxfield, webkit-bug-importer, zalan | ||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||
Version: | WebKit Nightly Build | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Attachments: |
|
Description
Gabriel Nava Marino
2021-10-12 12:19:11 PDT
Created attachment 440982 [details]
Patch
Comment on attachment 440982 [details]
Patch
Awesome 💯
Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:150 > + bool isLead = U16_IS_LEAD(text[index]) && (index + 1) < text.length() && U16_IS_TRAIL(text[index + 1]); Given that this is WTF::String, we don’t need the rang checks on index. The subscript operator does enough range checking. > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:-152 > - RELEASE_ASSERT(index + 1 < text.length()); Why remove this? > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:153 > return ++index; This is bizarre. It should return index + 1; why use ++? > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:172 > + bool isTrail = index && U16_IS_LEAD(text[index - 1]) && index < text.length() && U16_IS_TRAIL(text[index]); Given that this is WTF::String, we don’t need the additional range checks on index. The subscript operator does enough range checking. > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:176 > return --index; This is bizarre. It should return index - 1; why use --? Committed r284050 (242879@main): <https://commits.webkit.org/242879@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 440982 [details]. View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:150 >> + bool isLead = U16_IS_LEAD(text[index]) && (index + 1) < text.length() && U16_IS_TRAIL(text[index + 1]); > > Given that this is WTF::String, we don’t need the rang checks on index. The subscript operator does enough range checking. Thank you for the clarification. I will remove the range checks. >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:-152 >> - RELEASE_ASSERT(index + 1 < text.length()); > > Why remove this? This is enforced via early return in the if statement range check above. However, since I will remove the range checks, I will re-add this back here. >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:153 >> return ++index; > > This is bizarre. It should return index + 1; why use ++? I will update to use index + 1. >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:172 >> + bool isTrail = index && U16_IS_LEAD(text[index - 1]) && index < text.length() && U16_IS_TRAIL(text[index]); > > Given that this is WTF::String, we don’t need the additional range checks on index. The subscript operator does enough range checking. Thank you for the clarification. I will remove the range checks from here as well. >> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:176 >> return --index; > > This is bizarre. It should return index - 1; why use --? I will update to use index - 1. Reopened to address latest review comments. Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:178 > // When the substring does not fit, the right side is supposed to be the start of the surrogate pair if applicable. One other thought, not something we need to do, but maybe worth considering. The ICU library includes a macro for this specific operation, U16_SET_CP_START. We could just write: right = middle; U16_SET_CP_START(text, 0, right); Then we would not need to write the surrogatePairAwareStart function at all. Perhaps there is a suitable macro for the surrogatePairAwareIndex operation too; I didn’t find it. (In reply to Darin Adler from comment #8) > Comment on attachment 440982 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=440982&action=review > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:178 > > // When the substring does not fit, the right side is supposed to be the start of the surrogate pair if applicable. > > One other thought, not something we need to do, but maybe worth considering. > > The ICU library includes a macro for this specific operation, > U16_SET_CP_START. We could just write: > > right = middle; > U16_SET_CP_START(text, 0, right); > > Then we would not need to write the surrogatePairAwareStart function at all. Thank you for pointing this out. I will try it! > Perhaps there is a suitable macro for the surrogatePairAwareIndex operation > too; I didn’t find it. It looks like this could have been the closest one, but not quite: U16_SET_CP_LIMIT(s, start, i, length) (In reply to Gabriel Nava Marino from comment #9) > It looks like this could have been the closest one, but not quite: > U16_SET_CP_LIMIT(s, start, i, length) Yes, I do think clients of surrogatePairAwareIndex can be rewritten to use U16_SET_CP_LIMIT. (In reply to Darin Adler from comment #10) > (In reply to Gabriel Nava Marino from comment #9) > > It looks like this could have been the closest one, but not quite: > > U16_SET_CP_LIMIT(s, start, i, length) > > Yes, I do think clients of surrogatePairAwareIndex can be rewritten to use > U16_SET_CP_LIMIT. Simply and fairly elegantly. Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > It would be: auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); U16_SET_CP_LIMIT(text, 0, right, text.length()); auto middle = surrogatePairAwareIndex((left + right) / 2); U16_SET_CP_LIMIT(text, 0, middle, text.length()); (In reply to Darin Adler from comment #12) > Comment on attachment 440982 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=440982&action=review > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > > > > It would be: > > auto right = std::min<unsigned>(left + offset, (startPosition + length - > 1)); > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > auto middle = surrogatePairAwareIndex((left + right) / 2); > U16_SET_CP_LIMIT(text, 0, middle, text.length()); Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ ++(i); \ } \ } UPRV_BLOCK_MACRO_END Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review >>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 >>> >> >> It would be: >> >> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); >> U16_SET_CP_LIMIT(text, 0, right, text.length()); >> >> auto middle = surrogatePairAwareIndex((left + right) / 2); >> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > > Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > > #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > ++(i); \ > } \ > } UPRV_BLOCK_MACRO_END Oh, OK, good point. So my code is wrong. It would be more like this: auto right = std::min<unsigned>(left + offset + 1, startPosition + length); U16_SET_CP_LIMIT(text, 0, right, text.length()); --right; Comment on attachment 440982 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=440982&action=review >>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 >>>> >>> >>> It would be: >>> >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); >>> >>> auto middle = surrogatePairAwareIndex((left + right) / 2); >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); >> >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: >> >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ >> ++(i); \ >> } \ >> } UPRV_BLOCK_MACRO_END > > Oh, OK, good point. So my code is wrong. It would be more like this: > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > U16_SET_CP_LIMIT(text, 0, right, text.length()); > --right; And perhaps then could change the logic to get rid of the need for the "--right". Maybe a waste of time, but I do think that using the U16 macros is still a little easier to get correct than writing our own code. (In reply to Darin Adler from comment #15) > Comment on attachment 440982 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=440982&action=review > > >>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > >>>> > >>> > >>> It would be: > >>> > >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); > >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); > >>> > >>> auto middle = surrogatePairAwareIndex((left + right) / 2); > >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > >> > >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > >> > >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > >> ++(i); \ > >> } \ > >> } UPRV_BLOCK_MACRO_END > > > > Oh, OK, good point. So my code is wrong. It would be more like this: > > > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > --right; > > And perhaps then could change the logic to get rid of the need for the > "--right". I will work to update the logic to avoid the extra decrement. Thank you for the suggestion Darin. > > Maybe a waste of time, but I do think that using the U16 macros is still a > little easier to get correct than writing our own code. Yes, agreed! I will work to get this macro working. The upside being that the "safe" macro offers handling of unpaired surrogates and checks for string boundaries. (In reply to Darin Adler from comment #15) > Comment on attachment 440982 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=440982&action=review > > >>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > >>>> > >>> > >>> It would be: > >>> > >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); > >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); > >>> > >>> auto middle = surrogatePairAwareIndex((left + right) / 2); > >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > >> > >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > >> > >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > >> ++(i); \ > >> } \ > >> } UPRV_BLOCK_MACRO_END > > > > Oh, OK, good point. So my code is wrong. It would be more like this: > > > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > --right; > > And perhaps then could change the logic to get rid of the need for the > "--right". I tried adjusting the logic so that we could get rid of the need for the "--right". However, because of the loop condition "while (left < right)" and need for two U16_SET_CP_LIMIT calls and then the subsequent U16_SET_CP_START, we would need at least one "--right" after U16_SET_CP_START. To avoid this, I have opted to keep surrogatePairAwareIndex and return index - 1 instead. Although we are still using this lambda, it is only using the U16 U16_SET_CP_LIMIT macro now. > > Maybe a waste of time, but I do think that using the U16 macros is still a > little easier to get correct than writing our own code. (In reply to Gabriel Nava Marino from comment #17) > To avoid this, I have opted to keep surrogatePairAwareIndex and return index > - 1 instead. Although we are still using this lambda, it is only using the > U16 U16_SET_CP_LIMIT macro now. OK, sounds pretty good. Created attachment 441150 [details]
Patch
Comment on attachment 441150 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=441150&action=review Thanks for taking advantage of my suggestion to use higher level U16 macros instead of writing the code ourselves. I think it’s looking good. > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:149 > + U16_SET_CP_LIMIT(text, 0, index, text.length()); I think it’s a little bit less clear to modify an argument, index, than to modify a copy of it. It’s an informal coding style rule in WebKit that we typically don’t use arguments that way, even when they are passed by value and can be used that way. > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:159 > + auto right = surrogatePairAwareIndex(std::min<unsigned>(left + offset + 1, (startPosition + length))); Why add the + 1 for the the argument to surrogatePairAwareIndex at each place it’s called? Seems like the + 1 could be done inside that function. Maybe you think the function makes more sense with these semantics? If so, that’s OK, but if not, I’d do it there. (In reply to Darin Adler from comment #20) > Comment on attachment 441150 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=441150&action=review > > Thanks for taking advantage of my suggestion to use higher level U16 macros > instead of writing the code ourselves. I think it’s looking good. > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:149 > > + U16_SET_CP_LIMIT(text, 0, index, text.length()); > > I think it’s a little bit less clear to modify an argument, index, than to > modify a copy of it. It’s an informal coding style rule in WebKit that we > typically don’t use arguments that way, even when they are passed by value > and can be used that way. > Thank you for the suggestion. I will make a copy of the argument. > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:159 > > + auto right = surrogatePairAwareIndex(std::min<unsigned>(left + offset + 1, (startPosition + length))); > > Why add the + 1 for the the argument to surrogatePairAwareIndex at each > place it’s called? Seems like the + 1 could be done inside that function. > > Maybe you think the function makes more sense with these semantics? If so, > that’s OK, but if not, I’d do it there. It was done primarily to avoid doing the + 1 inside the function, but if we are also making a copy of the argument, then I will do the + 1 inside the function at the same time. Thank you for the suggestion. Created attachment 441156 [details]
Patch
Comment on attachment 441156 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=441156&action=review > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:151 > + RELEASE_ASSERT(offset - 1 < text.length()); Not 100% sure we need this RELEASE_ASSERT. (In reply to Darin Adler from comment #23) > Comment on attachment 441156 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=441156&action=review > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:151 > > + RELEASE_ASSERT(offset - 1 < text.length()); > > Not 100% sure we need this RELEASE_ASSERT. We don't need this. I will update this to remove it. Thank you. Created attachment 441158 [details]
Patch
Committed r284141 (242961@main): <https://commits.webkit.org/242961@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 441158 [details]. |