WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
231606
ASSERT hit in surrogatePairAwareIndex and surrogatePairAwareStart lambdas for text with unpaired surrogates
https://bugs.webkit.org/show_bug.cgi?id=231606
Summary
ASSERT hit in surrogatePairAwareIndex and surrogatePairAwareStart lambdas for...
Gabriel Nava Marino
Reported
2021-10-12 12:19:11 PDT
surrogatePairAwareIndex and surrogatePairAwareStart lambdas in TextUtil::MidWordBreak TextUtil::midWordBreak are assuming there are no unpaired surrogates but that is not guaranteed as they don't always come in pairs.
Attachments
Patch
(12.34 KB, patch)
2021-10-12 14:23 PDT
,
Gabriel Nava Marino
no flags
Details
Formatted Diff
Diff
Patch
(3.87 KB, patch)
2021-10-13 15:46 PDT
,
Gabriel Nava Marino
no flags
Details
Formatted Diff
Diff
Patch
(2.98 KB, patch)
2021-10-13 16:17 PDT
,
Gabriel Nava Marino
no flags
Details
Formatted Diff
Diff
Patch
(2.93 KB, patch)
2021-10-13 16:43 PDT
,
Gabriel Nava Marino
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Gabriel Nava Marino
Comment 1
2021-10-12 14:23:35 PDT
Created
attachment 440982
[details]
Patch
Myles C. Maxfield
Comment 2
2021-10-12 14:31:55 PDT
Comment on
attachment 440982
[details]
Patch Awesome 💯
Darin Adler
Comment 3
2021-10-12 15:36:11 PDT
Comment on
attachment 440982
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:150 > + bool isLead = U16_IS_LEAD(text[index]) && (index + 1) < text.length() && U16_IS_TRAIL(text[index + 1]);
Given that this is WTF::String, we don’t need the rang checks on index. The subscript operator does enough range checking.
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:-152 > - RELEASE_ASSERT(index + 1 < text.length());
Why remove this?
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:153 > return ++index;
This is bizarre. It should return index + 1; why use ++?
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:172 > + bool isTrail = index && U16_IS_LEAD(text[index - 1]) && index < text.length() && U16_IS_TRAIL(text[index]);
Given that this is WTF::String, we don’t need the additional range checks on index. The subscript operator does enough range checking.
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:176 > return --index;
This is bizarre. It should return index - 1; why use --?
EWS
Comment 4
2021-10-12 15:47:43 PDT
Committed
r284050
(
242879@main
): <
https://commits.webkit.org/242879@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 440982
[details]
.
Radar WebKit Bug Importer
Comment 5
2021-10-12 15:48:39 PDT
<
rdar://problem/84170185
>
Gabriel Nava Marino
Comment 6
2021-10-12 15:53:31 PDT
View in context:
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:150 >> + bool isLead = U16_IS_LEAD(text[index]) && (index + 1) < text.length() && U16_IS_TRAIL(text[index + 1]); > > Given that this is WTF::String, we don’t need the rang checks on index. The subscript operator does enough range checking.
Thank you for the clarification. I will remove the range checks.
>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:-152 >> - RELEASE_ASSERT(index + 1 < text.length()); > > Why remove this?
This is enforced via early return in the if statement range check above. However, since I will remove the range checks, I will re-add this back here.
>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:153 >> return ++index; > > This is bizarre. It should return index + 1; why use ++?
I will update to use index + 1.
>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:172 >> + bool isTrail = index && U16_IS_LEAD(text[index - 1]) && index < text.length() && U16_IS_TRAIL(text[index]); > > Given that this is WTF::String, we don’t need the additional range checks on index. The subscript operator does enough range checking.
Thank you for the clarification. I will remove the range checks from here as well.
>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:176 >> return --index; > > This is bizarre. It should return index - 1; why use --?
I will update to use index - 1.
Gabriel Nava Marino
Comment 7
2021-10-12 15:58:52 PDT
Reopened to address latest review comments.
Darin Adler
Comment 8
2021-10-12 16:06:20 PDT
Comment on
attachment 440982
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:178 > // When the substring does not fit, the right side is supposed to be the start of the surrogate pair if applicable.
One other thought, not something we need to do, but maybe worth considering. The ICU library includes a macro for this specific operation, U16_SET_CP_START. We could just write: right = middle; U16_SET_CP_START(text, 0, right); Then we would not need to write the surrogatePairAwareStart function at all. Perhaps there is a suitable macro for the surrogatePairAwareIndex operation too; I didn’t find it.
Gabriel Nava Marino
Comment 9
2021-10-12 16:50:47 PDT
(In reply to Darin Adler from
comment #8
)
> Comment on
attachment 440982
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
> > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:178 > > // When the substring does not fit, the right side is supposed to be the start of the surrogate pair if applicable. > > One other thought, not something we need to do, but maybe worth considering. > > The ICU library includes a macro for this specific operation, > U16_SET_CP_START. We could just write: > > right = middle; > U16_SET_CP_START(text, 0, right); > > Then we would not need to write the surrogatePairAwareStart function at all.
Thank you for pointing this out. I will try it!
> Perhaps there is a suitable macro for the surrogatePairAwareIndex operation > too; I didn’t find it.
It looks like this could have been the closest one, but not quite: U16_SET_CP_LIMIT(s, start, i, length)
Darin Adler
Comment 10
2021-10-12 17:22:27 PDT
(In reply to Gabriel Nava Marino from
comment #9
)
> It looks like this could have been the closest one, but not quite: > U16_SET_CP_LIMIT(s, start, i, length)
Yes, I do think clients of surrogatePairAwareIndex can be rewritten to use U16_SET_CP_LIMIT.
Darin Adler
Comment 11
2021-10-12 17:22:39 PDT
(In reply to Darin Adler from
comment #10
)
> (In reply to Gabriel Nava Marino from
comment #9
) > > It looks like this could have been the closest one, but not quite: > > U16_SET_CP_LIMIT(s, start, i, length) > > Yes, I do think clients of surrogatePairAwareIndex can be rewritten to use > U16_SET_CP_LIMIT.
Simply and fairly elegantly.
Darin Adler
Comment 12
2021-10-12 17:35:52 PDT
Comment on
attachment 440982
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 >
It would be: auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); U16_SET_CP_LIMIT(text, 0, right, text.length()); auto middle = surrogatePairAwareIndex((left + right) / 2); U16_SET_CP_LIMIT(text, 0, middle, text.length());
Gabriel Nava Marino
Comment 13
2021-10-12 17:46:42 PDT
(In reply to Darin Adler from
comment #12
)
> Comment on
attachment 440982
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
> > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > > > > It would be: > > auto right = std::min<unsigned>(left + offset, (startPosition + length - > 1)); > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > auto middle = surrogatePairAwareIndex((left + right) / 2); > U16_SET_CP_LIMIT(text, 0, middle, text.length());
Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ ++(i); \ } \ } UPRV_BLOCK_MACRO_END
Darin Adler
Comment 14
2021-10-12 17:50:54 PDT
Comment on
attachment 440982
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 >>> >> >> It would be: >> >> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); >> U16_SET_CP_LIMIT(text, 0, right, text.length()); >> >> auto middle = surrogatePairAwareIndex((left + right) / 2); >> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > > Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > > #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > ++(i); \ > } \ > } UPRV_BLOCK_MACRO_END
Oh, OK, good point. So my code is wrong. It would be more like this: auto right = std::min<unsigned>(left + offset + 1, startPosition + length); U16_SET_CP_LIMIT(text, 0, right, text.length()); --right;
Darin Adler
Comment 15
2021-10-12 17:53:08 PDT
Comment on
attachment 440982
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
>>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 >>>> >>> >>> It would be: >>> >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); >>> >>> auto middle = surrogatePairAwareIndex((left + right) / 2); >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); >> >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: >> >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ >> ++(i); \ >> } \ >> } UPRV_BLOCK_MACRO_END > > Oh, OK, good point. So my code is wrong. It would be more like this: > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > U16_SET_CP_LIMIT(text, 0, right, text.length()); > --right;
And perhaps then could change the logic to get rid of the need for the "--right". Maybe a waste of time, but I do think that using the U16 macros is still a little easier to get correct than writing our own code.
Gabriel Nava Marino
Comment 16
2021-10-12 18:07:21 PDT
(In reply to Darin Adler from
comment #15
)
> Comment on
attachment 440982
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
> > >>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > >>>> > >>> > >>> It would be: > >>> > >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); > >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); > >>> > >>> auto middle = surrogatePairAwareIndex((left + right) / 2); > >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > >> > >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > >> > >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > >> ++(i); \ > >> } \ > >> } UPRV_BLOCK_MACRO_END > > > > Oh, OK, good point. So my code is wrong. It would be more like this: > > > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > --right; > > And perhaps then could change the logic to get rid of the need for the > "--right".
I will work to update the logic to avoid the extra decrement. Thank you for the suggestion Darin.
> > Maybe a waste of time, but I do think that using the U16 macros is still a > little easier to get correct than writing our own code.
Yes, agreed! I will work to get this macro working. The upside being that the "safe" macro offers handling of unpaired surrogates and checks for string boundaries.
Gabriel Nava Marino
Comment 17
2021-10-13 15:41:46 PDT
(In reply to Darin Adler from
comment #15
)
> Comment on
attachment 440982
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=440982&action=review
> > >>>> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:155 > >>>> > >>> > >>> It would be: > >>> > >>> auto right = std::min<unsigned>(left + offset, (startPosition + length - 1)); > >>> U16_SET_CP_LIMIT(text, 0, right, text.length()); > >>> > >>> auto middle = surrogatePairAwareIndex((left + right) / 2); > >>> U16_SET_CP_LIMIT(text, 0, middle, text.length()); > >> > >> Thank you, I will try this! Although I just notice that currently, surrogatePairAwareIndex returns the index at trail whereas U16_SET_CP_LIMIT returns the index past it: > >> > >> #define U16_SET_CP_LIMIT(s, start, i, length) UPRV_BLOCK_MACRO_BEGIN { \ > >> if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ > >> ++(i); \ > >> } \ > >> } UPRV_BLOCK_MACRO_END > > > > Oh, OK, good point. So my code is wrong. It would be more like this: > > > > auto right = std::min<unsigned>(left + offset + 1, startPosition + length); > > U16_SET_CP_LIMIT(text, 0, right, text.length()); > > --right; > > And perhaps then could change the logic to get rid of the need for the > "--right".
I tried adjusting the logic so that we could get rid of the need for the "--right". However, because of the loop condition "while (left < right)" and need for two U16_SET_CP_LIMIT calls and then the subsequent U16_SET_CP_START, we would need at least one "--right" after U16_SET_CP_START. To avoid this, I have opted to keep surrogatePairAwareIndex and return index - 1 instead. Although we are still using this lambda, it is only using the U16 U16_SET_CP_LIMIT macro now.
> > Maybe a waste of time, but I do think that using the U16 macros is still a > little easier to get correct than writing our own code.
Darin Adler
Comment 18
2021-10-13 15:42:33 PDT
(In reply to Gabriel Nava Marino from
comment #17
)
> To avoid this, I have opted to keep surrogatePairAwareIndex and return index > - 1 instead. Although we are still using this lambda, it is only using the > U16 U16_SET_CP_LIMIT macro now.
OK, sounds pretty good.
Gabriel Nava Marino
Comment 19
2021-10-13 15:46:03 PDT
Created
attachment 441150
[details]
Patch
Darin Adler
Comment 20
2021-10-13 16:05:13 PDT
Comment on
attachment 441150
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=441150&action=review
Thanks for taking advantage of my suggestion to use higher level U16 macros instead of writing the code ourselves. I think it’s looking good.
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:149 > + U16_SET_CP_LIMIT(text, 0, index, text.length());
I think it’s a little bit less clear to modify an argument, index, than to modify a copy of it. It’s an informal coding style rule in WebKit that we typically don’t use arguments that way, even when they are passed by value and can be used that way.
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:159 > + auto right = surrogatePairAwareIndex(std::min<unsigned>(left + offset + 1, (startPosition + length)));
Why add the + 1 for the the argument to surrogatePairAwareIndex at each place it’s called? Seems like the + 1 could be done inside that function. Maybe you think the function makes more sense with these semantics? If so, that’s OK, but if not, I’d do it there.
Gabriel Nava Marino
Comment 21
2021-10-13 16:12:41 PDT
(In reply to Darin Adler from
comment #20
)
> Comment on
attachment 441150
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=441150&action=review
> > Thanks for taking advantage of my suggestion to use higher level U16 macros > instead of writing the code ourselves. I think it’s looking good. > > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:149 > > + U16_SET_CP_LIMIT(text, 0, index, text.length()); > > I think it’s a little bit less clear to modify an argument, index, than to > modify a copy of it. It’s an informal coding style rule in WebKit that we > typically don’t use arguments that way, even when they are passed by value > and can be used that way. >
Thank you for the suggestion. I will make a copy of the argument.
> > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:159 > > + auto right = surrogatePairAwareIndex(std::min<unsigned>(left + offset + 1, (startPosition + length))); > > Why add the + 1 for the the argument to surrogatePairAwareIndex at each > place it’s called? Seems like the + 1 could be done inside that function. > > Maybe you think the function makes more sense with these semantics? If so, > that’s OK, but if not, I’d do it there.
It was done primarily to avoid doing the + 1 inside the function, but if we are also making a copy of the argument, then I will do the + 1 inside the function at the same time. Thank you for the suggestion.
Gabriel Nava Marino
Comment 22
2021-10-13 16:17:18 PDT
Created
attachment 441156
[details]
Patch
Darin Adler
Comment 23
2021-10-13 16:32:08 PDT
Comment on
attachment 441156
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=441156&action=review
> Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:151 > + RELEASE_ASSERT(offset - 1 < text.length());
Not 100% sure we need this RELEASE_ASSERT.
Gabriel Nava Marino
Comment 24
2021-10-13 16:42:17 PDT
(In reply to Darin Adler from
comment #23
)
> Comment on
attachment 441156
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=441156&action=review
> > > Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:151 > > + RELEASE_ASSERT(offset - 1 < text.length()); > > Not 100% sure we need this RELEASE_ASSERT.
We don't need this. I will update this to remove it. Thank you.
Gabriel Nava Marino
Comment 25
2021-10-13 16:43:21 PDT
Created
attachment 441158
[details]
Patch
EWS
Comment 26
2021-10-13 17:57:22 PDT
Committed
r284141
(
242961@main
): <
https://commits.webkit.org/242961@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 441158
[details]
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug