Bug 225072

Summary: [iOS] Web content process occasionally crashes under VisibleSelection::adjustPositionForEnd
Product: WebKit Reporter: Wenson Hsieh <wenson_hsieh>
Component: HTML EditingAssignee: Wenson Hsieh <wenson_hsieh>
Status: RESOLVED FIXED    
Severity: Normal CC: bdakin, cdumez, darin, esprehn+autocc, ews-watchlist, hi, kangil.han, megan_gardner, rniwa, thorton, webkit-bug-importer, wenson_hsieh
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
thorton: review+
Patch
none
Patch none

Description Wenson Hsieh 2021-04-26 13:29:36 PDT
rdar://77159489
Comment 1 Wenson Hsieh 2021-04-26 15:32:25 PDT
Created attachment 427097 [details]
Patch
Comment 2 Tim Horton 2021-04-27 12:38:51 PDT
Please also get review from rniwa or darin :)
Comment 3 Darin Adler 2021-04-27 16:15:45 PDT
Comment on attachment 427097 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427097&action=review

> Source/WebCore/dom/Position.h:126
> -    TreeScope* treeScope() const { return m_anchorNode ? &m_anchorNode->treeScope() : nullptr; }
> +    TreeScope* treeScope() const
> +    {
> +        if (!m_anchorNode || !m_anchorNode->isInTreeScope())
> +            return nullptr;
> +
> +        return &m_anchorNode->treeScope();
> +    }

I don’t think this change should be made. Position::document returns the document a position’s node is associated with. It doesn’t return nullptr if the node happens to not be in the document at the time. The same behavior would make sense for Position::treeScope.

Callers can check isInTreeScope if they have reason to; having this return nullptr does not seem right.

> Source/WebKit/WebProcess/WebPage/ios/WebPageIOS.mm:1527
> -        else if (targetNode && selectionStart.deepEquivalent().treeScope() != &targetNode->treeScope())
> +        else if (auto treeScope = selectionStart.deepEquivalent().treeScope(); treeScope && targetNode && targetNode->isInTreeScope() && treeScope != &targetNode->treeScope())

I suggest we add an isInTreeScope for Position and use it here rather than changing the behavior of Position::treeScope.

We could also consider writing a "both in same tree scope" function to make the logic easier to read.
Comment 4 Wenson Hsieh 2021-04-27 16:19:07 PDT
(In reply to Darin Adler from comment #3)
> Comment on attachment 427097 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=427097&action=review
> 
> > Source/WebCore/dom/Position.h:126
> > -    TreeScope* treeScope() const { return m_anchorNode ? &m_anchorNode->treeScope() : nullptr; }
> > +    TreeScope* treeScope() const
> > +    {
> > +        if (!m_anchorNode || !m_anchorNode->isInTreeScope())
> > +            return nullptr;
> > +
> > +        return &m_anchorNode->treeScope();
> > +    }
> 
> I don’t think this change should be made. Position::document returns the
> document a position’s node is associated with. It doesn’t return nullptr if
> the node happens to not be in the document at the time. The same behavior
> would make sense for Position::treeScope.
> 
> Callers can check isInTreeScope if they have reason to; having this return
> nullptr does not seem right.
> 
> > Source/WebKit/WebProcess/WebPage/ios/WebPageIOS.mm:1527
> > -        else if (targetNode && selectionStart.deepEquivalent().treeScope() != &targetNode->treeScope())
> > +        else if (auto treeScope = selectionStart.deepEquivalent().treeScope(); treeScope && targetNode && targetNode->isInTreeScope() && treeScope != &targetNode->treeScope())
> 
> I suggest we add an isInTreeScope for Position and use it here rather than
> changing the behavior of Position::treeScope.
> 
> We could also consider writing a "both in same tree scope" function to make
> the logic easier to read.

Sounds good! I think I'll add a helper on Position that's something like:

bool isInSameTreeScope(Node&);
Comment 5 Wenson Hsieh 2021-04-27 16:59:14 PDT
(In reply to Wenson Hsieh from comment #4)
> (In reply to Darin Adler from comment #3)
> > Comment on attachment 427097 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=427097&action=review
> > 
> > > Source/WebCore/dom/Position.h:126
> > > -    TreeScope* treeScope() const { return m_anchorNode ? &m_anchorNode->treeScope() : nullptr; }
> > > +    TreeScope* treeScope() const
> > > +    {
> > > +        if (!m_anchorNode || !m_anchorNode->isInTreeScope())
> > > +            return nullptr;
> > > +
> > > +        return &m_anchorNode->treeScope();
> > > +    }
> > 
> > I don’t think this change should be made. Position::document returns the
> > document a position’s node is associated with. It doesn’t return nullptr if
> > the node happens to not be in the document at the time. The same behavior
> > would make sense for Position::treeScope.
> > 
> > Callers can check isInTreeScope if they have reason to; having this return
> > nullptr does not seem right.
> > 
> > > Source/WebKit/WebProcess/WebPage/ios/WebPageIOS.mm:1527
> > > -        else if (targetNode && selectionStart.deepEquivalent().treeScope() != &targetNode->treeScope())
> > > +        else if (auto treeScope = selectionStart.deepEquivalent().treeScope(); treeScope && targetNode && targetNode->isInTreeScope() && treeScope != &targetNode->treeScope())
> > 
> > I suggest we add an isInTreeScope for Position and use it here rather than
> > changing the behavior of Position::treeScope.
> > 
> > We could also consider writing a "both in same tree scope" function to make
> > the logic easier to read.
> 
> Sounds good! I think I'll add a helper on Position that's something like:
> 
> bool isInSameTreeScope(Node&);

..on second thought, this'll need to be `bool isInDifferentTreeScope(Node&) const` since the call site cares about whether the position and Node both exist in different tree scopes.
Comment 6 Ryosuke Niwa 2021-04-27 17:03:09 PDT
(In reply to Wenson Hsieh from comment #5)
> (In reply to Wenson Hsieh from comment #4)
> > Sounds good! I think I'll add a helper on Position that's something like:
> > 
> > bool isInSameTreeScope(Node&);
> 
> ..on second thought, this'll need to be `bool isInDifferentTreeScope(Node&)
> const` since the call site cares about whether the position and Node both
> exist in different tree scopes.

But doesn't isInSameTreeScope returning false would mean that they're in different tree scopes??
Comment 7 Wenson Hsieh 2021-04-27 17:04:25 PDT
(In reply to Ryosuke Niwa from comment #6)
> (In reply to Wenson Hsieh from comment #5)
> > (In reply to Wenson Hsieh from comment #4)
> > > Sounds good! I think I'll add a helper on Position that's something like:
> > > 
> > > bool isInSameTreeScope(Node&);
> > 
> > ..on second thought, this'll need to be `bool isInDifferentTreeScope(Node&)
> > const` since the call site cares about whether the position and Node both
> > exist in different tree scopes.
> 
> But doesn't isInSameTreeScope returning false would mean that they're in
> different tree scopes??

So that would change behavior in the case where either the node or position is not in a tree scope, but I suppose I could roll the isInTreeScope check into the call site...
Comment 8 Wenson Hsieh 2021-04-27 17:56:03 PDT
Created attachment 427222 [details]
Patch
Comment 9 Wenson Hsieh 2021-04-27 17:58:29 PDT
(In reply to Wenson Hsieh from comment #8)
> Created attachment 427222 [details]
> Patch

(I ended up just adding `Position::isInTreeScope()`, since it's kind of ambiguous how `isInSameTreeScope()` (and `isInDifferentTreeScope()`) should behave in the case where either is not in a TreeScope)
Comment 10 Darin Adler 2021-04-27 18:00:10 PDT
Comment on attachment 427222 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427222&action=review

> Source/WebCore/dom/Position.h:127
> +    bool isInTreeScope() const { return m_anchorNode && m_anchorNode->isInTreeScope(); }

Note that this is just a convenience. Clients could call anchorNode() and do this. I’m OK with having this convenience function, though.
Comment 11 Wenson Hsieh 2021-04-27 19:41:20 PDT
Comment on attachment 427222 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427222&action=review

>> Source/WebCore/dom/Position.h:127
>> +    bool isInTreeScope() const { return m_anchorNode && m_anchorNode->isInTreeScope(); }
> 
> Note that this is just a convenience. Clients could call anchorNode() and do this. I’m OK with having this convenience function, though.

Thanks for the review! While purely for convenience, I do think it fits in well with the extant helper functions such as `treeScope()` and `document()`.
Comment 12 EWS 2021-04-27 19:54:48 PDT
Committed r276688 (237102@main): <https://commits.webkit.org/237102@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 427222 [details].
Comment 13 Ryosuke Niwa 2021-04-27 23:49:39 PDT
Comment on attachment 427222 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427222&action=review

> Source/WebKit/WebProcess/WebPage/ios/WebPageIOS.mm:1527
> -        else if (targetNode && selectionStart.deepEquivalent().treeScope() != &targetNode->treeScope())
> +        else if (targetNode && targetNode->isInTreeScope() && selectionStart.deepEquivalent().isInTreeScope() && selectionStart.deepEquivalent().treeScope() != &targetNode->treeScope())

I don't think this is quite right for when targetNode and selectionStart belong to two different documents but one of them is disconnected from the document.
Comment 14 Wenson Hsieh 2021-04-28 08:42:31 PDT
Comment on attachment 427222 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427222&action=review

>> Source/WebKit/WebProcess/WebPage/ios/WebPageIOS.mm:1527
>> +        else if (targetNode && targetNode->isInTreeScope() && selectionStart.deepEquivalent().isInTreeScope() && selectionStart.deepEquivalent().treeScope() != &targetNode->treeScope())
> 
> I don't think this is quite right for when targetNode and selectionStart belong to two different documents but one of them is disconnected from the document.

Is that because you would expect us to enter this `else if` statement in that scenario? If so, then as `adjustPositionForEnd` and `adjustPositionForStart` currently are, I think that would trigger a crash when trying to access `startContainerNode->treeScope()` or `currentPosition.containerNode()->treeScope()`...

Or are you suggesting we should add some kind of check like this at the end of the function?

```
         range = makeSimpleRange(result, selectionEnd);
     }

+    if (!range || !range->startContainer().isConnected() || !range->endContainer().isConnected())
+        return WTF::nullopt;
+
     return range;
 }
```
Comment 15 Ryosuke Niwa 2021-04-28 11:53:06 PDT
(In reply to Wenson Hsieh from comment #14)
> Comment on attachment 427222 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=427222&action=review
> 
> >> Source/WebKit/WebProcess/WebPage/ios/WebPageIOS.mm:1527
> >> +        else if (targetNode && targetNode->isInTreeScope() && selectionStart.deepEquivalent().isInTreeScope() && selectionStart.deepEquivalent().treeScope() != &targetNode->treeScope())
> > 
> > I don't think this is quite right for when targetNode and selectionStart belong to two different documents but one of them is disconnected from the document.
> 
> Is that because you would expect us to enter this `else if` statement in
> that scenario?

Yes.

> If so, then as `adjustPositionForEnd` and
> `adjustPositionForStart` currently are, I think that would trigger a crash
> when trying to access `startContainerNode->treeScope()` or
> `currentPosition.containerNode()->treeScope()`...

Why? treeScope() of a Node is never null. In the case a Node is disconnected from the document, it is Document.
Comment 16 Ryosuke Niwa 2021-04-28 11:53:43 PDT
As confusing as it is, !node.isInTreeScope() doesn't mean node.treeScope() is null. This is why it returns a reference instead of a pointer.
Comment 17 Darin Adler 2021-04-28 12:14:47 PDT
The abstraction "tree scope" makes this more confusing than it should be. If you put the word "document" in, then it makes more sense if you are familiar with DOM history:

Nodes are associated with a document. But at any given time, they might be in the document’s tree, or might not be. Being in the document’s tree is referred to as being "connected" and can be checked with Node::isConnected efficiently without following parent pointers up to the root to see if it’s the document.

When we abstracted document to "tree scope", we kept those same concepts, but with different names. The node is always associated with a "tree scope". But at any given time it might be in the tree scope’s tree, or might not be. Being in the tree scope’s tree is referred to as being "in the tree scope" and can be checked with isInTreeScope efficiently without following parent pointers up to the root to see if it’s the shadow tree root or the document.
Comment 18 Ryosuke Niwa 2021-04-28 12:23:13 PDT
(In reply to Darin Adler from comment #17)
> The abstraction "tree scope" makes this more confusing than it should be. If
> you put the word "document" in, then it makes more sense if you are familiar
> with DOM history:

Yes.

> When we abstracted document to "tree scope", we kept those same concepts,
> but with different names. The node is always associated with a "tree scope".
> But at any given time it might be in the tree scope’s tree, or might not be.
> Being in the tree scope’s tree is referred to as being "in the tree scope"
> and can be checked with isInTreeScope efficiently without following parent
> pointers up to the root to see if it’s the shadow tree root or the document.

I think we need to change "in the tree scope" to something like isInDocumentOrShadowTree.

Separately, we probably want to rename TreeScope. It's not really a scope. It's really about the root node. Maybe DocumentOrShadowRootBase?
Comment 19 Wenson Hsieh 2021-04-28 12:24:12 PDT
Okay — I understand now. Thanks for the explanation!

It sounds like instead of checking for whether or not we are in a tree scope (i.e. connected), we should just be null checking the `containerNode` before passing it into `adjustPositionForEnd` and `adjustPositionForStart`. Or alternately, if the `containerNode` is null, just bail with nullopt..

Is that accurate?
Comment 20 Ryosuke Niwa 2021-04-28 12:29:25 PDT
(In reply to Wenson Hsieh from comment #19)
> Okay — I understand now. Thanks for the explanation!
> 
> It sounds like instead of checking for whether or not we are in a tree scope
> (i.e. connected), we should just be null checking the `containerNode` before
> passing it into `adjustPositionForEnd` and `adjustPositionForStart`. Or
> alternately, if the `containerNode` is null, just bail with nullopt..

That is containerNode being nullptr anything to do with this bug??
Comment 21 Wenson Hsieh 2021-04-28 13:21:06 PDT
(In reply to Ryosuke Niwa from comment #20)
> (In reply to Wenson Hsieh from comment #19)
> > Okay — I understand now. Thanks for the explanation!
> > 
> > It sounds like instead of checking for whether or not we are in a tree scope
> > (i.e. connected), we should just be null checking the `containerNode` before
> > passing it into `adjustPositionForEnd` and `adjustPositionForStart`. Or
> > alternately, if the `containerNode` is null, just bail with nullopt..
> 
> That is containerNode being nullptr anything to do with this bug??

Ryosuke and I discussed this over Slack; in summary, the crash is really due to the fact that we pass in a null container node into either `adjustPositionForEnd` or `adjustPositionForStart`, which we immediately proceed to dereference.

We should be checking the nullity of the container beforehand instead.
Comment 22 Wenson Hsieh 2021-04-28 13:22:21 PDT
Reopening to attach new patch.
Comment 23 Wenson Hsieh 2021-04-28 13:22:23 PDT
Created attachment 427290 [details]
Patch
Comment 24 Darin Adler 2021-04-28 13:28:35 PDT
(Not wanting to distract from this bug, these comments are focusing on the name TreeScope and related naming confusion.) If it was up to me I would suggest that isConnected be renamed to isInDocumentTree and isOrphaned be renamed to !isInDocumentTree. We would still have that "is connected" name if it’s used in web-exposed API, and maybe we need the "is orphaned" name for some reason (not really sure about that). On the other hand, I really like it when our code uses the same terminology that the web standards documents do. So that’s a consideration too, if some of these terms are used there. Seems that TreeScope should be named DocumentOrShadowRoot. No need to have the word "base" in there. But not sure how to name all the different predicates.
Comment 25 Ryosuke Niwa 2021-04-28 13:49:28 PDT
(In reply to Darin Adler from comment #24)
> (Not wanting to distract from this bug, these comments are focusing on the
> name TreeScope and related naming confusion.) If it was up to me I would
> suggest that isConnected be renamed to isInDocumentTree

isInDocumentTree has a specific meaning in the DOM spec which is that it's a Document tree without consideration to any consideration to shadow roots / shadow trees.
https://dom.spec.whatwg.org/#concept-document-tree

isConnected means it's either in a Document tree (i.e. without consideration to any of shadow trees), or in a ShadowRoot tree which is also connected:
https://dom.spec.whatwg.org/#connected

What you're suggesting doesn't agree with these definitions

>  and isOrphaned be renamed to !isInDocumentTree. We would still have that "is connected" name
> if it’s used in web-exposed API, and maybe we need the "is orphaned" name
> for some reason (not really sure about that).

Orphaned is really a term we used to have some version of DOM spec, which is !isConnected() so we should probably match the terminology here and use isDisconnected() or something.
Comment 26 Darin Adler 2021-04-28 13:50:17 PDT
(In reply to Wenson Hsieh from comment #21)
> We should be checking the nullity of the container beforehand instead.

Fits with one of my rules of thumb. I suggest always thinking about crashes "mechanically" first. If it’s a null dereference, consider adding the check for null. Then think through whether that’s right or not, but start there.
Comment 27 Darin Adler 2021-04-28 13:58:55 PDT
I’m happy to use those terms from web specifications, if they are carefully chosen. That can sometimes make our code clearer. When the terms are super-strange they can make our code hard to understand. I’m having a hard time figuring that out one way or another about the shadow tree related ones. Can definitely say that "is in tree" better always mean mean "if you walk up to the root, you'd get to the root of the tree". But sadly there is more than one kind of parent pointer and more than one kind of "walk up" in the shadow-haunted DOM.
Comment 28 Ryosuke Niwa 2021-04-28 14:05:04 PDT
(In reply to Darin Adler from comment #27)
> I’m happy to use those terms from web specifications, if they are carefully
> chosen. That can sometimes make our code clearer. When the terms are
> super-strange they can make our code hard to understand. I’m having a hard
> time figuring that out one way or another about the shadow tree related
> ones. Can definitely say that "is in tree" better always mean mean "if you
> walk up to the root, you'd get to the root of the tree". But sadly there is
> more than one kind of parent pointer and more than one kind of "walk up" in
> the shadow-haunted DOM.

DOM spec almost never crosses shadow boundaries. When they need to do, they explicitly say "shadow-including" e.g. shadow-including inclusive-ancestor.
Comment 29 EWS 2021-04-28 17:25:40 PDT
Committed r276742 (237142@main): <https://commits.webkit.org/237142@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 427290 [details].