RESOLVED FIXED 139929
AX: Crash: com.apple.WebKit.WebContent at com.apple.WebCore: WebCore::AXObjectCache::clearTextMarkerNodesInUse + 149
https://bugs.webkit.org/show_bug.cgi?id=139929
Summary AX: Crash: com.apple.WebKit.WebContent at com.apple.WebCore: WebCore::AXObjec...
chris fleizach
Reported 2014-12-23 17:12:46 PST
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 com.apple.WebCore 0x00007fff90ee7c55 WebCore::AXObjectCache::clearTextMarkerNodesInUse(WebCore::Document*) + 149 1 com.apple.WebCore 0x00007fff90d34bb6 WebCore::Frame::disconnectOwnerElement() + 54 2 com.apple.WebCore 0x00007fff90d349e6 WebCore::HTMLFrameOwnerElement::disconnectContentFrame() + 38 3 com.apple.WebCore 0x00007fff90f9e0cf WebCore::disconnectSubframes(WebCore::ContainerNode&, WebCore::SubframeDisconnectPolicy) + 271 4 com.apple.WebCore 0x00007fff90c3233e WebCore::Document::prepareForDestruction() + 30 5 com.apple.WebCore 0x00007fff90babf8e WebCore::Frame::setView(WTF::PassRefPtr<WebCore::FrameView>) + 62 6 com.apple.WebCore 0x00007fff90babd4c WebCore::Frame::createView(WebCore::IntSize const&, WebCore::Color const&, bool, WebCore::IntSize const&, WebCore::IntRect const&, bool, WebCore::ScrollbarMode, bool, WebCore::ScrollbarMode, bool) + 124 7 com.apple.WebKit 0x00007fff9a99b059 WebKit::WebFrameLoaderClient::transitionToCommittedForNewPage() + 241 8 com.apple.WebCore 0x00007fff9113a9c6 WebCore::FrameLoader::transitionToCommitted(WebCore::CachedPage*) + 598 9 com.apple.WebCore 0x00007fff90bab11a WebCore::FrameLoader::commitProvisionalLoad() + 378 10 com.apple.WebCore 0x00007fff90c4bc6d WebCore::DocumentLoader::commitLoad(char const*, int) + 77 11 com.apple.WebCore 0x00007fff90c4b88d WebCore::DocumentLoader::dataReceived(WebCore::CachedResource*, char const*, int) + 413 12 com.apple.WebCore 0x00007fff90c4b511 WebCore::CachedRawResource::notifyClientsDataWasReceived(char const*, unsigned int) + 177 13 com.apple.WebCore 0x00007fff90c4b0ea WebCore::CachedRawResource::addDataBuffer(WebCore::ResourceBuffer*) + 170 14 com.apple.WebCore 0x00007fff90c4adc2 WebCore::SubresourceLoader::didReceiveDataOrBuffer(char const*, int, WTF::PassRefPtr<WebCore::SharedBuffer>, long long, WebCore::DataPayloadType) + 210 15 com.apple.WebCore 0x00007fff918527a3 WebCore::SubresourceLoader::didReceiveData(char const*, unsigned int, long long, WebCore::DataPayloadType) + 35 16 com.apple.WebKit 0x00007fff9ab7d3f7 WebKit::WebResourceLoader::didReceiveWebResourceLoaderMessage(IPC::Connection*, IPC::MessageDecoder&) + 463 17 com.apple.WebKit 0x00007fff9aa2b16a IPC::Connection::dispatchMessage(std::__1::unique_ptr<IPC::MessageDecoder, std::__1::default_delete<IPC::MessageDecoder> >) + 94
Attachments
patch (3.76 KB, patch)
2014-12-23 17:23 PST, chris fleizach
no flags
patch (5.78 KB, patch)
2015-01-04 00:22 PST, chris fleizach
darin: review+
buildbot: commit-queue-
Archive of layout-test-results from ews101 for mac-mountainlion (514.12 KB, application/zip)
2015-01-04 01:08 PST, Build Bot
no flags
Archive of layout-test-results from ews106 for mac-mountainlion-wk2 (525.16 KB, application/zip)
2015-01-04 01:12 PST, Build Bot
no flags
final patch for running through EWS (9.08 KB, patch)
2015-01-06 18:41 PST, chris fleizach
buildbot: commit-queue-
Archive of layout-test-results from ews100 for mac-mountainlion (624.88 KB, application/zip)
2015-01-06 19:21 PST, Build Bot
no flags
Archive of layout-test-results from ews106 for mac-mountainlion-wk2 (918.74 KB, application/zip)
2015-01-06 19:34 PST, Build Bot
no flags
final patch with all the files (10.24 KB, patch)
2015-01-06 22:59 PST, chris fleizach
no flags
chris fleizach
Comment 1 2014-12-23 17:12:55 PST
chris fleizach
Comment 2 2014-12-23 17:23:24 PST
Created attachment 243715 [details] patch Patch is speculative. I was not able to reproduce this crash unfortunately and could not contrive a scenario that would cause it to happen
Darin Adler
Comment 3 2014-12-24 12:12:01 PST
Comment on attachment 243715 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=243715&action=review I’m going to say review+ but I have some serious reservations with the code, and I think I have a guess what the real problem is, too. > Source/WebCore/accessibility/AXObjectCache.cpp:1026 > - cache->setNodeInUse(domNode); > + cache->setNodeInUse(domNode, &domNode->document()); It seems like the setNodeInUse function could easily get at the document itself; no good reason to pass it in here. Also, no reason to change it to a Document* before passing it in. I would make the argument a Document&. > Source/WebCore/accessibility/AXObjectCache.cpp:1050 > + // [Bug: 139929] Note: this used to ask the node if it was in a document, but there > + // are some cases where nodes are not being cleaned up and that can crash. > + // Instead we can store the document and node, and when we clean up the document we can > + // remove all associated nodes. This problem with this comment is that having stale pointers to deleted nodes will cause other problems too. Just making this one crash go away doesn’t solve the whole problem. The comment doesn’t make that clear, even though it’s a long comment. I think the comment should focus on “why” rather than reiterating exactly what we do, since you can see the what part from reading the code. The “there are some cases” phrase here is not a good thing to write. We need to explain why, not simply state this as fact. The “[Bug: 139929] Note:” is not a format we have used elsewhere in WebKit and I’d prefer not to do that. > Source/WebCore/accessibility/AXObjectCache.cpp:1055 > + for (const auto& node : m_textMarkerNodes.keys()) { > + if (m_textMarkerNodes.get(node) == document) > nodesToDelete.add(node); > } When we see a loop like this, it usually indicates that the data structure is wrong. Normally we’d want to keep a set of nodes indexed by the document so we didn’t have to iterate the hash table just to find the nodes associated with a given document. That is a pre-existing problem with the data structure we were using. A good data structure for this might be a HashMap<Document*, HashSet<Node*>> alongside HashMap<Node*, Document*>. Should be straightforward to keep both up to date. If we do keep this data structure as is, the new code to iterate it is unnecessarily inefficient in another way. When we iterate a map we can get at both the key and value. Getting just the key and then calling get to see the value causes a hash table lookup each time, greatly increasing execution time. Doing it more efficiently would look like this: for (auto& entry : m_textMarkerNodes) { if (entry.value == document) nodesToDelete.add(entry.key); } > Source/WebCore/accessibility/AXObjectCache.h:227 > // This is a weak reference cache for knowing if Nodes used by TextMarkers are valid. I’m not sure I understand what a “weak reference cache” is exactly? There’s nothing here that makes these raw pointers act as safe weak references. It’s not safe to have pointers to nodes unless there is some kind of a lifetime guarantee. I guess that’s the whole point of this patch. Something bad like is happening because we did not plug the node lifetimes into this in an airtight way. The current implementation is a little messy, but mostly seems sound, although it’s really hard to find all the relevant lines of code. The hole I see is that when a subframe loads a new document, nothing calls clearTextMarkerNodesInUse. That code is in Frame::disconnectOwnerElement, which is called when a frame is being destroyed, not when a frame is loading a new document. This will leave all the nodes from a document in the main frame’s AXObjectCache, and they can then be destroyed. I suspect that’s the real bug here.
Darin Adler
Comment 4 2014-12-24 12:15:56 PST
Comment on attachment 243715 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=243715&action=review >> Source/WebCore/accessibility/AXObjectCache.h:227 >> // This is a weak reference cache for knowing if Nodes used by TextMarkers are valid. > > I’m not sure I understand what a “weak reference cache” is exactly? There’s nothing here that makes these raw pointers act as safe weak references. It’s not safe to have pointers to nodes unless there is some kind of a lifetime guarantee. I guess that’s the whole point of this patch. Something bad like is happening because we did not plug the node lifetimes into this in an airtight way. > > The current implementation is a little messy, but mostly seems sound, although it’s really hard to find all the relevant lines of code. The hole I see is that when a subframe loads a new document, nothing calls clearTextMarkerNodesInUse. That code is in Frame::disconnectOwnerElement, which is called when a frame is being destroyed, not when a frame is loading a new document. This will leave all the nodes from a document in the main frame’s AXObjectCache, and they can then be destroyed. I suspect that’s the real bug here. I would like to learn more about what “if nodes used by text markers are valid” means. There’s no way that a set can tell you if a Node* is valid. If the node has been destroyed, a new node might have been allocated with the same address. So if the true purpose here is “see if a Node* is still good or has already been deleted” then it’s based on a flawed premise. Doing that is simply not possible; we need some other solution.
chris fleizach
Comment 5 2014-12-24 12:59:25 PST
(In reply to comment #4) > Comment on attachment 243715 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=243715&action=review > > >> Source/WebCore/accessibility/AXObjectCache.h:227 > >> // This is a weak reference cache for knowing if Nodes used by TextMarkers are valid. > > > > I’m not sure I understand what a “weak reference cache” is exactly? There’s nothing here that makes these raw pointers act as safe weak references. It’s not safe to have pointers to nodes unless there is some kind of a lifetime guarantee. I guess that’s the whole point of this patch. Something bad like is happening because we did not plug the node lifetimes into this in an airtight way. > > > > The current implementation is a little messy, but mostly seems sound, although it’s really hard to find all the relevant lines of code. The hole I see is that when a subframe loads a new document, nothing calls clearTextMarkerNodesInUse. That code is in Frame::disconnectOwnerElement, which is called when a frame is being destroyed, not when a frame is loading a new document. This will leave all the nodes from a document in the main frame’s AXObjectCache, and they can then be destroyed. I suspect that’s the real bug here. > > I would like to learn more about what “if nodes used by text markers are > valid” means. There’s no way that a set can tell you if a Node* is valid. If > the node has been destroyed, a new node might have been allocated with the > same address. So if the true purpose here is “see if a Node* is still good > or has already been deleted” then it’s based on a flawed premise. Doing that > is simply not possible; we need some other solution. Background: The accessibility code exposes something called AXTextMarkers which encode a position and a Node pointer. Those get passed around to clients like VoiceOver. A problem in the past was that an old AXTextMarker would be used and the Node would be de-referenced directly, leading to a crash. To fix that we introduced a cache to keep track of which nodes we were using and which were still active. I want to look into your suggestion as to where certain sub-frame Nodes are not being removed from this cache. Hopefully that can solve the real problem, which I also would much prefer too. This current fix is only trying to paper over the problem.
chris fleizach
Comment 6 2014-12-24 12:59:59 PST
(In reply to comment #4) > Comment on attachment 243715 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=243715&action=review > > >> Source/WebCore/accessibility/AXObjectCache.h:227 > >> // This is a weak reference cache for knowing if Nodes used by TextMarkers are valid. > > > > I’m not sure I understand what a “weak reference cache” is exactly? There’s nothing here that makes these raw pointers act as safe weak references. It’s not safe to have pointers to nodes unless there is some kind of a lifetime guarantee. I guess that’s the whole point of this patch. Something bad like is happening because we did not plug the node lifetimes into this in an airtight way. > > > > The current implementation is a little messy, but mostly seems sound, although it’s really hard to find all the relevant lines of code. The hole I see is that when a subframe loads a new document, nothing calls clearTextMarkerNodesInUse. That code is in Frame::disconnectOwnerElement, which is called when a frame is being destroyed, not when a frame is loading a new document. This will leave all the nodes from a document in the main frame’s AXObjectCache, and they can then be destroyed. I suspect that’s the real bug here. > > I would like to learn more about what “if nodes used by text markers are > valid” means. There’s no way that a set can tell you if a Node* is valid. If > the node has been destroyed, a new node might have been allocated with the > same address. So if the true purpose here is “see if a Node* is still good > or has already been deleted” then it’s based on a flawed premise. Doing that > is simply not possible; we need some other solution. Background: The accessibility code exposes something called AXTextMarkers which encode a position and a Node pointer. Those get passed around to clients like VoiceOver. A problem in the past was that an old AXTextMarker would be used and the Node would be de-referenced directly, leading to a crash. To fix that we introduced a cache to keep track of which nodes we were using and which were still active. I want to look into your suggestion as to where certain sub-frame Nodes are not being removed from this cache. Hopefully that can solve the real problem, which I also would much prefer too. This current fix is only trying to paper over the problem.
chris fleizach
Comment 7 2015-01-04 00:22:48 PST
Created attachment 243923 [details] patch Based on Darin's comments, I took a new look at it and found definitively that Nodes would be left in the text marker cache when a frame was replaced. Putting in changes in setView seems to cover that case
Build Bot
Comment 8 2015-01-04 01:08:26 PST
Comment on attachment 243923 [details] patch Attachment 243923 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/4651466987929600 New failing tests: accessibility/frame-disconnect-textmarker-cache-crash.html
Build Bot
Comment 9 2015-01-04 01:08:34 PST
Created attachment 243925 [details] Archive of layout-test-results from ews101 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Build Bot
Comment 10 2015-01-04 01:12:39 PST
Comment on attachment 243923 [details] patch Attachment 243923 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/5316678636273664 New failing tests: accessibility/frame-disconnect-textmarker-cache-crash.html
Build Bot
Comment 11 2015-01-04 01:12:53 PST
Created attachment 243926 [details] Archive of layout-test-results from ews106 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
Darin Adler
Comment 12 2015-01-04 13:48:30 PST
Comment on attachment 243923 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=243923&action=review I am saying review+ but it looks like the new test is failing on the EWS machine. Please don’t land the patch until we diagnose and solve that problem. > Source/WebCore/dom/Document.cpp:2084 > + if (AXObjectCache* cache = existingAXObjectCache()) > + cache->clearTextMarkerNodesInUse(this); Is the call to this function in Frame::disconnectOwnerElement still needed, or can that be removed now that we have added this call? > Source/WebCore/page/FrameView.cpp:300 > + if (HTMLFrameOwnerElement* owner = frame().ownerElement()) > + cache->childrenChanged(owner->renderer()); Is this correct even if owner->renderer() is null? Is owner->renderer() guaranteed not to be null? > Source/WebCore/page/FrameView.cpp:302 > cache->remove(this); Maybe it would be slightly more efficient to do these two things in the reverse order?
Darin Adler
Comment 13 2015-01-04 13:50:23 PST
Comment on attachment 243923 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=243923&action=review >> Source/WebCore/dom/Document.cpp:2084 >> + cache->clearTextMarkerNodesInUse(this); > > Is the call to this function in Frame::disconnectOwnerElement still needed, or can that be removed now that we have added this call? It seems wasteful to do this for for the top level document. Maybe this should only be done if this != &topDocument()?
Darin Adler
Comment 14 2015-01-04 13:51:37 PST
We should consider testing in frames that are children of frames that are replaced; so not nodes in the frame that’s replaced itself, but nodes in one of its subframes. The code path there would be slightly different.
chris fleizach
Comment 15 2015-01-04 18:40:45 PST
(In reply to comment #12) > Comment on attachment 243923 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=243923&action=review > > I am saying review+ but it looks like the new test is failing on the EWS > machine. Please don’t land the patch until we diagnose and solve that > problem. > Yep, it looks like my arbitrary wait for a page to load isn't working. I'll need a better way of checking the frame load > > Source/WebCore/dom/Document.cpp:2084 > > + if (AXObjectCache* cache = existingAXObjectCache()) > > + cache->clearTextMarkerNodesInUse(this); > > Is the call to this function in Frame::disconnectOwnerElement still needed, > or can that be removed now that we have added this call? > I will double check. > > Source/WebCore/page/FrameView.cpp:300 > > + if (HTMLFrameOwnerElement* owner = frame().ownerElement()) > > + cache->childrenChanged(owner->renderer()); > > Is this correct even if owner->renderer() is null? Is owner->renderer() > guaranteed not to be null? If owner->renderer() is nil, this operation will not do anything. I verified that the methods inside can handle null correctly. Do you think it's better form to check anyway in case those other methods change one day? > > > Source/WebCore/page/FrameView.cpp:302 > > cache->remove(this); > > Maybe it would be slightly more efficient to do these two things in the > reverse order? I think it might be possible that if we do the remove first, the next childrenChanged call will fail because it will no longer think its a valid element
chris fleizach
Comment 16 2015-01-04 18:41:06 PST
(In reply to comment #13) > Comment on attachment 243923 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=243923&action=review > > >> Source/WebCore/dom/Document.cpp:2084 > >> + cache->clearTextMarkerNodesInUse(this); > > > > Is the call to this function in Frame::disconnectOwnerElement still needed, or can that be removed now that we have added this call? > > It seems wasteful to do this for for the top level document. Maybe this > should only be done if this != &topDocument()? Yes this seems like a good idea
chris fleizach
Comment 17 2015-01-04 18:41:33 PST
(In reply to comment #14) > We should consider testing in frames that are children of frames that are > replaced; so not nodes in the frame that’s replaced itself, but nodes in one > of its subframes. The code path there would be slightly different. Sounds good. I'll add that into the tests
Darin Adler
Comment 18 2015-01-04 21:03:40 PST
Comment on attachment 243923 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=243923&action=review >>> Source/WebCore/page/FrameView.cpp:300 >>> + cache->childrenChanged(owner->renderer()); >> >> Is this correct even if owner->renderer() is null? Is owner->renderer() guaranteed not to be null? > > If owner->renderer() is nil, this operation will not do anything. I verified that the methods inside can handle null correctly. Do you think it's better form to check anyway in case those other methods change one day? No need to check; glad you looked. The renderer being null is not a case that needs to be optimized for performance, I don’t think, so I don’t think you should add the additional check.
chris fleizach
Comment 19 2015-01-05 18:06:20 PST
(In reply to comment #15) > > > > Source/WebCore/dom/Document.cpp:2084 > > > + if (AXObjectCache* cache = existingAXObjectCache()) > > > + cache->clearTextMarkerNodesInUse(this); > > > > Is the call to this function in Frame::disconnectOwnerElement still needed, > > or can that be removed now that we have added this call? > > > > I will double check. > Follow up on this. It appears that disconnectOwnerFrame is redundant. I see prepareForDestruction being called in all cases that disconnectOwnerFrame is, as well as many others (during general web browsing)
chris fleizach
Comment 20 2015-01-06 18:41:06 PST
Created attachment 244117 [details] final patch for running through EWS
Build Bot
Comment 21 2015-01-06 19:21:19 PST
Comment on attachment 244117 [details] final patch for running through EWS Attachment 244117 [details] did not pass mac-ews (mac): Output: http://webkit-queues.appspot.com/results/6607957332590592 New failing tests: accessibility/frame-disconnect-textmarker-cache-crash.html
Build Bot
Comment 22 2015-01-06 19:21:25 PST
Created attachment 244124 [details] Archive of layout-test-results from ews100 for mac-mountainlion The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews100 Port: mac-mountainlion Platform: Mac OS X 10.8.5
Build Bot
Comment 23 2015-01-06 19:34:47 PST
Comment on attachment 244117 [details] final patch for running through EWS Attachment 244117 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/6334299196358656 New failing tests: accessibility/frame-disconnect-textmarker-cache-crash.html
Build Bot
Comment 24 2015-01-06 19:34:51 PST
Created attachment 244126 [details] Archive of layout-test-results from ews106 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5
chris fleizach
Comment 25 2015-01-06 22:59:05 PST
Created attachment 244141 [details] final patch with all the files
chris fleizach
Comment 26 2015-01-07 08:56:15 PST
Note You need to log in before you can comment on or make changes to this bug.