Summary: | REGRESSION (r272900): wpt.fyi loading performance is very slow (regressed, and slower than other browsers) | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Simon Fraser (smfr) <simon.fraser> | ||||
Component: | DOM | Assignee: | Nobody <webkit-unassigned> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Normal | CC: | ap, cdumez, cgarcia, changseok, darin, esprehn+autocc, ews-watchlist, glenn, gsnedders, koivisto, kondapallykalyan, pdr, rniwa, simon.fraser, webkit-bug-importer | ||||
Priority: | P2 | Keywords: | InRadar | ||||
Version: | Safari Technology Preview | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Bug Depends on: | 229960 | ||||||
Bug Blocks: | 148695, 221386 | ||||||
Attachments: |
|
Description
Simon Fraser (smfr)
2021-08-30 12:24:17 PDT
Sample shows: + ! 1620 WebCore::Element::dispatchMouseEvent(WebCore::PlatformMouseEvent const&, WTF::AtomString const&, int, WebCore::Element*, WebCore::IsSyntheticClick) (in WebCore) + 802 [0x10814d432] + ! 1620 WebCore::EventDispatcher::dispatchEvent(WebCore::Node&, WebCore::Event&) (in WebCore) + 4771 [0x1081683a3] + ! 1620 WebCore::dispatchEventInDOM(WebCore::Event&, WebCore::EventPath const&) (in WebCore) + 240 [0x108168e40] + ! 1620 WebCore::EventTarget::fireEventListeners(WebCore::Event&, WebCore::EventTarget::EventInvokePhase) (in WebCore) + 611 [0x10816c3e3] + ! 1620 WebCore::EventTarget::innerInvokeEventListeners(WebCore::Event&, WTF::Vector<WTF::RefPtr<WebCore::RegisteredEventListener, WTF::RawPtrTraits<WebCore::RegisteredEventListener>, WTF::DefaultRefDerefTraits<WebCore::RegisteredEventListener> >, 1ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>, WebCore::EventTarget::EventInvokePhase) (in WebCore) + 428 [0x10816cb6c] + ! 1507 WebCore::JSEventListener::handleEvent(WebCore::ScriptExecutionContext&, WebCore::Event&) (in WebCore) + 1965 [0x107e286ad] + ! : 1507 WebCore::JSExecState::~JSExecState() (in WebCore) + 214 [0x107e2c756] + ! : 1507 WebCore::MicrotaskQueue::performMicrotaskCheckpoint() (in WebCore) + 149 [0x108181d35] + ! : 1507 WTF::Detail::CallableWrapper<WebCore::WindowEventLoop::queueMutationObserverCompoundMicrotask()::$_0, void>::call() (in WebCore) + 71 [0x1081e6797] + ! : 1507 WebCore::MutationObserver::notifyMutationObservers(WebCore::WindowEventLoop&) (in WebCore) + 2255 [0x10818844f] + ! : 1507 WebCore::JSMutationCallback::handleEvent(WebCore::MutationObserver&, WTF::Vector<WTF::Ref<WebCore::MutationRecord, WTF::RawPtrTraits<WebCore::MutationRecord> >, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc> const&, WebCore::MutationObserver&) (in WebCore) + 515 [0x10758deb3] + ! : 1507 WebCore::JSCallbackData::invokeCallback(WebCore::JSDOMGlobalObject&, JSC::JSObject*, JSC::JSValue, JSC::MarkedArgumentBuffer&, WebCore::JSCallbackData::CallbackType, JSC::PropertyName, WTF::NakedPtr<JSC::Exception>&) (in WebCore) + 611 [0x107e10b33] + ! : 1507 JSC::profiledCall(JSC::JSGlobalObject*, JSC::ProfilingReason, JSC::JSValue, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&, WTF::NakedPtr<JSC::Exception>&) (in JavaScriptCore) + 174 [0x10bb8bdce] + ! : 1507 JSC::Interpreter::executeCall(JSC::JSGlobalObject*, JSC::JSObject*, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&) (in JavaScriptCore) + 555 [0x10b92c81b] + ! : 1507 vmEntryToJavaScript (in JavaScriptCore) + 216 [0x10b251326] + ! : 1501 ??? (in <unknown binary>) [0x3a716bc303aa] + ! : | 1501 ??? (in <unknown binary>) [0x3a716ba2c317] + ! : | 1499 ??? (in <unknown binary>) [0x3a716be87fca] + ! : | + 1452 ??? (in <unknown binary>) [0x3a716bc177d8] + ! : | + ! 1186 ??? (in <unknown binary>) [0x3a716bdfa779] + ! : | + ! : 1186 ??? (in <unknown binary>) [0x3a716ba011d8] + ! : | + ! : 1058 WebCore::jsNodePrototypeFunction_insertBefore(JSC::JSGlobalObject*, JSC::CallFrame*) (in WebCore) + 392 [0x1075bb4a8] + ! : | + ! : | 718 WebCore::ContainerNode::insertBefore(WebCore::Node&, WebCore::Node*) (in WebCore) + 1697 [0x1080ea671] + ! : | + ! : | + 718 WebCore::HTMLElement::childrenChanged(WebCore::ContainerNode::ChildChange const&) (in WebCore) + 18 [0x10834fcb2] + ! : | + ! : | + 697 WebCore::SlotAssignment::didChangeSlot(WTF::AtomString const&, WebCore::ShadowRoot&) (in WebCore) + 260 [0x1081c9f24] + ! : | + ! : | + ! 364 WebCore::RenderTreeUpdater::tearDownRenderers(WebCore::Element&, WebCore::RenderTreeUpdater::TeardownType, WebCore::RenderTreeBuilder&) (in WebCore) + 332 [0x108cd7f3c] + ! : | + ! : | + ! : 106 WebCore::ComposedTreeIterator::traverseNextInShadowTree() (in WebCore) + 96 [0x1080e33a0] + ! : | + ! : | + ! : | 106 WebCore::ElementAndTextDescendantIterator::traverseNext() (in WebCore) + 196,45,... [0x1080e37b4,0x1080e371d,...] + ! : | + ! : | + ! : 67 WebCore::ComposedTreeIterator::traverseNextInShadowTree() (in WebCore) + 192 [0x1080e3400] + ! : | + ! : | + ! : | 67 WebCore::HTMLSlotElement::assignedNodes() const (in WebCore) + 64 [0x1083dd940] + ! : | + ! : | + ! : | 67 WebCore::SlotAssignment::assignedNodesForSlot(WebCore::HTMLSlotElement const&, WebCore::ShadowRoot&) (in WebCore) + 353 [0x1081c6181] + ! : | + ! : | + ! : | 60 WebCore::SlotAssignment::assignSlots(WebCore::ShadowRoot&) (in WebCore) + 258 [0x1081c9032] + ! : | + ! : | + ! : | + 34 WebCore::SlotAssignment::assignToSlot(WebCore::Node&, WTF::AtomString const&) (in WebCore) + 730,340,... [0x1081ca63a,0x1081ca4b4,...] hmm, this happens because a lot of elements are added to the same shadow root, so didChangeDefaultSlot() and hostChildElementDidChange() are called too many times. Iterating the composed tree every time makes things super slow. I don't think we need to tear down renderers on every didChangeSlot call, only the ones coming from hostChildElementDidChange() and not in all those cases either. IIRC the problem was only when replacing a previous tree and there was a node with display contents. So, I think we need to find a more specific solution. Created attachment 436877 [details]
patch
Comment on attachment 436877 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=436877&action=review > Source/WebCore/rendering/updating/RenderTreeUpdater.cpp:544 > + if (!host.renderer() && !host.hasDisplayContents()) > + return; Oh, good catch! this is consistent with destroyRenderTreeIfNeeded(). > Oh, good catch! this is consistent with destroyRenderTreeIfNeeded().
This test should/could probably be in tearDownRenderers but lets do a minimal fix here.
Comment on attachment 436877 [details]
patch
Any good way to regression-test this performance optimization? Maybe we could make a test that is *super-slow* if this is broken.
> Any good way to regression-test this performance optimization? Maybe we
> could make a test that is *super-slow* if this is broken.
Those sort of benchmark is layout tests have often been problematic (hard to make reliable and speedy enough). We definitely should have coverage in our regular performance tests for Shadow DOM as it is getting increasingly popular. Speedometer3 is apparently planned to have some. It might also make sense to add microbenchmarks.
Committed r281813 (241149@main): <https://commits.webkit.org/241149@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 436877 [details]. (In reply to Antti Koivisto from comment #9) > > Any good way to regression-test this performance optimization? Maybe we > > could make a test that is *super-slow* if this is broken. > > Those sort of benchmark is layout tests have often been problematic (hard to > make reliable and speedy enough). We definitely should have coverage in our > regular performance tests for Shadow DOM as it is getting increasingly > popular. Speedometer3 is apparently planned to have some. It might also make > sense to add microbenchmarks. Seems to me that this optimization for this issue is so huge, it lends itself to a fast test that would simply hang without the optimization. Maybe that’s not right? Yeah, maybe. I'll try. Yeah, for anything like this where we're doing > O(n) work it's not too hard to write a regression test; I'm against writing tests for checking whether O(n) differs from O(n), but O(n) (or O(1) in this case) versus O(n^2) is pretty easy to detect. Perf test in 229960 |