Bug 303494
| Summary: | [Intel] Clang x86 optimizer needs help with std::optional::value_or branch prediction | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | Brent Fulgham <bfulgham> |
| Component: | New Bugs | Assignee: | Brent Fulgham <bfulgham> |
| Status: | RESOLVED FIXED | ||
| Severity: | Normal | CC: | webkit-bug-importer |
| Priority: | P2 | Keywords: | InRadar |
| Version: | WebKit Nightly Build | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
Brent Fulgham
The code change in Bug 301103 introduced a mysterious performance regression on Intel architecture.
The only difference in behavior in that patch is that this code now uses the 'Or' path of the `value_or` Optional accessor, rather than just returning the value, since the passed usedZoom is now std::nullopt.
```
float LocalFrameView::documentToAbsoluteScaleFactor(std::optional<float> usedZoom) const
{
// If usedZoom is passed, it already factors in pageZoomFactor().
return usedZoom.value_or(m_frame->pageZoomFactor()) * m_frame->frameScaleFactor();
}
```
Which means we are always taking the opposite branch we used to in the template expansion:
```
template <class _Up>
_LIBCPP_HIDE_FROM_ABI constexpr _Tp value_or(_Up&& __v) const& {
static_assert(is_copy_constructible_v<_Tp>, "value_type has to be copy constructible");
static_assert(is_convertible_v<_Up, _Tp>, "argument has to be convertible to value_type");
return this->__has_val() ? this->__val() : static_cast<_Tp>(std::forward<_Up>(__v));
}
```
Prior to the patch: usedZoom always held the value of style.usedZoom() (a float), and used it.
After the patch: usedZoom never holds a value, and so the code does a std::forward of m_frame->pageZoomFactor() which is also just returning a float.
The difference is we always take the false branch for has_value(), when previously we took the true branch. PGO would have captured this, so we always take the wrong path and fail branch prediction.
Apple Silicon recovered performance after updating PGO, but Intel did not.
This patch helps the Intel optimizer with this branch prediction. I have also reached out to the Clang team asking them to investigate the Intel optimizer.
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Brent Fulgham
<rdar://problem/164102621>
Brent Fulgham
Pull request: https://github.com/WebKit/WebKit/pull/54792
EWS
Committed 303933@main (def2f7e17e6c): <https://commits.webkit.org/303933@main>
Reviewed commits have been landed. Closing PR #54792 and removing active labels.