204361 – HTTPHeaderMap's operator== is not fully correct

RESOLVED FIXED204361

HTTPHeaderMap's operator== is not fully correct

https://bugs.webkit.org/show_bug.cgi?id=204361

Summary HTTPHeaderMap's operator== is not fully correct

Chris Dumez

Reported 2019-11-19 10:43:45 PST

HTTPHeaderMap's operator== is not fully correct.

Attachments
Simpler Patch (12.43 KB, patch) 2019-11-19 10:58 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (12.25 KB, patch) 2019-11-22 12:36 PST, Chris Dumez	cdumez: review-	Details Formatted Diff Diff
Alternative Patch dealing with large maps (14.99 KB, patch) 2019-11-22 15:04 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Alternative Patch dealing with large maps (12.82 KB, patch) 2019-11-22 15:06 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (11.01 KB, patch) 2019-11-22 15:22 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Chris Dumez

Comment 1 2019-11-19 10:44:34 PST

Noticed by Ben.

Chris Dumez

Comment 2 2019-11-19 10:58:11 PST

Created attachment 383883 [details] Simpler Patch

Ben Nham

Comment 3 2019-11-19 11:59:39 PST

I'm a bit nervous about switching to a quadratic time algorithm since the number of HTTP headers is unbounded. Maybe the fast path could be something like: - reject if # of headers is different - accept if vector comparison succeeds And the slow path could be: - build and sort temporary array of keys - accept/reject based on comparing sorted key/val pairs I removed the implementation of the function and the compiler told me this is used by: 1. ResourceRequestBase::equal 2. ResourceRequestBase::compare 3. operator== on NetworkLoadMetrics 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch I don't have enough experience in the code base to know the code paths that use these methods, but (4) is on the critical path for speculative revalidation and loading. It's probably a good idea to run whatever patch we use on the perf bots because (4) can definitely impact PLT times.

Ben Nham

Comment 4 2019-11-19 12:04:06 PST

My list should have been: 1. ResourceRequestBase::equal 2. ResourceResponseBase::compare 3. operator== on NetworkLoadMetrics 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch

Chris Dumez

Comment 5 2019-11-19 12:57:19 PST

(In reply to Ben Nham from comment #3) > I'm a bit nervous about switching to a quadratic time algorithm since the > number of HTTP headers is unbounded. Maybe the fast path could be something > like: > > - reject if # of headers is different > - accept if vector comparison succeeds > > And the slow path could be: > > - build and sort temporary array of keys > - accept/reject based on comparing sorted key/val pairs > > I removed the implementation of the function and the compiler told me this > is used by: > > 1. ResourceRequestBase::equal > 2. ResourceRequestBase::compare > 3. operator== on NetworkLoadMetrics > 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch > > I don't have enough experience in the code base to know the code paths that > use these methods, but (4) is on the critical path for speculative > revalidation and loading. > > It's probably a good idea to run whatever patch we use on the perf bots > because (4) can definitely impact PLT times. The number of headers is small which is why we switched from a hashmap to a vector in the first place. There was zero regression when we switched to a vector and started doing linear search for headers. I don’t think concerns are warranted here.

Chris Dumez

Comment 6 2019-11-20 15:01:41 PST

(In reply to Chris Dumez from comment #5) > (In reply to Ben Nham from comment #3) > > I'm a bit nervous about switching to a quadratic time algorithm since the > > number of HTTP headers is unbounded. Maybe the fast path could be something > > like: > > > > - reject if # of headers is different > > - accept if vector comparison succeeds > > > > And the slow path could be: > > > > - build and sort temporary array of keys > > - accept/reject based on comparing sorted key/val pairs > > > > I removed the implementation of the function and the compiler told me this > > is used by: > > > > 1. ResourceRequestBase::equal > > 2. ResourceRequestBase::compare > > 3. operator== on NetworkLoadMetrics > > 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch > > > > I don't have enough experience in the code base to know the code paths that > > use these methods, but (4) is on the critical path for speculative > > revalidation and loading. > > > > It's probably a good idea to run whatever patch we use on the perf bots > > because (4) can definitely impact PLT times. > > The number of headers is small which is why we switched from a hashmap to a > vector in the first place. There was zero regression when we switched to a > vector and started doing linear search for headers. > > I don’t think concerns are warranted here. Early PLT5 A/B testing results show no regression on macOS or iOS. Ping review?

Alex Christensen

Comment 7 2019-11-22 10:05:26 PST

Comment on attachment 383883 [details] Simpler Patch Instead of introducing a quadratic algorithm, why don't we just sort both vectors before comparison?

Chris Dumez

Comment 8 2019-11-22 10:23:53 PST

(In reply to Alex Christensen from comment #7) > Comment on attachment 383883 [details] > Patch > > Instead of introducing a quadratic algorithm, why don't we just sort both > vectors before comparison? The number of headers is small, there is no perf impact on PLT from this change so it did not seem worth the extra complexity.

Alex Christensen

Comment 9 2019-11-22 10:33:43 PST

Comment on attachment 383883 [details] Simpler Patch A call to std::sort doesn't add much code complexity. Even though it didn't affect a data set that doesn't have a lot of header fields, JavaScript can add an arbitrarily large number of header fields. https://fetch.spec.whatwg.org/#terminology-headers

Chris Dumez

Comment 10 2019-11-22 10:41:12 PST

(In reply to Alex Christensen from comment #9) > Comment on attachment 383883 [details] > Patch > > A call to std::sort doesn't add much code complexity. Even though it didn't > affect a data set that doesn't have a lot of header fields, JavaScript can > add an arbitrarily large number of header fields. > https://fetch.spec.whatwg.org/#terminology-headers If you insist, but the perf will be worse in the common case. I will have to copy 4 vectors, sort them and then compare them with ==.

Alex Christensen

Comment 11 2019-11-22 10:42:38 PST

Can't you just sort them in-place? That will make 0 copies, and most of the time the headers will be completely sorted or mostly sorted.

Chris Dumez

Comment 12 2019-11-22 10:44:26 PST

(In reply to Chris Dumez from comment #10) > (In reply to Alex Christensen from comment #9) > > Comment on attachment 383883 [details] > > Patch > > > > A call to std::sort doesn't add much code complexity. Even though it didn't > > affect a data set that doesn't have a lot of header fields, JavaScript can > > add an arbitrarily large number of header fields. > > https://fetch.spec.whatwg.org/#terminology-headers > > If you insist, but the perf will be worse in the common case. I will have to > copy 4 vectors, sort them and then compare them with ==. Wouldn't it be weird for operator== to modify the 2 HTTPHeaderMaps it is comparing? Also, it could mean operator==() would take non-const to those HTTPHeaderMaps, which means some call sites will likely have to be updated to make copies

Chris Dumez

Comment 13 2019-11-22 10:46:22 PST

(In reply to Chris Dumez from comment #12) > (In reply to Chris Dumez from comment #10) > > (In reply to Alex Christensen from comment #9) > > > Comment on attachment 383883 [details] > > > Patch > > > > > > A call to std::sort doesn't add much code complexity. Even though it didn't > > > affect a data set that doesn't have a lot of header fields, JavaScript can > > > add an arbitrarily large number of header fields. > > > https://fetch.spec.whatwg.org/#terminology-headers > > > > If you insist, but the perf will be worse in the common case. I will have to > > copy 4 vectors, sort them and then compare them with ==. > > Wouldn't it be weird for operator== to modify the 2 HTTPHeaderMaps it is > comparing? Also, it could mean operator==() would take non-const to those > HTTPHeaderMaps, which means some call sites will likely have to be updated > to make copies Or maybe I can work around this by making m_uncommonHeaders / m_commonHeaders mutable..

Alex Christensen

Comment 14 2019-11-22 10:46:44 PST

HTTPHeaderMap does not guarantee order right now. If it does, then its operator== is correct now. A little "mutable" or "const_cast" can make it so we never need to copy.

Chris Dumez

Comment 15 2019-11-22 12:14:23 PST

(In reply to Alex Christensen from comment #14) > HTTPHeaderMap does not guarantee order right now. If it does, then its > operator== is correct now. A little "mutable" or "const_cast" can make it > so we never need to copy. HTTPHeaderMap does have an iterator so this change is definitely not risk free, but let's try...

Chris Dumez

Comment 16 2019-11-22 12:36:54 PST

Created attachment 384183 [details] Patch

Alex Christensen

Comment 17 2019-11-22 12:40:25 PST

Comment on attachment 384183 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384183&action=review > Source/WebCore/platform/network/HTTPHeaderMap.h:202 > + std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.

Ben Nham

Comment 18 2019-11-22 12:52:58 PST

To me, const_cast-ing in operator== seems like it could lead to unexpected consequences if someone invokes operator== while iterating through the map (which is weird but could happen). After talking to Chris a bit more I am more convinced of his opinion that we should land his original patch to keep the complexity down. If this method becomes a problem in the future, then we could switch the whole implementation to use a hash map, or have some slow path that kicks in and builds a temporary hash map in the implementation of operator== for large amounts of headers.

Alex Christensen

Comment 19 2019-11-22 13:04:16 PST

Comment on attachment 384183 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384183&action=review All right. >> Source/WebCore/platform/network/HTTPHeaderMap.h:202 >> + std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); >> + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > > You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.

Chris Dumez

Comment 20 2019-11-22 13:12:52 PST

(In reply to Alex Christensen from comment #19) > Comment on attachment 384183 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=384183&action=review > > All right. > > >> Source/WebCore/platform/network/HTTPHeaderMap.h:202 > >> + std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > >> + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > > > > You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. > > You should be able to just put WTF::codePointCompareLessThan instead of a > lambda that wraps it. The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct. For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()).

Geoffrey Garen

Comment 21 2019-11-22 14:31:16 PST

Comment on attachment 384183 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384183&action=review >>>> Source/WebCore/platform/network/HTTPHeaderMap.h:202 >>>> + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); >>> >>> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. >> >> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. > > The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct. > > For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()). I don't want to stand in the way of landing this, but I tend to agree that modifying the object in == is weird, and the potential performance issue isn't limited to ==. I guess you could copy out the values and sort the copies to avoid weirdness. But maybe just accepting that get() is linear is OK in this case. 🤷🏻‍♂️

Chris Dumez

Comment 22 2019-11-22 14:37:12 PST

Comment on attachment 384183 [details] Patch This actually is not right anymore because header names are not case sensitive.

Chris Dumez

Comment 23 2019-11-22 14:45:25 PST

(In reply to Chris Dumez from comment #22) > Comment on attachment 384183 [details] > Patch > > This actually is not right anymore because header names are not case > sensitive. Will extend the test case to cover this.

Chris Dumez

Comment 24 2019-11-22 14:49:11 PST

(In reply to Geoffrey Garen from comment #21) > Comment on attachment 384183 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=384183&action=review > > >>>> Source/WebCore/platform/network/HTTPHeaderMap.h:202 > >>>> + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > >>> > >>> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. > >> > >> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. > > > > The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct. > > > > For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()). > > I don't want to stand in the way of landing this, but I tend to agree that > modifying the object in == is weird, and the potential performance issue > isn't limited to ==. I guess you could copy out the values and sort the > copies to avoid weirdness. But maybe just accepting that get() is linear is > OK in this case. 🤷🏻‍♂️ If people feel strongly about avoiding the linear search, maybe we can have a code path that does the linear search only if the vectors are small (e.g. <= 16), and fallback to constructing a HashMap in operator==() otherwise, to avoid pathological cases.

Chris Dumez

Comment 25 2019-11-22 15:04:18 PST

Created attachment 384202 [details] Alternative Patch dealing with large maps

Chris Dumez

Comment 26 2019-11-22 15:06:25 PST

Created attachment 384203 [details] Alternative Patch dealing with large maps

Alex Christensen

Comment 27 2019-11-22 15:08:27 PST

Comment on attachment 384203 [details] Alternative Patch dealing with large maps View in context: https://bugs.webkit.org/attachment.cgi?id=384203&action=review > Source/WebCore/platform/network/HTTPHeaderMap.cpp:228 > + bool isVectorSmall = m_commonHeaders.size() <= 16; m_uncommonHeaders has the unbounded size. I like the other patch.

Alex Christensen

Comment 28 2019-11-22 15:09:45 PST

Comment on attachment 383883 [details] Simpler Patch View in context: https://bugs.webkit.org/attachment.cgi?id=383883&action=review > Source/WebCore/platform/network/HTTPHeaderMap.h:193 > + if (a.m_commonHeaders.size() != b.m_commonHeaders.size() || a.m_uncommonHeaders.size() != b.m_uncommonHeaders.size()) It might be worth adding another fast path for if the vectors are actually equal, to have a O(N) possibility that returns true.

Chris Dumez

Comment 29 2019-11-22 15:22:50 PST

Created attachment 384206 [details] Patch

Chris Dumez

Comment 30 2019-11-22 16:04:12 PST

Comment on attachment 384206 [details] Patch Clearing flags on attachment: 384206 Committed r252813: <https://trac.webkit.org/changeset/252813>

Chris Dumez

Comment 31 2019-11-22 16:04:14 PST

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 32 2019-11-22 16:05:29 PST

<rdar://problem/57444366>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Page Loading

Assignee

Chris Dumez

Reported

2019-11-19 10:43 PST

Modified

2019-11-22 16:15 PST History

CC List

7 users Show

URL

Keywords InRadar

Depends on

Blocks