Bug 204361 - HTTPHeaderMap's operator== is not fully correct
Summary: HTTPHeaderMap's operator== is not fully correct
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-11-19 10:43 PST by Chris Dumez
Modified: 2019-11-22 16:15 PST (History)
7 users (show)

See Also:


Attachments
Simpler Patch (12.43 KB, patch)
2019-11-19 10:58 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (12.25 KB, patch)
2019-11-22 12:36 PST, Chris Dumez
cdumez: review-
Details | Formatted Diff | Diff
Alternative Patch dealing with large maps (14.99 KB, patch)
2019-11-22 15:04 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
Alternative Patch dealing with large maps (12.82 KB, patch)
2019-11-22 15:06 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (11.01 KB, patch)
2019-11-22 15:22 PST, Chris Dumez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dumez 2019-11-19 10:43:45 PST
HTTPHeaderMap's operator== is not fully correct.
Comment 1 Chris Dumez 2019-11-19 10:44:34 PST
Noticed by Ben.
Comment 2 Chris Dumez 2019-11-19 10:58:11 PST
Created attachment 383883 [details]
Simpler Patch
Comment 3 Ben Nham 2019-11-19 11:59:39 PST
I'm a bit nervous about switching to a quadratic time algorithm since the number of HTTP headers is unbounded. Maybe the fast path could be something like:

 - reject if # of headers is different
 - accept if vector comparison succeeds

And the slow path could be:

 - build and sort temporary array of keys
 - accept/reject based on comparing sorted key/val pairs

I removed the implementation of the function and the compiler told me this is used by:

1. ResourceRequestBase::equal
2. ResourceRequestBase::compare
3. operator== on NetworkLoadMetrics
4. NetworkCacheSpeculativeLoad::requestsHeadersMatch

I don't have enough experience in the code base to know the code paths that use these methods, but (4) is on the critical path for speculative revalidation and loading.

It's probably a good idea to run whatever patch we use on the perf bots because (4) can definitely impact PLT times.
Comment 4 Ben Nham 2019-11-19 12:04:06 PST
My list should have been:

1. ResourceRequestBase::equal
2. ResourceResponseBase::compare
3. operator== on NetworkLoadMetrics
4. NetworkCacheSpeculativeLoad::requestsHeadersMatch
Comment 5 Chris Dumez 2019-11-19 12:57:19 PST
(In reply to Ben Nham from comment #3)
> I'm a bit nervous about switching to a quadratic time algorithm since the
> number of HTTP headers is unbounded. Maybe the fast path could be something
> like:
> 
>  - reject if # of headers is different
>  - accept if vector comparison succeeds
> 
> And the slow path could be:
> 
>  - build and sort temporary array of keys
>  - accept/reject based on comparing sorted key/val pairs
> 
> I removed the implementation of the function and the compiler told me this
> is used by:
> 
> 1. ResourceRequestBase::equal
> 2. ResourceRequestBase::compare
> 3. operator== on NetworkLoadMetrics
> 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch
> 
> I don't have enough experience in the code base to know the code paths that
> use these methods, but (4) is on the critical path for speculative
> revalidation and loading.
> 
> It's probably a good idea to run whatever patch we use on the perf bots
> because (4) can definitely impact PLT times.

The number of headers is small which is why we switched from a hashmap to a vector in the first place. There was zero regression when we switched to a vector and started doing linear search for headers.

I don’t think concerns are warranted here.
Comment 6 Chris Dumez 2019-11-20 15:01:41 PST
(In reply to Chris Dumez from comment #5)
> (In reply to Ben Nham from comment #3)
> > I'm a bit nervous about switching to a quadratic time algorithm since the
> > number of HTTP headers is unbounded. Maybe the fast path could be something
> > like:
> > 
> >  - reject if # of headers is different
> >  - accept if vector comparison succeeds
> > 
> > And the slow path could be:
> > 
> >  - build and sort temporary array of keys
> >  - accept/reject based on comparing sorted key/val pairs
> > 
> > I removed the implementation of the function and the compiler told me this
> > is used by:
> > 
> > 1. ResourceRequestBase::equal
> > 2. ResourceRequestBase::compare
> > 3. operator== on NetworkLoadMetrics
> > 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch
> > 
> > I don't have enough experience in the code base to know the code paths that
> > use these methods, but (4) is on the critical path for speculative
> > revalidation and loading.
> > 
> > It's probably a good idea to run whatever patch we use on the perf bots
> > because (4) can definitely impact PLT times.
> 
> The number of headers is small which is why we switched from a hashmap to a
> vector in the first place. There was zero regression when we switched to a
> vector and started doing linear search for headers.
> 
> I don’t think concerns are warranted here.

Early PLT5 A/B testing results show no regression on macOS or iOS.

Ping review?
Comment 7 Alex Christensen 2019-11-22 10:05:26 PST
Comment on attachment 383883 [details]
Simpler Patch

Instead of introducing a quadratic algorithm, why don't we just sort both vectors before comparison?
Comment 8 Chris Dumez 2019-11-22 10:23:53 PST
(In reply to Alex Christensen from comment #7)
> Comment on attachment 383883 [details]
> Patch
> 
> Instead of introducing a quadratic algorithm, why don't we just sort both
> vectors before comparison?

The number of headers is small, there is no perf impact on PLT from this change so it did not seem worth the extra complexity.
Comment 9 Alex Christensen 2019-11-22 10:33:43 PST
Comment on attachment 383883 [details]
Simpler Patch

A call to std::sort doesn't add much code complexity.  Even though it didn't affect a data set that doesn't have a lot of header fields, JavaScript can add an arbitrarily large number of header fields.  https://fetch.spec.whatwg.org/#terminology-headers
Comment 10 Chris Dumez 2019-11-22 10:41:12 PST
(In reply to Alex Christensen from comment #9)
> Comment on attachment 383883 [details]
> Patch
> 
> A call to std::sort doesn't add much code complexity.  Even though it didn't
> affect a data set that doesn't have a lot of header fields, JavaScript can
> add an arbitrarily large number of header fields. 
> https://fetch.spec.whatwg.org/#terminology-headers

If you insist, but the perf will be worse in the common case. I will have to copy 4 vectors, sort them and then compare them with ==.
Comment 11 Alex Christensen 2019-11-22 10:42:38 PST
Can't you just sort them in-place?  That will make 0 copies, and most of the time the headers will be completely sorted or mostly sorted.
Comment 12 Chris Dumez 2019-11-22 10:44:26 PST
(In reply to Chris Dumez from comment #10)
> (In reply to Alex Christensen from comment #9)
> > Comment on attachment 383883 [details]
> > Patch
> > 
> > A call to std::sort doesn't add much code complexity.  Even though it didn't
> > affect a data set that doesn't have a lot of header fields, JavaScript can
> > add an arbitrarily large number of header fields. 
> > https://fetch.spec.whatwg.org/#terminology-headers
> 
> If you insist, but the perf will be worse in the common case. I will have to
> copy 4 vectors, sort them and then compare them with ==.

Wouldn't it be weird for operator== to modify the 2 HTTPHeaderMaps it is comparing? Also, it could mean operator==() would take non-const to those HTTPHeaderMaps, which means some call sites will likely have to be updated to make copies
Comment 13 Chris Dumez 2019-11-22 10:46:22 PST
(In reply to Chris Dumez from comment #12)
> (In reply to Chris Dumez from comment #10)
> > (In reply to Alex Christensen from comment #9)
> > > Comment on attachment 383883 [details]
> > > Patch
> > > 
> > > A call to std::sort doesn't add much code complexity.  Even though it didn't
> > > affect a data set that doesn't have a lot of header fields, JavaScript can
> > > add an arbitrarily large number of header fields. 
> > > https://fetch.spec.whatwg.org/#terminology-headers
> > 
> > If you insist, but the perf will be worse in the common case. I will have to
> > copy 4 vectors, sort them and then compare them with ==.
> 
> Wouldn't it be weird for operator== to modify the 2 HTTPHeaderMaps it is
> comparing? Also, it could mean operator==() would take non-const to those
> HTTPHeaderMaps, which means some call sites will likely have to be updated
> to make copies

Or maybe I can work around this by making m_uncommonHeaders / m_commonHeaders mutable..
Comment 14 Alex Christensen 2019-11-22 10:46:44 PST
HTTPHeaderMap does not guarantee order right now.  If it does, then its operator== is correct now.  A little "mutable" or "const_cast" can make it so we never need to copy.
Comment 15 Chris Dumez 2019-11-22 12:14:23 PST
(In reply to Alex Christensen from comment #14)
> HTTPHeaderMap does not guarantee order right now.  If it does, then its
> operator== is correct now.  A little "mutable" or "const_cast" can make it
> so we never need to copy.

HTTPHeaderMap does have an iterator so this change is definitely not risk free, but let's try...
Comment 16 Chris Dumez 2019-11-22 12:36:54 PST
Created attachment 384183 [details]
Patch
Comment 17 Alex Christensen 2019-11-22 12:40:25 PST
Comment on attachment 384183 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384183&action=review

> Source/WebCore/platform/network/HTTPHeaderMap.h:202
> +        std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });
> +        std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });

You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
Comment 18 Ben Nham 2019-11-22 12:52:58 PST
To me, const_cast-ing in operator== seems like it could lead to unexpected consequences if someone invokes operator== while iterating through the map (which is weird but could happen).

After talking to Chris a bit more I am more convinced of his opinion that we should land his original patch to keep the complexity down. If this method becomes a problem in the future, then we could switch the whole implementation to use a hash map, or have some slow path that kicks in and builds a temporary hash map in the implementation of operator== for large amounts of headers.
Comment 19 Alex Christensen 2019-11-22 13:04:16 PST
Comment on attachment 384183 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384183&action=review

All right.

>> Source/WebCore/platform/network/HTTPHeaderMap.h:202
>> +        std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });
>> +        std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });
> 
> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.

You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
Comment 20 Chris Dumez 2019-11-22 13:12:52 PST
(In reply to Alex Christensen from comment #19)
> Comment on attachment 384183 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=384183&action=review
> 
> All right.
> 
> >> Source/WebCore/platform/network/HTTPHeaderMap.h:202
> >> +        std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });
> >> +        std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });
> > 
> > You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
> 
> You should be able to just put WTF::codePointCompareLessThan instead of a
> lambda that wraps it.

The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct.

For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()).
Comment 21 Geoffrey Garen 2019-11-22 14:31:16 PST
Comment on attachment 384183 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384183&action=review

>>>> Source/WebCore/platform/network/HTTPHeaderMap.h:202
>>>> +        std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });
>>> 
>>> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
>> 
>> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
> 
> The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct.
> 
> For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()).

I don't want to stand in the way of landing this, but I tend to agree that modifying the object in == is weird, and the potential performance issue isn't limited to ==. I guess you could copy out the values and sort the copies to avoid weirdness. But maybe just accepting that get() is linear is OK in this case. 🤷🏻‍♂️
Comment 22 Chris Dumez 2019-11-22 14:37:12 PST
Comment on attachment 384183 [details]
Patch

This actually is not right anymore because header names are not case sensitive.
Comment 23 Chris Dumez 2019-11-22 14:45:25 PST
(In reply to Chris Dumez from comment #22)
> Comment on attachment 384183 [details]
> Patch
> 
> This actually is not right anymore because header names are not case
> sensitive.

Will extend the test case to cover this.
Comment 24 Chris Dumez 2019-11-22 14:49:11 PST
(In reply to Geoffrey Garen from comment #21)
> Comment on attachment 384183 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=384183&action=review
> 
> >>>> Source/WebCore/platform/network/HTTPHeaderMap.h:202
> >>>> +        std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });
> >>> 
> >>> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
> >> 
> >> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
> > 
> > The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct.
> > 
> > For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()).
> 
> I don't want to stand in the way of landing this, but I tend to agree that
> modifying the object in == is weird, and the potential performance issue
> isn't limited to ==. I guess you could copy out the values and sort the
> copies to avoid weirdness. But maybe just accepting that get() is linear is
> OK in this case. 🤷🏻‍♂️

If people feel strongly about avoiding the linear search, maybe we can have a code path that does the linear search only if the vectors are small (e.g. <= 16), and fallback to constructing a HashMap in operator==() otherwise, to avoid pathological cases.
Comment 25 Chris Dumez 2019-11-22 15:04:18 PST
Created attachment 384202 [details]
Alternative Patch dealing with large maps
Comment 26 Chris Dumez 2019-11-22 15:06:25 PST
Created attachment 384203 [details]
Alternative Patch dealing with large maps
Comment 27 Alex Christensen 2019-11-22 15:08:27 PST
Comment on attachment 384203 [details]
Alternative Patch dealing with large maps

View in context: https://bugs.webkit.org/attachment.cgi?id=384203&action=review

> Source/WebCore/platform/network/HTTPHeaderMap.cpp:228
> +    bool isVectorSmall = m_commonHeaders.size() <= 16;

m_uncommonHeaders has the unbounded size.  I like the other patch.
Comment 28 Alex Christensen 2019-11-22 15:09:45 PST
Comment on attachment 383883 [details]
Simpler Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=383883&action=review

> Source/WebCore/platform/network/HTTPHeaderMap.h:193
> +        if (a.m_commonHeaders.size() != b.m_commonHeaders.size() || a.m_uncommonHeaders.size() != b.m_uncommonHeaders.size())

It might be worth adding another fast path for if the vectors are actually equal, to have a O(N) possibility that returns true.
Comment 29 Chris Dumez 2019-11-22 15:22:50 PST
Created attachment 384206 [details]
Patch
Comment 30 Chris Dumez 2019-11-22 16:04:12 PST
Comment on attachment 384206 [details]
Patch

Clearing flags on attachment: 384206

Committed r252813: <https://trac.webkit.org/changeset/252813>
Comment 31 Chris Dumez 2019-11-22 16:04:14 PST
All reviewed patches have been landed.  Closing bug.
Comment 32 Radar WebKit Bug Importer 2019-11-22 16:05:29 PST
<rdar://problem/57444366>