WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
204361
HTTPHeaderMap's operator== is not fully correct
https://bugs.webkit.org/show_bug.cgi?id=204361
Summary
HTTPHeaderMap's operator== is not fully correct
Chris Dumez
Reported
2019-11-19 10:43:45 PST
HTTPHeaderMap's operator== is not fully correct.
Attachments
Simpler Patch
(12.43 KB, patch)
2019-11-19 10:58 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(12.25 KB, patch)
2019-11-22 12:36 PST
,
Chris Dumez
cdumez
: review-
Details
Formatted Diff
Diff
Alternative Patch dealing with large maps
(14.99 KB, patch)
2019-11-22 15:04 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Alternative Patch dealing with large maps
(12.82 KB, patch)
2019-11-22 15:06 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(11.01 KB, patch)
2019-11-22 15:22 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Chris Dumez
Comment 1
2019-11-19 10:44:34 PST
Noticed by Ben.
Chris Dumez
Comment 2
2019-11-19 10:58:11 PST
Created
attachment 383883
[details]
Simpler Patch
Ben Nham
Comment 3
2019-11-19 11:59:39 PST
I'm a bit nervous about switching to a quadratic time algorithm since the number of HTTP headers is unbounded. Maybe the fast path could be something like: - reject if # of headers is different - accept if vector comparison succeeds And the slow path could be: - build and sort temporary array of keys - accept/reject based on comparing sorted key/val pairs I removed the implementation of the function and the compiler told me this is used by: 1. ResourceRequestBase::equal 2. ResourceRequestBase::compare 3. operator== on NetworkLoadMetrics 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch I don't have enough experience in the code base to know the code paths that use these methods, but (4) is on the critical path for speculative revalidation and loading. It's probably a good idea to run whatever patch we use on the perf bots because (4) can definitely impact PLT times.
Ben Nham
Comment 4
2019-11-19 12:04:06 PST
My list should have been: 1. ResourceRequestBase::equal 2. ResourceResponseBase::compare 3. operator== on NetworkLoadMetrics 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch
Chris Dumez
Comment 5
2019-11-19 12:57:19 PST
(In reply to Ben Nham from
comment #3
)
> I'm a bit nervous about switching to a quadratic time algorithm since the > number of HTTP headers is unbounded. Maybe the fast path could be something > like: > > - reject if # of headers is different > - accept if vector comparison succeeds > > And the slow path could be: > > - build and sort temporary array of keys > - accept/reject based on comparing sorted key/val pairs > > I removed the implementation of the function and the compiler told me this > is used by: > > 1. ResourceRequestBase::equal > 2. ResourceRequestBase::compare > 3. operator== on NetworkLoadMetrics > 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch > > I don't have enough experience in the code base to know the code paths that > use these methods, but (4) is on the critical path for speculative > revalidation and loading. > > It's probably a good idea to run whatever patch we use on the perf bots > because (4) can definitely impact PLT times.
The number of headers is small which is why we switched from a hashmap to a vector in the first place. There was zero regression when we switched to a vector and started doing linear search for headers. I don’t think concerns are warranted here.
Chris Dumez
Comment 6
2019-11-20 15:01:41 PST
(In reply to Chris Dumez from
comment #5
)
> (In reply to Ben Nham from
comment #3
) > > I'm a bit nervous about switching to a quadratic time algorithm since the > > number of HTTP headers is unbounded. Maybe the fast path could be something > > like: > > > > - reject if # of headers is different > > - accept if vector comparison succeeds > > > > And the slow path could be: > > > > - build and sort temporary array of keys > > - accept/reject based on comparing sorted key/val pairs > > > > I removed the implementation of the function and the compiler told me this > > is used by: > > > > 1. ResourceRequestBase::equal > > 2. ResourceRequestBase::compare > > 3. operator== on NetworkLoadMetrics > > 4. NetworkCacheSpeculativeLoad::requestsHeadersMatch > > > > I don't have enough experience in the code base to know the code paths that > > use these methods, but (4) is on the critical path for speculative > > revalidation and loading. > > > > It's probably a good idea to run whatever patch we use on the perf bots > > because (4) can definitely impact PLT times. > > The number of headers is small which is why we switched from a hashmap to a > vector in the first place. There was zero regression when we switched to a > vector and started doing linear search for headers. > > I don’t think concerns are warranted here.
Early PLT5 A/B testing results show no regression on macOS or iOS. Ping review?
Alex Christensen
Comment 7
2019-11-22 10:05:26 PST
Comment on
attachment 383883
[details]
Simpler Patch Instead of introducing a quadratic algorithm, why don't we just sort both vectors before comparison?
Chris Dumez
Comment 8
2019-11-22 10:23:53 PST
(In reply to Alex Christensen from
comment #7
)
> Comment on
attachment 383883
[details]
> Patch > > Instead of introducing a quadratic algorithm, why don't we just sort both > vectors before comparison?
The number of headers is small, there is no perf impact on PLT from this change so it did not seem worth the extra complexity.
Alex Christensen
Comment 9
2019-11-22 10:33:43 PST
Comment on
attachment 383883
[details]
Simpler Patch A call to std::sort doesn't add much code complexity. Even though it didn't affect a data set that doesn't have a lot of header fields, JavaScript can add an arbitrarily large number of header fields.
https://fetch.spec.whatwg.org/#terminology-headers
Chris Dumez
Comment 10
2019-11-22 10:41:12 PST
(In reply to Alex Christensen from
comment #9
)
> Comment on
attachment 383883
[details]
> Patch > > A call to std::sort doesn't add much code complexity. Even though it didn't > affect a data set that doesn't have a lot of header fields, JavaScript can > add an arbitrarily large number of header fields. >
https://fetch.spec.whatwg.org/#terminology-headers
If you insist, but the perf will be worse in the common case. I will have to copy 4 vectors, sort them and then compare them with ==.
Alex Christensen
Comment 11
2019-11-22 10:42:38 PST
Can't you just sort them in-place? That will make 0 copies, and most of the time the headers will be completely sorted or mostly sorted.
Chris Dumez
Comment 12
2019-11-22 10:44:26 PST
(In reply to Chris Dumez from
comment #10
)
> (In reply to Alex Christensen from
comment #9
) > > Comment on
attachment 383883
[details]
> > Patch > > > > A call to std::sort doesn't add much code complexity. Even though it didn't > > affect a data set that doesn't have a lot of header fields, JavaScript can > > add an arbitrarily large number of header fields. > >
https://fetch.spec.whatwg.org/#terminology-headers
> > If you insist, but the perf will be worse in the common case. I will have to > copy 4 vectors, sort them and then compare them with ==.
Wouldn't it be weird for operator== to modify the 2 HTTPHeaderMaps it is comparing? Also, it could mean operator==() would take non-const to those HTTPHeaderMaps, which means some call sites will likely have to be updated to make copies
Chris Dumez
Comment 13
2019-11-22 10:46:22 PST
(In reply to Chris Dumez from
comment #12
)
> (In reply to Chris Dumez from
comment #10
) > > (In reply to Alex Christensen from
comment #9
) > > > Comment on
attachment 383883
[details]
> > > Patch > > > > > > A call to std::sort doesn't add much code complexity. Even though it didn't > > > affect a data set that doesn't have a lot of header fields, JavaScript can > > > add an arbitrarily large number of header fields. > > >
https://fetch.spec.whatwg.org/#terminology-headers
> > > > If you insist, but the perf will be worse in the common case. I will have to > > copy 4 vectors, sort them and then compare them with ==. > > Wouldn't it be weird for operator== to modify the 2 HTTPHeaderMaps it is > comparing? Also, it could mean operator==() would take non-const to those > HTTPHeaderMaps, which means some call sites will likely have to be updated > to make copies
Or maybe I can work around this by making m_uncommonHeaders / m_commonHeaders mutable..
Alex Christensen
Comment 14
2019-11-22 10:46:44 PST
HTTPHeaderMap does not guarantee order right now. If it does, then its operator== is correct now. A little "mutable" or "const_cast" can make it so we never need to copy.
Chris Dumez
Comment 15
2019-11-22 12:14:23 PST
(In reply to Alex Christensen from
comment #14
)
> HTTPHeaderMap does not guarantee order right now. If it does, then its > operator== is correct now. A little "mutable" or "const_cast" can make it > so we never need to copy.
HTTPHeaderMap does have an iterator so this change is definitely not risk free, but let's try...
Chris Dumez
Comment 16
2019-11-22 12:36:54 PST
Created
attachment 384183
[details]
Patch
Alex Christensen
Comment 17
2019-11-22 12:40:25 PST
Comment on
attachment 384183
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=384183&action=review
> Source/WebCore/platform/network/HTTPHeaderMap.h:202 > + std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); });
You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
Ben Nham
Comment 18
2019-11-22 12:52:58 PST
To me, const_cast-ing in operator== seems like it could lead to unexpected consequences if someone invokes operator== while iterating through the map (which is weird but could happen). After talking to Chris a bit more I am more convinced of his opinion that we should land his original patch to keep the complexity down. If this method becomes a problem in the future, then we could switch the whole implementation to use a hash map, or have some slow path that kicks in and builds a temporary hash map in the implementation of operator== for large amounts of headers.
Alex Christensen
Comment 19
2019-11-22 13:04:16 PST
Comment on
attachment 384183
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=384183&action=review
All right.
>> Source/WebCore/platform/network/HTTPHeaderMap.h:202 >> + std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); >> + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > > You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it.
Chris Dumez
Comment 20
2019-11-22 13:12:52 PST
(In reply to Alex Christensen from
comment #19
)
> Comment on
attachment 384183
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=384183&action=review
> > All right. > > >> Source/WebCore/platform/network/HTTPHeaderMap.h:202 > >> + std::sort(a.m_uncommonHeaders.begin(), a.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > >> + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > > > > You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. > > You should be able to just put WTF::codePointCompareLessThan instead of a > lambda that wraps it.
The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct. For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()).
Geoffrey Garen
Comment 21
2019-11-22 14:31:16 PST
Comment on
attachment 384183
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=384183&action=review
>>>> Source/WebCore/platform/network/HTTPHeaderMap.h:202 >>>> + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); >>> >>> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. >> >> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. > > The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct. > > For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()).
I don't want to stand in the way of landing this, but I tend to agree that modifying the object in == is weird, and the potential performance issue isn't limited to ==. I guess you could copy out the values and sort the copies to avoid weirdness. But maybe just accepting that get() is linear is OK in this case. 🤷🏻♂️
Chris Dumez
Comment 22
2019-11-22 14:37:12 PST
Comment on
attachment 384183
[details]
Patch This actually is not right anymore because header names are not case sensitive.
Chris Dumez
Comment 23
2019-11-22 14:45:25 PST
(In reply to Chris Dumez from
comment #22
)
> Comment on
attachment 384183
[details]
> Patch > > This actually is not right anymore because header names are not case > sensitive.
Will extend the test case to cover this.
Chris Dumez
Comment 24
2019-11-22 14:49:11 PST
(In reply to Geoffrey Garen from
comment #21
)
> Comment on
attachment 384183
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=384183&action=review
> > >>>> Source/WebCore/platform/network/HTTPHeaderMap.h:202 > >>>> + std::sort(b.m_uncommonHeaders.begin(), b.m_uncommonHeaders.end(), [](auto& headerA, auto& headerB) { return WTF::codePointCompareLessThan(headerA.key, headerB.key); }); > >>> > >>> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. > >> > >> You should be able to just put WTF::codePointCompareLessThan instead of a lambda that wraps it. > > > > The reason I need a lambda is to extra the 'key' data member from the CommonHeader / UncommonHeader struct. > > > > For the record, I don't really like where this patch ended up. I feel like modifying the internal representation in an operator==() is super scary and bound to bite us someday. All this for doubtful performance benefits. Also, if we had concerns about having a ton of headers, the fix would not be at operator==() level, but at get() level, because get() would be super slow too (which is why my previous implementation of operator==() relied on get()). > > I don't want to stand in the way of landing this, but I tend to agree that > modifying the object in == is weird, and the potential performance issue > isn't limited to ==. I guess you could copy out the values and sort the > copies to avoid weirdness. But maybe just accepting that get() is linear is > OK in this case. 🤷🏻♂️
If people feel strongly about avoiding the linear search, maybe we can have a code path that does the linear search only if the vectors are small (e.g. <= 16), and fallback to constructing a HashMap in operator==() otherwise, to avoid pathological cases.
Chris Dumez
Comment 25
2019-11-22 15:04:18 PST
Created
attachment 384202
[details]
Alternative Patch dealing with large maps
Chris Dumez
Comment 26
2019-11-22 15:06:25 PST
Created
attachment 384203
[details]
Alternative Patch dealing with large maps
Alex Christensen
Comment 27
2019-11-22 15:08:27 PST
Comment on
attachment 384203
[details]
Alternative Patch dealing with large maps View in context:
https://bugs.webkit.org/attachment.cgi?id=384203&action=review
> Source/WebCore/platform/network/HTTPHeaderMap.cpp:228 > + bool isVectorSmall = m_commonHeaders.size() <= 16;
m_uncommonHeaders has the unbounded size. I like the other patch.
Alex Christensen
Comment 28
2019-11-22 15:09:45 PST
Comment on
attachment 383883
[details]
Simpler Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=383883&action=review
> Source/WebCore/platform/network/HTTPHeaderMap.h:193 > + if (a.m_commonHeaders.size() != b.m_commonHeaders.size() || a.m_uncommonHeaders.size() != b.m_uncommonHeaders.size())
It might be worth adding another fast path for if the vectors are actually equal, to have a O(N) possibility that returns true.
Chris Dumez
Comment 29
2019-11-22 15:22:50 PST
Created
attachment 384206
[details]
Patch
Chris Dumez
Comment 30
2019-11-22 16:04:12 PST
Comment on
attachment 384206
[details]
Patch Clearing flags on attachment: 384206 Committed
r252813
: <
https://trac.webkit.org/changeset/252813
>
Chris Dumez
Comment 31
2019-11-22 16:04:14 PST
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 32
2019-11-22 16:05:29 PST
<
rdar://problem/57444366
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug