Bug 187427 - [Curl] Move HTTP Setup logic from CurlRequest to CurlHandle for reuse.
Summary: [Curl] Move HTTP Setup logic from CurlRequest to CurlHandle for reuse.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Basuke Suzuki
URL:
Keywords: InRadar
Depends on:
Blocks: 172630
  Show dependency treegraph
 
Reported: 2018-07-06 21:14 PDT by Basuke Suzuki
Modified: 2018-07-13 16:19 PDT (History)
11 users (show)

See Also:


Attachments
PATCH (20.86 KB, patch)
2018-07-09 15:24 PDT, Basuke Suzuki
no flags Details | Formatted Diff | Diff
Fix style (21.17 KB, patch)
2018-07-09 15:33 PDT, Basuke Suzuki
no flags Details | Formatted Diff | Diff
FIX (20.77 KB, patch)
2018-07-09 16:58 PDT, Basuke Suzuki
no flags Details | Formatted Diff | Diff
FIX (21.55 KB, patch)
2018-07-12 10:47 PDT, Basuke Suzuki
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Basuke Suzuki 2018-07-06 21:14:24 PDT
Move logic to setup CurlHandle for HTTP/HTTPS communication from CurlRequest to CurlHandle. Upcoming WebSocket impementation doesn't use CurlRequest which is for regular HTTP transaction.
Comment 1 Basuke Suzuki 2018-07-09 15:24:22 PDT
Created attachment 344628 [details]
PATCH
Comment 2 EWS Watchlist 2018-07-09 15:27:01 PDT
Attachment 344628 [details] did not pass style-queue:


ERROR: Source/WebCore/ChangeLog:8:  You should remove the 'No new tests' and either add and list tests, or explain why no new tests were possible.  [changelog/nonewtests] [5]
Total errors found: 1 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Basuke Suzuki 2018-07-09 15:28:23 PDT
Comment on attachment 344628 [details]
PATCH

View in context: https://bugs.webkit.org/attachment.cgi?id=344628&action=review

> Source/WebCore/platform/network/curl/CurlContext.cpp:808
> +        curl_easy_setopt(m_handle, CURLOPT_STDERR, log);

This is a bug fix.
Comment 4 Basuke Suzuki 2018-07-09 15:33:01 PDT
Created attachment 344630 [details]
Fix style
Comment 5 Basuke Suzuki 2018-07-09 16:58:48 PDT
Created attachment 344640 [details]
FIX
Comment 6 Fujii Hironori 2018-07-09 22:55:47 PDT
Comment on attachment 344640 [details]
FIX

View in context: https://bugs.webkit.org/attachment.cgi?id=344640&action=review

I'm not a reviewer. This is an informal review.

> Source/WebCore/ChangeLog:9
> +        in CurlRequest, which is only for regular HTTP/HTTPS transacction. This patch

transacction → transaction

> Source/WebCore/platform/network/curl/CurlContext.h:286
> +    std::optional<CertificateInfo> certificateInfo() const;

This returns a copy of CertificateInfo.
In general, getters return the reference of memeber variables.
Do you need to copy?
std::optional<CertificateInfo>& certificateInfo() const;

> Source/WebCore/platform/network/curl/CurlRequest.cpp:149
> +        if (CurlRequestClient* client = m_client)

Do you nee this variable?
if (m_client)

> Source/WebCore/platform/network/curl/CurlRequest.cpp:309
> +        m_certificateInfo = *info;

This code copies CertificateInfo.
I think CurlRequest don't need to have m_certificateInfo because m_curlHandle has m_sslVerifier and m_sslVerifier has CertificateInfo.

> Source/WebCore/platform/network/curl/CurlSSLVerifier.cpp:77
> +    bool ok = CurlContext::singleton().sslHandle().isAllowedHTTPSCertificateHost(m_curlHandle.url().host().toString());

This callback is called in the curl thread. Is this thread-safe to access m_curlHandle.url()?

> Source/WebCore/platform/network/curl/CurlSSLVerifier.cpp:79
> +    bool ok = CurlContext::singleton().sslHandle().canIgnoredHTTPSCertificate(m_curlHandle.url().host().toString(), m_certificateInfo.certificateChain());

Ditto.
Comment 7 Basuke Suzuki 2018-07-11 14:43:58 PDT
Comment on attachment 344640 [details]
FIX

View in context: https://bugs.webkit.org/attachment.cgi?id=344640&action=review

Thanks for comment.

>> Source/WebCore/platform/network/curl/CurlContext.h:286
>> +    std::optional<CertificateInfo> certificateInfo() const;
> 
> This returns a copy of CertificateInfo.
> In general, getters return the reference of memeber variables.
> Do you need to copy?
> std::optional<CertificateInfo>& certificateInfo() const;

That's right. Actually as you mentioned in other bug, In the upcoming patch, CertificateInfo will be moved to ResourceError and ResourceResponse. I'll open the bug right now for that improvement and wanna keep this part as is. The main purpose of this patch is moving the logic to share with others.

>> Source/WebCore/platform/network/curl/CurlRequest.cpp:149
>> +        if (CurlRequestClient* client = m_client)
> 
> Do you nee this variable?
> if (m_client)

This is std::atomic<CurlRequestClient*> so that this avoid fetching data from storage twice. I see this worth doing. https://godbolt.org/g/UNQ3A5

>> Source/WebCore/platform/network/curl/CurlSSLVerifier.cpp:77
>> +    bool ok = CurlContext::singleton().sslHandle().isAllowedHTTPSCertificateHost(m_curlHandle.url().host().toString());
> 
> This callback is called in the curl thread. Is this thread-safe to access m_curlHandle.url()?

It is isolatedCopied by setUrl, which runs in curl thread.
Comment 8 Fujii Hironori 2018-07-12 02:45:15 PDT
Comment on attachment 344640 [details]
FIX

View in context: https://bugs.webkit.org/attachment.cgi?id=344640&action=review

>>> Source/WebCore/platform/network/curl/CurlRequest.cpp:149
>>> +        if (CurlRequestClient* client = m_client)
>> 
>> Do you nee this variable?
>> if (m_client)
> 
> This is std::atomic<CurlRequestClient*> so that this avoid fetching data from storage twice. I see this worth doing. https://godbolt.org/g/UNQ3A5

I didn't know m_client is an atomic.
This is a bad pattern. You can't use atomic in this case.
If m_client is modified in other thread concurrently, it would be destructed after the null-check and before accessing.
As far as I read the code, m_client is set only in the main thread. You shouldn't use atomic for m_client.

>>> Source/WebCore/platform/network/curl/CurlSSLVerifier.cpp:77
>>> +    bool ok = CurlContext::singleton().sslHandle().isAllowedHTTPSCertificateHost(m_curlHandle.url().host().toString());
>> 
>> This callback is called in the curl thread. Is this thread-safe to access m_curlHandle.url()?
> 
> It is isolatedCopied by setUrl, which runs in curl thread.

It seems that CurlHandle::setUrl is called both in the main thread and the worker thread.

In the main thread
  CurlRequest::start
  → CurlRequest::setupTransfer
    → CurlHandle::setUrl

In the worker thread
  CurlRequestScheduler::executeTasks
  → lambda of CurlRequestScheduler::startTransfer
    → CurlRequest::setupTransfer
       → CurlHandle::setUrl

It is a synchronization problem if CurlHandle::m_url is written in one thread, and it is read in other thread.
URL::isolatedCopied doesn't help in this case.

But, I didn't check carefully. It might not be a problem due to synchronization piggybacking.
Comment 9 Basuke Suzuki 2018-07-12 10:34:55 PDT
Comment on attachment 344640 [details]
FIX

View in context: https://bugs.webkit.org/attachment.cgi?id=344640&action=review

>>>> Source/WebCore/platform/network/curl/CurlRequest.cpp:149
>>>> +        if (CurlRequestClient* client = m_client)
>>> 
>>> Do you nee this variable?
>>> if (m_client)
>> 
>> This is std::atomic<CurlRequestClient*> so that this avoid fetching data from storage twice. I see this worth doing. https://godbolt.org/g/UNQ3A5
> 
> I didn't know m_client is an atomic.
> This is a bad pattern. You can't use atomic in this case.
> If m_client is modified in other thread concurrently, it would be destructed after the null-check and before accessing.
> As far as I read the code, m_client is set only in the main thread. You shouldn't use atomic for m_client.

This code is not for thread safety. Thread safety is guarantied by the convention. It is passed as an argument of constructor and can be invaldated vie invalidClient() method. All usage is in this callClient which is executed in main thread (thanks to runOnMainThread). All access is serialized in main thread. That's the design. I don't know why it's stored in std::::atomic. I guess that came from old implementation by changing its name and location. I agree to remove atomic at this moment. And also noticed I forgot to add main thread assertion in invalidateClient() method. Thank you.

>>>> Source/WebCore/platform/network/curl/CurlSSLVerifier.cpp:77
>>>> +    bool ok = CurlContext::singleton().sslHandle().isAllowedHTTPSCertificateHost(m_curlHandle.url().host().toString());
>>> 
>>> This callback is called in the curl thread. Is this thread-safe to access m_curlHandle.url()?
>> 
>> It is isolatedCopied by setUrl, which runs in curl thread.
> 
> It seems that CurlHandle::setUrl is called both in the main thread and the worker thread.
> 
> In the main thread
>   CurlRequest::start
>   → CurlRequest::setupTransfer
>     → CurlHandle::setUrl
> 
> In the worker thread
>   CurlRequestScheduler::executeTasks
>   → lambda of CurlRequestScheduler::startTransfer
>     → CurlRequest::setupTransfer
>        → CurlHandle::setUrl
> 
> It is a synchronization problem if CurlHandle::m_url is written in one thread, and it is read in other thread.
> URL::isolatedCopied doesn't help in this case.
> 
> But, I didn't check carefully. It might not be a problem due to synchronization piggybacking.

Right those difference happen in sync and async difference. For sync transfer, it stays in main thread. In async transfer it set in curl thread by the scheduler and used entirely in that thread.
Comment 10 Basuke Suzuki 2018-07-12 10:47:56 PDT
Created attachment 344860 [details]
FIX
Comment 11 Fujii Hironori 2018-07-13 05:00:15 PDT
(In reply to Basuke Suzuki from comment #9)
> Right those difference happen in sync and async difference. For sync
> transfer, it stays in main thread. In async transfer it set in curl thread
> by the scheduler and used entirely in that thread.

Thank you for the explanation. I get the idea.

IMO, this is badly designed. CurlContext and CurlSSLHandle, which is
actually a context even thought it's named 'Handle', are shared
among the main thread and the worker thread without locking mutex.

For example, CurlRequest::didReceiveHeader has a following code:

> CurlContext::singleton().setProxyAuthMethod(m_response.availableProxyAuth);

This setter might be called concurrently in other thread.No mutex locked.

I think the main thread should post a sync message to the worker thread for sync requests.
Comment 12 Fujii Hironori 2018-07-13 05:15:04 PDT
Comment on attachment 344860 [details]
FIX

View in context: https://bugs.webkit.org/attachment.cgi?id=344860&action=review

> Source/WebCore/platform/network/curl/CurlContext.cpp:360
> +    m_url = url.isolatedCopy();

IIUC, This isolatedCopy is not needed because CurlRequest::m_request is an isolated copy.

> Source/WebCore/platform/network/curl/CurlSSLVerifier.cpp:85
> +        m_curlHandle.setSslVerifyPeer(CurlHandle::VerifyPeer::Disable);

Just out of curiosity, Why is this code needed?
Comment 13 Basuke Suzuki 2018-07-13 10:14:38 PDT
Thanks for r+ and congrats for being a reviewer!

(In reply to Fujii Hironori from comment #12)
> Comment on attachment 344860 [details]
> FIX
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=344860&action=review
> 
> > Source/WebCore/platform/network/curl/CurlContext.cpp:360
> > +    m_url = url.isolatedCopy();
> 
> IIUC, This isolatedCopy is not needed because CurlRequest::m_request is an
> isolated copy.

That's right. Our intention is CurlHandle should be isolated from other implementation detail and should hold information required to be used. Next step may be moving out url from CurlRequest.


> > Source/WebCore/platform/network/curl/CurlSSLVerifier.cpp:85
> > +        m_curlHandle.setSslVerifyPeer(CurlHandle::VerifyPeer::Disable);
> 
> Just out of curiosity, Why is this code needed?

It came from old code and I didn't sure the intent at this moment. While I was figuring out the mis-implementation of Curl port at this bug https://bugs.webkit.org/show_bug.cgi?id=187611, It was a left over of old implementation that raw curl handle was setup to handle redirect by itself. If redirect happens, following peer check was bypassed by setting this flag. Current Curl port handle redirect in WebCore and never reuse curl handle. So this code can be removed safely. I'll do that on the patch.
Comment 14 Basuke Suzuki 2018-07-13 10:21:51 PDT
(In reply to Fujii Hironori from comment #11)
> (In reply to Basuke Suzuki from comment #9)
> > Right those difference happen in sync and async difference. For sync
> > transfer, it stays in main thread. In async transfer it set in curl thread
> > by the scheduler and used entirely in that thread.
> 
> Thank you for the explanation. I get the idea.
> 
> IMO, this is badly designed. CurlContext and CurlSSLHandle, which is
> actually a context even thought it's named 'Handle', are shared
> among the main thread and the worker thread without locking mutex.
> 
> For example, CurlRequest::didReceiveHeader has a following code:
> 
> > CurlContext::singleton().setProxyAuthMethod(m_response.availableProxyAuth);
> 
> This setter might be called concurrently in other thread.No mutex locked.
> 
> I think the main thread should post a sync message to the worker thread for
> sync requests.

This part always makes us uncomfortable. Yes, it's very hard to design the architecture for both sync and async.

We thought about your idea, but main thread has to handle callback request while waiting the result. With ResourceHandle implementation, that's too complected than doing everything in main thread.

But we see there must be better way to make the code simpler. We will figure out the way soon.
Comment 15 WebKit Commit Bot 2018-07-13 10:28:37 PDT
Comment on attachment 344860 [details]
FIX

Clearing flags on attachment: 344860

Committed r233803: <https://trac.webkit.org/changeset/233803>
Comment 16 WebKit Commit Bot 2018-07-13 10:28:39 PDT
All reviewed patches have been landed.  Closing bug.
Comment 17 Radar WebKit Bug Importer 2018-07-13 10:29:19 PDT
<rdar://problem/42171198>
Comment 18 Fujii Hironori 2018-07-13 16:19:29 PDT
(In reply to Basuke Suzuki from comment #14)
> We thought about your idea, but main thread has to handle callback request
> while waiting the result. With ResourceHandle implementation, that's too
> complected than doing everything in main thread.

You can process other (restricted) messages while waiting sync-reply.
 
> But we see there must be better way to make the code simpler. We will figure
> out the way soon.

Great.