Summary: | HTTPS Upgrade: Use full sqlite upgrade list | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Vivek Seth <v_seth> | ||||||||||||||||||||
Component: | Page Loading | Assignee: | Nobody <webkit-unassigned> | ||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||
Severity: | Normal | CC: | achristensen, aestes, beidson, cdumez, commit-queue, darin, dbates, ews-watchlist, ggaren, jberlin, mkwst, rhoule, sam, v_seth, webkit-bug-importer, youennf | ||||||||||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||||
Version: | Other | ||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||
Attachments: |
|
Description
Vivek Seth
2018-12-14 19:28:12 PST
Created attachment 357378 [details]
Patch
Comment on attachment 357378 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=357378&action=review A few comments, not a complete review > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:37 > +using namespace WebCore; This is not allowed; it’s incompatible with unified builds. Best solution would be to not us "using namespace" at all, or the using namespace could be put inside the WebKit namespace. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:42 > + : m_isSetupComplete(false) Typically better style is to initialize this in the data member definition in the class definition rather than hear. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:75 > + if (this->m_isSetupComplete) { No need for the "this->" here or in the many other places in this file. Please remove it wherever possible, everywhere if possible. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:132 > + CFURLRef databaseURL = CFBundleCopyResourceURL(webKitBundle, CFSTR("HTTPSUpgradeList"), CFSTR("db"), nullptr); > + String path = CFURLGetString(databaseURL); > + CFRelease(databaseURL); > + return path; Better style is to use adoptCF rather than an explicit CFRelease. For example: return CFURLGetString(adoptCF(CFBundleCopyResourceURL(webKitBundle, CFSTR("HTTPSUpgradeList"), CFSTR("db"), nullptr)).get()); Or, if you prefer: auto databaseURL = adoptCF(CFBundleCopyResourceURL(webKitBundle, CFSTR("HTTPSUpgradeList"), CFSTR("db"), nullptr)); return CFURLGetString(databaseURL.get()); > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:134 > +#endif // PLATFORM(COCOA) > + return { }; Need an else here. We don’t want double returns. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:137 > +const String NetworkHTTPSUpgradeChecker::queryDatabaseSQL() The "const" here isn’t really meaningful and should be omitted. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:139 > + return "SELECT host FROM hosts WHERE host = ?;"; Should use a _s suffix so we use ASCIILiteral here. But also, why have a function that returns this as a String? > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.h:33 > +#include <WebCore/ResourceRequest.h> > +#include <WebCore/SQLiteDatabase.h> > +#include <WebCore/SQLiteStatement.h> > +#include <wtf/CompletionHandler.h> > +#include <wtf/Ref.h> > +#include <wtf/WorkQueue.h> I don’t think we need complete includes of classes like SQLiteDatabase and SQLiteStatement. I believe we can compile with just forward declarations of most of these classes. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.h:48 > + static const String databasePath(); > + static const String queryDatabaseSQL(); The const here don’t mean anything and should be omitted. Not sure these need to be static class member functions; could just have them be free functions in the .cpp file. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.h:52 > + std::atomic<bool> m_isSetupComplete; I don’t think making this std::atomic is doing enough to synchronize here. To give one example, I think there is a race condition in ~NetworkHTTPSUpgradeChecker. It checks the boolean for true and then decides not to do work based on that, but the boolean could become true just after it checks. Doesn’t seem right. Created attachment 357505 [details]
Patch
Created attachment 357507 [details]
Patch
Comment on attachment 357507 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=357507&action=review > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:28 > + #if ENABLE(HTTPS_UPGRADE) should go here. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:36 > +#if ENABLE(HTTPS_UPGRADE) Not here. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:43 > +auto kNetworkHTTPSUpgradeCheckerSQLQuery = "SELECT host FROM hosts WHERE host = ?;"_s; static constexpr auto httpsUpgradeCheckerSQLQuery = ... > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:45 > +String NetworkHTTPSUpgradeCheckerDatabasePath(); No need to forward declare. You can just move the function definition here. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:59 > + auto path = NetworkHTTPSUpgradeCheckerDatabasePath(); > + if (!path) { > + RELEASE_LOG_IF_ALLOWED("NetworkHTTPSUpgradeChecker() - Database does not exist, cannot do upgrades."); > + return; > + } Under what scenario would the HTTPS_UPGRADE feature be enabled with a missing database? Can this be an ASSERT instead? > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:79 > + // The database and statement need to be destroyed on the background thread. Don't think you need this comment. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:134 > +String NetworkHTTPSUpgradeCheckerDatabasePath() static const String& httpsUpgradeCheckerDatabasePath() > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:141 > +#if PLATFORM(COCOA) > + CFBundleRef webKitBundle = CFBundleGetBundleWithIdentifier(CFSTR("com.apple.WebKit")); > + path = CFURLGetString(adoptCF(CFBundleCopyResourceURL(webKitBundle, CFSTR("HTTPSUpgradeList"), CFSTR("db"), nullptr)).get()); > +#endif // PLATFORM(COCOA) It's expensive to compute this path. Since it never changes, you can cache it in a static NeverDestroyed<String>. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.h:47 > + // Returns `true` after internal setup is sucessfully completed. If there is an error with setup, or if setup is in-progress, it will return `false`. > + bool isSetupComplete() const { return m_isSetupComplete; }; Not sure you really need this function. I see where you call it below, but it seems unnecessary because query() will already do the right thing when m_isSetupComplete is false. The benefit of removing this function and letting query() check setup completion on the work queue is that you don't need to make m_isSetupComplete an std::atomic. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.h:55 > + std::atomic<bool> m_isSetupComplete { false }; This doesn't need to be atomic so long as you only access it on the work queue (which you can and should). > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.h:58 > + std::unique_ptr<WebCore::SQLiteDatabase> m_database; > + std::unique_ptr<WebCore::SQLiteStatement> m_statement; You can use WTF::UniqueRef here. > Source/WebKit/NetworkProcess/NetworkLoadChecker.cpp:216 > + // Do not wait for httpsUpgradeChecker to complete its setup. > + if (!httpsUpgradeChecker.isSetupComplete()) { > + handler(WTFMove(request)); > return; > + } I think you should let NetworkHTTPSUpgradeChecker::query() do this check on the work queue. > Source/WebKit/NetworkProcess/NetworkLoadChecker.h:44 > +#include "UserContentControllerIdentifier.h" > #include <WebCore/ContentExtensionActions.h> > +#include <WebCore/ContentSecurityPolicyResponseHeaders.h> > +#include <WebCore/FetchOptions.h> > #include <WebCore/NetworkLoadInformation.h> > #include <WebCore/ResourceError.h> > #include <wtf/CompletionHandler.h> > #include <wtf/Expected.h> > #include <wtf/Variant.h> > +#include <wtf/WeakPtr.h> > > namespace WebCore { > class ContentSecurityPolicy; > struct ContentSecurityPolicyClient; > +class SecurityOrigin; > +enum class PreflightPolicy : uint8_t; > +enum class StoredCredentialsPolicy : bool; Not sure you still need all this. > Source/WebKit/NetworkProcess/NetworkProcess.h:407 > +#if ENABLE(HTTPS_UPGRADE) > + NetworkHTTPSUpgradeChecker m_networkHTTPSUpgradeChecker; > +#endif // ENABLE(HTTPS_UPGRADE) Could you make NetworkHTTPSUpgradeChecker itself be a singleton, as opposed to adding a member to an existing singleton? Comment on attachment 357507 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=357507&action=review > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:69 > + if (!this->m_database->open(path)) { > + RELEASE_LOG_ERROR_IF_ALLOWED("NetworkHTTPSUpgradeChecker() - Unable to open database."); > + return; > + } > + > + if (this->m_statement->prepare() != SQLITE_OK) { > + RELEASE_LOG_ERROR_IF_ALLOWED("NetworkHTTPSUpgradeChecker() - Unable to prepare statement."); > + return; > + } Under what scenario would the HTTPS_UPGRADE feature be enabled with a read-only database in the WebKit bundle that can't be opened? Same question for failures to prepare our hard-coded SQL query. Can these be assertions instead? > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:101 > + if (!this->m_isSetupComplete) { > + RELEASE_LOG_IF_ALLOWED("query - Setup is not complete, ignoring query."); > + runCallbackOnMainThread(false); > + return; > + } The logging here makes it sound like you are ignoring the query because setup is not yet complete, but that's impossible. By the time this lambda executes on the work queue setup will have completed (possibly with an error). I think what you really want to check is whether an error occurred in the setup lambda, which means m_isSetupComplete probably needs a better name. Comment on attachment 357507 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=357507&action=review >> Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:59 >> + } > > Under what scenario would the HTTPS_UPGRADE feature be enabled with a missing database? Can this be an ASSERT instead? Can't think of one, I will make this an assert >> Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.h:47 >> + bool isSetupComplete() const { return m_isSetupComplete; }; > > Not sure you really need this function. I see where you call it below, but it seems unnecessary because query() will already do the right thing when m_isSetupComplete is false. The benefit of removing this function and letting query() check setup completion on the work queue is that you don't need to make m_isSetupComplete an std::atomic. Removing this and leaving query() as-is, would result in a slightly different behavior than I was going for. Since running "setup" can take a few milliseconds on some devices, we don't want to block network requests on that. We decided that if we make a network request before setup is complete, we will ignore upgrading those requests. That's why I think I still need property or something like it. In the past I considered, having "NetworkHTTPSUpgradeChecker" own this logic. If its internal setup is not yet complete, it bails out early without checking the database. I feel like this approach creates a weird API: only return correct answers after internal setup is complete, but clients have no way of knowing when that happens. It also makes this class hard to test for correctness. Another idea would be to give query() another param `isBlocking` to determine whether or not it should wait for setup, but I feel like this is weird API too. >> Source/WebKit/NetworkProcess/NetworkLoadChecker.cpp:216 >> + } > > I think you should let NetworkHTTPSUpgradeChecker::query() do this check on the work queue. If I do this check on the work queue, it means we have to wait for setup is complete. The point of having this check is to avoid waiting for setup to complete. >> Source/WebKit/NetworkProcess/NetworkLoadChecker.h:44 >> +enum class StoredCredentialsPolicy : bool; > > Not sure you still need all this. I think I had to add these as a side-effect of unified builds. After I added NetworkHTTPSUpgradeChecker.cpp to Sources.txt, I got a bunch of errors in this file. This is what I came up with after fixing all the errors. >> Source/WebKit/NetworkProcess/NetworkProcess.h:407 >> +#endif // ENABLE(HTTPS_UPGRADE) > > Could you make NetworkHTTPSUpgradeChecker itself be a singleton, as opposed to adding a member to an existing singleton? I can. This approach is what Chris suggested, but I don't feel strongly one way or another. Created attachment 357587 [details]
Patch
Comment on attachment 357587 [details]
Patch
Andy already approved this patch. Moving to commit-queue+
Comment on attachment 357587 [details]
Patch
Modified the wrong bug.
Comment on attachment 357587 [details]
Patch
Uploaded the wrong patch, obsoleting it.
Created attachment 357640 [details]
Patch
> > Could you make NetworkHTTPSUpgradeChecker itself be a singleton, as opposed to adding a member to an existing singleton?
>
> I can. This approach is what Chris suggested, but I don't feel strongly one
> way or another.
Having fewer singletons has some benefits, especially if one can meaningfully be owned by another.
For instance, in network process, we had to make a used-to-be singleton class non singleton.
I spoke to Andy in-person about this. We decided to leave NetworkHTTPSUpgradeChecker where it is now. Were any other formats for the database considered other than sqlite? Given that list seems to be immutable (at least from WebCore's perspective), something like a persistent on-disk trie might be more efficient here. > Were any other formats for the database considered other than sqlite? I considered 3 formats so far: A. simple sqlite table 1 column with index on it (what is implemented now) B. sqlite table with 2 columns: domain and 4 byte hash. The hash column is indexed. C. a custom implementation of "Hierarchies of Indexes" (Kärkkäinen J., Srinivasa Rao S. (2003) Full-Text Indexes in External Memory). Basically the format reduces disk seeks by using a small binary search tree to index into other binary search trees. All the options meet our performance threshold for devices with fast SSDs (iPhone, iPad, SSD Macs). They can all search a list of up to 10M domains in <1ms. Format A is the fastest, then B, then C. For devices with slower disks (SSD macs and Apple Watch) format C is the fastest, then B, then A. In terms of disk-space overhead format C is the best (6 bytes per domain), then B (28 bytes per domain), then A (40 bytes per domain). I've implemented format A for now, but since we would like to support HDD macs, I may switch to format B in another patch. Format C needs significant testing before we can consider using it. Feel free to send me an email/meet me in-person if you would like to discuss in more detail. > Given that list seems to be immutable (at least from WebCore's perspective), something like a persistent on-disk trie might be more efficient here. I'm not very familiar with on-disk tries. Are you thinking of something like what is described in "M. Mosharaf Kabir Chowdhury, N & Mostofa Akbar, Md & Kaykobad, Mohammad. (2007). DiskTrie: An Efficient Data Structure using Flash Memory for Mobile Devices." ? It seems like DiskTrie works best on disks with fast random access like SSDs. I'm not sure how well it would perform on HDD macs or Apple Watch. If you know of a open-source implementation I might be able to try it out. Even if its better, I think we should consider switching to it in a future patch. Comment on attachment 357640 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=357640&action=review > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:44 > +static constexpr auto httpsUpgradeCheckerSQLQuery = "SELECT host FROM hosts WHERE host = ?;"_s; Not convinced with need this global, we could just inline it. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:46 > +static const String& NetworkHTTPSUpgradeCheckerDatabasePath() This should not start with an upper case latter. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:60 > + , m_database(makeUniqueRef<WebCore::SQLiteDatabase>()) You're likely missing a call to SQLiteDatabase::disableThreadingChecks() since you're using a worker queue and it means the db may be used from different thread (even though sequentially). I expect you'd see assertions getting hit in debug builds. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:87 > + ASSERT(isMainThread()); We normally use ASSERT(RunLoop::isMain()); in WK2 code. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:89 > + m_workQueue->dispatch([this, host = WTFMove(host), callback = WTFMove(callback)] () mutable { host = host.isolatedCopy() for thread safety. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:103 > + WTF::RunLoop::main().dispatch([foundHost, callback = WTFMove(callback)] () mutable { You should not need WTF:: prefix. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:111 > + // FIXME: how should I implement this? Definitely not OK, pass m_sessionID to the query() method then rely on sessionID.isAlwaysOnLoggingAllowed(). > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.h:57 > + std::atomic<bool> m_didSetupCompleteSuccessfully { false }; We live to put boolean members last. > Source/WebKit/NetworkProcess/NetworkLoadChecker.cpp:218 > + httpsUpgradeChecker.query(url.host().toString(), [request = WTFMove(request), handler = WTFMove(handler)] (bool foundHost) mutable { You'll likely need to pass m_sessionID to query() here. > Source/WebKit/NetworkProcess/NetworkProcess.h:33 > +#if ENABLE(HTTPS_UPGRADE) Not needed, you can include unconditionally. > Source/WebKit/NetworkProcess/NetworkProcess.h:408 > + Unnecessary new line. > Source/WebKit/NetworkProcess/NetworkProcess.h:409 > + Unnecessary new line. > You're likely missing a call to SQLiteDatabase::disableThreadingChecks() since you're using a worker queue and it means the db may be used from different thread (even though sequentially). I expect you'd see assertions getting hit in debug builds.
I have not seen assert in debug builds yet, but maybe I have just been lucky so far. I'll make the change.
Created attachment 357726 [details]
Patch
Created attachment 357727 [details]
Patch
Comment on attachment 357727 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=357727&action=review r=me with suggestion. > Source/WebKit/NetworkProcess/NetworkHTTPSUpgradeChecker.cpp:68 > + ASSERT(isDatabaseOpen); I think we should return early if !isDatabaseOpen in release so that m_didSetupCompleteSuccessfully never gets set to true. Created attachment 357730 [details]
Patch
Comment on attachment 357730 [details]
Patch
Chris approved my last patch with a suggestion. Moving to commit-queue+
Comment on attachment 357730 [details]
Patch
BTW, why all the explicit this-> calls in this patch? We normally prefer the omit them unless one of the compiler complains.
(In reply to Vivek Seth from comment #24) > Comment on attachment 357730 [details] > Patch > > Chris approved my last patch with a suggestion. Moving to commit-queue+ Can we see if we can get rid of all the this-> calls before committing? It would make the code look much nicer. iirc, gcc only has trouble with nested lambdas which your patch does not use. Created attachment 357739 [details]
Patch
(In reply to Chris Dumez from comment #25) > Comment on attachment 357730 [details] > Patch > > BTW, why all the explicit this-> calls in this patch? We normally prefer the > omit them unless one of the compiler complains. I did it to preemptively fix GCC bugs. Didn't realize the bug only happened with nested lambdas. Removed the "this->"s and submitted another patch. Comment on attachment 357739 [details]
Patch
All bots have built my patch successfully, but I just realized they are building my patch with the feature flag disabled. I built WebKit locally with the feature flag enabled and it seems to be okay to omit the "this->".
Moving to commit-queue+
(In reply to Vivek Seth from comment #17) > > Were any other formats for the database considered other than sqlite? > > I considered 3 formats so far: > > A. simple sqlite table 1 column with index on it (what is implemented now) > B. sqlite table with 2 columns: domain and 4 byte hash. The hash column is > indexed. > C. a custom implementation of "Hierarchies of Indexes" (Kärkkäinen J., > Srinivasa Rao S. (2003) Full-Text Indexes in External Memory). Basically the > format reduces disk seeks by using a small binary search tree to index into > other binary search trees. > > All the options meet our performance threshold for devices with fast SSDs > (iPhone, iPad, SSD Macs). They can all search a list of up to 10M domains in > <1ms. Format A is the fastest, then B, then C. > > For devices with slower disks (SSD macs and Apple Watch) format C is the > fastest, then B, then A. > > In terms of disk-space overhead format C is the best (6 bytes per domain), > then B (28 bytes per domain), then A (40 bytes per domain). > > I've implemented format A for now, but since we would like to support HDD > macs, I may switch to format B in another patch. Format C needs significant > testing before we can consider using it. > > Feel free to send me an email/meet me in-person if you would like to discuss > in more detail. > Thanks for filling in these details! This would be great information to include in your ChangeLog. Comment on attachment 357739 [details] Patch Clearing flags on attachment: 357739 Committed r239423: <https://trac.webkit.org/changeset/239423> All reviewed patches have been landed. Closing bug. |