Bug 223808

Summary: Service Worker scripts use too much memory in the network process
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: Service WorkersAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: beidson, benjamin, cgarcia, cmarcelo, darin, drousso, ews-watchlist, ggaren, joepeck, koivisto, mkwst, simon.fraser, webkit-bug-importer, youennf, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=224015
https://bugs.webkit.org/show_bug.cgi?id=224059
https://bugs.webkit.org/show_bug.cgi?id=224175
Attachments:
Description Flags
WIP Patch
none
WIP Patch
ews-feeder: commit-queue-
WIP Patch
ews-feeder: commit-queue-
WIP Patch
none
WIP Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
ews-feeder: commit-queue-
Patch none

Description Chris Dumez 2021-03-26 11:32:14 PDT
Service Worker scripts use to much memory in the network process.
Comment 1 Chris Dumez 2021-03-26 11:32:29 PDT
<rdar://75637093>
Comment 2 Chris Dumez 2021-03-26 11:33:24 PDT
Created attachment 424382 [details]
WIP Patch
Comment 3 Chris Dumez 2021-03-26 11:52:48 PDT
Created attachment 424387 [details]
WIP Patch
Comment 4 Chris Dumez 2021-03-26 13:09:36 PDT
Created attachment 424395 [details]
WIP Patch
Comment 5 Chris Dumez 2021-03-26 13:20:38 PDT
Created attachment 424396 [details]
WIP Patch
Comment 6 Chris Dumez 2021-03-26 14:03:03 PDT
Created attachment 424402 [details]
WIP Patch
Comment 7 Chris Dumez 2021-03-26 15:51:22 PDT
Created attachment 424411 [details]
Patch
Comment 8 Chris Dumez 2021-03-26 17:09:48 PDT
Created attachment 424421 [details]
Patch
Comment 9 Chris Dumez 2021-03-26 19:14:00 PDT
Created attachment 424431 [details]
Patch
Comment 10 Chris Dumez 2021-03-26 21:13:47 PDT
Created attachment 424440 [details]
Patch
Comment 11 Chris Dumez 2021-03-29 14:18:29 PDT
Created attachment 424577 [details]
Patch
Comment 12 Chris Dumez 2021-03-29 15:07:59 PDT
Created attachment 424589 [details]
Patch
Comment 13 Chris Dumez 2021-03-29 15:51:26 PDT
Created attachment 424597 [details]
Patch
Comment 14 Chris Dumez 2021-03-29 17:05:52 PDT
Created attachment 424604 [details]
Patch
Comment 15 Chris Dumez 2021-03-29 17:27:22 PDT
Created attachment 424605 [details]
Patch
Comment 16 Chris Dumez 2021-03-30 11:19:04 PDT
Created attachment 424663 [details]
Patch
Comment 17 Chris Dumez 2021-03-30 11:20:15 PDT
I did manual memory testing with a very large service worker that had one ~100MB main
script, importing another ~100MB script.

Here are the memory results without my change:
- Networking (Dirty): 238MB (cold) / 331MB (warm)
- WebContent (Dirty): 441MB (cold) / 857MB (warm)

Here are the memory results with my change:
- Networking (Dirty): 22MB (cold) / 14MB (warm)
- WebContent (Dirty): 440MB (cold) / 438MB (warm)
Comment 18 Geoffrey Garen 2021-03-30 12:29:12 PDT
Comment on attachment 424663 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424663&action=review

Looks good. A few comments. I'm not going to say r+ yet because I think the salt issue might change the on-disk representation a smidge, and I'd hate to have to bump the version again just for that.

> Source/WebCore/ChangeLog:15
> +        and greatly reduce memory usage.

reduce [ dirty ] memory usage

> Source/WebCore/ChangeLog:36
> +        - WebContent (Dirty): 440MB (cold) / 438MB (warm)

I wonder why the warm case used to use 1.5X - 2X more memory than the cold case. Maybe this won't matter once we're done optimizing; but it almost suggests that we have two copies of the script somewhere in both networking and WebContent.

> Source/WebCore/workers/service/server/RegistrationDatabase.cpp:53
> -static const uint64_t schemaVersion = 5;
> +static const uint64_t schemaVersion = 6;

Is this schema bump sufficient to delete the old database, or do we need to do something explicit? (We want to avoid leaving old data behind, including in the case of a restore from a backup from a previous version of WebKit.)

> Source/WebCore/workers/service/server/SWScriptStorage.cpp:45
> +    auto crypto = PAL::CryptoDigest::create(PAL::CryptoDigest::Algorithm::SHA_256);
> +    auto inputUtf8 = input.utf8();
> +    crypto->addBytes(inputUtf8.data(), inputUtf8.length());
> +    auto hash = crypto->computeHash();

Is there anything in this code that will add a salt to the hash? If not, I think we need to add a salt -- otherwise the hash does not confer much secrecy. (Specifically, anyone searching for a specific URL can just compute the hash for themselves and then compare. And perhaps there are other attacks; I'm not an expert, but I know that salt is best practice and also what we use in the HTTP cache.)

> Source/WebCore/workers/service/server/SWScriptStorage.cpp:94
> +    // If mmap() fails, fall back to less efficient way.
> +    auto fileHandle = FileSystem::openFile(scriptPath, FileSystem::FileOpenMode::Write);
> +    if (!FileSystem::isHandleValid(fileHandle)) {
> +        RELEASE_LOG_ERROR(ServiceWorker, "SWScriptStorage::store: Failure to store %s, FileSystem::openFile() failed", scriptPath.utf8().data());
> +        return nullptr;
> +    }
> +    for (auto it = script.begin(); it != script.end(); ++it)
> +        FileSystem::writeToFile(fileHandle, it->segment->data(), it->segment->size());
> +
> +    FileSystem::closeFile(fileHandle);
> +    return script.copy();

Do we expect mmap to fail in any reasonable case? If not, I think we should just fail the operation instead of falling back. I don't love having fallback behavior that we can't test.
Comment 19 Chris Dumez 2021-03-30 12:38:54 PDT
Comment on attachment 424663 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424663&action=review

>> Source/WebCore/ChangeLog:15
>> +        and greatly reduce memory usage.
> 
> reduce [ dirty ] memory usage

OK

>> Source/WebCore/ChangeLog:36
>> +        - WebContent (Dirty): 440MB (cold) / 438MB (warm)
> 
> I wonder why the warm case used to use 1.5X - 2X more memory than the cold case. Maybe this won't matter once we're done optimizing; but it almost suggests that we have two copies of the script somewhere in both networking and WebContent.

Yes, I think this deserves a separate investigation. I could not believe the 857MB at first so I took the measurement twice...

>> Source/WebCore/workers/service/server/RegistrationDatabase.cpp:53
>> +static const uint64_t schemaVersion = 6;
> 
> Is this schema bump sufficient to delete the old database, or do we need to do something explicit? (We want to avoid leaving old data behind, including in the case of a restore from a backup from a previous version of WebKit.)

This is sufficient because the following will get called and remove old databases:
```
static inline void cleanOldDatabases(const String& databaseDirectory)
{
    for (uint64_t version = 1; version < schemaVersion; ++version)
        SQLiteFileSystem::deleteDatabaseFile(FileSystem::pathByAppendingComponent(databaseDirectory, databaseFilenameFromVersion(version)));
}
```

>> Source/WebCore/workers/service/server/SWScriptStorage.cpp:45
>> +    auto hash = crypto->computeHash();
> 
> Is there anything in this code that will add a salt to the hash? If not, I think we need to add a salt -- otherwise the hash does not confer much secrecy. (Specifically, anyone searching for a specific URL can just compute the hash for themselves and then compare. And perhaps there are other attacks; I'm not an expert, but I know that salt is best practice and also what we use in the HTTP cache.)

Ok, I will look into adding a Salt. I did not realized we were concerned about this sort of attack.

>> Source/WebCore/workers/service/server/SWScriptStorage.cpp:94
>> +    return script.copy();
> 
> Do we expect mmap to fail in any reasonable case? If not, I think we should just fail the operation instead of falling back. I don't love having fallback behavior that we can't test.

Ok, so there is definitely one case:
1. size is 0 (I found out because some of our API test use empty files). FileSystem::mapToFile() definitely fails with an empty file.

This code path is definitely tested by API tests.

Then there is a possible second case I thought of:
2. I am assuming the system has some sort of limits on how many files you can have mmap'd() at the same time. If so, once we reach that limit, it is nicer to fall back to basic writing.
Comment 20 Chris Dumez 2021-03-30 12:42:50 PDT
Comment on attachment 424663 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424663&action=review

>>> Source/WebCore/workers/service/server/SWScriptStorage.cpp:94
>>> +    return script.copy();
>> 
>> Do we expect mmap to fail in any reasonable case? If not, I think we should just fail the operation instead of falling back. I don't love having fallback behavior that we can't test.
> 
> Ok, so there is definitely one case:
> 1. size is 0 (I found out because some of our API test use empty files). FileSystem::mapToFile() definitely fails with an empty file.
> 
> This code path is definitely tested by API tests.
> 
> Then there is a possible second case I thought of:
> 2. I am assuming the system has some sort of limits on how many files you can have mmap'd() at the same time. If so, once we reach that limit, it is nicer to fall back to basic writing.

Also note that the existing SharedBuffer::createWithContentsOfFile() does:
1. Try to mmap() file and return that if successful
2. Read file into memory

So my storing behavior is consistent with our retrieval behavior.
Comment 21 Geoffrey Garen 2021-03-30 12:53:29 PDT
> > Do we expect mmap to fail in any reasonable case? If not, I think we should just fail the operation instead of falling back. I don't love having fallback behavior that we can't test.
> 
> Ok, so there is definitely one case:
> 1. size is 0 (I found out because some of our API test use empty files).
> FileSystem::mapToFile() definitely fails with an empty file.

This behavior in mapToFile() seems like a bug? Zero-byte files are definitely a valid thing:

~> touch test
~> ls -al test
-rw-r--r--  1 ggaren  staff  0 Mar 30 12:49 test

> Then there is a possible second case I thought of:
> 2. I am assuming the system has some sort of limits on how many files you
> can have mmap'd() at the same time. If so, once we reach that limit, it is
> nicer to fall back to basic writing.

There *is* a limit (per process, per user, and per device) on number of open files, but it applies equally to mmap and other file descriptors. So, if you hit that limit, mapToFile() will fail, but so will openFile()!

So, like, if we had a test for this condition, it would be failing.
Comment 22 Chris Dumez 2021-03-30 14:09:08 PDT
Created attachment 424693 [details]
Patch
Comment 23 Chris Dumez 2021-03-30 14:10:31 PDT
(In reply to Chris Dumez from comment #22)
> Created attachment 424693 [details]
> Patch

Made the following suggested changes:
1. Added a Salt when computing the hashes. I moved the salt logic from NetworkCacheData to FileSystem so that I could reuse it.
2. Made mapToFile() return a valid mapped file, even if the file size is 0.
3. Drop logic to try writing manually to file when mapToFile() fails.
Comment 24 Darin Adler 2021-03-30 14:16:31 PDT
What do we do to prevent ourselves from mapping an arbitrarily large number of files? I assume we can’t do that safely without having a bad effect on the rest of the system.
Comment 25 Chris Dumez 2021-03-30 14:19:30 PDT
Comment on attachment 424693 [details]
Patch

Just noticed something that probably needs fixing.
Comment 26 Chris Dumez 2021-03-30 14:48:42 PDT
(In reply to Darin Adler from comment #24)
> What do we do to prevent ourselves from mapping an arbitrarily large number
> of files? I assume we can’t do that safely without having a bad effect on
> the rest of the system.

That is a good question. I am not sure how we deal with this on the network cache side (where we do the same mmap'ing). I will try and find out.

I currently do not have a limit in my patch.
Comment 27 Chris Dumez 2021-03-30 15:07:19 PDT
(In reply to Chris Dumez from comment #26)
> (In reply to Darin Adler from comment #24)
> > What do we do to prevent ourselves from mapping an arbitrarily large number
> > of files? I assume we can’t do that safely without having a bad effect on
> > the rest of the system.
> 
> That is a good question. I am not sure how we deal with this on the network
> cache side (where we do the same mmap'ing). I will try and find out.
> 
> I currently do not have a limit in my patch.

I am having trouble finding out how we're dealing with this in the network cache code. I am open to suggestions here. It seems that we could avoid using mmap() if the size is smallish to avoid mmap() overuse. We may also want to have a limit on the total number of mmap'd files the SWScriptStorage can have (and fallback to reading file from disk)?
Comment 28 Chris Dumez 2021-03-30 15:50:26 PDT
Created attachment 424708 [details]
Patch
Comment 29 Chris Dumez 2021-03-30 15:51:34 PDT
(In reply to Chris Dumez from comment #28)
> Created attachment 424708 [details]
> Patch

1. Fixed regression in my patch where NetworkCacheData::mapToFile() was no longer calling NetworkCacheData::apply() to write the data.
2. Do not use mmap() for files < PAGE_SIZE, similarly to what the network cache is doing.
Comment 30 Chris Dumez 2021-03-30 15:53:02 PDT
Created attachment 424709 [details]
Patch
Comment 31 Geoffrey Garen 2021-03-30 15:55:47 PDT
Comment on attachment 424709 [details]
Patch

r=me
Comment 32 Chris Dumez 2021-03-30 19:53:45 PDT
Created attachment 424729 [details]
Patch
Comment 33 Chris Dumez 2021-03-30 19:58:07 PDT
Created attachment 424730 [details]
Patch
Comment 34 Chris Dumez 2021-03-30 20:56:30 PDT
I will follow-up to restrict the number of open file descriptors in the network process since the solution likely needs to cover both the network cache and this service worker storage (and seems more likely to be an issue in the network cache case).
Comment 35 EWS 2021-03-30 21:25:05 PDT
Committed r275267: <https://commits.webkit.org/r275267>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 424730 [details].