201734 – results.webkit.org: Shard result archives

RESOLVED FIXED 201734

results.webkit.org: Shard result archives

https://bugs.webkit.org/show_bug.cgi?id=201734

Summary results.webkit.org: Shard result archives

Jonathan Bedard

Reported 2019-09-12 13:00:34 PDT

We start having issues as our result archives approach 50 MB. Note that with this change, we will lose existing result archives, but we're still storing them on Buildbot, so that's not a big deal.

Attachments
Patch (12.64 KB, patch) 2019-09-12 13:32 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (12.76 KB, patch) 2019-09-13 17:05 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (12.84 KB, patch) 2019-09-16 15:51 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch for landing (12.87 KB, patch) 2019-09-16 15:59 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch for landing (12.87 KB, patch) 2019-09-16 16:11 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Jonathan Bedard

Comment 1 2019-09-12 13:32:48 PDT

Created attachment 378672 [details] Patch

Aakash Jain

Comment 2 2019-09-12 14:37:24 PDT

(In reply to Jonathan Bedard from comment #0) > We start having issues as our result archives approach 50 MB. What kind of issues? Can you elaborate more on the problem and the fix.

Jonathan Bedard

Comment 3 2019-09-12 15:39:43 PDT

(In reply to Aakash Jain from comment #2) > (In reply to Jonathan Bedard from comment #0) > > We start having issues as our result archives approach 50 MB. > What kind of issues? Can you elaborate more on the problem and the fix. 'Issues' meaning the writes would fail. The solution I've implemented here is basically the one suggested in https://cwiki.apache.org/confluence/display/CASSANDRA2/CassandraLimitations. Instead of placing the entirety of the archive in a single row, we split the archive into multiple rows, each with a blob of no more than 10 MB. Added bonus, if we wanted to change the size of the blob shards in the future, we can just leave the existing data as is.

Alexey Proskuryakov

Comment 4 2019-09-12 17:38:14 PDT

Have you considered storing these in S3?

Jonathan Bedard

Comment 5 2019-09-13 07:32:54 PDT

(In reply to Alexey Proskuryakov from comment #4) > Have you considered storing these in S3? I have not. It's probably the case that S3 is better suited for this sort of thing, but the results database doesn't have an S3 dependency at the moment, so I'm a bit reluctant to add one. If we were storing these in S3, I think we would also have to solve the key-rotation problem for test archives originating from Apple's internal infrastructure, which we know is not a trivial problem to solve.

Aakash Jain

Comment 6 2019-09-13 12:21:29 PDT

(In reply to Jonathan Bedard from comment #5) > If we were storing these in S3, I think we would also have to solve the key-rotation problem for test archives originating from Apple's internal infrastructure, which we know is not a trivial problem to solve. I believe we have already solved that problem with EWS and build.webkit.org. The archive is uploaded by the bots to the build-master, and build-master uploads to S3. Only build-master has the required authentication to upload to S3.

Jonathan Bedard

Comment 7 2019-09-13 12:31:50 PDT

(In reply to Aakash Jain from comment #6) > (In reply to Jonathan Bedard from comment #5) > > If we were storing these in S3, I think we would also have to solve the key-rotation problem for test archives originating from Apple's internal infrastructure, which we know is not a trivial problem to solve. > I believe we have already solved that problem with EWS and build.webkit.org. > The archive is uploaded by the bots to the build-master, and build-master > uploads to S3. Only build-master has the required authentication to upload > to S3. That's the authentication problem, we would need in-place encryption with rotating keys.

Alexey Proskuryakov

Comment 8 2019-09-13 13:44:48 PDT

Why do we need encryption?

Jonathan Bedard

Comment 9 2019-09-13 13:51:00 PDT

(In reply to Alexey Proskuryakov from comment #8) > Why do we need encryption? We would need in-place encryption to use S3 for storing Apple Internal test results, at least according to the guidance we have recently been given for storing similar archives.

Aakash Jain

Comment 10 2019-09-13 15:20:50 PDT

Comment on attachment 378672 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=378672&action=review > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:51 > + SHARD_SIZE = 10 * 1000 * 1000 # Shards of 10 MB Would be good to elaborate in the comment here that this is because of Cassandra limitation. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:52 > + MEMORY_CAP = 2 * 1000 * 1000 * 1000 # Don't allow more than 2 gigs of archives in memory at one time MAX_MEMORY or MEMORY_LIMIT might be more clear. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:54 > + class ArchiveMetaByCommit(ClusteredByConfiguration): please rename 'meta' to 'metadata' everywhere. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:61 > + meta = columns.Text(required=True) Ditto, 'meta' => 'metadata'. I am not sure if it's efficient to combine two columns (hash and size) into one (as json) here. Probably two separate columns (hash and size) would be more clear and efficient. For relational database it violates first normal form https://en.wikipedia.org/wiki/First_normal_form probably doesn't matter much in Cassandra though. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:72 > + class ArchiveShardByCommit(Model): what do you think about naming it ArchiveDataByCommit? Not sure if the word 'shard' is immediately obvious to everyone. Or how about 'chunk' instead of 'shard'. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:74 > + digest = columns.Text(partition_key=True, required=True) please 'digest' to 'hash' everywhere. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:76 > + shard = columns.Blob(required=True) can consider renaming shard to 'chunk' or 'data'. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:124 > + index = 0 Would be nice to add a comment here that we are sharding the archive here. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:126 > + digest = hashlib.md5() 'digest' => 'hash' > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:132 > + digest=digest.hexdigest(), index=index, please do .hexdigest() above in line 127. > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:179 > + raise RuntimeError('Archive metadata does not contain a key') key or hash?

Aakash Jain

Comment 11 2019-09-13 15:42:42 PDT

I am ok with this fix. However, I would encourage you to explore the S3 approach as well. The solution we use here could also be used for EWS and build.webkit.org test archives.

Jonathan Bedard

Comment 12 2019-09-13 17:05:57 PDT

Created attachment 378764 [details] Patch

dewei_zhu

Comment 13 2019-09-16 10:39:47 PDT

Comment on attachment 378764 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=378764&action=review > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:127 > + index = 0 Is 'index' primary key per partition? > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:129 > + digest = hashlib.md5(archive.getvalue()).hexdigest() Is MD5 good enough? > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:180 > + raise RuntimeError(MEMORY_ERROR) Can we sum size to ensure we won't hit memory limit before doing any query from ArchiveChunks? Maybe we can do all sanity checks (including the digest existence check) ahead? Also, do you think it's necessary to add a unit test the memory limit gets hit by temporarily modifying the limit? > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:193 > + rows = sorted(rows, key=lambda row: row.index) Can we get the rows ordered when we query it? > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:208 > + if not result.get(config): > + result[config] = [] Maybe cleaner if we use setdefault here? > Tools/resultsdbpy/resultsdbpy/model/archive_context_unittest.py:106 > + print(len(buff.getvalue())) Do we need `print` here or this is just for debugging purpose?

Jonathan Bedard

Comment 14 2019-09-16 13:15:36 PDT

(In reply to dewei_zhu from comment #13) > Comment on attachment 378764 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=378764&action=review > > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:127 > > + index = 0 > > Is 'index' primary key per partition? Yes...it's basically a shards order in the archive. > > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:129 > > + digest = hashlib.md5(archive.getvalue()).hexdigest() > > Is MD5 good enough? Yes, Aakash and I talked about this. We can change it to a secure hash, but engineering a hash collision doesn't get get you anything, and you would have to have the API key to upload an archive anyways. Engineering a hash collision would just replace the existing archive with that hash. > > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:180 > > + raise RuntimeError(MEMORY_ERROR) > > Can we sum size to ensure we won't hit memory limit before doing any query > from ArchiveChunks? Maybe we can do all sanity checks (including the digest > existence check) ahead? > Also, do you think it's necessary to add a unit test the memory limit gets > hit by temporarily modifying the limit? > > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:193 > > + rows = sorted(rows, key=lambda row: row.index) > > Can we get the rows ordered when we query it? I need to verify, but they probably already are sorted....so we might not need that. > > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:208 > > + if not result.get(config): > > + result[config] = [] > > Maybe cleaner if we use setdefault here? > > > Tools/resultsdbpy/resultsdbpy/model/archive_context_unittest.py:106 > > + print(len(buff.getvalue())) > > Do we need `print` here or this is just for debugging purpose?

Jonathan Bedard

Comment 15 2019-09-16 15:51:52 PDT

Created attachment 378905 [details] Patch

dewei_zhu

Comment 16 2019-09-16 15:55:10 PDT

Comment on attachment 378905 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=378905&action=review > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:181 > + if not value.get('digest'): > + raise RuntimeError('Archive metadata does not contain a hash') We can also move this to above for loop so that we fail early with lightweight.

Jonathan Bedard

Comment 17 2019-09-16 15:59:17 PDT

(In reply to dewei_zhu from comment #16) > Comment on attachment 378905 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=378905&action=review > > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:181 > > + if not value.get('digest'): > > + raise RuntimeError('Archive metadata does not contain a hash') > > We can also move this to above for loop so that we fail early with > lightweight. Actually, we should just ignore it if we don't have a digest (note: that shouldn't be possible anyways, just being defensive)

Jonathan Bedard

Comment 18 2019-09-16 15:59:44 PDT

Created attachment 378907 [details] Patch for landing

Russell Epstein

Comment 19 2019-09-16 16:10:36 PDT

Comment on attachment 378907 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=378907&action=review > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:51 > + CHUNK_SIZE = 10 * 1000 * 1000 # Cassandra doesn't do well with data blobs of more than 10 MB Shouldn't this be 1024 instead? > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:53 > Ditto

Jonathan Bedard

Comment 20 2019-09-16 16:11:10 PDT

Created attachment 378908 [details] Patch for landing

WebKit Commit Bot

Comment 21 2019-09-16 17:13:22 PDT

Comment on attachment 378908 [details] Patch for landing Clearing flags on attachment: 378908 Committed r249932: <https://trac.webkit.org/changeset/249932>

WebKit Commit Bot

Comment 22 2019-09-16 17:13:24 PDT

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 23 2019-09-16 17:15:27 PDT

<rdar://problem/55421685>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Jonathan Bedard

Reported

2019-09-12 13:00 PDT

Modified

2019-09-16 17:15 PDT History

CC List

9 users Show

URL

Keywords InRadar

Depends on

Blocks