WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
201734
results.webkit.org: Shard result archives
https://bugs.webkit.org/show_bug.cgi?id=201734
Summary
results.webkit.org: Shard result archives
Jonathan Bedard
Reported
2019-09-12 13:00:34 PDT
We start having issues as our result archives approach 50 MB. Note that with this change, we will lose existing result archives, but we're still storing them on Buildbot, so that's not a big deal.
Attachments
Patch
(12.64 KB, patch)
2019-09-12 13:32 PDT
,
Jonathan Bedard
no flags
Details
Formatted Diff
Diff
Patch
(12.76 KB, patch)
2019-09-13 17:05 PDT
,
Jonathan Bedard
no flags
Details
Formatted Diff
Diff
Patch
(12.84 KB, patch)
2019-09-16 15:51 PDT
,
Jonathan Bedard
no flags
Details
Formatted Diff
Diff
Patch for landing
(12.87 KB, patch)
2019-09-16 15:59 PDT
,
Jonathan Bedard
no flags
Details
Formatted Diff
Diff
Patch for landing
(12.87 KB, patch)
2019-09-16 16:11 PDT
,
Jonathan Bedard
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Jonathan Bedard
Comment 1
2019-09-12 13:32:48 PDT
Created
attachment 378672
[details]
Patch
Aakash Jain
Comment 2
2019-09-12 14:37:24 PDT
(In reply to Jonathan Bedard from
comment #0
)
> We start having issues as our result archives approach 50 MB.
What kind of issues? Can you elaborate more on the problem and the fix.
Jonathan Bedard
Comment 3
2019-09-12 15:39:43 PDT
(In reply to Aakash Jain from
comment #2
)
> (In reply to Jonathan Bedard from
comment #0
) > > We start having issues as our result archives approach 50 MB. > What kind of issues? Can you elaborate more on the problem and the fix.
'Issues' meaning the writes would fail. The solution I've implemented here is basically the one suggested in
https://cwiki.apache.org/confluence/display/CASSANDRA2/CassandraLimitations
. Instead of placing the entirety of the archive in a single row, we split the archive into multiple rows, each with a blob of no more than 10 MB. Added bonus, if we wanted to change the size of the blob shards in the future, we can just leave the existing data as is.
Alexey Proskuryakov
Comment 4
2019-09-12 17:38:14 PDT
Have you considered storing these in S3?
Jonathan Bedard
Comment 5
2019-09-13 07:32:54 PDT
(In reply to Alexey Proskuryakov from
comment #4
)
> Have you considered storing these in S3?
I have not. It's probably the case that S3 is better suited for this sort of thing, but the results database doesn't have an S3 dependency at the moment, so I'm a bit reluctant to add one. If we were storing these in S3, I think we would also have to solve the key-rotation problem for test archives originating from Apple's internal infrastructure, which we know is not a trivial problem to solve.
Aakash Jain
Comment 6
2019-09-13 12:21:29 PDT
(In reply to Jonathan Bedard from
comment #5
)
> If we were storing these in S3, I think we would also have to solve the key-rotation problem for test archives originating from Apple's internal infrastructure, which we know is not a trivial problem to solve.
I believe we have already solved that problem with EWS and build.webkit.org. The archive is uploaded by the bots to the build-master, and build-master uploads to S3. Only build-master has the required authentication to upload to S3.
Jonathan Bedard
Comment 7
2019-09-13 12:31:50 PDT
(In reply to Aakash Jain from
comment #6
)
> (In reply to Jonathan Bedard from
comment #5
) > > If we were storing these in S3, I think we would also have to solve the key-rotation problem for test archives originating from Apple's internal infrastructure, which we know is not a trivial problem to solve. > I believe we have already solved that problem with EWS and build.webkit.org. > The archive is uploaded by the bots to the build-master, and build-master > uploads to S3. Only build-master has the required authentication to upload > to S3.
That's the authentication problem, we would need in-place encryption with rotating keys.
Alexey Proskuryakov
Comment 8
2019-09-13 13:44:48 PDT
Why do we need encryption?
Jonathan Bedard
Comment 9
2019-09-13 13:51:00 PDT
(In reply to Alexey Proskuryakov from
comment #8
)
> Why do we need encryption?
We would need in-place encryption to use S3 for storing Apple Internal test results, at least according to the guidance we have recently been given for storing similar archives.
Aakash Jain
Comment 10
2019-09-13 15:20:50 PDT
Comment on
attachment 378672
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=378672&action=review
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:51 > + SHARD_SIZE = 10 * 1000 * 1000 # Shards of 10 MB
Would be good to elaborate in the comment here that this is because of Cassandra limitation.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:52 > + MEMORY_CAP = 2 * 1000 * 1000 * 1000 # Don't allow more than 2 gigs of archives in memory at one time
MAX_MEMORY or MEMORY_LIMIT might be more clear.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:54 > + class ArchiveMetaByCommit(ClusteredByConfiguration):
please rename 'meta' to 'metadata' everywhere.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:61 > + meta = columns.Text(required=True)
Ditto, 'meta' => 'metadata'. I am not sure if it's efficient to combine two columns (hash and size) into one (as json) here. Probably two separate columns (hash and size) would be more clear and efficient. For relational database it violates first normal form
https://en.wikipedia.org/wiki/First_normal_form
probably doesn't matter much in Cassandra though.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:72 > + class ArchiveShardByCommit(Model):
what do you think about naming it ArchiveDataByCommit? Not sure if the word 'shard' is immediately obvious to everyone. Or how about 'chunk' instead of 'shard'.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:74 > + digest = columns.Text(partition_key=True, required=True)
please 'digest' to 'hash' everywhere.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:76 > + shard = columns.Blob(required=True)
can consider renaming shard to 'chunk' or 'data'.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:124 > + index = 0
Would be nice to add a comment here that we are sharding the archive here.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:126 > + digest = hashlib.md5()
'digest' => 'hash'
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:132 > + digest=digest.hexdigest(), index=index,
please do .hexdigest() above in line 127.
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:179 > + raise RuntimeError('Archive metadata does not contain a key')
key or hash?
Aakash Jain
Comment 11
2019-09-13 15:42:42 PDT
I am ok with this fix. However, I would encourage you to explore the S3 approach as well. The solution we use here could also be used for EWS and build.webkit.org test archives.
Jonathan Bedard
Comment 12
2019-09-13 17:05:57 PDT
Created
attachment 378764
[details]
Patch
dewei_zhu
Comment 13
2019-09-16 10:39:47 PDT
Comment on
attachment 378764
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=378764&action=review
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:127 > + index = 0
Is 'index' primary key per partition?
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:129 > + digest = hashlib.md5(archive.getvalue()).hexdigest()
Is MD5 good enough?
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:180 > + raise RuntimeError(MEMORY_ERROR)
Can we sum size to ensure we won't hit memory limit before doing any query from ArchiveChunks? Maybe we can do all sanity checks (including the digest existence check) ahead? Also, do you think it's necessary to add a unit test the memory limit gets hit by temporarily modifying the limit?
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:193 > + rows = sorted(rows, key=lambda row: row.index)
Can we get the rows ordered when we query it?
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:208 > + if not result.get(config): > + result[config] = []
Maybe cleaner if we use setdefault here?
> Tools/resultsdbpy/resultsdbpy/model/archive_context_unittest.py:106 > + print(len(buff.getvalue()))
Do we need `print` here or this is just for debugging purpose?
Jonathan Bedard
Comment 14
2019-09-16 13:15:36 PDT
(In reply to dewei_zhu from
comment #13
)
> Comment on
attachment 378764
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=378764&action=review
> > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:127 > > + index = 0 > > Is 'index' primary key per partition?
Yes...it's basically a shards order in the archive.
> > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:129 > > + digest = hashlib.md5(archive.getvalue()).hexdigest() > > Is MD5 good enough?
Yes, Aakash and I talked about this. We can change it to a secure hash, but engineering a hash collision doesn't get get you anything, and you would have to have the API key to upload an archive anyways. Engineering a hash collision would just replace the existing archive with that hash.
> > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:180 > > + raise RuntimeError(MEMORY_ERROR) > > Can we sum size to ensure we won't hit memory limit before doing any query > from ArchiveChunks? Maybe we can do all sanity checks (including the digest > existence check) ahead? > Also, do you think it's necessary to add a unit test the memory limit gets > hit by temporarily modifying the limit? > > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:193 > > + rows = sorted(rows, key=lambda row: row.index) > > Can we get the rows ordered when we query it?
I need to verify, but they probably already are sorted....so we might not need that.
> > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:208 > > + if not result.get(config): > > + result[config] = [] > > Maybe cleaner if we use setdefault here? > > > Tools/resultsdbpy/resultsdbpy/model/archive_context_unittest.py:106 > > + print(len(buff.getvalue())) > > Do we need `print` here or this is just for debugging purpose?
Jonathan Bedard
Comment 15
2019-09-16 15:51:52 PDT
Created
attachment 378905
[details]
Patch
dewei_zhu
Comment 16
2019-09-16 15:55:10 PDT
Comment on
attachment 378905
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=378905&action=review
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:181 > + if not value.get('digest'): > + raise RuntimeError('Archive metadata does not contain a hash')
We can also move this to above for loop so that we fail early with lightweight.
Jonathan Bedard
Comment 17
2019-09-16 15:59:17 PDT
(In reply to dewei_zhu from
comment #16
)
> Comment on
attachment 378905
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=378905&action=review
> > > Tools/resultsdbpy/resultsdbpy/model/archive_context.py:181 > > + if not value.get('digest'): > > + raise RuntimeError('Archive metadata does not contain a hash') > > We can also move this to above for loop so that we fail early with > lightweight.
Actually, we should just ignore it if we don't have a digest (note: that shouldn't be possible anyways, just being defensive)
Jonathan Bedard
Comment 18
2019-09-16 15:59:44 PDT
Created
attachment 378907
[details]
Patch for landing
Russell Epstein
Comment 19
2019-09-16 16:10:36 PDT
Comment on
attachment 378907
[details]
Patch for landing View in context:
https://bugs.webkit.org/attachment.cgi?id=378907&action=review
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:51 > + CHUNK_SIZE = 10 * 1000 * 1000 # Cassandra doesn't do well with data blobs of more than 10 MB
Shouldn't this be 1024 instead?
> Tools/resultsdbpy/resultsdbpy/model/archive_context.py:53 >
Ditto
Jonathan Bedard
Comment 20
2019-09-16 16:11:10 PDT
Created
attachment 378908
[details]
Patch for landing
WebKit Commit Bot
Comment 21
2019-09-16 17:13:22 PDT
Comment on
attachment 378908
[details]
Patch for landing Clearing flags on attachment: 378908 Committed
r249932
: <
https://trac.webkit.org/changeset/249932
>
WebKit Commit Bot
Comment 22
2019-09-16 17:13:24 PDT
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 23
2019-09-16 17:15:27 PDT
<
rdar://problem/55421685
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug