Bug 210659 - [EME][CDMProxy] Sort key status array lexicographically by key IDs
Summary: [EME][CDMProxy] Sort key status array lexicographically by key IDs
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Charlie Turner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-17 09:46 PDT by Charlie Turner
Modified: 2020-04-23 07:50 PDT (History)
9 users (show)

See Also:


Attachments
Patch (4.22 KB, patch)
2020-04-17 10:57 PDT, Charlie Turner
no flags Details | Formatted Diff | Diff
Patch (3.93 KB, patch)
2020-04-21 08:23 PDT, Charlie Turner
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Charlie Turner 2020-04-17 09:46:44 PDT
[EME][CDMProxy] Sort key status array lexicographically by key IDs
Comment 1 Charlie Turner 2020-04-17 10:57:34 PDT
Created attachment 396773 [details]
Patch
Comment 2 Darin Adler 2020-04-17 11:18:56 PDT
Comment on attachment 396773 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396773&action=review

> Source/WebCore/ChangeLog:4
> +        [EME][CDMProxy] Sort key status array lexicographically by key IDs
> +        https://bugs.webkit.org/show_bug.cgi?id=210659

Re-sorting each time we insert something gives us an almost pathologically inefficient data structure. We might get away with it if it is always very small.

I think a reasonable data structure here would be a vector that we keep sorted, using binary search for both searching for elements and to find where to insert a new element in the vector.

The standard library offers std::binary_search, which I believe can be used for both of these purposes, and Vector offers Vector::insert which can be used to place new elements in the correct place in the vector after using binary_search to find where to insert.

> Source/WebCore/ChangeLog:15
> +        (WebCore::KeyStore::add): We could use a HashSet here and keep it
> +        sorted by design, but we also have to keep references to keys from

HashSet is not an ordered collection. You may be thinking of non-hashed collections that are ordered, like std::set.

> Source/WebCore/ChangeLog:19
> +        not clear how HashSet<RefPtr<T>> could be made to compare with
> +        operator<(T) rather than operator<(RefPtr<T>) (pointer
> +        comparison). In addition to that, it's often required to search

HashSet doesn't use operator< at all. It uses hashing and equality.

> Source/WebCore/ChangeLog:23
> +        for a key with a Vector<uint8_t> directly. The HashSet API doesn't
> +        provide a way to search via predicate (like Vector), which means I
> +        can not feasibly lookup a Vector<uint8_t> if I'm essentially
> +        storing RefPtr<Vector<uint8_t>> in the set. Instead, a compromise

It’s straightforward to search a HashSet via predicate like you would with a Vector. It would be a loop like this:

    for (auto it = set.begin(), end = set.end(); it != end; ++it) {
        if (predicate(*it))
            return it;
    }

The reason this isn’t offered is that it’s an O(n) operation, and if doing operations like that, then we’d typically use a Vector instead. The set offers O(log n) membership checking, which is usually the reason we use it.
Comment 3 Charlie Turner 2020-04-19 15:54:23 PDT
Comment on attachment 396773 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396773&action=review

Thanks for the review, I feel I can keep the naive insert and resort in this context, but I should clarify why in the ChangeLog.

>> Source/WebCore/ChangeLog:4
>> +        https://bugs.webkit.org/show_bug.cgi?id=210659
> 
> Re-sorting each time we insert something gives us an almost pathologically inefficient data structure. We might get away with it if it is always very small.
> 
> I think a reasonable data structure here would be a vector that we keep sorted, using binary search for both searching for elements and to find where to insert a new element in the vector.
> 
> The standard library offers std::binary_search, which I believe can be used for both of these purposes, and Vector offers Vector::insert which can be used to place new elements in the correct place in the vector after using binary_search to find where to insert.

> We might get away with it if it is always very small.

I should have made it clearer that the collection in this context is typically very small. The common case in practice is for this collection to be of size 1 (one encryption key for all tracks). It's also less common for the collection to be size 2 (one key for audio and one for video) but cases where N > 2 are pathological in themselves, and so in practice the pathology wouldn't be seen. For N=1, the sort call is trivial, and for N=2 it's just a swap. I did consider keeping it sorted, but because of this context the extra complexity didn't seem worth it.

>> Source/WebCore/ChangeLog:15
>> +        sorted by design, but we also have to keep references to keys from
> 
> HashSet is not an ordered collection. You may be thinking of non-hashed collections that are ordered, like std::set.

I'll correct that mistake in the ChangeLog, I am thinking of the ordered variety.

>> Source/WebCore/ChangeLog:23
>> +        storing RefPtr<Vector<uint8_t>> in the set. Instead, a compromise
> 
> It’s straightforward to search a HashSet via predicate like you would with a Vector. It would be a loop like this:
> 
>     for (auto it = set.begin(), end = set.end(); it != end; ++it) {
>         if (predicate(*it))
>             return it;
>     }
> 
> The reason this isn’t offered is that it’s an O(n) operation, and if doing operations like that, then we’d typically use a Vector instead. The set offers O(log n) membership checking, which is usually the reason we use it.

Understood, I shouldn't have been so strong here in my explanation, but loops like that are definitely a warning sign I've picked the wrong data structure :)
Comment 4 Charlie Turner 2020-04-19 15:54:28 PDT
View in context: https://bugs.webkit.org/attachment.cgi?id=396773&action=review

Thanks for the review, I feel I can keep the naive insert and resort in this context, but I should clarify why in the ChangeLog.

>> Source/WebCore/ChangeLog:4
>> +        https://bugs.webkit.org/show_bug.cgi?id=210659
> 
> Re-sorting each time we insert something gives us an almost pathologically inefficient data structure. We might get away with it if it is always very small.
> 
> I think a reasonable data structure here would be a vector that we keep sorted, using binary search for both searching for elements and to find where to insert a new element in the vector.
> 
> The standard library offers std::binary_search, which I believe can be used for both of these purposes, and Vector offers Vector::insert which can be used to place new elements in the correct place in the vector after using binary_search to find where to insert.

> We might get away with it if it is always very small.

I should have made it clearer that the collection in this context is typically very small. The common case in practice is for this collection to be of size 1 (one encryption key for all tracks). It's also less common for the collection to be size 2 (one key for audio and one for video) but cases where N > 2 are pathological in themselves, and so in practice the pathology wouldn't be seen. For N=1, the sort call is trivial, and for N=2 it's just a swap. I did consider keeping it sorted, but because of this context the extra complexity didn't seem worth it.

>> Source/WebCore/ChangeLog:15
>> +        sorted by design, but we also have to keep references to keys from
> 
> HashSet is not an ordered collection. You may be thinking of non-hashed collections that are ordered, like std::set.

I'll correct that mistake in the ChangeLog, I am thinking of the ordered variety.

>> Source/WebCore/ChangeLog:23
>> +        storing RefPtr<Vector<uint8_t>> in the set. Instead, a compromise
> 
> It’s straightforward to search a HashSet via predicate like you would with a Vector. It would be a loop like this:
> 
>     for (auto it = set.begin(), end = set.end(); it != end; ++it) {
>         if (predicate(*it))
>             return it;
>     }
> 
> The reason this isn’t offered is that it’s an O(n) operation, and if doing operations like that, then we’d typically use a Vector instead. The set offers O(log n) membership checking, which is usually the reason we use it.

Understood, I shouldn't have been so strong here in my explanation, but loops like that are definitely a warning sign I've picked the wrong data structure :)
Comment 5 Charlie Turner 2020-04-21 08:23:58 PDT
Created attachment 397084 [details]
Patch
Comment 6 EWS 2020-04-23 07:50:52 PDT
Committed r260568: <https://trac.webkit.org/changeset/260568>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 397084 [details].