Bug 235070

Summary: Make ServiceWorkerClient.id a UUID instead of a string derived from a ScriptExecutionContextIdentifier
Product: WebKit Reporter: youenn fablet <youennf>
Component: Service WorkersAssignee: youenn fablet <youennf>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, benjamin, cdumez, cmarcelo, darin, esprehn+autocc, ews-watchlist, falken, japhet, kangil.han, kondapallykalyan, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 235107    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing
none
Patch
ews-feeder: commit-queue-
Patch
ews-feeder: commit-queue-
Patch none

Description youenn fablet 2022-01-11 08:35:50 PST
Make ServiceWorkerClient.id a UUID instead of a string derived from a ScriptExecutionContextIdentifier
Comment 1 youenn fablet 2022-01-11 08:45:40 PST
Created attachment 448850 [details]
Patch
Comment 2 Chris Dumez 2022-01-12 08:20:30 PST
Comment on attachment 448850 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=448850&action=review

> Source/WebCore/ChangeLog:8
> +        Make SWServer store the maping between internal ScriptExecutinoContextIdentifiers and visible identifiers.

Typos: maping and ScriptExecutinoContextIdentifiers.

The fact that we have to store a mapping is sad. It could for example get out of date.

> Source/WebCore/ChangeLog:9
> +        Visible identifiers are now generated as UUIDs, which is mapping what Chrome and Firefox are doing.

Why though? The spec [1] merely says "An opaque string that uniquely identifies this environment." so I think we should use whatever is convenient.

Note that there should not even be a concept of a ServiceWorkerClientIdentifier, per spec it has nothing specific to service workers and it is just an identifier for the environment (aka ScriptExecutionContext). As a matter of fact, the identifiers returned for the clients in the service worker API should match the identifiers returned for the clients in the Web Lock API.

[1] https://html.spec.whatwg.org/multipage/webappapis.html#concept-environment-id

> Source/WebCore/workers/service/server/SWServer.cpp:946
> +    auto visibleId = createCanonicalUUIDString();

As said earlier, client ids returned both by the service worker API and the Web Lock API (and may others) need to match. This is because it is just an environment identifier (defined in the HTML spec as an opaque string), nothing specific to service workers. Therefore, when I see new logic in Service worker code to generate client identifiers, it makes me worry.
Comment 3 Chris Dumez 2022-01-12 08:26:13 PST
Note that if we really want to use a UUID for some reason, an alternative would also be to use a UUID for ScriptExecutionContextIdentifier instead of a ProcessIdentified<ObjectIdentifier>.

I just don't think we should have logic specific to service workers here. I also worry of the fact that having to maintain a mapping between internal identifiers and visible identifiers could be error-prone and lead to new bugs.
Comment 4 Chris Dumez 2022-01-12 08:28:44 PST
(In reply to Chris Dumez from comment #3)
> Note that if we really want to use a UUID for some reason, an alternative
> would also be to use a UUID for ScriptExecutionContextIdentifier instead of
> a ProcessIdentified<ObjectIdentifier>.

Thinking about this more though, I think this would likely mean keeping some kind of mapping somewhere to match a given UUID to a particular process. Using a ProcessIdentified<ObjectIdentifier> makes it very convenient to find out which process it comes from. That's why I'd like to understand better the benefits of using a UUID compared to what we have now (which seems less complex / error-prone).

> 
> I just don't think we should have logic specific to service workers here. I
> also worry of the fact that having to maintain a mapping between internal
> identifiers and visible identifiers could be error-prone and lead to new
> bugs.
Comment 5 youenn fablet 2022-01-12 09:00:07 PST
(In reply to Chris Dumez from comment #4)
> (In reply to Chris Dumez from comment #3)
> > Note that if we really want to use a UUID for some reason, an alternative
> > would also be to use a UUID for ScriptExecutionContextIdentifier instead of
> > a ProcessIdentified<ObjectIdentifier>.
> 
> Thinking about this more though, I think this would likely mean keeping some
> kind of mapping somewhere to match a given UUID to a particular process.
> Using a ProcessIdentified<ObjectIdentifier> makes it very convenient to find
> out which process it comes from. That's why I'd like to understand better
> the benefits of using a UUID compared to what we have now (which seems less
> complex / error-prone).

I can see a few reasons:
- It makes it easier to handle the case of exposing an ID for a Document that is not yet created (see follow-up patch to implement resultingClientId).
- The current IDs are exposing to JavaScript internal implementation details that could be leveraged. For instance whether two contexts are living in the same process. This may be used to correlate third-party iframes more easily for instance. If a service worker gets corrupted, it also allows a service worker to more easily post message to iframes that are not clients in other processes.
- The spec says it should be an "opaque string", I do not think our current ID is matching the 'opaque' part of it. Firefox and Chrome are using some form of UUIDs.

I did not know that other specs were using those IDs.
I can rewrite the patch to externalise the map so that it can be used by other APIs.
Hopefully, these APIs can use a network process based map.
Comment 6 Chris Dumez 2022-01-12 09:03:21 PST
(In reply to youenn fablet from comment #5)
> (In reply to Chris Dumez from comment #4)
> > (In reply to Chris Dumez from comment #3)
> > > Note that if we really want to use a UUID for some reason, an alternative
> > > would also be to use a UUID for ScriptExecutionContextIdentifier instead of
> > > a ProcessIdentified<ObjectIdentifier>.
> > 
> > Thinking about this more though, I think this would likely mean keeping some
> > kind of mapping somewhere to match a given UUID to a particular process.
> > Using a ProcessIdentified<ObjectIdentifier> makes it very convenient to find
> > out which process it comes from. That's why I'd like to understand better
> > the benefits of using a UUID compared to what we have now (which seems less
> > complex / error-prone).
> 
> I can see a few reasons:
> - It makes it easier to handle the case of exposing an ID for a Document
> that is not yet created (see follow-up patch to implement resultingClientId).
> - The current IDs are exposing to JavaScript internal implementation details
> that could be leveraged. For instance whether two contexts are living in the
> same process. This may be used to correlate third-party iframes more easily
> for instance. If a service worker gets corrupted, it also allows a service
> worker to more easily post message to iframes that are not clients in other
> processes.
> - The spec says it should be an "opaque string", I do not think our current
> ID is matching the 'opaque' part of it. Firefox and Chrome are using some
> form of UUIDs.
> 
> I did not know that other specs were using those IDs.
> I can rewrite the patch to externalise the map so that it can be used by
> other APIs.
> Hopefully, these APIs can use a network process based map.

LayoutTests/imported/w3c/web-platform-tests/web-locks/clientids.tentative.https.html is supposed to test that ids returned by the service worker API and the Web Lock API are consistent. Your patch doesn't seem to introduce new test failures though, not sure why that is.
Comment 7 youenn fablet 2022-01-13 02:54:08 PST
Created attachment 449039 [details]
Patch
Comment 8 youenn fablet 2022-01-13 04:38:04 PST
Created attachment 449046 [details]
Patch
Comment 9 youenn fablet 2022-01-13 09:28:35 PST
Created attachment 449071 [details]
Patch
Comment 10 youenn fablet 2022-01-14 01:34:41 PST
Created attachment 449147 [details]
Patch
Comment 11 youenn fablet 2022-01-14 04:33:11 PST
Created attachment 449161 [details]
Patch
Comment 12 Chris Dumez 2022-01-14 08:11:39 PST
Comment on attachment 449161 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=449161&action=review

> Source/WebKit/ChangeLog:8
> +        Change findByClientIdentifier into findByVisibleClientIdentifieran and make it asn async reply.

typo: asn

> Source/WebCore/platform/ScriptExecutionContextIdentifier.h:35
> +class ProcessQualified<UUID> {

I will note that I find it a bit weird to have a ProcessQualified<UUID> and a UUID is already globally unique (and not process local).

> Source/WebCore/platform/ScriptExecutionContextIdentifier.h:98
> +    String toVisibleString() const { return m_object.toString(); }

It is related to the comment above but it is a bit confusing that we have 2 types of Strings now. It seems the Visible string (UUID) should be sufficient everywhere.

> Source/WebCore/platform/ScriptExecutionContextIdentifier.h:121
> +using ScriptExecutionContextIdentifier = ProcessQualified<UUID>;

Again, I don't really understand why the ScriptExecutionContextIdentifier needs to be ProcessQualified<> at all after your changes to use a UUID.

> Source/WebCore/testing/Internals.cpp:2767
> +    return document.identifier().object().toString();

document.identifier().toVisibleString() ?
Comment 13 youenn fablet 2022-01-14 08:41:36 PST
> I will note that I find it a bit weird to have a ProcessQualified<UUID> and
> a UUID is already globally unique (and not process local).

Internally, we often need to know the WebProcess ID where lives the corresponding context. We could go with a simple UUID (or maybe a typed UUID) but we would probably need each time to query a UUID -> ProcessID map either in UIProcess and/or NetworkProcess, potentially on various threads.
Comment 14 youenn fablet 2022-01-14 08:42:44 PST
> It is related to the comment above but it is a bit confusing that we have 2
> types of Strings now. It seems the Visible string (UUID) should be
> sufficient everywhere.

The other string is for debug purposes where it might be useful to know the process ID of a given context.
Comment 15 Chris Dumez 2022-01-14 08:43:42 PST
(In reply to youenn fablet from comment #13)
> > I will note that I find it a bit weird to have a ProcessQualified<UUID> and
> > a UUID is already globally unique (and not process local).
> 
> Internally, we often need to know the WebProcess ID where lives the
> corresponding context. We could go with a simple UUID (or maybe a typed
> UUID) but we would probably need each time to query a UUID -> ProcessID map
> either in UIProcess and/or NetworkProcess, potentially on various threads.

Yes, it is true. Give me a few minutes to take another at the patch.
Comment 16 Chris Dumez 2022-01-14 08:48:51 PST
(In reply to youenn fablet from comment #13)
> > I will note that I find it a bit weird to have a ProcessQualified<UUID> and
> > a UUID is already globally unique (and not process local).
> 
> Internally, we often need to know the WebProcess ID where lives the
> corresponding context. We could go with a simple UUID (or maybe a typed
> UUID) but we would probably need each time to query a UUID -> ProcessID map
> either in UIProcess and/or NetworkProcess, potentially on various threads.

I was hoping we wouldn't need such a map because both the network process and the UIProcess know which process the UUID came from (because they get it from a particular IPC connection).

As a matter of fact, both the UIProcess and NetworkProcess cannot really trust the ProcessID sent by the WebProcess as part of this ScriptExecutionIdentifier and have to validate it already (since IPC from the WebProcess cannot be trusted).

That said, my bet is that once the network process or UIProcess may need to know later on (not right when receiving the IPC) where the UUID came from and at that point the information would indeed be lost.
Comment 17 Chris Dumez 2022-01-14 09:24:21 PST
Comment on attachment 449161 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=449161&action=review

r=me with some simplification ideas. I think the patch adds a bit more complexity than we really need.

> Source/WebCore/platform/ProcessQualified.h:72
> +    const T& objectHash() const

Shouldn't be needed, see comment below.

> Source/WebCore/platform/ProcessQualified.h:134
> +    add(hasher, processQualified.objectHash());

I don't think we should need to add this objectHash(), we should probably just add this to UUID.h:
inline void add(Hasher& hasher, const UUID& uuid)
{
    add(hasher, uuid());
}

> Source/WebCore/platform/ScriptExecutionContextIdentifier.h:70
> +    unsigned objectHash() const

Shouldn't be needed, see comment above.

> Source/WebCore/platform/ScriptExecutionContextIdentifier.h:97
> +    String toDebugString() const { return makeString(m_processIdentifier.toUInt64(), '-', m_object.toString()); }

I personally don't think it is worth the extra mental complexity to have 2 different kinds of strings, just to print the process identifier in some cases. I think having only a single string (that is the "visible" string) would be better.

> Source/WebCore/testing/Internals.idl:936
> +    DOMString serviceWorkerClientInternalIdentifier(Document document);

Why do we need to expose "internal" identifiers to the "Web" (I understand it is just tests). Why cannot tests rely on regular client identifiers (UUIDs)?
Comment 18 youenn fablet 2022-01-16 23:14:58 PST
(In reply to Chris Dumez from comment #17)
> Comment on attachment 449161 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=449161&action=review
> 
> r=me with some simplification ideas. I think the patch adds a bit more
> complexity than we really need.
> 
> > Source/WebCore/platform/ProcessQualified.h:72
> > +    const T& objectHash() const
> 
> Shouldn't be needed, see comment below.
> 
> > Source/WebCore/platform/ProcessQualified.h:134
> > +    add(hasher, processQualified.objectHash());
> 
> I don't think we should need to add this objectHash(), we should probably
> just add this to UUID.h:
> inline void add(Hasher& hasher, const UUID& uuid)
> {
>     add(hasher, uuid());
> }
> 
> > Source/WebCore/platform/ScriptExecutionContextIdentifier.h:70
> > +    unsigned objectHash() const
> 
> Shouldn't be needed, see comment above.

Will simplify this.

> > Source/WebCore/platform/ScriptExecutionContextIdentifier.h:97
> > +    String toDebugString() const { return makeString(m_processIdentifier.toUInt64(), '-', m_object.toString()); }
> 
> I personally don't think it is worth the extra mental complexity to have 2
> different kinds of strings, just to print the process identifier in some
> cases. I think having only a single string (that is the "visible" string)
> would be better.

toDebugString is a ProcessQualified concept.
I can see whether we can remove it but in general I prefer we keep methods defined in template.
Comment 19 youenn fablet 2022-01-16 23:42:43 PST
Created attachment 449311 [details]
Patch for landing
Comment 20 EWS 2022-01-17 02:50:49 PST
Committed r288093 (246107@main): <https://commits.webkit.org/246107@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 449311 [details].
Comment 21 Radar WebKit Bug Importer 2022-01-17 02:51:20 PST
<rdar://problem/87673540>
Comment 22 Darin Adler 2022-01-17 09:48:05 PST
Comment on attachment 449311 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=449311&action=review

I spotted a mistake and some ares we could improve coding style. I’d be willing to do this refinement if you are busy on other things. Just let me know if you prefer that

> Source/WTF/wtf/UUID.cpp:55
> +    // We sanitize the value so as not loosing any data when going to Version 4 UUIDs.

"loosing" -> "losing", but also the grammar here is wrong. Grammatical version would be:

    // We sanitize the value so we do not lose any data when converting this to a Version 4 UUID.

But also, the comment doesn’t seem right. Is this being converted to a Version 4 UUID? What exactly *is* this code doing? Changing it from a random 16 byte value into a valid random UUID? Why is clearing these 6 bits sufficient?

> Source/WTF/wtf/UUID.cpp:102
> +    unsigned y = value[19];
> +    if (y == '8' || y == '9')
> +        y = y - '8';
> +    else if (y == 'a' || y == 'A')
> +        y = 2;
> +    else if (y == 'b' || y == 'B')
> +        y = 3;
> +    else
> +        return { };

I think it’s a little strange to use "y" both for the character and for the number. Why do this operation "in place"?

But also, why do we need "y" at all? Why not just parse the hex numbers, and do any masking or validity checks on the values after they are parsed, instead of working directly on the characters?

> Source/WTF/wtf/UUID.cpp:104
> +    auto firstValue = parseInteger<unsigned>(value.substring(0, 8), 16);

This use, and the other 5 calls below are all wrong, because they will allow leading "+". Simplest solution is to write a wrapper that checks for the "+" and fails if it’s present. Fancier solution would be to add a variant of parseInteger that does not allow the "+", as mentioned in the FIXME In StringToIntegerConversion.h. Trickiest part is probably just deciding what to name such a function.

> Source/WTF/wtf/UUID.cpp:134
> +    auto* data = reinterpret_cast<unsigned*>(&uuidValue);
> +    data[0] = *firstValue;
> +    data[1] = (*secondValue << 16) | *thirdValue;
> +    data[2] = ((*fourthValue << 16) | *fifthValue) + (y << 30);
> +    data[3] = *sixthValue;

This should be rewritten without the reinterpret_cast; just generally a bad practice to do it if not necessary. There’s no reason we can’t use uint64_t and shifting instead. Could just make a function that takes four uint32_t and assembles them into a UInt128 if the idiom seems too ugly written out in line.

> Source/WTF/wtf/UUID.cpp:136
> +    return UUID(WTFMove(uuidValue));

No reason to use WTFMove on a scalar.

> Source/WTF/wtf/UUID.h:131
> +    add(hasher, uuid.hash());

It’s not good to hash, and then hash the hash. Instead this function should add m_data to the Hasher. Some details will need to be worked out, but that’s the general best approach.

> Source/WebCore/platform/ScriptExecutionContextIdentifier.h:40
> +        : m_object(WTF::UInt128(0))

Would be better style to do this with a default value below and use "= default". Also, why do we need to use UInt128 directly to get a zeroed UUID? Can we change the UUID class so that’s easier to do?

> Source/WebCore/platform/ScriptExecutionContextIdentifier.h:48
> +    ProcessQualified(UUID&& object, ProcessIdentifier processIdentifier)
> +        : m_object(WTFMove(object))
> +        , m_processIdentifier(processIdentifier)
> +    {
> +    }

This isn’t valuable. Since a UUID is a scalar, there’s no need to move it. Should just drop this.

> Tools/TestWebKitAPI/Tests/WTF/UUID.cpp:43
> +    value = UUID::parse("12345678-9abc-5de0-89AB-0123456789ab");

We can add test cases here that show the "+" bug.
Comment 23 youenn fablet 2022-01-18 03:45:29 PST
Reopening to attach new patch.
Comment 24 youenn fablet 2022-01-18 03:45:33 PST
Created attachment 449374 [details]
Patch
Comment 25 youenn fablet 2022-01-18 03:51:45 PST
(In reply to Darin Adler from comment #22)
> Comment on attachment 449311 [details]
> Patch for landing

Thanks for the review, I updated the patch accordingly.
Also, I made sure we cannot parse specific values like empty or deleted.
We should probably add these same checks for the decoder routine as well.

>     // We sanitize the value so we do not lose any data when converting this
> to a Version 4 UUID.
> 
> But also, the comment doesn’t seem right. Is this being converted to a
> Version 4 UUID? What exactly *is* this code doing? Changing it from a random
> 16 byte value into a valid random UUID? Why is clearing these 6 bits
> sufficient?

When stringifying a UUID a, we loose the values of these 6 bits.
When parsing a's stringified representation into a new UUID b, we want a == b.
Either we update == to ignore these 6 bits or we make sure that these 6 bits have fixed values.
This code is doing the latter.
Comment 26 youenn fablet 2022-01-18 03:55:50 PST
Created attachment 449376 [details]
Patch
Comment 27 youenn fablet 2022-01-18 04:30:47 PST
Created attachment 449379 [details]
Patch
Comment 28 EWS 2022-01-18 08:17:48 PST
Committed r288116 (246130@main): <https://commits.webkit.org/246130@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 449379 [details].
Comment 29 Darin Adler 2022-01-18 13:29:57 PST
(In reply to youenn fablet from comment #25)
> When stringifying a UUID a, we loose the values of these 6 bits.
> When parsing a's stringified representation into a new UUID b, we want a == b.
> Either we update == to ignore these 6 bits or we make sure that these 6 bits
> have fixed values.
> This code is doing the latter.

But is that enough to make a valid UUID. I had the impression there were some special requirements to just turn random bits into a valid UUID, rather than just clearing a few of the bits. Like a pattern that says "this is a random UUID" to guarantee no conflicts with non-random UUIDs.
Comment 30 youenn fablet 2022-01-18 14:02:44 PST
(In reply to Darin Adler from comment #29)
> (In reply to youenn fablet from comment #25)
> > When stringifying a UUID a, we loose the values of these 6 bits.
> > When parsing a's stringified representation into a new UUID b, we want a == b.
> > Either we update == to ignore these 6 bits or we make sure that these 6 bits
> > have fixed values.
> > This code is doing the latter.
> 
> But is that enough to make a valid UUID. I had the impression there were
> some special requirements to just turn random bits into a valid UUID, rather
> than just clearing a few of the bits. Like a pattern that says "this is a
> random UUID" to guarantee no conflicts with non-random UUIDs.

You are probably right, I'll look at the RFC tomorrow.
Comment 31 youenn fablet 2022-01-19 00:52:54 PST
I looked at https://datatracker.ietf.org/doc/html/rfc4122#section-4.4 and what we are doing is okayish: we are setting randomly the whole UUID but 4+2 bits.

The 4 bits generate the '4' and the two bits the '8', '9', 'a', or 'b'.
We could change the sanitisation to actually set these 6 bits to 0100 and 10, instead of using 0000 and 00, which would simplify the stringification routine.
The internal bit representation would perfectly match the stringified version and the spec.

We could also make it clear UUID::create and UUID::parse are dealing with UUIDs v4 only.
Comment 32 Darin Adler 2022-01-19 09:55:48 PST
(In reply to youenn fablet from comment #31)
> We could change the sanitisation to actually set these 6 bits to 0100 and
> 10, instead of using 0000 and 00, which would simplify the stringification
> routine.

Yes, great idea. I think that would just be two more lines of code in UUID::create.

> We could also make it clear UUID::create and UUID::parse are dealing with
> UUIDs v4 only.

Yes, I think we should rename them to be more specific.

Renaming create to make it clear what kind of UUID it creates seems good. We can just call it "v4" or whatever term makes it clear that it’s the cryptographically-random type of UUID and it creates a new unique one.

For UUID::parse, not sure why we wrote one that only works with v4 UUIDs. We might want a UUID::parse that can parse any valid UUID, but which will still refuse to parse ones that conflict with our empty and deleted values for hash tables, which I hope we carefully selected so they are not valid UUIDs.

Or maybe we do want to reject non-v4 UUIDs. That’s OK with me, but I’m not sure why we would would want to do that.
Comment 33 youenn fablet 2022-01-19 10:02:55 PST
Some specs require UUID v4, like the mDNS ICE candidate spec so it is good to support parsing and creation of V4
Comment 34 Darin Adler 2022-01-19 10:05:53 PST
Another way to do this is we could use a different class to hold UUIDs that we assert are v4. I don’t think a class named UUID should have that invariant, though.
Comment 35 youenn fablet 2022-01-21 00:03:16 PST
Follow-up work at https://bugs.webkit.org/show_bug.cgi?id=235430 to clarify v4 use
Comment 36 Sam Sneddon [:gsnedders] 2022-06-06 06:38:32 PDT
*** Bug 190313 has been marked as a duplicate of this bug. ***