Bug 212486 - Use correct encoding when converting a WTF::URL to CFURLRef
Summary: Use correct encoding when converting a WTF::URL to CFURLRef
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-05-28 13:30 PDT by Alex Christensen
Modified: 2020-06-24 10:06 PDT (History)
6 users (show)

See Also:


Attachments
Patch (8.72 KB, patch)
2020-05-28 13:32 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (8.17 KB, patch)
2020-05-28 17:45 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (8.56 KB, patch)
2020-05-29 10:35 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (8.34 KB, patch)
2020-05-29 23:14 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2020-05-28 13:30:14 PDT
Use correct encoding when converting a WTF::URL to CFURLRef
Comment 1 Alex Christensen 2020-05-28 13:32:47 PDT
Created attachment 400500 [details]
Patch
Comment 2 Darin Adler 2020-05-28 13:41:10 PDT
Comment on attachment 400500 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=400500&action=review

> Source/WTF/wtf/cf/CFURLExtras.cpp:34
> +RetainPtr<CFURLRef> createCFURLFromBuffer(const char* data, size_t size, URLBufferEncoding encoding, CFURLRef baseURL)

Why do we need a shared function for this at all? Why not call CFURLCreateAbsoluteURLWithBytes directly at call sites?

> Source/WTF/wtf/cf/CFURLExtras.cpp:45
> +        ASSERT(StringView(data, size).isAllASCII());

The one place this is used is decoding URLs in ArgumentCoders. So it seems the assertion is "trusting" data that came cross process. Would we normally assert something if the guarantee comes from the other side of an XPC boundary?

> Source/WTF/wtf/cf/CFURLExtras.h:38
> +enum class URLBufferEncoding : uint8_t { Latin1, UTF8, ASCII };
> +WTF_EXPORT_PRIVATE RetainPtr<CFURLRef> createCFURLFromBuffer(const char*, size_t, URLBufferEncoding, CFURLRef baseURL = nullptr);

Maybe 3 separate functions instead of an enum? Or no functions at all? Just move the code to there call sites.

Note that we never use UTF-8 or Latin-1 with a base URL.

> Tools/TestWebKitAPI/Tests/WTF/cocoa/URLExtras.mm:221
> +    std::array<LChar, 3> latin1 { 0xC2, 0xB6, 0x00 };
> +    WTF::URL url4(URL(), String(latin1.data()));
> +    EXPECT_FALSE(url4.isValid());
> +    EXPECT_TRUE(url4.string().is8Bit());
> +    EXPECT_STREQ([[url4 absoluteString] UTF8String], "%C3%82%C2%B6");

This is a behavior change. While the consistency with the 16-bit form of the same string is a plus, I am concerned about whether any code relied on the old "if it looks like UTF-8 treat it like UTF-8" behavior. I hope not but not sure how to verify.
Comment 3 Alex Christensen 2020-05-28 17:22:04 PDT
This is such an edge case, I'm not worried about the intentional functional change.  This is only for non-ASCII URLs that are invalid according to WTF::URL but valid according to NSURL.  The only record we have of this code path ever having been hit is from a security researcher trying to hit new code paths, and we have no tests so it must've been unimportant in the past.
I'll try your suggestions.
Comment 4 Alex Christensen 2020-05-28 17:45:52 PDT
Created attachment 400529 [details]
Patch
Comment 5 Darin Adler 2020-05-29 10:11:59 PDT
Comment on attachment 400529 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=400529&action=review

> Source/WebKit/Shared/cf/ArgumentCodersCF.cpp:642
> +    result = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(urlBytes.data()), urlBytes.size(), kCFStringEncodingASCII, baseURL.get(), true));

Should we add a check that none of the bytes have their high bit set, and return false if so?

> Tools/TestWebKitAPI/Tests/WTF/cocoa/URLExtras.mm:221
> +    EXPECT_STREQ([[url4 absoluteString] UTF8String], "%C3%82%C2%B6");

Looks like this test is failing on the bot; please fix before landing.
Comment 6 Alex Christensen 2020-05-29 10:29:05 PDT
(In reply to Darin Adler from comment #5)
> Comment on attachment 400529 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=400529&action=review
> 
> > Source/WebKit/Shared/cf/ArgumentCodersCF.cpp:642
> > +    result = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(urlBytes.data()), urlBytes.size(), kCFStringEncodingASCII, baseURL.get(), true));
> 
> Should we add a check that none of the bytes have their high bit set, and
> return false if so?
I think that would be an unnecessary slowdown.  If a compromised process wants to send a different URL through IPC, they can do so directly without playing any encoding games.

> > Tools/TestWebKitAPI/Tests/WTF/cocoa/URLExtras.mm:221
> > +    EXPECT_STREQ([[url4 absoluteString] UTF8String], "%C3%82%C2%B6");
> 
> Looks like this test is failing on the bot; please fix before landing.

It looks like the behavior of CFURLCreateAbsoluteURLWithBytes with kCFStringEncodingISOLatin1 changed between Mojave and Catalina, which I verified with this program:

#include <array>
#include <CoreFoundation/CFURL.h>
#include <Foundation/Foundation.h>
int main()
{
    std::array<UInt8, 3> bytes { 0xc2, 0xb6, 0x00 };
    CFURLRef url = CFURLCreateAbsoluteURLWithBytes(nullptr, bytes.data(), 2, kCFStringEncodingISOLatin1, nullptr, true);
    NSLog(@"%@", url);
    CFRelease(url);
}

On Mojave it prints %C2%B6 and on Catalina is prints %C3%82%C2%B6
So I guess that behavior is expected on Mojave.  I'll make platform-specific expectations on the test.
Comment 7 Darin Adler 2020-05-29 10:32:49 PDT
(In reply to Alex Christensen from comment #6)
> (In reply to Darin Adler from comment #5)
> > Should we add a check that none of the bytes have their high bit set, and
> > return false if so?
> I think that would be an unnecessary slowdown.

OK.

I think we should better document our strategy on what to check in IPC. Until now I thought that generally consistency checks were valuable even if they weren’t obviously directly exploitable; instead it seems that an accurate explanation of our philosophy is that certain consistency checks are important, but others gratuitous and unimportant. You’ve pointed that out to both Kilzer about enumerations and me about these URL characters. This makes sense to me, but I would like this to be crystal clear for contributors.

> It looks like the behavior of CFURLCreateAbsoluteURLWithBytes with
> kCFStringEncodingISOLatin1 changed between Mojave and Catalina

And, just to confirm, you are saying it’s OK that WebKit will change behavior because this is an unimportant edge case and either behavior is OK on Mojave?
Comment 8 Alex Christensen 2020-05-29 10:35:06 PDT
(In reply to Darin Adler from comment #7)
> And, just to confirm, you are saying it’s OK that WebKit will change
> behavior because this is an unimportant edge case and either behavior is OK
> on Mojave?
Correct.
Comment 9 Alex Christensen 2020-05-29 10:35:35 PDT
Created attachment 400599 [details]
Patch
Comment 10 Darin Adler 2020-05-29 10:36:17 PDT
Comment on attachment 400599 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=400599&action=review

> Source/WTF/wtf/cocoa/URLCocoa.mm:78
> +        WTFLogAlways("BEGIN");
> +        for (size_t i = 0; i < utf8.length(); i++)
> +            WTFLogAlways("BYTE %x", utf8.data()[i]);

Should re-upload without this.
Comment 11 Alex Christensen 2020-05-29 23:14:57 PDT
Created attachment 400651 [details]
Patch
Comment 12 EWS 2020-05-29 23:59:25 PDT
Committed r262341: <https://trac.webkit.org/changeset/262341>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 400651 [details].
Comment 13 Radar WebKit Bug Importer 2020-05-30 00:00:26 PDT
<rdar://problem/63785975>
Comment 14 Alex Christensen 2020-06-24 10:06:32 PDT
Comment on attachment 400651 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=400651&action=review

> Source/WebKit/Shared/cf/ArgumentCodersCF.cpp:642
> +    result = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(urlBytes.data()), urlBytes.size(), kCFStringEncodingASCII, baseURL.get(), true));

This wasn't so great after all.  See https://bugs.webkit.org/show_bug.cgi?id=213565