RESOLVED FIXED 212486
Use correct encoding when converting a WTF::URL to CFURLRef
https://bugs.webkit.org/show_bug.cgi?id=212486
Summary Use correct encoding when converting a WTF::URL to CFURLRef
Alex Christensen
Reported 2020-05-28 13:30:14 PDT
Use correct encoding when converting a WTF::URL to CFURLRef
Attachments
Patch (8.72 KB, patch)
2020-05-28 13:32 PDT, Alex Christensen
no flags
Patch (8.17 KB, patch)
2020-05-28 17:45 PDT, Alex Christensen
no flags
Patch (8.56 KB, patch)
2020-05-29 10:35 PDT, Alex Christensen
no flags
Patch (8.34 KB, patch)
2020-05-29 23:14 PDT, Alex Christensen
no flags
Alex Christensen
Comment 1 2020-05-28 13:32:47 PDT
Darin Adler
Comment 2 2020-05-28 13:41:10 PDT
Comment on attachment 400500 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=400500&action=review > Source/WTF/wtf/cf/CFURLExtras.cpp:34 > +RetainPtr<CFURLRef> createCFURLFromBuffer(const char* data, size_t size, URLBufferEncoding encoding, CFURLRef baseURL) Why do we need a shared function for this at all? Why not call CFURLCreateAbsoluteURLWithBytes directly at call sites? > Source/WTF/wtf/cf/CFURLExtras.cpp:45 > + ASSERT(StringView(data, size).isAllASCII()); The one place this is used is decoding URLs in ArgumentCoders. So it seems the assertion is "trusting" data that came cross process. Would we normally assert something if the guarantee comes from the other side of an XPC boundary? > Source/WTF/wtf/cf/CFURLExtras.h:38 > +enum class URLBufferEncoding : uint8_t { Latin1, UTF8, ASCII }; > +WTF_EXPORT_PRIVATE RetainPtr<CFURLRef> createCFURLFromBuffer(const char*, size_t, URLBufferEncoding, CFURLRef baseURL = nullptr); Maybe 3 separate functions instead of an enum? Or no functions at all? Just move the code to there call sites. Note that we never use UTF-8 or Latin-1 with a base URL. > Tools/TestWebKitAPI/Tests/WTF/cocoa/URLExtras.mm:221 > + std::array<LChar, 3> latin1 { 0xC2, 0xB6, 0x00 }; > + WTF::URL url4(URL(), String(latin1.data())); > + EXPECT_FALSE(url4.isValid()); > + EXPECT_TRUE(url4.string().is8Bit()); > + EXPECT_STREQ([[url4 absoluteString] UTF8String], "%C3%82%C2%B6"); This is a behavior change. While the consistency with the 16-bit form of the same string is a plus, I am concerned about whether any code relied on the old "if it looks like UTF-8 treat it like UTF-8" behavior. I hope not but not sure how to verify.
Alex Christensen
Comment 3 2020-05-28 17:22:04 PDT
This is such an edge case, I'm not worried about the intentional functional change. This is only for non-ASCII URLs that are invalid according to WTF::URL but valid according to NSURL. The only record we have of this code path ever having been hit is from a security researcher trying to hit new code paths, and we have no tests so it must've been unimportant in the past. I'll try your suggestions.
Alex Christensen
Comment 4 2020-05-28 17:45:52 PDT
Darin Adler
Comment 5 2020-05-29 10:11:59 PDT
Comment on attachment 400529 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=400529&action=review > Source/WebKit/Shared/cf/ArgumentCodersCF.cpp:642 > + result = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(urlBytes.data()), urlBytes.size(), kCFStringEncodingASCII, baseURL.get(), true)); Should we add a check that none of the bytes have their high bit set, and return false if so? > Tools/TestWebKitAPI/Tests/WTF/cocoa/URLExtras.mm:221 > + EXPECT_STREQ([[url4 absoluteString] UTF8String], "%C3%82%C2%B6"); Looks like this test is failing on the bot; please fix before landing.
Alex Christensen
Comment 6 2020-05-29 10:29:05 PDT
(In reply to Darin Adler from comment #5) > Comment on attachment 400529 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=400529&action=review > > > Source/WebKit/Shared/cf/ArgumentCodersCF.cpp:642 > > + result = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(urlBytes.data()), urlBytes.size(), kCFStringEncodingASCII, baseURL.get(), true)); > > Should we add a check that none of the bytes have their high bit set, and > return false if so? I think that would be an unnecessary slowdown. If a compromised process wants to send a different URL through IPC, they can do so directly without playing any encoding games. > > Tools/TestWebKitAPI/Tests/WTF/cocoa/URLExtras.mm:221 > > + EXPECT_STREQ([[url4 absoluteString] UTF8String], "%C3%82%C2%B6"); > > Looks like this test is failing on the bot; please fix before landing. It looks like the behavior of CFURLCreateAbsoluteURLWithBytes with kCFStringEncodingISOLatin1 changed between Mojave and Catalina, which I verified with this program: #include <array> #include <CoreFoundation/CFURL.h> #include <Foundation/Foundation.h> int main() { std::array<UInt8, 3> bytes { 0xc2, 0xb6, 0x00 }; CFURLRef url = CFURLCreateAbsoluteURLWithBytes(nullptr, bytes.data(), 2, kCFStringEncodingISOLatin1, nullptr, true); NSLog(@"%@", url); CFRelease(url); } On Mojave it prints %C2%B6 and on Catalina is prints %C3%82%C2%B6 So I guess that behavior is expected on Mojave. I'll make platform-specific expectations on the test.
Darin Adler
Comment 7 2020-05-29 10:32:49 PDT
(In reply to Alex Christensen from comment #6) > (In reply to Darin Adler from comment #5) > > Should we add a check that none of the bytes have their high bit set, and > > return false if so? > I think that would be an unnecessary slowdown. OK. I think we should better document our strategy on what to check in IPC. Until now I thought that generally consistency checks were valuable even if they weren’t obviously directly exploitable; instead it seems that an accurate explanation of our philosophy is that certain consistency checks are important, but others gratuitous and unimportant. You’ve pointed that out to both Kilzer about enumerations and me about these URL characters. This makes sense to me, but I would like this to be crystal clear for contributors. > It looks like the behavior of CFURLCreateAbsoluteURLWithBytes with > kCFStringEncodingISOLatin1 changed between Mojave and Catalina And, just to confirm, you are saying it’s OK that WebKit will change behavior because this is an unimportant edge case and either behavior is OK on Mojave?
Alex Christensen
Comment 8 2020-05-29 10:35:06 PDT
(In reply to Darin Adler from comment #7) > And, just to confirm, you are saying it’s OK that WebKit will change > behavior because this is an unimportant edge case and either behavior is OK > on Mojave? Correct.
Alex Christensen
Comment 9 2020-05-29 10:35:35 PDT
Darin Adler
Comment 10 2020-05-29 10:36:17 PDT
Comment on attachment 400599 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=400599&action=review > Source/WTF/wtf/cocoa/URLCocoa.mm:78 > + WTFLogAlways("BEGIN"); > + for (size_t i = 0; i < utf8.length(); i++) > + WTFLogAlways("BYTE %x", utf8.data()[i]); Should re-upload without this.
Alex Christensen
Comment 11 2020-05-29 23:14:57 PDT
EWS
Comment 12 2020-05-29 23:59:25 PDT
Committed r262341: <https://trac.webkit.org/changeset/262341> All reviewed patches have been landed. Closing bug and clearing flags on attachment 400651 [details].
Radar WebKit Bug Importer
Comment 13 2020-05-30 00:00:26 PDT
Alex Christensen
Comment 14 2020-06-24 10:06:32 PDT
Comment on attachment 400651 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=400651&action=review > Source/WebKit/Shared/cf/ArgumentCodersCF.cpp:642 > + result = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(urlBytes.data()), urlBytes.size(), kCFStringEncodingASCII, baseURL.get(), true)); This wasn't so great after all. See https://bugs.webkit.org/show_bug.cgi?id=213565
Note You need to log in before you can comment on or make changes to this bug.