Bug 214314 - REGRESSION(r262341) URL::createCFURL should produce a CFURL that uses UTF-8 to decode its percent-encoded sequences
Summary: REGRESSION(r262341) URL::createCFURL should produce a CFURL that uses UTF-8 t...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-07-14 12:32 PDT by Alex Christensen
Modified: 2021-02-19 14:23 PST (History)
7 users (show)

See Also:


Attachments
Patch (5.24 KB, patch)
2020-07-14 13:05 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (6.87 KB, patch)
2020-07-14 14:00 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (6.61 KB, patch)
2020-07-14 15:58 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2020-07-14 12:32:24 PDT
REGRESSION(r262341) URL::createCFURL should produce a CFURL that uses UTF-8 to decode its percent-encoded sequences
Comment 1 Alex Christensen 2020-07-14 13:05:06 PDT
Created attachment 404268 [details]
Patch
Comment 2 Alex Christensen 2020-07-14 14:00:39 PDT
Created attachment 404281 [details]
Patch
Comment 3 Darin Adler 2020-07-14 14:46:21 PDT
Comment on attachment 404281 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=404281&action=review

> Source/WTF/wtf/cf/URLCF.cpp:56
> +struct PartialCFURL {
> +    uintptr_t unused1;
> +    uintptr_t unused2;
> +    uint32_t unused3;
> +    CFStringEncoding encoding;
> +};

I suppose this is OK for the short term, but for the future need CFURL API or at *least* SPI. Please make *sure* you get the ball rolling on that. This seems like an accident waiting to happen! I’d prefer that this technique be put into its own separate named function, not be done inline in the middle of URL::createCFURL. If we knew what the API/SPI was going to be, maybe we could name our function based on that.

But I have a suggestion that allows us to avoid this whole mess, with a small additional performance cost.

> Source/WTF/wtf/cf/URLCF.cpp:62
> +    if (LIKELY(m_string.is8Bit())) {
>          cfURL = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(m_string.characters8()), m_string.length(), kCFStringEncodingISOLatin1, nullptr, true));

Instead of the change here, I propose we instead do this:

    if (LIKELY(m_string.is8Bit() && m_string.isAllASCII()))
        cfURL = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(m_string.characters8()), m_string.length(), kCFStringEncodingUTF8, nullptr, true));
    else
        ...

Later if we get some API or SPI, we can optimize further using that, but we don’t need to do this change just to get correct behavior.
Comment 4 Alex Christensen 2020-07-14 15:57:40 PDT
Comment on attachment 404281 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=404281&action=review

>> Source/WTF/wtf/cf/URLCF.cpp:62
>>          cfURL = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(m_string.characters8()), m_string.length(), kCFStringEncodingISOLatin1, nullptr, true));
> 
> Instead of the change here, I propose we instead do this:
> 
>     if (LIKELY(m_string.is8Bit() && m_string.isAllASCII()))
>         cfURL = adoptCF(CFURLCreateAbsoluteURLWithBytes(nullptr, reinterpret_cast<const UInt8*>(m_string.characters8()), m_string.length(), kCFStringEncodingUTF8, nullptr, true));
>     else
>         ...
> 
> Later if we get some API or SPI, we can optimize further using that, but we don’t need to do this change just to get correct behavior.

That works great, and doesn't do horrible things like my original patch does.  ASCII checks are considerably faster than parsing URLs, so I don't anticipate the perf hit to be too bad.
Comment 5 Alex Christensen 2020-07-14 15:58:13 PDT
Created attachment 404299 [details]
Patch
Comment 6 EWS 2020-07-14 16:37:03 PDT
Committed r264382: <https://trac.webkit.org/changeset/264382>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 404299 [details].
Comment 7 Radar WebKit Bug Importer 2020-07-14 16:38:14 PDT
<rdar://problem/65572000>