Bug 194213 - [GLib] Stop URI-escaping file system representations
Summary: [GLib] Stop URI-escaping file system representations
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Zan Dobersek
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-02-03 23:00 PST by Zan Dobersek
Modified: 2019-02-05 02:50 PST (History)
9 users (show)

See Also:


Attachments
Patch (16.31 KB, patch)
2019-02-03 23:18 PST, Zan Dobersek
no flags Details | Formatted Diff | Diff
Patch for landing (16.66 KB, patch)
2019-02-05 02:48 PST, Zan Dobersek
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zan Dobersek 2019-02-03 23:00:07 PST
URI-escaping when creating strings from filesystem representations (and unescaping when creating FS representations from strings) is currently breaking the IndexedDB database size calculation. It's also different behavior than what is done in POSIX and CF implementations of FileSystem::stringFromFileSystemRepresentation() and FileSystem::fileSystemRepresentation().
Comment 1 Zan Dobersek 2019-02-03 23:18:40 PST
Created attachment 361041 [details]
Patch
Comment 2 Carlos Garcia Campos 2019-02-04 00:26:47 PST
Comment on attachment 361041 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361041&action=review

> Source/WTF/wtf/glib/FileSystemGlib.cpp:66
> +    GUniquePtr<gchar> utf8(g_convert(representation, representationLength,
> +        "UTF-8", filenameCharsets[0], nullptr, &utf8Length, nullptr));
> +    if (!utf8)
> +        return { };

An error here would be difficult to catch, I think, I would get the error and show a message with WTFLogAlways.

> Source/WTF/wtf/glib/FileSystemGlib.cpp:88
> +    GUniquePtr<gchar> representation(g_convert(utf8.data(), utf8.length(),
> +        filenameCharsets[0], "UTF-8", nullptr, &representationLength, nullptr));
> +    if (!representation)
> +        return { };

Ditto.

> Source/WTF/wtf/glib/FileSystemGlib.cpp:96
> +    auto* data = representation.data();
> +    return !!data && data[0] != '\0';

I think CString should have an isEmpty() method, but anyway, why don't we just ensure we return a null CString in case of invalid representation? Then we can simply use isNull() instead of this
Comment 3 Zan Dobersek 2019-02-04 05:37:12 PST
Comment on attachment 361041 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361041&action=review

>> Source/WTF/wtf/glib/FileSystemGlib.cpp:66
>> +        return { };
> 
> An error here would be difficult to catch, I think, I would get the error and show a message with WTFLogAlways.

OK.

>> Source/WTF/wtf/glib/FileSystemGlib.cpp:88
>> +        return { };
> 
> Ditto.

OK.

>> Source/WTF/wtf/glib/FileSystemGlib.cpp:96
>> +    return !!data && data[0] != '\0';
> 
> I think CString should have an isEmpty() method, but anyway, why don't we just ensure we return a null CString in case of invalid representation? Then we can simply use isNull() instead of this

It's simpler here to check whether a CString is acceptable for passing to system APIs (i.e. it contains data and the first byte is not zero, meaning it's not a zero-length string) than to test what fileSystemRepresentation() might have produced.
Comment 4 Michael Catanzaro 2019-02-04 07:32:15 PST
Comment on attachment 361041 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361041&action=review

Good!

> Source/WTF/wtf/glib/FileSystemGlib.cpp:59
> +    // If the returned charset is UTF-8, go directly to String creation.
> +    const gchar** filenameCharsets = nullptr;
> +    if (g_get_filename_charsets(&filenameCharsets))
> +        return String::fromUTF8(representation, representationLength);

This is right, but it looks like a bug, because a reasonable programmer would not expect g_get_filename_charsets() to return TRUE if it's UTF-8. You might adjust the comment to indicate that the function is weird so somebody doesn't waste time adding a check for UTF-8 here in the future.

> Source/WTF/wtf/glib/FileSystemGlib.cpp:65
> +    ASSERT(filenameCharsets);
> +    size_t utf8Length = 0;
> +    GUniquePtr<gchar> utf8(g_convert(representation, representationLength,
> +        "UTF-8", filenameCharsets[0], nullptr, &utf8Length, nullptr));
> +    if (!utf8)

Hm, this looks harder and no better than using g_filename_to_utf8(), which could of course be the easiest solution. I thought you were doing all this extra effort to have a fast path, to convert straight from filesystem representation to UTF-16. It doesn't make much sense to manually convert it to UTF-8 and then immediately convert to UTF-16, right? I think you should directly convert to UTF-16 instead, then the string can be constructed directly instead of using fromUTF8(). Otherwise, as long as you're doing the extra step through UTF-8, might as well just use simpler g_filename_utf8().

> Source/WTF/wtf/glib/FileSystemGlib.cpp:86
> +    GUniquePtr<gchar> representation(g_convert(utf8.data(), utf8.length(),
> +        filenameCharsets[0], "UTF-8", nullptr, &representationLength, nullptr));

Though I guess in this case, we already have the UTF-8 for the fast path above, so converting from UTF-16 here wouldn't save any effort. But it would parallel my suggested changes to the function above. So both ways kinda make sense here. Anyway, up to you. Think about it.
Comment 5 Zan Dobersek 2019-02-04 23:58:18 PST
Comment on attachment 361041 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361041&action=review

>> Source/WTF/wtf/glib/FileSystemGlib.cpp:65
>> +    if (!utf8)
> 
> Hm, this looks harder and no better than using g_filename_to_utf8(), which could of course be the easiest solution. I thought you were doing all this extra effort to have a fast path, to convert straight from filesystem representation to UTF-16. It doesn't make much sense to manually convert it to UTF-8 and then immediately convert to UTF-16, right? I think you should directly convert to UTF-16 instead, then the string can be constructed directly instead of using fromUTF8(). Otherwise, as long as you're doing the extra step through UTF-8, might as well just use simpler g_filename_utf8().

Using g_filename_to_utf8() would unnecessarily g_strdup() the string contents when the filename charset is UTF-8. That's why that info is retrieved manually.

WTF::String doesn't provide a way to construct objects from UTF-16 data that's contained in sequential array of 8-bit values, and I can't currently commit more time to finding out whether g_convert() can, when converting from/to UTF-16, properly consume/produce UTF-16 data in sequential 8-bit representation that can be just reinterpret_casted to UChar on our side.
Comment 6 Zan Dobersek 2019-02-05 02:48:11 PST
Created attachment 361179 [details]
Patch for landing
Comment 7 Zan Dobersek 2019-02-05 02:49:14 PST
Comment on attachment 361179 [details]
Patch for landing

Clearing flags on attachment: 361179

Committed r240969: <https://trac.webkit.org/changeset/240969>
Comment 8 Zan Dobersek 2019-02-05 02:49:18 PST
All reviewed patches have been landed.  Closing bug.
Comment 9 Radar WebKit Bug Importer 2019-02-05 02:50:29 PST
<rdar://problem/47813632>