WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
194213
[GLib] Stop URI-escaping file system representations
https://bugs.webkit.org/show_bug.cgi?id=194213
Summary
[GLib] Stop URI-escaping file system representations
Zan Dobersek
Reported
2019-02-03 23:00:07 PST
URI-escaping when creating strings from filesystem representations (and unescaping when creating FS representations from strings) is currently breaking the IndexedDB database size calculation. It's also different behavior than what is done in POSIX and CF implementations of FileSystem::stringFromFileSystemRepresentation() and FileSystem::fileSystemRepresentation().
Attachments
Patch
(16.31 KB, patch)
2019-02-03 23:18 PST
,
Zan Dobersek
no flags
Details
Formatted Diff
Diff
Patch for landing
(16.66 KB, patch)
2019-02-05 02:48 PST
,
Zan Dobersek
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Zan Dobersek
Comment 1
2019-02-03 23:18:40 PST
Created
attachment 361041
[details]
Patch
Carlos Garcia Campos
Comment 2
2019-02-04 00:26:47 PST
Comment on
attachment 361041
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=361041&action=review
> Source/WTF/wtf/glib/FileSystemGlib.cpp:66 > + GUniquePtr<gchar> utf8(g_convert(representation, representationLength, > + "UTF-8", filenameCharsets[0], nullptr, &utf8Length, nullptr)); > + if (!utf8) > + return { };
An error here would be difficult to catch, I think, I would get the error and show a message with WTFLogAlways.
> Source/WTF/wtf/glib/FileSystemGlib.cpp:88 > + GUniquePtr<gchar> representation(g_convert(utf8.data(), utf8.length(), > + filenameCharsets[0], "UTF-8", nullptr, &representationLength, nullptr)); > + if (!representation) > + return { };
Ditto.
> Source/WTF/wtf/glib/FileSystemGlib.cpp:96 > + auto* data = representation.data(); > + return !!data && data[0] != '\0';
I think CString should have an isEmpty() method, but anyway, why don't we just ensure we return a null CString in case of invalid representation? Then we can simply use isNull() instead of this
Zan Dobersek
Comment 3
2019-02-04 05:37:12 PST
Comment on
attachment 361041
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=361041&action=review
>> Source/WTF/wtf/glib/FileSystemGlib.cpp:66 >> + return { }; > > An error here would be difficult to catch, I think, I would get the error and show a message with WTFLogAlways.
OK.
>> Source/WTF/wtf/glib/FileSystemGlib.cpp:88 >> + return { }; > > Ditto.
OK.
>> Source/WTF/wtf/glib/FileSystemGlib.cpp:96 >> + return !!data && data[0] != '\0'; > > I think CString should have an isEmpty() method, but anyway, why don't we just ensure we return a null CString in case of invalid representation? Then we can simply use isNull() instead of this
It's simpler here to check whether a CString is acceptable for passing to system APIs (i.e. it contains data and the first byte is not zero, meaning it's not a zero-length string) than to test what fileSystemRepresentation() might have produced.
Michael Catanzaro
Comment 4
2019-02-04 07:32:15 PST
Comment on
attachment 361041
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=361041&action=review
Good!
> Source/WTF/wtf/glib/FileSystemGlib.cpp:59 > + // If the returned charset is UTF-8, go directly to String creation. > + const gchar** filenameCharsets = nullptr; > + if (g_get_filename_charsets(&filenameCharsets)) > + return String::fromUTF8(representation, representationLength);
This is right, but it looks like a bug, because a reasonable programmer would not expect g_get_filename_charsets() to return TRUE if it's UTF-8. You might adjust the comment to indicate that the function is weird so somebody doesn't waste time adding a check for UTF-8 here in the future.
> Source/WTF/wtf/glib/FileSystemGlib.cpp:65 > + ASSERT(filenameCharsets); > + size_t utf8Length = 0; > + GUniquePtr<gchar> utf8(g_convert(representation, representationLength, > + "UTF-8", filenameCharsets[0], nullptr, &utf8Length, nullptr)); > + if (!utf8)
Hm, this looks harder and no better than using g_filename_to_utf8(), which could of course be the easiest solution. I thought you were doing all this extra effort to have a fast path, to convert straight from filesystem representation to UTF-16. It doesn't make much sense to manually convert it to UTF-8 and then immediately convert to UTF-16, right? I think you should directly convert to UTF-16 instead, then the string can be constructed directly instead of using fromUTF8(). Otherwise, as long as you're doing the extra step through UTF-8, might as well just use simpler g_filename_utf8().
> Source/WTF/wtf/glib/FileSystemGlib.cpp:86 > + GUniquePtr<gchar> representation(g_convert(utf8.data(), utf8.length(), > + filenameCharsets[0], "UTF-8", nullptr, &representationLength, nullptr));
Though I guess in this case, we already have the UTF-8 for the fast path above, so converting from UTF-16 here wouldn't save any effort. But it would parallel my suggested changes to the function above. So both ways kinda make sense here. Anyway, up to you. Think about it.
Zan Dobersek
Comment 5
2019-02-04 23:58:18 PST
Comment on
attachment 361041
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=361041&action=review
>> Source/WTF/wtf/glib/FileSystemGlib.cpp:65 >> + if (!utf8) > > Hm, this looks harder and no better than using g_filename_to_utf8(), which could of course be the easiest solution. I thought you were doing all this extra effort to have a fast path, to convert straight from filesystem representation to UTF-16. It doesn't make much sense to manually convert it to UTF-8 and then immediately convert to UTF-16, right? I think you should directly convert to UTF-16 instead, then the string can be constructed directly instead of using fromUTF8(). Otherwise, as long as you're doing the extra step through UTF-8, might as well just use simpler g_filename_utf8().
Using g_filename_to_utf8() would unnecessarily g_strdup() the string contents when the filename charset is UTF-8. That's why that info is retrieved manually. WTF::String doesn't provide a way to construct objects from UTF-16 data that's contained in sequential array of 8-bit values, and I can't currently commit more time to finding out whether g_convert() can, when converting from/to UTF-16, properly consume/produce UTF-16 data in sequential 8-bit representation that can be just reinterpret_casted to UChar on our side.
Zan Dobersek
Comment 6
2019-02-05 02:48:11 PST
Created
attachment 361179
[details]
Patch for landing
Zan Dobersek
Comment 7
2019-02-05 02:49:14 PST
Comment on
attachment 361179
[details]
Patch for landing Clearing flags on attachment: 361179 Committed
r240969
: <
https://trac.webkit.org/changeset/240969
>
Zan Dobersek
Comment 8
2019-02-05 02:49:18 PST
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 9
2019-02-05 02:50:29 PST
<
rdar://problem/47813632
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug