RESOLVED FIXED 20664
[GTK] File names are not always encodable in UTF-8
https://bugs.webkit.org/show_bug.cgi?id=20664
Summary [GTK] File names are not always encodable in UTF-8
Marco Barisione
Reported 2008-09-05 07:14:15 PDT
On linux file names should be considered just raw strings as it's possible to have just junk non-encodable in UTF-8 inside them. This means that file names cannot be just stored as-is inside a String. Note that using g_filename_to_utf8 is not a possible solution as it's only meant to convert file names for displaying them in the UI. A possible solution would be to switch from String to CString for file names but this would mean losing the ability to use all the nice String methods on them and making the generic code and the code for other ports more complex. Another possible solution is to escape invalid characters before storing them in a String and unescaping them before passing the string to functions expecting a raw file name.
Attachments
Add String::toFilename and String::fromFilename (8.13 KB, patch)
2008-09-05 07:25 PDT, Marco Barisione
darin: review-
Modified version of attachment #23170 to the String::to/fromFilename (5.94 KB, patch)
2008-09-05 07:30 PDT, Marco Barisione
no flags
Test (1.83 KB, patch)
2008-09-05 07:38 PDT, Marco Barisione
no flags
Add filenameToString and filenameFromString to FileSystemGtk.cpp (7.19 KB, patch)
2008-09-11 08:49 PDT, Marco Barisione
alp: review-
Add filenameToString and filenameFromString to FileSystemGtk.cpp (10.18 KB, patch)
2008-09-15 08:10 PDT, Marco Barisione
eric: review-
Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp (15.97 KB, patch)
2008-10-02 09:46 PDT, Marco Barisione
no flags
Marco Barisione
Comment 1 2008-09-05 07:25:52 PDT
Created attachment 23191 [details] Add String::toFilename and String::fromFilename 1. Add String::fromFilename static method (only for the GTK port) that creates an escape string from a file name. 2. Add String::toFilename to unescape a string create with String::fromFilename. 3. Add some debugging code to try to catch uses of String::toFilename on Strings not created through String::fromFilename. 4. Fix the functions in FileSystemGtk.cpp.
Marco Barisione
Comment 2 2008-09-05 07:30:58 PDT
Created attachment 23192 [details] Modified version of attachment #23170 [details] to the String::to/fromFilename
Marco Barisione
Comment 3 2008-09-05 07:38:40 PDT
Created attachment 23193 [details] Test This patch creates 2 CSS files the first named test-à.css (with the à encoded in latin-1, i.e. "\xe0") and test-[?].css (where with [?] I mean the Unicode replacement character, i.e. "\xef\xbf\xbd"), and then it tries to load test-à.css as the user-specified stylesheet. Try this with https://bugs.webkit.org/attachment.cgi?id=23170 and WebKit will load test-[?].css because of the raw chars -> utf-8 transformation, so you will se a red background. Try this with https://bugs.webkit.org/attachment.cgi?id=23192 and WebKit will load test-à.css, so you will se a green background.
Darin Adler
Comment 4 2008-09-05 11:31:26 PDT
Comment on attachment 23191 [details] Add String::toFilename and String::fromFilename Adding this capability to the String class seems clearly wrong. It's not correct layering -- we don't want the String class to know anything about the file system. We also want to limit platform-specific functions in the String class to the ones for converting to and from platform String types. You can accomplish this by building functions that operate on String objects rather than adding member functions to the String class itself. I believe this can be done entirely within the FileSystemGtk.cpp file without touching PlatformString.h or String.cpp.
Darin Adler
Comment 5 2008-09-05 11:33:32 PDT
One other question: This seems fine for cases where the filenames come from the functions in FileSystemGtk. But what effect will this have on all the code paths where the filename comes from content on the web?
Marco Barisione
Comment 6 2008-09-11 08:49:18 PDT
Created attachment 23350 [details] Add filenameToString and filenameFromString to FileSystemGtk.cpp (In reply to comment #5) > One other question: This seems fine for cases where the filenames come from the > functions in FileSystemGtk. But what effect will this have on all the code > paths where the filename comes from content on the web? Like? I cannot think to any case where we could get filenames causing problems, but probably I'm missing something.
Alp Toker
Comment 7 2008-09-13 09:18:59 PDT
Comment on attachment 23350 [details] Add filenameToString and filenameFromString to FileSystemGtk.cpp +#if PLATFORM(WIN) should be WIN_OS. g_uri_escape_string() ^ only available in GLib >= 2.16 For this change to make sense, we need to decide exactly what we expect the developer to pass to user-stylesheet-uri: a local filename (byte array) or a well formatted URI?
Marco Barisione
Comment 8 2008-09-15 08:10:29 PDT
Created attachment 23441 [details] Add filenameToString and filenameFromString to FileSystemGtk.cpp (In reply to comment #7) > +#if PLATFORM(WIN) > > should be WIN_OS. Fixed. > g_uri_escape_string() > > ^ only available in GLib >= 2.16 I included a copy of that function so we can use with glib < 2.16. Is it the right way to do that? > For this change to make sense, we need to decide exactly what we expect the > developer to pass to user-stylesheet-uri: a local filename (byte array) or a > well formatted URI? WebCore expects a URI and considering the name I think that it should be clear that it's a URI and not a filename.
Holger Freyther
Comment 9 2008-09-21 19:10:34 PDT
(In reply to comment #8) > Created an attachment (id=23441) [edit] > > > g_uri_escape_string() > > > > ^ only available in GLib >= 2.16 > > I included a copy of that function so we can use with glib < 2.16. Is it the > right way to do that? It might make sense to create a new file (maybe even only a header file) for such things? Besides that question it seems to be done. If we manage to land bug #18987 first you might need to upload a new diff though.
Eric Seidel (no email)
Comment 10 2008-09-23 04:20:36 PDT
Comment on attachment 23441 [details] Add filenameToString and filenameFromString to FileSystemGtk.cpp 87 gchar* systemFilename = filenameFromString(m_filename.utf8().data()); That can just be: 87 gchar* systemFilename = filenameFromString(m_filename); or? I think all the imported code from GLIB should live in a separate header/.cpp pair, that way it can have its own special indentation for that file. Although I'm not sure that's really enough code (at least for is_valid) to really justify needing to be copy/pasted from glib.
Marco Barisione
Comment 11 2008-10-02 09:46:32 PDT
Created attachment 24021 [details] Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp (In reply to comment #10) > (From update of attachment 23441 [details] [edit]) > 87 gchar* systemFilename = > filenameFromString(m_filename.utf8().data()); > > That can just be: > 87 gchar* systemFilename = filenameFromString(m_filename); > or? Yeah, I didn't see that. Fixed. > I think all the imported code from GLIB should live in a separate header/.cpp > pair, that way it can have its own special indentation for that file. Although > I'm not sure that's really enough code (at least for is_valid) to really > justify needing to be copy/pasted from glib. Done. is_valid is there only because it's used by another function copied from glib, I don't use it directly. > such things? Besides that question it seems to be done. If we manage to land > bug #18987 first you might need to upload a new diff though. Updated. For this chance I also added filenameForDisplay that converts an encoded string to a UTF-8 string for display, useful for error messages for instance.
Holger Freyther
Comment 12 2008-10-13 10:51:32 PDT
Looks sane. One minor coding style issue with the following but it can be fixed when landing. I need another second to have a second look. + gchar *filename = filenameFromString(string); + gchar *display = g_filename_to_utf8(filename, 0, 0, 0, 0);
Holger Freyther
Comment 13 2008-10-13 11:23:27 PDT
Comment on attachment 24021 [details] Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp >+ gchar* systemFilename = filenameFromString(m_filename); >+ gchar* systemBasename = g_path_get_basename(systemFilename); >+ g_free(systemFilename); >+ stringByAdoptingFileSystemRepresentation(systemBasename, string); vs. >+ char* baseName = g_path_get_basename(tmpFilename); > String fileName = String::fromUTF8(baseName); > g_free(baseName); >+ g_free(tmpFilename); should one use the above stringByAdopting... function as well? The next thing is to remove the various if (fileName) {} statements as filenameFromString is not returning a NULL String? We at least have inconsistent null handling with filenameFromString after this patch. I would be willing to land this as the first part if cleanups follow? Do you agree?
Marco Barisione
Comment 14 2008-10-14 08:57:24 PDT
(In reply to comment #13) > (From update of attachment 24021 [details] [edit]) > >+ gchar* systemFilename = filenameFromString(m_filename); > >+ gchar* systemBasename = g_path_get_basename(systemFilename); > >+ g_free(systemFilename); > >+ stringByAdoptingFileSystemRepresentation(systemBasename, string); > > vs. > > >+ char* baseName = g_path_get_basename(tmpFilename); > > String fileName = String::fromUTF8(baseName); > > g_free(baseName); > >+ g_free(tmpFilename); I will fix it using GOwnPtr now that it landed. I can fix this and the return value checks in a separate patch+bug report if it's ok for you, so we avoid to keep changing the same patch.
Holger Freyther
Comment 15 2008-10-15 11:23:43 PDT
Comment on attachment 24021 [details] Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp We are aware that more work needs to be done but this will be done in another patch.
Holger Freyther
Comment 16 2008-10-15 13:34:28 PDT
Comment on attachment 24021 [details] Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp Clearing the review field as the patch landed. guriescape had to be converted to spaces. Thanks for the patch and the next thing would be to move things to GOwnPtr?
Sung-jun, Kim
Comment 17 2008-12-02 23:00:56 PST
This patch was landed. How about change status to FIXED?
Gustavo Noronha (kov)
Comment 18 2009-01-12 15:33:53 PST
Setting to fixed since the patch was landed, please reopen if needed.
Note You need to log in before you can comment on or make changes to this bug.