Bug 20664

Summary: [GTK] File names are not always encodable in UTF-8
Product: WebKit Reporter: Marco Barisione <marco.barisione>
Component: WebKitGTKAssignee: Marco Barisione <marco.barisione>
Status: RESOLVED FIXED    
Severity: Normal CC: ap
Priority: P2 Keywords: Gtk
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Add String::toFilename and String::fromFilename
darin: review-
Modified version of attachment #23170 to the String::to/fromFilename
none
Test
none
Add filenameToString and filenameFromString to FileSystemGtk.cpp
alp: review-
Add filenameToString and filenameFromString to FileSystemGtk.cpp
eric: review-
Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp none

Description Marco Barisione 2008-09-05 07:14:15 PDT
On linux file names should be considered just raw strings as it's possible to have just junk non-encodable in UTF-8 inside them. This means that file names cannot be just stored as-is inside a String. Note that using g_filename_to_utf8 is not a possible solution as it's only meant to convert file names for displaying them in the UI.

A possible solution would be to switch from String to CString for file names but this would mean losing the ability to use all the nice String methods on them and making the generic code and the code for other ports more complex.

Another possible solution is to escape invalid characters before storing them in a String and unescaping them before passing the string to functions expecting a raw file name.
Comment 1 Marco Barisione 2008-09-05 07:25:52 PDT
Created attachment 23191 [details]
Add String::toFilename and String::fromFilename

1. Add String::fromFilename static method (only for the GTK port) that creates an escape string from a file name.
2. Add String::toFilename to unescape a string create with String::fromFilename.
3. Add some debugging code to try to catch uses of String::toFilename on Strings not created through String::fromFilename.
4. Fix the functions in FileSystemGtk.cpp.
Comment 2 Marco Barisione 2008-09-05 07:30:58 PDT
Created attachment 23192 [details]
Modified version of attachment #23170 [details] to the String::to/fromFilename
Comment 3 Marco Barisione 2008-09-05 07:38:40 PDT
Created attachment 23193 [details]
Test

This patch creates 2 CSS files the first named test-à.css (with the à encoded in latin-1, i.e. "\xe0") and test-[?].css (where with [?] I mean the Unicode replacement character, i.e. "\xef\xbf\xbd"), and then it tries to load test-à.css as the user-specified stylesheet.

Try this with https://bugs.webkit.org/attachment.cgi?id=23170 and WebKit will load test-[?].css because of the raw chars -> utf-8 transformation, so you will se a red background.

Try this with https://bugs.webkit.org/attachment.cgi?id=23192 and WebKit will load test-à.css, so you will se a green background.
Comment 4 Darin Adler 2008-09-05 11:31:26 PDT
Comment on attachment 23191 [details]
Add String::toFilename and String::fromFilename

Adding this capability to the String class seems clearly wrong. It's not correct layering -- we don't want the String class to know anything about the file system. We also want to limit platform-specific functions in the String class to the ones for converting to and from platform String types.

You can accomplish this by building functions that operate on String objects rather than adding member functions to the String class itself.

I believe this can be done entirely within the FileSystemGtk.cpp file without touching PlatformString.h or String.cpp.
Comment 5 Darin Adler 2008-09-05 11:33:32 PDT
One other question: This seems fine for cases where the filenames come from the functions in FileSystemGtk. But what effect will this have on all the code paths where the filename comes from content on the web?
Comment 6 Marco Barisione 2008-09-11 08:49:18 PDT
Created attachment 23350 [details]
Add filenameToString and filenameFromString to FileSystemGtk.cpp

(In reply to comment #5)
> One other question: This seems fine for cases where the filenames come from the
> functions in FileSystemGtk. But what effect will this have on all the code
> paths where the filename comes from content on the web?

Like? I cannot think to any case where we could get filenames causing problems, but probably I'm missing something.
Comment 7 Alp Toker 2008-09-13 09:18:59 PDT
Comment on attachment 23350 [details]
Add filenameToString and filenameFromString to FileSystemGtk.cpp

+#if PLATFORM(WIN)

should be WIN_OS.

g_uri_escape_string()

^ only available in GLib >= 2.16


For this change to make sense, we need to decide exactly what we expect the developer to pass to user-stylesheet-uri: a local filename (byte array) or a well formatted URI?
Comment 8 Marco Barisione 2008-09-15 08:10:29 PDT
Created attachment 23441 [details]
Add filenameToString and filenameFromString to FileSystemGtk.cpp

(In reply to comment #7)
> +#if PLATFORM(WIN)
> 
> should be WIN_OS.

Fixed.

> g_uri_escape_string()
> 
> ^ only available in GLib >= 2.16

I included a copy of that function so we can use with glib < 2.16. Is it the right way to do that?
 
> For this change to make sense, we need to decide exactly what we expect the
> developer to pass to user-stylesheet-uri: a local filename (byte array) or a
> well formatted URI?

WebCore expects a URI and considering the name I think that it should be clear that it's a URI and not a filename.
Comment 9 Holger Freyther 2008-09-21 19:10:34 PDT
(In reply to comment #8)
> Created an attachment (id=23441) [edit]

> 
> > g_uri_escape_string()
> > 
> > ^ only available in GLib >= 2.16
> 
> I included a copy of that function so we can use with glib < 2.16. Is it the
> right way to do that?

It might make sense to create a new file (maybe even only a header file) for such things? Besides that question it seems to be done. If we manage to land bug #18987 first you might need to upload a new diff though.
Comment 10 Eric Seidel (no email) 2008-09-23 04:20:36 PDT
Comment on attachment 23441 [details]
Add filenameToString and filenameFromString to FileSystemGtk.cpp

 87         gchar* systemFilename = filenameFromString(m_filename.utf8().data());

That can just be:
 87         gchar* systemFilename = filenameFromString(m_filename);
or?

I think all the imported code from GLIB should live in a separate header/.cpp pair, that way it can have its own special indentation for that file.  Although I'm not sure that's really enough code (at least for is_valid) to really justify needing to be copy/pasted from glib.
Comment 11 Marco Barisione 2008-10-02 09:46:32 PDT
Created attachment 24021 [details]
Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp

(In reply to comment #10)
> (From update of attachment 23441 [details] [edit])
>  87         gchar* systemFilename =
> filenameFromString(m_filename.utf8().data());
> 
> That can just be:
>  87         gchar* systemFilename = filenameFromString(m_filename);
> or?

Yeah, I didn't see that. Fixed.

> I think all the imported code from GLIB should live in a separate header/.cpp
> pair, that way it can have its own special indentation for that file.  Although
> I'm not sure that's really enough code (at least for is_valid) to really
> justify needing to be copy/pasted from glib.

Done. is_valid is there only because it's used by another function copied from glib, I don't use it directly.

> such things? Besides that question it seems to be done. If we manage to land
> bug #18987 first you might need to upload a new diff though.

Updated. For this chance I also added filenameForDisplay that converts an encoded string to a UTF-8 string for display, useful for error messages for instance.
Comment 12 Holger Freyther 2008-10-13 10:51:32 PDT
Looks sane. One minor coding style issue with the following but it can be fixed when landing. I need another second to have a second look.

+    gchar *filename = filenameFromString(string);
+    gchar *display = g_filename_to_utf8(filename, 0, 0, 0, 0);
Comment 13 Holger Freyther 2008-10-13 11:23:27 PDT
Comment on attachment 24021 [details]
Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp


>+        gchar* systemFilename = filenameFromString(m_filename);
>+        gchar* systemBasename = g_path_get_basename(systemFilename);
>+        g_free(systemFilename);
>+        stringByAdoptingFileSystemRepresentation(systemBasename, string);


vs.


>+    char* baseName = g_path_get_basename(tmpFilename);
>     String fileName = String::fromUTF8(baseName);
>     g_free(baseName);
>+    g_free(tmpFilename);

should one use the above stringByAdopting... function as well?


The next thing is to remove the various if (fileName) {} statements as filenameFromString is not returning a NULL String? We at least have inconsistent null handling with filenameFromString after this patch. I would be willing to land this as the first part if cleanups follow? Do you agree?
Comment 14 Marco Barisione 2008-10-14 08:57:24 PDT
(In reply to comment #13)
> (From update of attachment 24021 [details] [edit])
> >+        gchar* systemFilename = filenameFromString(m_filename);
> >+        gchar* systemBasename = g_path_get_basename(systemFilename);
> >+        g_free(systemFilename);
> >+        stringByAdoptingFileSystemRepresentation(systemBasename, string);
> 
> vs.
> 
> >+    char* baseName = g_path_get_basename(tmpFilename);
> >     String fileName = String::fromUTF8(baseName);
> >     g_free(baseName);
> >+    g_free(tmpFilename);

I will fix it using GOwnPtr now that it landed. I can fix this and the return value checks in a separate patch+bug report if it's ok for you, so we avoid to keep changing the same patch.
Comment 15 Holger Freyther 2008-10-15 11:23:43 PDT
Comment on attachment 24021 [details]
Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp

We are aware that more work needs to be done but this will be done in another patch.
Comment 16 Holger Freyther 2008-10-15 13:34:28 PDT
Comment on attachment 24021 [details]
Add filenameToString, filenameFromString and filenameForDisplay to FileSystemGtk.cpp

Clearing the review field as the patch landed. guriescape had to be converted to spaces.

Thanks for the patch and the next thing would be to move things to GOwnPtr?
Comment 17 Sung-jun, Kim 2008-12-02 23:00:56 PST
This patch was landed. How about change status to FIXED?
Comment 18 Gustavo Noronha (kov) 2009-01-12 15:33:53 PST
Setting to fixed since the patch was landed, please reopen if needed.