Bug 14635

Summary: Uploading file with non-ASCII character in path fails
Product: WebKit Reporter: 808caaa4.8ce9.9cd6c799e9f6
Component: FormsAssignee: Alexey Proskuryakov <ap>
Status: RESOLVED FIXED    
Severity: Major CC: ap, ddkilzer
Priority: P2 Keywords: InRadar
Version: 523.x (Safari 3)   
Hardware: PC   
OS: Windows XP   
Attachments:
Description Flags
proposed fix
aroben: review-
updated patch darin: review+

808caaa4.8ce9.9cd6c799e9f6
Reported 2007-07-17 01:32:35 PDT
With this site(webkit-buglizza), I cannot upload file from my Desktop. After moving file to somewhere like c:\__sendtemp\, uploading succeeded. 'filename' is full path, and contains UTF8(nonascii) chars. With some environment directory name "desktop" is localized into nonascii strings. I don't know who has responsiblility, or whether we should always send fullpath. ; =============== ; posts ; =============== POST http://bugs.webkit.org/attachment.cgi HTTP/1.1 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja) AppleWebKit/522.14.1+ (KHTML, like Gecko) Version/3.0.2 Safari/522.13.1 Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryAAsdAAJVAApEAA5j Transfer-Encoding: Chunked ..... ------WebKitFormBoundaryAAsdAAJVAApEAA5j Content-Disposition: form-data; name="data"; filename="C:\Documents and Settings\*****\ε㑦€»εβ―εγ°ε₯γεγΝ½test.html.png" Content-Type: image/png ; =============== ; response ; =============== HTTP/1.1 411 Length Required Server: Apache/1.3.33 (Darwin) PHP/4.4.1 mod_ssl/2.8.24 OpenSSL/0.9.7i DAV/1.0.3 Transfer-Encoding: chunked Content-Type: text/html; charset=iso-8859-1
Attachments
proposed fix (2.88 KB, patch)
2007-08-11 12:50 PDT, Alexey Proskuryakov
aroben: review-
updated patch (2.87 KB, patch)
2007-08-11 13:03 PDT, Alexey Proskuryakov
darin: review+
808caaa4.8ce9.9cd6c799e9f6
Comment 1 2007-07-17 02:07:46 PDT
Oh, HTMLFormElement::formData() converts path to current encoding. At the time above I certainly settled DefaultEncoding as UTF8. Now I retried with reloaded attachment.cgi while Encoding->Europian. Path is converted as below and still uploading failed. Content-Disposition: form-data; name="data"; filename="C:\Documents and Settings\********\&#12487;&#12473;&#12463;&#12488;&#12483;&#12503;\test.txt"
Alexey Proskuryakov
Comment 2 2007-07-17 04:16:48 PDT
I might well be wrong, but I think that a full path is not really needed (and is sometimes harmful) in Content-Disposition, so just a file name can be sent. Would you be willing to submit such a patch for WebCore, as described in <http://webkit.org/coding/contributing.html>?
David Kilzer (:ddkilzer)
Comment 3 2007-07-17 08:23:03 PDT
What does Firefox or MSIE (or Opera) when uploading this file? Do they encode the path differently?
David Kilzer (:ddkilzer)
Comment 4 2007-07-17 08:24:01 PDT
(In reply to comment #2) > I might well be wrong, but I think that a full path is not really needed (and > is sometimes harmful) in Content-Disposition, so just a file name can be sent. But if the actual file contained UTF8 (non-ASCII) characters, this doesn't really fix the issue.
David Kilzer (:ddkilzer)
Comment 5 2007-07-17 08:24:35 PDT
Alexey Proskuryakov
Comment 6 2007-07-17 12:44:56 PDT
(In reply to comment #4) > But if the actual file contained UTF8 (non-ASCII) characters, this doesn't > really fix the issue. That's true, but we don't have to fix everything at once - there isn't any 100% correct solution for non-ASCII characters in HTTP headers AFAIK, so it will take a lot of tweaking. Also, most experienced users know about the perils of non-ASCII file names, while it's really hard to guess that a problem with uploading is caused by a directory name somewhere in the path.
David Kilzer (:ddkilzer)
Comment 7 2007-07-18 08:34:49 PDT
(In reply to comment #6) > (In reply to comment #4) > > But if the actual file contained UTF8 (non-ASCII) characters, this doesn't > > really fix the issue. > > That's true, but we don't have to fix everything at once - there isn't any 100% > correct solution for non-ASCII characters in HTTP headers AFAIK, so it will > take a lot of tweaking. > > Also, most experienced users know about the perils of non-ASCII file names, > while it's really hard to guess that a problem with uploading is caused by a > directory name somewhere in the path. My main concern is that this would make Safari on Windows behave differently from every other browser with respect to uploading files.
David Kilzer (:ddkilzer)
Comment 8 2007-07-18 15:26:25 PDT
Why can't we use "encoded-word" encoding for the path as defined by RFC 2047? It's perfect for this application--encoding non-US-ASCII characters in a US-ASCII-only field! We could either encode the path on a per-directory basis (encode each set of characters between directory separators), or encode the whole string. (I would recommend encoding text between each directory separator since server-side software would have to be updated to understand how to decode such fields.) http://tools.ietf.org/html/rfc2047 Updates to RFC 2047: http://tools.ietf.org/html/rfc2184 http://tools.ietf.org/html/rfc2231 Confirming bug based on empirical evidence.
Alexey Proskuryakov
Comment 9 2007-07-18 21:30:52 PDT
(In reply to comment #8) > Why can't we use "encoded-word" encoding for the path as defined by RFC 2047? Let's track this as a separate bug, please. The fact that a full path is sent from Windows is just a porting bug AFAICT - it's not sent from the Mac.
808caaa4.8ce9.9cd6c799e9f6
Comment 10 2007-07-19 03:56:54 PDT
/* bug#14667 for the encoding of filename part discussion */
Alexey Proskuryakov
Comment 11 2007-08-08 22:21:49 PDT
The problem is in HTMLFormElement::formData(): int start = path.reverseFind('/') + 1; We should be using PathFindFileName() API on Windows, and possibly basename(3) where available. I'm trying to think of a good place to put a helper function...
Alexey Proskuryakov
Comment 12 2007-08-11 12:50:38 PDT
Created attachment 15933 [details] proposed fix Not tested, as I don't have a Windows build environment set up. AFAICT, path operations aren't needed anywhere else in WebCore, so a slightly buggy static helper looks like an fine solution :-)
Adam Roben (:aroben)
Comment 13 2007-08-11 12:54:17 PDT
Comment on attachment 15933 [details] proposed fix +static String pathGetFileName(const String& path) +{ +#if PLATFORM(WIN_OS) + return String(PathFindFileName(path.charactersWithNullTermination())); This won't compile, since charactersWithNullTermination is a non-const instance method.
Alexey Proskuryakov
Comment 14 2007-08-11 13:03:26 PDT
Created attachment 15934 [details] updated patch Oops - I wonder why it is non-const if it doesn't really modify the string.
Darin Adler
Comment 15 2007-08-13 11:39:27 PDT
Comment on attachment 15934 [details] updated patch + return String(PathFindFileName(path.charactersWithNullTermination())); Do you need the explicit String() here? Doesn't the conversion happen automatically? Also, we often use "filename" as a single word; so I think the variables and functions could just be all lowercase. r=me
Alexey Proskuryakov
Comment 16 2007-08-13 12:12:25 PDT
(In reply to comment #15) > Do you need the explicit String() here? Doesn't the conversion happen > automatically? I wondered the same when I saw such a construct in FileChooserWin.cpp, so I preserved it for safety (I don't have a Windows build environment set up at the moment).
Alexey Proskuryakov
Comment 17 2007-08-13 22:52:38 PDT
Committed revision 25067.
Note You need to log in before you can comment on or make changes to this bug.