Bug 14635 - Uploading file with non-ASCII character in path fails
Summary: Uploading file with non-ASCII character in path fails
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Forms (show other bugs)
Version: 523.x (Safari 3)
Hardware: PC Windows XP
: P2 Major
Assignee: Alexey Proskuryakov
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2007-07-17 01:32 PDT by 808caaa4.8ce9.9cd6c799e9f6
Modified: 2007-08-13 22:52 PDT (History)
2 users (show)

See Also:


Attachments
proposed fix (2.88 KB, patch)
2007-08-11 12:50 PDT, Alexey Proskuryakov
aroben: review-
Details | Formatted Diff | Diff
updated patch (2.87 KB, patch)
2007-08-11 13:03 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description 808caaa4.8ce9.9cd6c799e9f6 2007-07-17 01:32:35 PDT
With this site(webkit-buglizza), I cannot upload file from my Desktop.
After moving file to somewhere like c:\__sendtemp\, uploading succeeded.

'filename' is full path, and contains UTF8(nonascii) chars.
With some environment directory name "desktop" is localized into nonascii strings.

I don't know who has responsiblility, or whether we should always send fullpath.

; ===============
; posts
; ===============
POST http://bugs.webkit.org/attachment.cgi HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja) AppleWebKit/522.14.1+ (KHTML, like Gecko) Version/3.0.2 Safari/522.13.1
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryAAsdAAJVAApEAA5j
Transfer-Encoding: Chunked
.....

------WebKitFormBoundaryAAsdAAJVAApEAA5j
Content-Disposition: form-data; name="data"; filename="C:\Documents and Settings\*****\ε㑦€»εβ―εγ°ε₯γεγΝ½test.html.png"
Content-Type: image/png


; ===============
; response
; ===============
HTTP/1.1 411 Length Required
Server: Apache/1.3.33 (Darwin) PHP/4.4.1 mod_ssl/2.8.24 OpenSSL/0.9.7i DAV/1.0.3
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
Comment 1 808caaa4.8ce9.9cd6c799e9f6 2007-07-17 02:07:46 PDT
Oh, HTMLFormElement::formData() converts path to current encoding.

At the time above I certainly settled DefaultEncoding as UTF8.
Now I retried with reloaded attachment.cgi while Encoding->Europian.
Path is converted as below and still uploading failed.

Content-Disposition: form-data; name="data"; filename="C:\Documents and Settings\********\デスクトップ\test.txt"

Comment 2 Alexey Proskuryakov 2007-07-17 04:16:48 PDT
I might well be wrong, but I think that a full path is not really needed (and is sometimes harmful) in Content-Disposition, so just a file name can be sent.

Would you be willing to submit such a patch for WebCore, as described in <http://webkit.org/coding/contributing.html>?
Comment 3 David Kilzer (:ddkilzer) 2007-07-17 08:23:03 PDT
What does Firefox or MSIE (or Opera) when uploading this file?  Do they encode the path differently?

Comment 4 David Kilzer (:ddkilzer) 2007-07-17 08:24:01 PDT
(In reply to comment #2)
> I might well be wrong, but I think that a full path is not really needed (and
> is sometimes harmful) in Content-Disposition, so just a file name can be sent.

But if the actual file contained UTF8 (non-ASCII) characters, this doesn't really fix the issue.

Comment 5 David Kilzer (:ddkilzer) 2007-07-17 08:24:35 PDT
<rdar://problem/5340188>
Comment 6 Alexey Proskuryakov 2007-07-17 12:44:56 PDT
(In reply to comment #4)
> But if the actual file contained UTF8 (non-ASCII) characters, this doesn't
> really fix the issue.

That's true, but we don't have to fix everything at once - there isn't any 100% correct solution for non-ASCII characters in HTTP headers AFAIK, so it will take a lot of tweaking. 

Also, most experienced users know about the perils of non-ASCII file names, while it's really hard to guess that a problem with uploading is caused by a directory name somewhere in the path.
Comment 7 David Kilzer (:ddkilzer) 2007-07-18 08:34:49 PDT
(In reply to comment #6)
> (In reply to comment #4)
> > But if the actual file contained UTF8 (non-ASCII) characters, this doesn't
> > really fix the issue.
> 
> That's true, but we don't have to fix everything at once - there isn't any 100%
> correct solution for non-ASCII characters in HTTP headers AFAIK, so it will
> take a lot of tweaking. 
> 
> Also, most experienced users know about the perils of non-ASCII file names,
> while it's really hard to guess that a problem with uploading is caused by a
> directory name somewhere in the path.

My main concern is that this would make Safari on Windows behave differently from every other browser with respect to uploading files.

Comment 8 David Kilzer (:ddkilzer) 2007-07-18 15:26:25 PDT
Why can't we use "encoded-word" encoding for the path as defined by RFC 2047?  It's perfect for this application--encoding non-US-ASCII characters in a US-ASCII-only field!  We could either encode the path on a per-directory basis (encode each set of characters between directory separators), or encode the whole string.  (I would recommend encoding text between each directory separator since server-side software would have to be updated to understand how to decode such fields.)

http://tools.ietf.org/html/rfc2047

Updates to RFC 2047:

http://tools.ietf.org/html/rfc2184
http://tools.ietf.org/html/rfc2231

Confirming bug based on empirical evidence.

Comment 9 Alexey Proskuryakov 2007-07-18 21:30:52 PDT
(In reply to comment #8)
> Why can't we use "encoded-word" encoding for the path as defined by RFC 2047? 

Let's track this as a separate bug, please. The fact that a full path is sent from Windows is just a porting bug AFAICT - it's not sent from the Mac.
Comment 10 808caaa4.8ce9.9cd6c799e9f6 2007-07-19 03:56:54 PDT
/* bug#14667 for the encoding of filename part discussion */
Comment 11 Alexey Proskuryakov 2007-08-08 22:21:49 PDT
The problem is in HTMLFormElement::formData():
    int start = path.reverseFind('/') + 1;

We should be using PathFindFileName() API on Windows, and possibly basename(3) where available. I'm trying to think of a good place to put a helper function...
Comment 12 Alexey Proskuryakov 2007-08-11 12:50:38 PDT
Created attachment 15933 [details]
proposed fix

Not tested, as I don't have a Windows build environment set up.

AFAICT, path operations aren't needed anywhere else in WebCore, so a slightly buggy static helper looks like an fine solution :-)
Comment 13 Adam Roben (:aroben) 2007-08-11 12:54:17 PDT
Comment on attachment 15933 [details]
proposed fix

+static String pathGetFileName(const String& path)
+{
+#if PLATFORM(WIN_OS)
+    return String(PathFindFileName(path.charactersWithNullTermination()));

This won't compile, since charactersWithNullTermination is a non-const instance method.
Comment 14 Alexey Proskuryakov 2007-08-11 13:03:26 PDT
Created attachment 15934 [details]
updated patch

Oops - I wonder why it is non-const if it doesn't really modify the string.
Comment 15 Darin Adler 2007-08-13 11:39:27 PDT
Comment on attachment 15934 [details]
updated patch

+    return String(PathFindFileName(path.charactersWithNullTermination()));

Do you need the explicit String() here? Doesn't the conversion happen automatically?

Also, we often use "filename" as a single word; so I think the variables and functions could just be all lowercase.

r=me
Comment 16 Alexey Proskuryakov 2007-08-13 12:12:25 PDT
(In reply to comment #15)
> Do you need the explicit String() here? Doesn't the conversion happen
> automatically?

I wondered the same when I saw such a construct in FileChooserWin.cpp, so I preserved it for safety (I don't have a Windows build environment set up at the moment).
Comment 17 Alexey Proskuryakov 2007-08-13 22:52:38 PDT
Committed revision 25067.