WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
91817
Shouldn't normalise file names on submission
https://bugs.webkit.org/show_bug.cgi?id=91817
Summary
Shouldn't normalise file names on submission
Ian 'Hixie' Hickson
Reported
2012-07-19 21:30:59 PDT
The names of uploaded files are normalised to NFC. This means that if you upload two files that have names that are different on the filesystem, it's possible that the server will, even if it faithfully round-trips the filenames, return them with the same filename. Quoting NARUSE, Yui from W3C bug number 14526 comment number 18 (
https://www.w3.org/Bugs/Public/show_bug.cgi?id=14526#c18
): >
> Imagine following situation, a directory has two file, U+795E.txt and > U+FA19.txt. > And the user want to upload them. As you can notice, DOM and uploaded server > can't distinguish them. Normalization considered harmful. > [...] > Yes, current WebKit normalizes those Kanjis, and it is considered breakage. > You can see the breakage by uploading U+FA19.txt. > After uploading, it become U+795E.txt and you can find the left part of the > Kanji is changed. > These kanjis have the same meaning "god", and specified as compatibility > character thorough some political reason, but people don't want to > normalize them other than the true normalization situation.
Attachments
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2012-07-20 09:40:54 PDT
I think that what we're doing is right. If we didn't normalize file names, we'd send decomposed form to servers from Mac, while every Windows browser always sends precomposed form. It is very likely that sites would have trouble with that (either break on any decomposed Unicode because they were only tested with Windows clients, or get confused when a file is touched by multiple platforms). Indeed, imagine a file that's uploaded from Windows, then edited on Mac and uploaded again. Chances are that the server would show two copies if the name were in a different form when re-uploaded. This is a much more practical situation than the one presented in bug description.
Alexey Proskuryakov
Comment 2
2013-07-31 09:45:09 PDT
If the Unicode spec disagrees with what people want, this is something to bring up with the Unicode committee. It makes no sense for implementations to preserve normalization forms.
Julian Reschke
Comment 3
2013-10-04 09:30:44 PDT
(In reply to
comment #1
)
> I think that what we're doing is right. If we didn't normalize file names, we'd send decomposed form to servers from Mac, while every Windows browser always sends precomposed form.
Not entirely true; Firefox (and probably IE too) send whatever the FS layer gaev them, and that *can* be decomposed as well (yes, tested).
Alexey Proskuryakov
Comment 4
2013-10-04 09:56:12 PDT
OK. I don't think that this factoid changes anything though - manually adjusted file names at FS level is not a common scenario, so the fact that Windows browsers don't normalize these is not a practical consideration. It's still true that this never happens, with the exception of your testing.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug