RESOLVED LATER 113233
filenameFromHTTPContentDisposition() doesn't comply with RFC 6266
https://bugs.webkit.org/show_bug.cgi?id=113233
Summary filenameFromHTTPContentDisposition() doesn't comply with RFC 6266
Chris Dumez
Reported 2013-03-25 12:26:09 PDT
filenameFromHTTPContentDisposition() doesn't comply with RFC 6266. It has several issues: - ';' in filenames are not supported - escape characters in the filename are not handled - "filename" is compared case-sensitively but shouldn't - file path separators are not replaced Additionally "filename*" is not supported but this should probably be handled in a separate bug. Useful test cases can be found at: http://greenbytes.de/tech/tc2231/ This HTTPParser function is used by the following ports: EFL, GTK, Qt, BlackBerry.
Attachments
Patch (15.21 KB, patch)
2013-03-25 13:56 PDT, Chris Dumez
webkit-ews: commit-queue-
Patch (15.90 KB, patch)
2013-03-25 14:25 PDT, Chris Dumez
webkit-ews: commit-queue-
Patch (15.96 KB, patch)
2013-03-25 14:41 PDT, Chris Dumez
webkit-ews: commit-queue-
Patch (15.67 KB, patch)
2013-03-25 15:00 PDT, Chris Dumez
webkit-ews: commit-queue-
Patch (14.37 KB, patch)
2013-03-26 01:57 PDT, Chris Dumez
no flags
Patch (15.27 KB, patch)
2013-03-27 02:03 PDT, Chris Dumez
abarth: review-
Patch (17.77 KB, patch)
2013-03-31 07:25 PDT, Chris Dumez
darin: review-
Patch (17.55 KB, patch)
2013-04-14 02:17 PDT, Chris Dumez
no flags
Patch (16.77 KB, patch)
2013-05-13 00:57 PDT, Chris Dumez
gtk-ews: commit-queue-
Patch (17.15 KB, patch)
2013-05-13 03:35 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2013-03-25 13:56:12 PDT
Early Warning System Bot
Comment 2 2013-03-25 14:14:35 PDT
Chris Dumez
Comment 3 2013-03-25 14:25:31 PDT
Created attachment 194926 [details] Patch Attempt to fix Qt build.
Early Warning System Bot
Comment 4 2013-03-25 14:35:47 PDT
Chris Dumez
Comment 5 2013-03-25 14:41:29 PDT
Early Warning System Bot
Comment 6 2013-03-25 14:56:16 PDT
Chris Dumez
Comment 7 2013-03-25 15:00:00 PDT
Early Warning System Bot
Comment 8 2013-03-25 15:06:10 PDT
Chris Dumez
Comment 9 2013-03-25 16:12:05 PDT
Anyone from Qt port can tell me how I can get the new api test binary to link against webcore library? I added "WEBKIT += webcore" to TestWebKitAPI.pri. This added the include paths, but not the libs.
Chris Dumez
Comment 10 2013-03-26 01:57:26 PDT
Created attachment 195032 [details] Patch No longer add the test for Qt port as the needed WebCore symbols are not exported and I don't know how to.
Build Bot
Comment 11 2013-03-26 05:00:25 PDT
Andras Becsi
Comment 12 2013-03-26 09:37:10 PDT
(In reply to comment #9) > Anyone from Qt port can tell me how I can get the new api test binary to link against webcore library? > > I added "WEBKIT += webcore" to TestWebKitAPI.pri. This added the include paths, but not the libs. Linking against the intermediate static libraries is only enabled by "WEBKIT += webcore" if the app does not link against the final shared library, and would increase the binary size significantly since stripping is disabled in the implicit static linking configuration. Maybe using WEBCORE_TESTING to mark the function to be exported could fix the linking issue?
Chris Dumez
Comment 13 2013-03-26 10:16:23 PDT
(In reply to comment #12) > (In reply to comment #9) > > Anyone from Qt port can tell me how I can get the new api test binary to link against webcore library? > > > > I added "WEBKIT += webcore" to TestWebKitAPI.pri. This added the include paths, but not the libs. > > Linking against the intermediate static libraries is only enabled by "WEBKIT += webcore" if the app does not link against the final shared library, and would increase the binary size significantly since stripping is disabled in the implicit static linking configuration. > > Maybe using WEBCORE_TESTING to mark the function to be exported could fix the linking issue? Thanks Andras, it seems like a good idea. Since this is a bit more work. I would propose to enable the WebCore testing for Qt port in a separate patch, if that's OK.
Chris Dumez
Comment 14 2013-03-27 02:03:15 PDT
Adam Barth
Comment 15 2013-03-30 11:22:49 PDT
Comment on attachment 195249 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=195249&action=review We shouldn't use regular expressions when parsing HTTP headers. > Source/WebCore/platform/network/HTTPParsers.cpp:293 > + RegularExpression regexp(";\\s*filename\\s*=\\s*", TextCaseInsensitive); > + int matchOffset = regexp.match(value); This isn't correct. For example, it doesn't work on: Content-Disposition: attachment; foo="; filename="; filename=bar.txt
Chris Dumez
Comment 16 2013-03-31 07:25:53 PDT
Created attachment 195895 [details] Patch Take Adam's feedback into consideration. It is more robust now and I'm no longer using regular expressions.
Chris Dumez
Comment 17 2013-04-08 02:00:48 PDT
ping review?
Darin Adler
Comment 18 2013-04-08 10:05:03 PDT
Comment on attachment 195895 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=195895&action=review Looks generally quite good. review- because of the unnecessarily inefficient loop to change slashes into underscores. I’d like to see at least one test case with trailing backslash without a following character. > Source/WebCore/platform/network/HTTPParsers.cpp:236 > + return (c == '/' || c == '\\'); Extra parentheses not needed here. > Source/WebCore/platform/network/HTTPParsers.cpp:246 > + // Replace path separators with '_'. > + size_t index = 0; > + while ((index = unsafeFileName.find(isPathSeparator, index)) != notFound) { > + unsafeFileName.replace(index, 1, "_"); > + ++index; > + } This is an unnecessarily inefficient way to replace path separators with "_" characters. For every path separator in the string, it creates a constant string with the "_" character in it to pass to the replace function, and then goes on to make a new copy of the entire string each time. That can have pathologically slow behavior. This simpler code is much more efficient: unsafeFileName = unsafeFileName.replace('/', '_').replace('\\', '_'); This still makes two copies of the string if it contains both forward and reverse slashes, but that’s OK; it won’t ever make more than two copies. > Source/WebCore/platform/network/HTTPParsers.cpp:255 > + if (index > 0) > + unsafeFileName = unsafeFileName.substring(index); The if statement here is not needed. The substring function already optimizes this case, so we can simply call substring unconditionally. > Source/WebCore/platform/network/HTTPParsers.cpp:259 > +static String parseQuotedString(const String& str, unsigned pos) In the WebKit project we try to avoid abbreviations like “str“ and “pos”. I know the existing functions in this file use them, but it’s not the preferred style. I think the name quotedString and position would be good alternatives. > Source/WebCore/platform/network/HTTPParsers.cpp:264 > + bool wasEscaped = false; There’s a cleaner way of writing this without a boolean. > Source/WebCore/platform/network/HTTPParsers.cpp:265 > + while (pos < length) { I think this would read much better as a for loop rather than a while. > Source/WebCore/platform/network/HTTPParsers.cpp:276 > + } else > + wasEscaped = true; Here you can just write the following code: else { if (position <= length) parsedString.append(quotedString[++position]); } Then you don’t need the wasEscaped boolean at all. But also, this function strips a trailing backslash if it’s the last character of the string. Is that correct behavior? It also will return the string even if there is no double quote mark in it. Is that also correct behavior? Is that covered by test cases below? > Source/WebCore/platform/network/HTTPParsers.cpp:284 > +// Note that we deliberately don't comply RFC 6266 here to match the Typo: "comply with RFC 6266". > Source/WebCore/platform/network/HTTPParsers.cpp:286 > +// but we are way more tolerant. Our layout tests even rely on this. Instead of “Our layout tests even rely on this” please say “WebKit regression tests cover this”. > Source/WebCore/platform/network/HTTPParsers.cpp:287 > +static String parseUnquotedFileName(const String& str, unsigned pos) Again, please don’t use the abbreviations str and pos here. > Source/WebCore/platform/network/HTTPParsers.cpp:297 > + while (endIndex < length) { > + UChar c = str[endIndex]; > + if (c == ';') > + break; > + ++endIndex; > + } This is a copy of the String::find function. Please use that instead. > Source/WebCore/platform/network/HTTPParsers.cpp:303 > +static bool skipToNextSeparator(const String& value, UChar separator, unsigned& pos) Please use a word, position, instead of an abbreviation, pos. > Source/WebCore/platform/network/HTTPParsers.cpp:307 > + bool insideQuotes = false; > + bool escaped = false; This function can be written better without the escaped boolean. The escape logic can simply consume an extra character like the one above. > Source/WebCore/platform/network/HTTPParsers.cpp:308 > + while (pos < length) { This would read better as a for loop; it can be one without an initializer. > Source/WebCore/platform/network/HTTPParsers.cpp:312 > + if (c == '\\') { > + escaped = !escaped; > + } else if (c == '"') { In WebKit coding style we don’t use braces around a single line if body like this. > Source/WebCore/platform/network/HTTPParsers.cpp:367 > + return fileName.isEmpty() ? String() : fileName; Why is it important to return the null string instead of an empty string? That seems a bit peculiar.
Chris Dumez
Comment 19 2013-04-14 02:17:53 PDT
Created attachment 197979 [details] Patch Thanks for reviewing Darin. I updated the patch according to your feedback.
Chris Dumez
Comment 20 2013-04-21 03:44:56 PDT
ping review?
Chris Dumez
Comment 21 2013-05-12 11:44:38 PDT
Darin, would you mind taking another look please? I updated the patch as per your feedback a while ago.
Darin Adler
Comment 22 2013-05-12 22:03:44 PDT
Comment on attachment 197979 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=197979&action=review I am OK with this patch as is, but I still have many suggestions for improvement. > Source/WebCore/ChangeLog:19 > + (WebCore): Please remove bogus lines like this from the change log. > Source/WebCore/platform/network/HTTPParsers.cpp:234 > +static void replaceUnsafeCharactersInFileName(String& unsafeFileName) We should figure out whether it’s “filename” or “file name”. If it’s “filename”, then we should not capitalize the “N”. Elsewhere in the file we use that unsafeFilename, so lets do that consistently. > Source/WebCore/platform/network/HTTPParsers.cpp:243 > + unsigned length = unsafeFileName.length(); > + unsigned index = 0; > + while (index < length && unsafeFileName[index] == '.') > + ++index; While we prefer words over single letters, the word “index” for a loop index is an exceptoin. Since String does a range check on indices passed in, this can be written more simply like this: unsigned i; for (i = 0; unsafeFileName[i] == '.'; ++i) ; Or as a while loop if you prefer, but without the length variable and length check. > Source/WebCore/platform/network/HTTPParsers.cpp:258 > + if (c == '\\' && position < (length - 1)) { The parentheses around length - 1 are not idiomatic for our coding style. > Source/WebCore/platform/network/HTTPParsers.cpp:275 > + return unquotedFilename.substring(position, endIndex == notFound ? -1 : endIndex - position); The use of -1 here is wrong since the argument is unsigned; while it works it’s not suggested style. In fact, the value for notFound was chosen to make expressions like this work without special cases. This can be written like this: unquotedFilename.substring(position, unquotedFilename.find(';', position) - position); Or you can use the local variable endIndex, but no special case is needed. > Source/WebCore/platform/network/HTTPParsers.cpp:285 > + if (c == '\\' && position < (length -1)) { We write this as "length - 1" not "(length -1)", please leave off the parentheses. > Source/WebCore/platform/network/HTTPParsers.cpp:290 > + else if ((c == separator) && !insideQuotes) We don’t parenthesize expressions like "(c == separator)" in contexts like this. > Source/WebCore/platform/network/HTTPParsers.cpp:291 > + break; // Found separator. Comment doesn’t seem helpful since previous line says "c == separator". > Source/WebCore/platform/network/HTTPParsers.cpp:297 > +// Returns true if Filename start was found and there is still more. The word “Filename” should not be capitalized here. > Source/WebCore/platform/network/HTTPParsers.cpp:298 > +static bool skipToFileNameStart(const String& value, unsigned& position) The word “FileName” should be “Filename” here. > Source/WebCore/platform/network/HTTPParsers.cpp:321 > + unsigned pos = 0; How about position instead of “pos”? Or maybe filenameStartPosition? > Source/WebCore/platform/network/HTTPParsers.cpp:325 > + String fileName; Lets use “filename” instead of “fileName”, unless you want to argue we should go the other way. > Tools/TestWebKitAPI/Tests/WebCore/HTTPParsers.cpp:43 > + EXPECT_EQ(String("foo.html"), filenameFromHTTPContentDisposition("inline; filename=\"foo.html\"")); I think it would be nice to expose this on the internals object so we can test it from JavaScript. Doesn’t seem necessary to test it with a C++ unit test.
Alexey Proskuryakov
Comment 23 2013-05-12 22:21:29 PDT
How is this function used? I suspect that it may be testable with regular tests like those in http/tests/download.
Chris Dumez
Comment 24 2013-05-13 00:45:56 PDT
(In reply to comment #23) > How is this function used? I suspect that it may be testable with regular tests like those in http/tests/download. Yes, in theory this is partly covered by tests under http/tests/download and those tests could be extended. The reason I chose to write an API test is that filenameFromHTTPContentDisposition() is used by EFL, GTK and Qt ports only. However, tests under http/tests/download are skipped for all these ports at the moment. Also note the following entry in WK2 generic TestExpectations: # https://bugs.webkit.org/show_bug.cgi?id=115203 # Downloaded file name tests fail because of incorrect custom policy delegate implementation http/tests/download/default-encoding.html http/tests/download/inherited-encoding-form-submission-result.html http/tests/download/inherited-encoding.html
Chris Dumez
Comment 25 2013-05-13 00:57:56 PDT
Created attachment 201538 [details] Patch Is it OK to land with API testing?
kov's GTK+ EWS bot
Comment 26 2013-05-13 01:03:20 PDT
Chris Dumez
Comment 27 2013-05-13 03:35:45 PDT
Created attachment 201543 [details] Patch Should fix GTK build.
Alexey Proskuryakov
Comment 28 2013-05-13 08:44:49 PDT
> Yes, in theory this is partly covered by tests under http/tests/download and those tests could be extended. The reason I chose to write an API test is that filenameFromHTTPContentDisposition() is used by EFL, GTK and Qt ports only. However, tests under http/tests/download are skipped for all these ports at the moment. The issue with these particular API tests is that they don't test behavior for other ports. You are changing this function to match an RFC, which is a good starting point, but consistency across WebKit ports is also something to consider. It's more difficult to achieve consistency if we are not even running the same tests.
Csaba Osztrogonác
Comment 29 2014-02-13 04:13:43 PST
Comment on attachment 197979 [details] Patch Cleared Darin Adler's review+ from obsolete attachment 197979 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Note You need to log in before you can comment on or make changes to this bug.