Steps to Reproduce: 1. Launch QtLauncher. 2. Load the webpage attached to the bug report(from the zip file). 3. Click the Button to execute the test. 4. Content-Type Header received from the server is displayed in the green box above the button. Expected Result: =============== - Should display content-type and charset as "application/xml; charset=UTF-8". Actual Result: ============== - content-type and charset displayed as "application/xml; charset=KOI8-R". Additional Info: ================ 1. Test Assertion : Invoke Send with in parameter Data as Document - If Content-Type header is set using setRequestHeader() and its value is not malformed, set the charset parameter of the header, by either changing the charset parameter(if one is present) or appending one, to the encoding used to encode the document 2. The encoding used to encode the document is UTF-8 3. W3c spec link - http://dev.w3.org/cvsweb/~checkout~/2006/webapi/XMLHttpRequest/Overview.html?rev=1.209&content-type=text/html;%20charset=iso-8859-1 4. Pls refer : Section 4.6.3 - Alogorithm 3 under "data is a Document"
Created attachment 48243 [details] Test files
> - content-type and charset displayed as "application/xml; charset=KOI8-R". The attached test doesn't mention KOI8-R anywhere. Is it the right test?
Please follow the QtWebKit bug reporting guidelines when reporting bugs. See http://trac.webkit.org/wiki/QtWebKitBugs Specifically: - The 'QtWebKit' component should only be used for bugs/features in the public QtWebKit API layer, not to signify that the bug is specific to the Qt port of WebKit http://trac.webkit.org/wiki/QtWebKitBugs#Component - Add the keyword 'Qt' to signal that it's a Qt-related bug http://trac.webkit.org/wiki/QtWebKitBugs#Keywords
Created attachment 50520 [details] fix patch
(In reply to comment #2) > > - content-type and charset displayed as "application/xml; charset=KOI8-R". > > The attached test doesn't mention KOI8-R anywhere. Is it the right test? KIO8-R can be found in the attached htm file around line 54: client.setRequestHeader("Content-Type", "text/plain;charset=KOI8-R");
Attachment 50520 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/xml/XMLHttpRequest.cpp:435: An else should appear on the same line as the preceding } [whitespace/newline] [4] Total errors found: 1 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 50520 [details] fix patch > KIO8-R can be found in the attached htm file around line 54: >client.setRequestHeader("Content-Type", "text/plain;charset=KOI8-R"); Sorry, I was probably looking at some other test. This bug has all Qt flags set on it, but the change is in cross-platform code. Please correct the bug's component and keywords. Whether we want to make this change depends on what Firefox and IE do, not just on what the spec says. Please research their behavior, and post the results in the bug. + } + else { The should be on the same line. + contentType = contentType.stripWhiteSpace(); + int start = contentType.find("charset", 0); HTTP header modification should abstracted out into a separate function. We don't have a great place to put it currently - HTTPParsers.{h,cpp} is the closest, but it's only about parsing. Maybe you should just make it a static funciton in XMLHttpRequest.cpp for now. - setRequestHeaderInternal("Content-Type", "application/xml"); + setRequestHeaderInternal("Content-Type", "text/plain;charset=UTF-8"); This is a change that is not asked for in this bug, and one that is not explained in ChangeLog. Is there a reason you are making it in the same patch? It will need to be discussed on its own. + No new tests have been created. The orginal test case, + http://waplabdc.nokia-boston.com/browser/users/akancher/sf_XHR/Send/Send_strData_Charset_Not_UTF8.htm, + requires ALL-HTTP echoing back from an asp server. However, cgi does not support this. It is certainly possible to access a request Content-Type from a cgi, see e.g. LayoutTests/http/tests/xmlhttprequest/print-content-type.cgi.
> This bug has all Qt flags set on it, but the change is in cross-platform code. > Please correct the bug's component and keywords. I see that you did this while I was reviewing. Also changing platform now.
> > + contentType = contentType.stripWhiteSpace(); > + int start = contentType.find("charset", 0); > > HTTP header modification should abstracted out into a separate function. We > don't have a great place to put it currently - HTTPParsers.{h,cpp} is the > closest, but it's only about parsing. Maybe you should just make it a static > funciton in XMLHttpRequest.cpp for now. Sure. will do. > > - setRequestHeaderInternal("Content-Type", "application/xml"); > + setRequestHeaderInternal("Content-Type", > "text/plain;charset=UTF-8"); > > This is a change that is not asked for in this bug, and one that is not > explained in ChangeLog. Is there a reason you are making it in the same patch? > It will need to be discussed on its own. Right. I planned to take this line out but forgot to do it while submitting the patch. I understand this is addressed in another bug https://bugs.webkit.org/show_bug.cgi?id=11049 > > + No new tests have been created. The orginal test case, > + > http://waplabdc.nokia-boston.com/browser/users/akancher/sf_XHR/Send/Send_strData_Charset_Not_UTF8.htm, > + requires ALL-HTTP echoing back from an asp server. However, cgi does > not support this. > > It is certainly possible to access a request Content-Type from a cgi, see e.g. > LayoutTests/http/tests/xmlhttprequest/print-content-type.cgi. wow, I will create a new test case then. thanks!
Created attachment 50538 [details] fix patch 2
Comment on attachment 50538 [details] fix patch 2 + m_requestHeaders.remove("Content-Type"); + m_requestHeaders.set("Content-Type", contentType); Is there actually necessary to remove before setting? + req.open("POST", "print-content-type.cgi", false); This can't work - the cgi script is in resources/ subdirectory. Please fix and make the test resilient against such errors. This can't go in without an answer to my question about other browsers' behavior.
> + req.open("POST", "print-content-type.cgi", false); > > This can't work - the cgi script is in resources/ subdirectory. Please fix and > make the test resilient against such errors. For some reason, this cgi script is under xmlhttprequest, along with a couple of others. Most other cgi scripts are under xmlhttprequest/resources indeed. > > This can't go in without an answer to my question about other browsers' > behavior. I tried on Firefox and IE. Firefox shows UTF-8, which is as specified, but IE shows KOI8-R.
> For some reason, this cgi script is under xmlhttprequest, along with a couple Indeed it is!
OK. Siding with Firefox seems fine here. Does it match the behavior of this patch in all these cases? - setRequestHeader wasn't called; - setRequestHeader was called, and Content-Type without a charset was passed; - setRequestHeader was called, and Content-Type with a different charset was passed.
Thanks for the comments! Please see my findings below: > Does it match the behavior of this patch in all these cases? > > - setRequestHeader wasn't called; This is not covered by this patch. The current webkit behavior is showing "application/xml". But Firefox shows "text/plain; charset=UTF-8". > - setRequestHeader was called, and Content-Type without a charset was passed; > - setRequestHeader was called, and Content-Type with a different charset was > passed. Yes, after my patch, webkit shows "text/plain; charset=UTF-8" in both cases and so does firefox. I will go ahead submit the new patch after removing the "remove" line. Thanks!
Created attachment 50598 [details] fix patch 3
Comment on attachment 50598 [details] fix patch 3 1. We usually don't fix coding style for code that isn't touched by a change, like isValidToken here. But it's fine to keep these, I'm just commenting in general. 2. "setCharsetinContentType" should be "setCharsetInMediaType". Note the capitalization, and matching the naming of existing functions. 3. This function doesn't do what it says. Instead of setting charset, it re-creates the media type string, potentially losing lots of other parameters. #2 must be fixed before landing. But I can't decide if #3 is important enough to be fixed right away. This part regresses behavior and doesn't match the spec, but it may be harmless in practice.
(In reply to comment #17) > (From update of attachment 50598 [details]) > 1. We usually don't fix coding style for code that isn't touched by a change, > like isValidToken here. But it's fine to keep these, I'm just commenting in > general. I was running check-webkit-style against the cpp file. So looks like I should run against the patch file. > > 2. "setCharsetinContentType" should be "setCharsetInMediaType". Note the > capitalization, and matching the naming of existing functions. Sure. > 3. This function doesn't do what it says. Instead of setting charset, it > re-creates the media type string, potentially losing lots of other parameters. Yeah, I assumed the media type is composed of two parts. Just wanted to simplified the code, otherwise I have to almost copy the whole extractcharsetfrommediatype over. > > #2 must be fixed before landing. But I can't decide if #3 is important enough > to be fixed right away. This part regresses behavior and doesn't match the > spec, but it may be harmless in practice. Is it ok I don't touch #3 in the next patch?
> Is it ok I don't touch #3 in the next patch? I don't understand the question. But anyway, I'd like someone who is more familiar with real world HTTP usage (Darin Fisher?) to make this call.
(In reply to comment #19) > > Is it ok I don't touch #3 in the next patch? > > I don't understand the question. But anyway, I'd like someone who is more > familiar with real world HTTP usage (Darin Fisher?) to make this call. I asked if I really needed to re-implement the function. I will wait for Darin Fisher' comments then. Thanks,
Yeah, completely overriding the Content-Type header seems risky to me. That said, I don't know of any real world examples where someone is using custom modifiers in the Content-Type header, but there is nothing to prevent someone from doing so today. Does Firefox also set the charset on the request header? How do they handle this case?
(In reply to comment #21) > Yeah, completely overriding the Content-Type header seems risky to me. That > said, I don't know of any real world examples where someone is using custom > modifiers in the Content-Type header, but there is nothing to prevent someone > from doing so today. > > Does Firefox also set the charset on the request header? How do they handle > this case? Thanks for the comment, Darin. I have no problem to reimplement the setcharset function. I guess we should always favor a solid implementation over a simplified one. :) From the test result, I can see Firefox always set the charset on the request header. I also read the code: https://hg.mozilla.org/mozilla-central/file/249428fdc4a8/content/base/src/nsXMLHttpRequest.cpp and believe they modified the header based on certain logic. However, since I didn't debug through the code, I wasn't 100% sure if we match their implementation exactly.
Created attachment 51076 [details] fix patch 4
Attachment 51076 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/xml/XMLHttpRequest.cpp:136: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] WebCore/xml/XMLHttpRequest.cpp:141: One line control clauses should not use braces. [whitespace/braces] [4] Total errors found: 2 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 51079 [details] fix patch 4
Note that XHR spec explicitly says that all occurrences of charset in media type string should be replaced.
Comment on attachment 51079 [details] fix patch 4 Sorry for the super slow turnaround on this code review. > Index: WebCore/platform/network/HTTPParsers.cpp ... > +void findCharsetInMediaType(const String& mediaType, int& charsetStart, int& charsetLen) > +{ > + charsetStart = 0; > + charsetLen = 0; > + > int pos = 0; > int length = (int)mediaType.length(); > > while (pos < length) { > pos = mediaType.find("charset", pos, false); > + if (pos <= 0) { > + charsetLen = 0; > + return; > + } > > // is what we found a beginning of a word? > if (mediaType[pos-1] > ' ' && mediaType[pos-1] != ';') { > @@ -214,10 +226,12 @@ String extractCharsetFromMediaType(const > int endpos = pos; > while (pos != length && mediaType[endpos] > ' ' && mediaType[endpos] != '"' && mediaType[endpos] != '\'' && mediaType[endpos] != ';') > ++endpos; > + > + charsetStart = pos; > + charsetLen = endpos - pos; > + return; > } > + > + charsetLen = 0; > } ^^^ it looks likes it is unnecessary to assign 0 here since contentLen is initialized to 0 at the top of the function. > Index: WebCore/xml/XMLHttpRequest.cpp > =================================================================== > --- WebCore/xml/XMLHttpRequest.cpp (revision 56125) > +++ WebCore/xml/XMLHttpRequest.cpp (working copy) > @@ -128,6 +128,31 @@ static bool isSetCookieHeader(const Atom > return equalIgnoringCase(name, "set-cookie") || equalIgnoringCase(name, "set-cookie2"); > } > > +static void setCharsetInMediaType(String& mediaType, const String& charsetValue) > +{ > + int start, len; > + findCharsetInMediaType(mediaType, start, len); > + > + if (!len) { > + // When no charset found, append new charset. > + mediaType.stripWhiteSpace(); > + if (mediaType[mediaType.length() - 1] != ';') > + mediaType.append(";"); > + mediaType.append(" charset="); > + mediaType.append(charsetValue); > + } else { > + // Found existing charset, replace with new charset. > + String newMediaType = mediaType.substring(0, start); > + newMediaType.append(charsetValue); > + > + if (start + len < (int)mediaType.length()) { ^^^ nit: please use static_cast<int>(...) instead of the C-style cast. > + newMediaType.append("; "); > + newMediaType.append(mediaType.substring(start + len)); maybe it would be better to use the String::replace method? another approach here might be to strip out all occurrences of a charset attribute and then append the desired charset attribute. > Index: LayoutTests/http/tests/xmlhttprequest/request-encoding2.html ... > + function test1() > + { > + req = new XMLHttpRequest; > + req.open("POST", "print-content-type.cgi", false); > + > + req.send(""); > + > + document.getElementById("result1").firstChild.data = "Test1 (setRequestHeader was not called):"; > + if (req.responseText == "application/xml\n") Can you explain why this case does not have an implicit charset?
Please also include tests for the case where there are other Content-Type attributes bracketing the charset attribute since the code attempts to deal with such.
(In reply to comment #28) > Please also include tests for the case where there are other Content-Type > attributes bracketing the charset attribute since the code attempts to deal > with such. Thanks for the review, Darin. I will submit a new patch based on your comments. For the above comment, I am not sure how to come up with a real-life case with attributes following charset. Can you give me an example, or I just use an imaginary content string, like content="text/html; charset=iso-8859-1; future-extension"?
>> + if (req.responseText == "application/xml\n") >Can you explain why this case does not have an implicit charset? Perhaps because application/xml defaults to UTF-8 anyway, and since browsers didn't send charset before, it's safer not to send one. I don't know what Firefox does now though.
(In reply to comment #30) > >> + if (req.responseText == "application/xml\n") > >Can you explain why this case does not have an implicit charset? > > Perhaps because application/xml defaults to UTF-8 anyway, and since browsers > didn't send charset before, it's safer not to send one. I don't know what > Firefox does now though. Well, I noticed that with this patch if the user sets a Content-Type header without an explicit charset, we will add a charset attribute, but if they do not set a Content-Type header at all, then we will not add a charset attribute. Is that how Firefox behaves? Doesn't it seem a bit odd to be inconsistent in these two cases? I agree that excluding UTF-8 shouldn't really matter though.
(In reply to comment #29) ... > attributes following charset. Can you give me an example, or I just use an > imaginary content string, like content="text/html; charset=iso-8859-1; > future-extension"? Yeah, making up a fake attribute was my intent. That way you can test some of the cases the code is attempting to handle.
Created attachment 52579 [details] fix patch 5 This patch implements multiple charset and extra fields handling.
(In reply to comment #31) > (In reply to comment #30) > > >> + if (req.responseText == "application/xml\n") > > >Can you explain why this case does not have an implicit charset? > > > > Perhaps because application/xml defaults to UTF-8 anyway, and since browsers > > didn't send charset before, it's safer not to send one. I don't know what > > Firefox does now though. > > Well, I noticed that with this patch if the user sets a Content-Type header > without an explicit charset, we will add a charset attribute, but if they do > not set a Content-Type header at all, then we will not add a charset attribute. > Is that how Firefox behaves? Doesn't it seem a bit odd to be inconsistent in > these two cases? I agree that excluding UTF-8 shouldn't really matter though. This can probably be addressed in this bug: https://bugs.webkit.org/show_bug.cgi?id=11049
Comment on attachment 52579 [details] fix patch 5 Rejecting patch 52579 from commit-queue. Failed to run "['/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', '--reviewer', 'Darin Fisher', '--force']" exit_code: 1 Last 500 characters of output: form/network/HTTPParsers.h Hunk #1 FAILED at 48. 1 out of 1 hunk FAILED -- saving rejects to file WebCore/platform/network/HTTPParsers.h.rej patching file WebCore/xml/XMLHttpRequest.cpp Hunk #1 succeeded at 129 (offset 1 line). Hunk #2 succeeded at 456 (offset 1 line). patching file LayoutTests/ChangeLog Hunk #1 succeeded at 1 with fuzz 3. patching file LayoutTests/http/tests/xmlhttprequest/request-encoding2-expected.txt patching file LayoutTests/http/tests/xmlhttprequest/request-encoding2.html Full output: http://webkit-commit-queue.appspot.com/results/1584466
The commit was rejected due to conflict on merging. I manually landed the patch after resolving conflict. Committed revision: 57544
This broke tests on the build bots, such as <http://build.webkit.org/results/Tiger%20Intel%20Release/r57547%20(10780)/http/tests/xmlhttprequest/access-control-basic-whitelist-request-headers-pretty-diff.html>.
(In reply to comment #37) > This broke tests on the build bots, such as > <http://build.webkit.org/results/Tiger%20Intel%20Release/r57547%20(10780)/http/tests/xmlhttprequest/access-control-basic-whitelist-request-headers-pretty-diff.html>. Hi, Mark, I see you have updated the test results for me. Thanks!
This has caused bug 40947.