Bug 34653

Summary: Charset parameter set by setRequestHeader as "KOI8-R" is NOT changed to UTF-8
Product: WebKit Reporter: sangeetha.sugavanam
Component: XMLAssignee: Chang Shu <cshu>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, commit-queue, cshu, fishd, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Test files
none
fix patch
ap: review-
fix patch 2
ap: review-
fix patch 3
ap: commit-queue-
fix patch 4
none
fix patch 4
fishd: review-
fix patch 5 fishd: review+, commit-queue: commit-queue-

sangeetha.sugavanam
Reported 2010-02-05 10:51:20 PST
Steps to Reproduce: 1. Launch QtLauncher. 2. Load the webpage attached to the bug report(from the zip file). 3. Click the Button to execute the test. 4. Content-Type Header received from the server is displayed in the green box above the button. Expected Result: =============== - Should display content-type and charset as "application/xml; charset=UTF-8". Actual Result: ============== - content-type and charset displayed as "application/xml; charset=KOI8-R". Additional Info: ================ 1. Test Assertion : Invoke Send with in parameter Data as Document - If Content-Type header is set using setRequestHeader() and its value is not malformed, set the charset parameter of the header, by either changing the charset parameter(if one is present) or appending one, to the encoding used to encode the document 2. The encoding used to encode the document is UTF-8 3. W3c spec link - http://dev.w3.org/cvsweb/~checkout~/2006/webapi/XMLHttpRequest/Overview.html?rev=1.209&content-type=text/html;%20charset=iso-8859-1 4. Pls refer : Section 4.6.3 - Alogorithm 3 under "data is a Document"
Attachments
Test files (1.55 KB, application/x-zip-compressed)
2010-02-05 11:01 PST, sangeetha.sugavanam
no flags
fix patch (2.28 KB, patch)
2010-03-11 11:33 PST, Chang Shu
ap: review-
fix patch 2 (5.41 KB, patch)
2010-03-11 14:11 PST, Chang Shu
ap: review-
fix patch 3 (5.45 KB, patch)
2010-03-12 07:42 PST, Chang Shu
ap: commit-queue-
fix patch 4 (8.59 KB, patch)
2010-03-18 12:54 PDT, Chang Shu
no flags
fix patch 4 (8.57 KB, patch)
2010-03-18 12:58 PDT, Chang Shu
fishd: review-
fix patch 5 (9.80 KB, patch)
2010-04-05 15:17 PDT, Chang Shu
fishd: review+
commit-queue: commit-queue-
sangeetha.sugavanam
Comment 1 2010-02-05 11:01:21 PST
Created attachment 48243 [details] Test files
Alexey Proskuryakov
Comment 2 2010-02-05 21:52:13 PST
> - content-type and charset displayed as "application/xml; charset=KOI8-R". The attached test doesn't mention KOI8-R anywhere. Is it the right test?
Tor Arne Vestbø
Comment 3 2010-03-10 06:30:38 PST
Please follow the QtWebKit bug reporting guidelines when reporting bugs. See http://trac.webkit.org/wiki/QtWebKitBugs Specifically: - The 'QtWebKit' component should only be used for bugs/features in the public QtWebKit API layer, not to signify that the bug is specific to the Qt port of WebKit http://trac.webkit.org/wiki/QtWebKitBugs#Component - Add the keyword 'Qt' to signal that it's a Qt-related bug http://trac.webkit.org/wiki/QtWebKitBugs#Keywords
Chang Shu
Comment 4 2010-03-11 11:33:49 PST
Created attachment 50520 [details] fix patch
Chang Shu
Comment 5 2010-03-11 11:35:23 PST
(In reply to comment #2) > > - content-type and charset displayed as "application/xml; charset=KOI8-R". > > The attached test doesn't mention KOI8-R anywhere. Is it the right test? KIO8-R can be found in the attached htm file around line 54: client.setRequestHeader("Content-Type", "text/plain;charset=KOI8-R");
WebKit Review Bot
Comment 6 2010-03-11 11:36:30 PST
Attachment 50520 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/xml/XMLHttpRequest.cpp:435: An else should appear on the same line as the preceding } [whitespace/newline] [4] Total errors found: 1 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Alexey Proskuryakov
Comment 7 2010-03-11 11:58:43 PST
Comment on attachment 50520 [details] fix patch > KIO8-R can be found in the attached htm file around line 54: >client.setRequestHeader("Content-Type", "text/plain;charset=KOI8-R"); Sorry, I was probably looking at some other test. This bug has all Qt flags set on it, but the change is in cross-platform code. Please correct the bug's component and keywords. Whether we want to make this change depends on what Firefox and IE do, not just on what the spec says. Please research their behavior, and post the results in the bug. + } + else { The should be on the same line. + contentType = contentType.stripWhiteSpace(); + int start = contentType.find("charset", 0); HTTP header modification should abstracted out into a separate function. We don't have a great place to put it currently - HTTPParsers.{h,cpp} is the closest, but it's only about parsing. Maybe you should just make it a static funciton in XMLHttpRequest.cpp for now. - setRequestHeaderInternal("Content-Type", "application/xml"); + setRequestHeaderInternal("Content-Type", "text/plain;charset=UTF-8"); This is a change that is not asked for in this bug, and one that is not explained in ChangeLog. Is there a reason you are making it in the same patch? It will need to be discussed on its own. + No new tests have been created. The orginal test case, + http://waplabdc.nokia-boston.com/browser/users/akancher/sf_XHR/Send/Send_strData_Charset_Not_UTF8.htm, + requires ALL-HTTP echoing back from an asp server. However, cgi does not support this. It is certainly possible to access a request Content-Type from a cgi, see e.g. LayoutTests/http/tests/xmlhttprequest/print-content-type.cgi.
Alexey Proskuryakov
Comment 8 2010-03-11 12:00:30 PST
> This bug has all Qt flags set on it, but the change is in cross-platform code. > Please correct the bug's component and keywords. I see that you did this while I was reviewing. Also changing platform now.
Chang Shu
Comment 9 2010-03-11 12:05:06 PST
> > + contentType = contentType.stripWhiteSpace(); > + int start = contentType.find("charset", 0); > > HTTP header modification should abstracted out into a separate function. We > don't have a great place to put it currently - HTTPParsers.{h,cpp} is the > closest, but it's only about parsing. Maybe you should just make it a static > funciton in XMLHttpRequest.cpp for now. Sure. will do. > > - setRequestHeaderInternal("Content-Type", "application/xml"); > + setRequestHeaderInternal("Content-Type", > "text/plain;charset=UTF-8"); > > This is a change that is not asked for in this bug, and one that is not > explained in ChangeLog. Is there a reason you are making it in the same patch? > It will need to be discussed on its own. Right. I planned to take this line out but forgot to do it while submitting the patch. I understand this is addressed in another bug https://bugs.webkit.org/show_bug.cgi?id=11049 > > + No new tests have been created. The orginal test case, > + > http://waplabdc.nokia-boston.com/browser/users/akancher/sf_XHR/Send/Send_strData_Charset_Not_UTF8.htm, > + requires ALL-HTTP echoing back from an asp server. However, cgi does > not support this. > > It is certainly possible to access a request Content-Type from a cgi, see e.g. > LayoutTests/http/tests/xmlhttprequest/print-content-type.cgi. wow, I will create a new test case then. thanks!
Chang Shu
Comment 10 2010-03-11 14:11:24 PST
Created attachment 50538 [details] fix patch 2
Alexey Proskuryakov
Comment 11 2010-03-11 14:40:46 PST
Comment on attachment 50538 [details] fix patch 2 + m_requestHeaders.remove("Content-Type"); + m_requestHeaders.set("Content-Type", contentType); Is there actually necessary to remove before setting? + req.open("POST", "print-content-type.cgi", false); This can't work - the cgi script is in resources/ subdirectory. Please fix and make the test resilient against such errors. This can't go in without an answer to my question about other browsers' behavior.
Chang Shu
Comment 12 2010-03-11 19:03:32 PST
> + req.open("POST", "print-content-type.cgi", false); > > This can't work - the cgi script is in resources/ subdirectory. Please fix and > make the test resilient against such errors. For some reason, this cgi script is under xmlhttprequest, along with a couple of others. Most other cgi scripts are under xmlhttprequest/resources indeed. > > This can't go in without an answer to my question about other browsers' > behavior. I tried on Firefox and IE. Firefox shows UTF-8, which is as specified, but IE shows KOI8-R.
Alexey Proskuryakov
Comment 13 2010-03-11 19:58:03 PST
> For some reason, this cgi script is under xmlhttprequest, along with a couple Indeed it is!
Alexey Proskuryakov
Comment 14 2010-03-11 20:00:39 PST
OK. Siding with Firefox seems fine here. Does it match the behavior of this patch in all these cases? - setRequestHeader wasn't called; - setRequestHeader was called, and Content-Type without a charset was passed; - setRequestHeader was called, and Content-Type with a different charset was passed.
Chang Shu
Comment 15 2010-03-12 06:49:52 PST
Thanks for the comments! Please see my findings below: > Does it match the behavior of this patch in all these cases? > > - setRequestHeader wasn't called; This is not covered by this patch. The current webkit behavior is showing "application/xml". But Firefox shows "text/plain; charset=UTF-8". > - setRequestHeader was called, and Content-Type without a charset was passed; > - setRequestHeader was called, and Content-Type with a different charset was > passed. Yes, after my patch, webkit shows "text/plain; charset=UTF-8" in both cases and so does firefox. I will go ahead submit the new patch after removing the "remove" line. Thanks!
Chang Shu
Comment 16 2010-03-12 07:42:26 PST
Created attachment 50598 [details] fix patch 3
Alexey Proskuryakov
Comment 17 2010-03-12 13:42:38 PST
Comment on attachment 50598 [details] fix patch 3 1. We usually don't fix coding style for code that isn't touched by a change, like isValidToken here. But it's fine to keep these, I'm just commenting in general. 2. "setCharsetinContentType" should be "setCharsetInMediaType". Note the capitalization, and matching the naming of existing functions. 3. This function doesn't do what it says. Instead of setting charset, it re-creates the media type string, potentially losing lots of other parameters. #2 must be fixed before landing. But I can't decide if #3 is important enough to be fixed right away. This part regresses behavior and doesn't match the spec, but it may be harmless in practice.
Chang Shu
Comment 18 2010-03-12 13:55:40 PST
(In reply to comment #17) > (From update of attachment 50598 [details]) > 1. We usually don't fix coding style for code that isn't touched by a change, > like isValidToken here. But it's fine to keep these, I'm just commenting in > general. I was running check-webkit-style against the cpp file. So looks like I should run against the patch file. > > 2. "setCharsetinContentType" should be "setCharsetInMediaType". Note the > capitalization, and matching the naming of existing functions. Sure. > 3. This function doesn't do what it says. Instead of setting charset, it > re-creates the media type string, potentially losing lots of other parameters. Yeah, I assumed the media type is composed of two parts. Just wanted to simplified the code, otherwise I have to almost copy the whole extractcharsetfrommediatype over. > > #2 must be fixed before landing. But I can't decide if #3 is important enough > to be fixed right away. This part regresses behavior and doesn't match the > spec, but it may be harmless in practice. Is it ok I don't touch #3 in the next patch?
Alexey Proskuryakov
Comment 19 2010-03-12 14:14:22 PST
> Is it ok I don't touch #3 in the next patch? I don't understand the question. But anyway, I'd like someone who is more familiar with real world HTTP usage (Darin Fisher?) to make this call.
Chang Shu
Comment 20 2010-03-12 14:18:44 PST
(In reply to comment #19) > > Is it ok I don't touch #3 in the next patch? > > I don't understand the question. But anyway, I'd like someone who is more > familiar with real world HTTP usage (Darin Fisher?) to make this call. I asked if I really needed to re-implement the function. I will wait for Darin Fisher' comments then. Thanks,
Darin Fisher (:fishd, Google)
Comment 21 2010-03-12 19:42:48 PST
Yeah, completely overriding the Content-Type header seems risky to me. That said, I don't know of any real world examples where someone is using custom modifiers in the Content-Type header, but there is nothing to prevent someone from doing so today. Does Firefox also set the charset on the request header? How do they handle this case?
Chang Shu
Comment 22 2010-03-15 08:14:29 PDT
(In reply to comment #21) > Yeah, completely overriding the Content-Type header seems risky to me. That > said, I don't know of any real world examples where someone is using custom > modifiers in the Content-Type header, but there is nothing to prevent someone > from doing so today. > > Does Firefox also set the charset on the request header? How do they handle > this case? Thanks for the comment, Darin. I have no problem to reimplement the setcharset function. I guess we should always favor a solid implementation over a simplified one. :) From the test result, I can see Firefox always set the charset on the request header. I also read the code: https://hg.mozilla.org/mozilla-central/file/249428fdc4a8/content/base/src/nsXMLHttpRequest.cpp and believe they modified the header based on certain logic. However, since I didn't debug through the code, I wasn't 100% sure if we match their implementation exactly.
Chang Shu
Comment 23 2010-03-18 12:54:53 PDT
Created attachment 51076 [details] fix patch 4
WebKit Review Bot
Comment 24 2010-03-18 12:55:42 PDT
Attachment 51076 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/xml/XMLHttpRequest.cpp:136: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] WebCore/xml/XMLHttpRequest.cpp:141: One line control clauses should not use braces. [whitespace/braces] [4] Total errors found: 2 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Chang Shu
Comment 25 2010-03-18 12:58:19 PDT
Created attachment 51079 [details] fix patch 4
Alexey Proskuryakov
Comment 26 2010-03-18 13:00:04 PDT
Note that XHR spec explicitly says that all occurrences of charset in media type string should be replaced.
Darin Fisher (:fishd, Google)
Comment 27 2010-04-01 23:53:23 PDT
Comment on attachment 51079 [details] fix patch 4 Sorry for the super slow turnaround on this code review. > Index: WebCore/platform/network/HTTPParsers.cpp ... > +void findCharsetInMediaType(const String& mediaType, int& charsetStart, int& charsetLen) > +{ > + charsetStart = 0; > + charsetLen = 0; > + > int pos = 0; > int length = (int)mediaType.length(); > > while (pos < length) { > pos = mediaType.find("charset", pos, false); > + if (pos <= 0) { > + charsetLen = 0; > + return; > + } > > // is what we found a beginning of a word? > if (mediaType[pos-1] > ' ' && mediaType[pos-1] != ';') { > @@ -214,10 +226,12 @@ String extractCharsetFromMediaType(const > int endpos = pos; > while (pos != length && mediaType[endpos] > ' ' && mediaType[endpos] != '"' && mediaType[endpos] != '\'' && mediaType[endpos] != ';') > ++endpos; > + > + charsetStart = pos; > + charsetLen = endpos - pos; > + return; > } > + > + charsetLen = 0; > } ^^^ it looks likes it is unnecessary to assign 0 here since contentLen is initialized to 0 at the top of the function. > Index: WebCore/xml/XMLHttpRequest.cpp > =================================================================== > --- WebCore/xml/XMLHttpRequest.cpp (revision 56125) > +++ WebCore/xml/XMLHttpRequest.cpp (working copy) > @@ -128,6 +128,31 @@ static bool isSetCookieHeader(const Atom > return equalIgnoringCase(name, "set-cookie") || equalIgnoringCase(name, "set-cookie2"); > } > > +static void setCharsetInMediaType(String& mediaType, const String& charsetValue) > +{ > + int start, len; > + findCharsetInMediaType(mediaType, start, len); > + > + if (!len) { > + // When no charset found, append new charset. > + mediaType.stripWhiteSpace(); > + if (mediaType[mediaType.length() - 1] != ';') > + mediaType.append(";"); > + mediaType.append(" charset="); > + mediaType.append(charsetValue); > + } else { > + // Found existing charset, replace with new charset. > + String newMediaType = mediaType.substring(0, start); > + newMediaType.append(charsetValue); > + > + if (start + len < (int)mediaType.length()) { ^^^ nit: please use static_cast<int>(...) instead of the C-style cast. > + newMediaType.append("; "); > + newMediaType.append(mediaType.substring(start + len)); maybe it would be better to use the String::replace method? another approach here might be to strip out all occurrences of a charset attribute and then append the desired charset attribute. > Index: LayoutTests/http/tests/xmlhttprequest/request-encoding2.html ... > + function test1() > + { > + req = new XMLHttpRequest; > + req.open("POST", "print-content-type.cgi", false); > + > + req.send(""); > + > + document.getElementById("result1").firstChild.data = "Test1 (setRequestHeader was not called):"; > + if (req.responseText == "application/xml\n") Can you explain why this case does not have an implicit charset?
Darin Fisher (:fishd, Google)
Comment 28 2010-04-01 23:54:22 PDT
Please also include tests for the case where there are other Content-Type attributes bracketing the charset attribute since the code attempts to deal with such.
Chang Shu
Comment 29 2010-04-02 08:36:46 PDT
(In reply to comment #28) > Please also include tests for the case where there are other Content-Type > attributes bracketing the charset attribute since the code attempts to deal > with such. Thanks for the review, Darin. I will submit a new patch based on your comments. For the above comment, I am not sure how to come up with a real-life case with attributes following charset. Can you give me an example, or I just use an imaginary content string, like content="text/html; charset=iso-8859-1; future-extension"?
Alexey Proskuryakov
Comment 30 2010-04-02 08:47:38 PDT
>> + if (req.responseText == "application/xml\n") >Can you explain why this case does not have an implicit charset? Perhaps because application/xml defaults to UTF-8 anyway, and since browsers didn't send charset before, it's safer not to send one. I don't know what Firefox does now though.
Darin Fisher (:fishd, Google)
Comment 31 2010-04-02 08:57:15 PDT
(In reply to comment #30) > >> + if (req.responseText == "application/xml\n") > >Can you explain why this case does not have an implicit charset? > > Perhaps because application/xml defaults to UTF-8 anyway, and since browsers > didn't send charset before, it's safer not to send one. I don't know what > Firefox does now though. Well, I noticed that with this patch if the user sets a Content-Type header without an explicit charset, we will add a charset attribute, but if they do not set a Content-Type header at all, then we will not add a charset attribute. Is that how Firefox behaves? Doesn't it seem a bit odd to be inconsistent in these two cases? I agree that excluding UTF-8 shouldn't really matter though.
Darin Fisher (:fishd, Google)
Comment 32 2010-04-02 08:58:11 PDT
(In reply to comment #29) ... > attributes following charset. Can you give me an example, or I just use an > imaginary content string, like content="text/html; charset=iso-8859-1; > future-extension"? Yeah, making up a fake attribute was my intent. That way you can test some of the cases the code is attempting to handle.
Chang Shu
Comment 33 2010-04-05 15:17:17 PDT
Created attachment 52579 [details] fix patch 5 This patch implements multiple charset and extra fields handling.
Chang Shu
Comment 34 2010-04-05 15:18:47 PDT
(In reply to comment #31) > (In reply to comment #30) > > >> + if (req.responseText == "application/xml\n") > > >Can you explain why this case does not have an implicit charset? > > > > Perhaps because application/xml defaults to UTF-8 anyway, and since browsers > > didn't send charset before, it's safer not to send one. I don't know what > > Firefox does now though. > > Well, I noticed that with this patch if the user sets a Content-Type header > without an explicit charset, we will add a charset attribute, but if they do > not set a Content-Type header at all, then we will not add a charset attribute. > Is that how Firefox behaves? Doesn't it seem a bit odd to be inconsistent in > these two cases? I agree that excluding UTF-8 shouldn't really matter though. This can probably be addressed in this bug: https://bugs.webkit.org/show_bug.cgi?id=11049
WebKit Commit Bot
Comment 35 2010-04-13 14:39:22 PDT
Comment on attachment 52579 [details] fix patch 5 Rejecting patch 52579 from commit-queue. Failed to run "['/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', '--reviewer', 'Darin Fisher', '--force']" exit_code: 1 Last 500 characters of output: form/network/HTTPParsers.h Hunk #1 FAILED at 48. 1 out of 1 hunk FAILED -- saving rejects to file WebCore/platform/network/HTTPParsers.h.rej patching file WebCore/xml/XMLHttpRequest.cpp Hunk #1 succeeded at 129 (offset 1 line). Hunk #2 succeeded at 456 (offset 1 line). patching file LayoutTests/ChangeLog Hunk #1 succeeded at 1 with fuzz 3. patching file LayoutTests/http/tests/xmlhttprequest/request-encoding2-expected.txt patching file LayoutTests/http/tests/xmlhttprequest/request-encoding2.html Full output: http://webkit-commit-queue.appspot.com/results/1584466
Chang Shu
Comment 36 2010-04-13 16:08:41 PDT
The commit was rejected due to conflict on merging. I manually landed the patch after resolving conflict. Committed revision: 57544
Chang Shu
Comment 38 2010-04-13 20:27:22 PDT
(In reply to comment #37) > This broke tests on the build bots, such as > <http://build.webkit.org/results/Tiger%20Intel%20Release/r57547%20(10780)/http/tests/xmlhttprequest/access-control-basic-whitelist-request-headers-pretty-diff.html>. Hi, Mark, I see you have updated the test results for me. Thanks!
Alexey Proskuryakov
Comment 39 2010-08-31 17:36:35 PDT
This has caused bug 40947.
Note You need to log in before you can comment on or make changes to this bug.