WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 34653
Charset parameter set by setRequestHeader as "KOI8-R" is NOT changed to UTF-8
https://bugs.webkit.org/show_bug.cgi?id=34653
Summary
Charset parameter set by setRequestHeader as "KOI8-R" is NOT changed to UTF-8
sangeetha.sugavanam
Reported
2010-02-05 10:51:20 PST
Steps to Reproduce: 1. Launch QtLauncher. 2. Load the webpage attached to the bug report(from the zip file). 3. Click the Button to execute the test. 4. Content-Type Header received from the server is displayed in the green box above the button. Expected Result: =============== - Should display content-type and charset as "application/xml; charset=UTF-8". Actual Result: ============== - content-type and charset displayed as "application/xml; charset=KOI8-R". Additional Info: ================ 1. Test Assertion : Invoke Send with in parameter Data as Document - If Content-Type header is set using setRequestHeader() and its value is not malformed, set the charset parameter of the header, by either changing the charset parameter(if one is present) or appending one, to the encoding used to encode the document 2. The encoding used to encode the document is UTF-8 3. W3c spec link -
http://dev.w3.org/cvsweb/~checkout~/2006/webapi/XMLHttpRequest/Overview.html?rev=1.209&content-type=text/html;%20charset=iso-8859-1
4. Pls refer : Section 4.6.3 - Alogorithm 3 under "data is a Document"
Attachments
Test files
(1.55 KB, application/x-zip-compressed)
2010-02-05 11:01 PST
,
sangeetha.sugavanam
no flags
Details
fix patch
(2.28 KB, patch)
2010-03-11 11:33 PST
,
Chang Shu
ap
: review-
Details
Formatted Diff
Diff
fix patch 2
(5.41 KB, patch)
2010-03-11 14:11 PST
,
Chang Shu
ap
: review-
Details
Formatted Diff
Diff
fix patch 3
(5.45 KB, patch)
2010-03-12 07:42 PST
,
Chang Shu
ap
: commit-queue-
Details
Formatted Diff
Diff
fix patch 4
(8.59 KB, patch)
2010-03-18 12:54 PDT
,
Chang Shu
no flags
Details
Formatted Diff
Diff
fix patch 4
(8.57 KB, patch)
2010-03-18 12:58 PDT
,
Chang Shu
fishd
: review-
Details
Formatted Diff
Diff
fix patch 5
(9.80 KB, patch)
2010-04-05 15:17 PDT
,
Chang Shu
fishd
: review+
commit-queue
: commit-queue-
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
sangeetha.sugavanam
Comment 1
2010-02-05 11:01:21 PST
Created
attachment 48243
[details]
Test files
Alexey Proskuryakov
Comment 2
2010-02-05 21:52:13 PST
> - content-type and charset displayed as "application/xml; charset=KOI8-R".
The attached test doesn't mention KOI8-R anywhere. Is it the right test?
Tor Arne Vestbø
Comment 3
2010-03-10 06:30:38 PST
Please follow the QtWebKit bug reporting guidelines when reporting bugs. See
http://trac.webkit.org/wiki/QtWebKitBugs
Specifically: - The 'QtWebKit' component should only be used for bugs/features in the public QtWebKit API layer, not to signify that the bug is specific to the Qt port of WebKit
http://trac.webkit.org/wiki/QtWebKitBugs#Component
- Add the keyword 'Qt' to signal that it's a Qt-related bug
http://trac.webkit.org/wiki/QtWebKitBugs#Keywords
Chang Shu
Comment 4
2010-03-11 11:33:49 PST
Created
attachment 50520
[details]
fix patch
Chang Shu
Comment 5
2010-03-11 11:35:23 PST
(In reply to
comment #2
)
> > - content-type and charset displayed as "application/xml; charset=KOI8-R". > > The attached test doesn't mention KOI8-R anywhere. Is it the right test?
KIO8-R can be found in the attached htm file around line 54: client.setRequestHeader("Content-Type", "text/plain;charset=KOI8-R");
WebKit Review Bot
Comment 6
2010-03-11 11:36:30 PST
Attachment 50520
[details]
did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/xml/XMLHttpRequest.cpp:435: An else should appear on the same line as the preceding } [whitespace/newline] [4] Total errors found: 1 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Alexey Proskuryakov
Comment 7
2010-03-11 11:58:43 PST
Comment on
attachment 50520
[details]
fix patch
> KIO8-R can be found in the attached htm file around line 54: >client.setRequestHeader("Content-Type", "text/plain;charset=KOI8-R");
Sorry, I was probably looking at some other test. This bug has all Qt flags set on it, but the change is in cross-platform code. Please correct the bug's component and keywords. Whether we want to make this change depends on what Firefox and IE do, not just on what the spec says. Please research their behavior, and post the results in the bug. + } + else { The should be on the same line. + contentType = contentType.stripWhiteSpace(); + int start = contentType.find("charset", 0); HTTP header modification should abstracted out into a separate function. We don't have a great place to put it currently - HTTPParsers.{h,cpp} is the closest, but it's only about parsing. Maybe you should just make it a static funciton in XMLHttpRequest.cpp for now. - setRequestHeaderInternal("Content-Type", "application/xml"); + setRequestHeaderInternal("Content-Type", "text/plain;charset=UTF-8"); This is a change that is not asked for in this bug, and one that is not explained in ChangeLog. Is there a reason you are making it in the same patch? It will need to be discussed on its own. + No new tests have been created. The orginal test case, +
http://waplabdc.nokia-boston.com/browser/users/akancher/sf_XHR/Send/Send_strData_Charset_Not_UTF8.htm
, + requires ALL-HTTP echoing back from an asp server. However, cgi does not support this. It is certainly possible to access a request Content-Type from a cgi, see e.g. LayoutTests/http/tests/xmlhttprequest/print-content-type.cgi.
Alexey Proskuryakov
Comment 8
2010-03-11 12:00:30 PST
> This bug has all Qt flags set on it, but the change is in cross-platform code. > Please correct the bug's component and keywords.
I see that you did this while I was reviewing. Also changing platform now.
Chang Shu
Comment 9
2010-03-11 12:05:06 PST
> > + contentType = contentType.stripWhiteSpace(); > + int start = contentType.find("charset", 0); > > HTTP header modification should abstracted out into a separate function. We > don't have a great place to put it currently - HTTPParsers.{h,cpp} is the > closest, but it's only about parsing. Maybe you should just make it a static > funciton in XMLHttpRequest.cpp for now.
Sure. will do.
> > - setRequestHeaderInternal("Content-Type", "application/xml"); > + setRequestHeaderInternal("Content-Type", > "text/plain;charset=UTF-8"); > > This is a change that is not asked for in this bug, and one that is not > explained in ChangeLog. Is there a reason you are making it in the same patch? > It will need to be discussed on its own.
Right. I planned to take this line out but forgot to do it while submitting the patch. I understand this is addressed in another bug
https://bugs.webkit.org/show_bug.cgi?id=11049
> > + No new tests have been created. The orginal test case, > + >
http://waplabdc.nokia-boston.com/browser/users/akancher/sf_XHR/Send/Send_strData_Charset_Not_UTF8.htm
, > + requires ALL-HTTP echoing back from an asp server. However, cgi does > not support this. > > It is certainly possible to access a request Content-Type from a cgi, see e.g. > LayoutTests/http/tests/xmlhttprequest/print-content-type.cgi.
wow, I will create a new test case then. thanks!
Chang Shu
Comment 10
2010-03-11 14:11:24 PST
Created
attachment 50538
[details]
fix patch 2
Alexey Proskuryakov
Comment 11
2010-03-11 14:40:46 PST
Comment on
attachment 50538
[details]
fix patch 2 + m_requestHeaders.remove("Content-Type"); + m_requestHeaders.set("Content-Type", contentType); Is there actually necessary to remove before setting? + req.open("POST", "print-content-type.cgi", false); This can't work - the cgi script is in resources/ subdirectory. Please fix and make the test resilient against such errors. This can't go in without an answer to my question about other browsers' behavior.
Chang Shu
Comment 12
2010-03-11 19:03:32 PST
> + req.open("POST", "print-content-type.cgi", false); > > This can't work - the cgi script is in resources/ subdirectory. Please fix and > make the test resilient against such errors.
For some reason, this cgi script is under xmlhttprequest, along with a couple of others. Most other cgi scripts are under xmlhttprequest/resources indeed.
> > This can't go in without an answer to my question about other browsers' > behavior.
I tried on Firefox and IE. Firefox shows UTF-8, which is as specified, but IE shows KOI8-R.
Alexey Proskuryakov
Comment 13
2010-03-11 19:58:03 PST
> For some reason, this cgi script is under xmlhttprequest, along with a couple
Indeed it is!
Alexey Proskuryakov
Comment 14
2010-03-11 20:00:39 PST
OK. Siding with Firefox seems fine here. Does it match the behavior of this patch in all these cases? - setRequestHeader wasn't called; - setRequestHeader was called, and Content-Type without a charset was passed; - setRequestHeader was called, and Content-Type with a different charset was passed.
Chang Shu
Comment 15
2010-03-12 06:49:52 PST
Thanks for the comments! Please see my findings below:
> Does it match the behavior of this patch in all these cases? > > - setRequestHeader wasn't called;
This is not covered by this patch. The current webkit behavior is showing "application/xml". But Firefox shows "text/plain; charset=UTF-8".
> - setRequestHeader was called, and Content-Type without a charset was passed; > - setRequestHeader was called, and Content-Type with a different charset was > passed.
Yes, after my patch, webkit shows "text/plain; charset=UTF-8" in both cases and so does firefox. I will go ahead submit the new patch after removing the "remove" line. Thanks!
Chang Shu
Comment 16
2010-03-12 07:42:26 PST
Created
attachment 50598
[details]
fix patch 3
Alexey Proskuryakov
Comment 17
2010-03-12 13:42:38 PST
Comment on
attachment 50598
[details]
fix patch 3 1. We usually don't fix coding style for code that isn't touched by a change, like isValidToken here. But it's fine to keep these, I'm just commenting in general. 2. "setCharsetinContentType" should be "setCharsetInMediaType". Note the capitalization, and matching the naming of existing functions. 3. This function doesn't do what it says. Instead of setting charset, it re-creates the media type string, potentially losing lots of other parameters. #2 must be fixed before landing. But I can't decide if #3 is important enough to be fixed right away. This part regresses behavior and doesn't match the spec, but it may be harmless in practice.
Chang Shu
Comment 18
2010-03-12 13:55:40 PST
(In reply to
comment #17
)
> (From update of
attachment 50598
[details]
) > 1. We usually don't fix coding style for code that isn't touched by a change, > like isValidToken here. But it's fine to keep these, I'm just commenting in > general.
I was running check-webkit-style against the cpp file. So looks like I should run against the patch file.
> > 2. "setCharsetinContentType" should be "setCharsetInMediaType". Note the > capitalization, and matching the naming of existing functions.
Sure.
> 3. This function doesn't do what it says. Instead of setting charset, it > re-creates the media type string, potentially losing lots of other parameters.
Yeah, I assumed the media type is composed of two parts. Just wanted to simplified the code, otherwise I have to almost copy the whole extractcharsetfrommediatype over.
> > #2 must be fixed before landing. But I can't decide if #3 is important enough > to be fixed right away. This part regresses behavior and doesn't match the > spec, but it may be harmless in practice.
Is it ok I don't touch #3 in the next patch?
Alexey Proskuryakov
Comment 19
2010-03-12 14:14:22 PST
> Is it ok I don't touch #3 in the next patch?
I don't understand the question. But anyway, I'd like someone who is more familiar with real world HTTP usage (Darin Fisher?) to make this call.
Chang Shu
Comment 20
2010-03-12 14:18:44 PST
(In reply to
comment #19
)
> > Is it ok I don't touch #3 in the next patch? > > I don't understand the question. But anyway, I'd like someone who is more > familiar with real world HTTP usage (Darin Fisher?) to make this call.
I asked if I really needed to re-implement the function. I will wait for Darin Fisher' comments then. Thanks,
Darin Fisher (:fishd, Google)
Comment 21
2010-03-12 19:42:48 PST
Yeah, completely overriding the Content-Type header seems risky to me. That said, I don't know of any real world examples where someone is using custom modifiers in the Content-Type header, but there is nothing to prevent someone from doing so today. Does Firefox also set the charset on the request header? How do they handle this case?
Chang Shu
Comment 22
2010-03-15 08:14:29 PDT
(In reply to
comment #21
)
> Yeah, completely overriding the Content-Type header seems risky to me. That > said, I don't know of any real world examples where someone is using custom > modifiers in the Content-Type header, but there is nothing to prevent someone > from doing so today. > > Does Firefox also set the charset on the request header? How do they handle > this case?
Thanks for the comment, Darin. I have no problem to reimplement the setcharset function. I guess we should always favor a solid implementation over a simplified one. :) From the test result, I can see Firefox always set the charset on the request header. I also read the code:
https://hg.mozilla.org/mozilla-central/file/249428fdc4a8/content/base/src/nsXMLHttpRequest.cpp
and believe they modified the header based on certain logic. However, since I didn't debug through the code, I wasn't 100% sure if we match their implementation exactly.
Chang Shu
Comment 23
2010-03-18 12:54:53 PDT
Created
attachment 51076
[details]
fix patch 4
WebKit Review Bot
Comment 24
2010-03-18 12:55:42 PDT
Attachment 51076
[details]
did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/xml/XMLHttpRequest.cpp:136: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] WebCore/xml/XMLHttpRequest.cpp:141: One line control clauses should not use braces. [whitespace/braces] [4] Total errors found: 2 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Chang Shu
Comment 25
2010-03-18 12:58:19 PDT
Created
attachment 51079
[details]
fix patch 4
Alexey Proskuryakov
Comment 26
2010-03-18 13:00:04 PDT
Note that XHR spec explicitly says that all occurrences of charset in media type string should be replaced.
Darin Fisher (:fishd, Google)
Comment 27
2010-04-01 23:53:23 PDT
Comment on
attachment 51079
[details]
fix patch 4 Sorry for the super slow turnaround on this code review.
> Index: WebCore/platform/network/HTTPParsers.cpp
...
> +void findCharsetInMediaType(const String& mediaType, int& charsetStart, int& charsetLen) > +{ > + charsetStart = 0; > + charsetLen = 0; > + > int pos = 0; > int length = (int)mediaType.length(); > > while (pos < length) { > pos = mediaType.find("charset", pos, false); > + if (pos <= 0) { > + charsetLen = 0; > + return; > + } > > // is what we found a beginning of a word? > if (mediaType[pos-1] > ' ' && mediaType[pos-1] != ';') { > @@ -214,10 +226,12 @@ String extractCharsetFromMediaType(const > int endpos = pos; > while (pos != length && mediaType[endpos] > ' ' && mediaType[endpos] != '"' && mediaType[endpos] != '\'' && mediaType[endpos] != ';') > ++endpos; > + > + charsetStart = pos; > + charsetLen = endpos - pos; > + return; > } > + > + charsetLen = 0; > }
^^^ it looks likes it is unnecessary to assign 0 here since contentLen is initialized to 0 at the top of the function.
> Index: WebCore/xml/XMLHttpRequest.cpp > =================================================================== > --- WebCore/xml/XMLHttpRequest.cpp (revision 56125) > +++ WebCore/xml/XMLHttpRequest.cpp (working copy) > @@ -128,6 +128,31 @@ static bool isSetCookieHeader(const Atom > return equalIgnoringCase(name, "set-cookie") || equalIgnoringCase(name, "set-cookie2"); > } > > +static void setCharsetInMediaType(String& mediaType, const String& charsetValue) > +{ > + int start, len; > + findCharsetInMediaType(mediaType, start, len); > + > + if (!len) { > + // When no charset found, append new charset. > + mediaType.stripWhiteSpace(); > + if (mediaType[mediaType.length() - 1] != ';') > + mediaType.append(";"); > + mediaType.append(" charset="); > + mediaType.append(charsetValue); > + } else { > + // Found existing charset, replace with new charset. > + String newMediaType = mediaType.substring(0, start); > + newMediaType.append(charsetValue); > + > + if (start + len < (int)mediaType.length()) {
^^^ nit: please use static_cast<int>(...) instead of the C-style cast.
> + newMediaType.append("; "); > + newMediaType.append(mediaType.substring(start + len));
maybe it would be better to use the String::replace method? another approach here might be to strip out all occurrences of a charset attribute and then append the desired charset attribute.
> Index: LayoutTests/http/tests/xmlhttprequest/request-encoding2.html
...
> + function test1() > + { > + req = new XMLHttpRequest; > + req.open("POST", "print-content-type.cgi", false); > + > + req.send(""); > + > + document.getElementById("result1").firstChild.data = "Test1 (setRequestHeader was not called):"; > + if (req.responseText == "application/xml\n")
Can you explain why this case does not have an implicit charset?
Darin Fisher (:fishd, Google)
Comment 28
2010-04-01 23:54:22 PDT
Please also include tests for the case where there are other Content-Type attributes bracketing the charset attribute since the code attempts to deal with such.
Chang Shu
Comment 29
2010-04-02 08:36:46 PDT
(In reply to
comment #28
)
> Please also include tests for the case where there are other Content-Type > attributes bracketing the charset attribute since the code attempts to deal > with such.
Thanks for the review, Darin. I will submit a new patch based on your comments. For the above comment, I am not sure how to come up with a real-life case with attributes following charset. Can you give me an example, or I just use an imaginary content string, like content="text/html; charset=iso-8859-1; future-extension"?
Alexey Proskuryakov
Comment 30
2010-04-02 08:47:38 PDT
>> + if (req.responseText == "application/xml\n") >Can you explain why this case does not have an implicit charset?
Perhaps because application/xml defaults to UTF-8 anyway, and since browsers didn't send charset before, it's safer not to send one. I don't know what Firefox does now though.
Darin Fisher (:fishd, Google)
Comment 31
2010-04-02 08:57:15 PDT
(In reply to
comment #30
)
> >> + if (req.responseText == "application/xml\n") > >Can you explain why this case does not have an implicit charset? > > Perhaps because application/xml defaults to UTF-8 anyway, and since browsers > didn't send charset before, it's safer not to send one. I don't know what > Firefox does now though.
Well, I noticed that with this patch if the user sets a Content-Type header without an explicit charset, we will add a charset attribute, but if they do not set a Content-Type header at all, then we will not add a charset attribute. Is that how Firefox behaves? Doesn't it seem a bit odd to be inconsistent in these two cases? I agree that excluding UTF-8 shouldn't really matter though.
Darin Fisher (:fishd, Google)
Comment 32
2010-04-02 08:58:11 PDT
(In reply to
comment #29
) ...
> attributes following charset. Can you give me an example, or I just use an > imaginary content string, like content="text/html; charset=iso-8859-1; > future-extension"?
Yeah, making up a fake attribute was my intent. That way you can test some of the cases the code is attempting to handle.
Chang Shu
Comment 33
2010-04-05 15:17:17 PDT
Created
attachment 52579
[details]
fix patch 5 This patch implements multiple charset and extra fields handling.
Chang Shu
Comment 34
2010-04-05 15:18:47 PDT
(In reply to
comment #31
)
> (In reply to
comment #30
) > > >> + if (req.responseText == "application/xml\n") > > >Can you explain why this case does not have an implicit charset? > > > > Perhaps because application/xml defaults to UTF-8 anyway, and since browsers > > didn't send charset before, it's safer not to send one. I don't know what > > Firefox does now though. > > Well, I noticed that with this patch if the user sets a Content-Type header > without an explicit charset, we will add a charset attribute, but if they do > not set a Content-Type header at all, then we will not add a charset attribute. > Is that how Firefox behaves? Doesn't it seem a bit odd to be inconsistent in > these two cases? I agree that excluding UTF-8 shouldn't really matter though.
This can probably be addressed in this bug:
https://bugs.webkit.org/show_bug.cgi?id=11049
WebKit Commit Bot
Comment 35
2010-04-13 14:39:22 PDT
Comment on
attachment 52579
[details]
fix patch 5 Rejecting patch 52579 from commit-queue. Failed to run "['/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/svn-apply', '--reviewer', 'Darin Fisher', '--force']" exit_code: 1 Last 500 characters of output: form/network/HTTPParsers.h Hunk #1 FAILED at 48. 1 out of 1 hunk FAILED -- saving rejects to file WebCore/platform/network/HTTPParsers.h.rej patching file WebCore/xml/XMLHttpRequest.cpp Hunk #1 succeeded at 129 (offset 1 line). Hunk #2 succeeded at 456 (offset 1 line). patching file LayoutTests/ChangeLog Hunk #1 succeeded at 1 with fuzz 3. patching file LayoutTests/http/tests/xmlhttprequest/request-encoding2-expected.txt patching file LayoutTests/http/tests/xmlhttprequest/request-encoding2.html Full output:
http://webkit-commit-queue.appspot.com/results/1584466
Chang Shu
Comment 36
2010-04-13 16:08:41 PDT
The commit was rejected due to conflict on merging. I manually landed the patch after resolving conflict. Committed revision: 57544
Mark Rowe (bdash)
Comment 37
2010-04-13 17:02:54 PDT
This broke tests on the build bots, such as <
http://build.webkit.org/results/Tiger%20Intel%20Release/r57547%20(10780)/http/tests/xmlhttprequest/access-control-basic-whitelist-request-headers-pretty-diff.html
>.
Chang Shu
Comment 38
2010-04-13 20:27:22 PDT
(In reply to
comment #37
)
> This broke tests on the build bots, such as > <
http://build.webkit.org/results/Tiger%20Intel%20Release/r57547%20(10780)/http/tests/xmlhttprequest/access-control-basic-whitelist-request-headers-pretty-diff.html
>.
Hi, Mark, I see you have updated the test results for me. Thanks!
Alexey Proskuryakov
Comment 39
2010-08-31 17:36:35 PDT
This has caused
bug 40947
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug