Summary: | XMLHttpRequest does not apply page encoding after assigning via innerHtml | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Michael Kaye <mk> | ||||||||||||||||||||
Component: | WebKit Misc. | Assignee: | Alexey Proskuryakov <ap> | ||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||
Severity: | Major | CC: | mitz | ||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||
Version: | 420+ | ||||||||||||||||||||||
Hardware: | Mac | ||||||||||||||||||||||
OS: | OS X 10.4 | ||||||||||||||||||||||
URL: | http://idptest.bl.uk | ||||||||||||||||||||||
Attachments: |
|
Description
Michael Kaye
2005-11-14 17:37:12 PST
that should be http://idptest.bl.uk Does Firefox actually use the document encoding as a default for XMLHttpRequest encoding, or maybe it just defaults to UTF-8? Yes, it looks like Firefox defaults to UTF-8 for responses, while Safari defaults to us-ascii. I'm going to investigate this, as well as overriding encoding with overrideMimeType(), and also how req.responseXML is decoded. Test case: <http://nypop.com/~ap/webkit/xmlhttprequestenc/xmlhttprequestenc.html>. Works in Firefox; both Safari and Opera fail pretty spectacularly. Created attachment 4885 [details]
proposed patch
Fixes a bunch of issues related to applying the correct encoding to
XMLHTTPRequest responses, including a complete inability to parse a response if
overrideMimeType('text/xml; charset=<some_charset>') was used.
There are still some FIXMEs concerning request encoding; I'm going to
investigate these separately.
Comment on attachment 4885 [details]
proposed patch
One style violation:
+ if (encoding.isNull() && job) {
+ encoding = job->queryMetaData("charset");
+ }
(no {} around one line ifs)
Are you sure you want
mimeType.isNull()
instead of:
mimeType.isEmpty()
?
pos += 7;
A comment such as // strlen("charset") might be helpful here.
Curious, did you adapt getCharset from somewhere? It seems like functionality
which has to already exist somewhere else.
(In reply to comment #6) > One style violation: > (no {} around one line ifs) There's a sequence of ifs, some single-line, some not... To me, it's easier to read if they all folow a single style, even if it violates the guideline. Are you sure it's not a legitimate exception? > Are you sure you want > mimeType.isNull() > instead of: > mimeType.isEmpty() > ? No, I'm not sure... The existing code uses both, and they appear equivalent here, but I may misunderstand something. > A comment such as // strlen("charset") might be helpful here. Agreed. I'm not updating the patch just for this, but will add the comment after the more substantial concerns are resolved. > Curious, did you adapt getCharset from somewhere? It seems like functionality > which has to already exist somewhere else. Yes, this worries me, too. I started from code in Decoder::decode(), but it doesn't look like it can be easily factored out and directly reused here. Another implementation is in Foundation ([NSURLResponse textEncodingName]), but I couldn't find a good way to reuse that, as well. Created attachment 4911 [details]
proposed patch
- Updated to cleanly apply to ToT.
- Replaced isNull() with isEmpty(), for better or worse :).
- Removed an unrelated change that sneaked into the previous version.
- Addressed stylistic comments.
Comment on attachment 4911 [details]
proposed patch
I suggest you should change getResponseHeader to return a QString or DOMString,
get rid of the JSLocks and change the getResponseHeader callers that need a
JavaScript string to call jsString explicitly.
Otherwise, this patch looks great.
Created attachment 4920 [details]
updated patch
Changed getResponseHeader to:
bool getResponseHeader(const QString& name, QString& value) const;
Comment on attachment 4920 [details]
updated patch
alexey and I talked about this extensively on IRC. Mostly about ways to clean
up the getCharset() function and make it simpler. I'm going to r- this patch
for now, and he'll post a better one tomorrow.
Created attachment 4925 [details]
updated patch
Slightly streamlined getCharset().
Comment on attachment 4925 [details]
updated patch
We have discussed code reuse in getCharset() with MacDome on IRC. Similar code
is available in khtml::Decoder and as an SPI in Foundation.
However, the Foundation version is much less permissive (doesn't even seem to
handle quoted values), and the Decoder one has somewhat different logic that
makes code reuse quite problematic IMO.
So, I'm proposing this version - while a longer-term solution may be to fix and
leverage that Foundation SPI. And/or to split out the Decoder logic to be
reusable.
Comment on attachment 4925 [details]
updated patch
The algorithm in getCharset isn't quite right. It's not correct to look for
"charset" and then look for a "=" after it -- there might be the letter "x"
just before the string "charset", and this code would allow that. You should
fix that aspect of the getCharset function.
For getResponseHeader, I do think a separate bool is nice and clear, but there
is the distinct "null" value in QString that you could have used instead. I
think I'd prefer that because it makes the JavaScript version more consistent
with the C++ version.
Future direction: To fit in with the rest of the DOM bindings, the way this
class should work some day is that there is an XMLHttpRequest class that is not
part of the JavaScript bindings, but rather part of the "DOM" implementation.
Functions like getResponseHeader which are exported should both take and return
DOM strings, just like the functions in classes like DOM::NodeImpl and
DOM::DocumentImpl.
Created attachment 4937 [details]
updated patch
Addressed the comments (except for future direction).
Comment on attachment 4937 [details]
updated patch
Still looks wrong to me. This looks for charset, and if it's not at the
beginning of a word, just returns from the entire function. That doesn't make
sense to me. I think maybe you want to do "continue" in that case to look for
the next occurence of the string "charset".
I also think the test for word boundary seems strange. Why include all control
characters, semicolon, and quote marks? This would parse this:
ignoreString="charset=UTF-8"
as a charset option, and there's no reason to parse in a way that allows that.
The skip whitespace code actually skips all characters <= ' ' which includes
things other than whitespace.
There's no reason to make this function strange in these respects. It could be
coded to match RFC 2616 instead.
Created attachment 4948 [details]
updated patch
Sorry, stupid bugs (thought about writing "continue", but then forgot).
As for testing for <= ' ' - well, that's what khtml::Decoder does, and it
sounds like a good idea to make the logic similar. Actually, Decoder also
permits "xcharset"...
Comment on attachment 4948 [details]
updated patch
If this function is going to share quirks with the decoder, then I think it
should be moved to be a static member function of Decoder so we can fix both at
the same time. I know you were discussing where it should go, and I think that
would be the right thing to do.
In general I'm not happy with these getMIMEType and getCharset functions.
getMIMEType splits the entire string into a list and then just uses the first
piece -- that's a bit inefficient since you really just want to search for the
first ";" and get the whitespace-stripped sequence up until that.
But then getCharset goes the other way and instead of parsing the
semicolon-separated string into pieces and looking at each one for charset, it
scans the entire string for the substring "charset" and then tries to figure
things out from there.
In both cases we could write something that was both correct and relatively
efficient.
But for this patch, I think we're OK. This is good enough ot fix the bug. Lets
land it. r=me
Filed Content-Type parsing cleanup as bug 5954. Created attachment 5118 [details]
updated patch
Same as 4948, but updated for the recent ValueImp->JSValue renaming.
Is there some kind of automated test we can do? Perhaps using XMLHttpRequest to load a local file? (In reply to comment #21) Obviously, only some aspects of the behavior could be tested with local files... And I'm not sure if expecting XMLHttpRequest to work with local files is reasonable (my quick test worked in ToT WebKit, but not in Firefox or Opera). re testing, you can test against the now live site that originally had the problem. goto to the following site : 1) http://idp.bl.uk 2) Choose Show Catalogues from left menu. 3) Select "Enoki_1962" from catalogue list and click submit button. 4) The page will load and make note of the text in the bottom right area entitled Catalogue Entry. 5) Click "C1..." from the list on the left (Catalogue List) and check whether the data is still encoded correctly Created attachment 5155 [details]
ChangeLog entry
Created attachment 5156 [details]
archived test suite
To be served by Apache; don't overlook the .htaccess file.
Hi I'm the original reporter of this bug. Although it was marked as fixed a while ago, I note the problem still remains with the latest version of Safari (2.0.4 (419.3)), Webkit and Mac OS X (10.4.7). Should this be the case of have the fixes not been rolled into the latest builds. Yes, the fix is not in latest released MacOS yet. If I were to guess (Apple doesn't give details on such decisions), it might be partially because a workaround exists - as seen at <http://nypop.com/~ap/webkit/xmlhttprequestenc/xmlhttprequestenc.html>, some cases do work. You may try asking for a fix on Apple's release branch via <http://bugreport.apple.com>. The fix should be present in OmniWeb 5.5 beta (again, no idea about its release date). Alexey, Thanks for you comments. I have registered the bug with apple (hopefully they will action it). Regards, Michael. |