Summary: | Settings::setDefaultTextEncodingName() allows to set unsupported encoding | ||
---|---|---|---|
Product: | WebKit | Reporter: | Grzegorz Czajkowski <g.czajkowski> |
Component: | Page Loading | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED WONTFIX | ||
Severity: | Normal | CC: | ap, darin |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | Unspecified | ||
OS: | Unspecified |
Description
Grzegorz Czajkowski
2014-04-28 07:21:44 PDT
Before sending any patches could you possibly express your opinion on that? What is the proposed behavior change, is it only that WKPreferencesCopyDefaultTextEncodingName() getter will not return the unsupported name? Seems OK to me. Could probably use WTFLogAlways to complain early. The default encoding name is only used in three places in WebCore. Before we change the rules about setting it we need to figure out what behavior we want. For WebKit APIs there seem to be a few separate questions: - Does an unsupported decoding name actually cause any real problems in decoding of what gets set in headers? Some uses of defaultTextEncodingName *do* handle unsupported values in a good way. It seems that TextResourceDecoder handles unsupported values, by treating unknown names as Windows Latin-1. The two other call sites don't, and that might be worth fixing. Even for supported encodings it is strange to use the encoding name string directly from settings; that skips the canonicalization that the TextEncoding class does. For example, TextEncoding will return "UTF-8", not "utf8". - What actual decoding and encoding is done based on the unsupported decoding name? It seems clear that Windows Latin-1 is used, which seems fine to me. Maybe UTF-8 would be better in theory but on Mac and iOS at least we have the many years of the current behavior and might have apps depending on it by accident. - Does setting the encoding do anything at all if the encoding is unsupported? Specifically, if someone calls a get function does the caller get back the unsupported encoding name? Today we support a round trip on any string, even if it's not a supported encoding. - Can a caller tell if a given encoding is supported? Is there some way to either directly or indirectly find out if an encoding is supported? I would like some more concrete examples of real scenarios today (on any platform) where this makes a difference. What the current behavior is and what the improved behavior would be. (In reply to comment #2) > What is the proposed behavior change, is it only that WKPreferencesCopyDefaultTextEncodingName() getter will not return the unsupported name? Yes, this is what I had in mind. Additionally, "document.defaultCharset" won't return the unsupported name as well. > Could probably use WTFLogAlways to complain early. I don't mind adding it. Thanks for hint. (In reply to comment #3) > - Does an unsupported decoding name actually cause any real problems in decoding of what gets set in headers? As far as I know, it doesn't. WebKit behaves as you said. Latin1 encoding will be used when client sets the unsupported encoding. It happens in TextResourceDecoder's constructor. > > - The two other call sites don't, and that might be worth fixing. IMHO, if we prevent setting the unsupported encoding then the mentioned methods will behave correctly, for example, Document::defaultCharset() will return the empty string if encoding name is not set or the previously set the correct one. FYI, all WK2 platforms uses ISO-8859-1 as default encoding (Shared/WebPreferencesStore.h) unless they override it in the port specific files. > > - Even for supported encodings it is strange to use the encoding name string directly from settings; that skips the canonicalization that the TextEncoding class does. For example, TextEncoding will return "UTF-8", not "utf8". I'm afraid of difference between what the client set and what he could get. We might easily break API tests I think. > > - What actual decoding and encoding is done based on the unsupported decoding name? It seems clear that Windows Latin-1 is used, which seems fine to me. Maybe UTF-8 would be better in theory but on Mac and iOS at least we have the many years of the current behavior and might have apps depending on it by accident. I don't mind changing it. However, I think is should be done in a separate bug. > > - Does setting the encoding do anything at all if the encoding is unsupported? Specifically, if someone calls a get function does the caller get back the unsupported encoding name? Exactly, that mostly inspired me to create a bug. Get function can be called on both JS side and native platform API. > > - Today we support a round trip on any string, even if it's not a supported encoding. Right. However, getter will return what has been set (the unsupported name). > > - Can a caller tell if a given encoding is supported? Is there some way to either directly or indirectly find out if an encoding is supported? > I guess WebKit doesn't expose any API for that. Inside the WebKie code, I was thinking of using WebCore::TextEncoding::isValid(). > > I would like some more concrete examples of real scenarios today (on any platform) where this makes a difference. What the current behavior is and what the improved behavior would be. I believe that above comments showed the difference between current and improved behaviour. Note that, any modification in Settings::setDefaultTextEncodingName will affect all WebKit ports. As far as I know, all of them wrap it as API which is exposed to the client. I don’t think we should make this change unless there is a significant benefit. So far there does not seem to be one. Thank you guys for comprehensive comments. I suggest closing it as won't fix. |