Bug 132270 - Settings::setDefaultTextEncodingName() allows to set unsupported encoding
Summary: Settings::setDefaultTextEncodingName() allows to set unsupported encoding
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-28 07:21 PDT by Grzegorz Czajkowski
Modified: 2014-05-02 01:45 PDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Grzegorz Czajkowski 2014-04-28 07:21:44 PDT
JavaScript interface does not allow to set the encoding which is not supported by WebKit, for example:

document.charset = "FooBar"; // won't take effect

However, all WebKit ports expose an additional API to set default text encoding. This API mostly uses Settings::setDefaultTextEncodingName() which doesn't ensure correctness of the given encoding name.
What's more, it seems that all methods which use the defaultTextEncodingName() do not take it into consideration if it's wrong, for example:

1) request.setResponseContentDispositionEncodingFallbackArray("UTF-8", m_frame.document()->encoding(), m_frame.settings().defaultTextEncodingName())
2) TextResourceDecoder::TextResourceDecoder(const String& mimeType, const TextEncoding& specifiedDefaultEncoding, ....) {
   ...
   m_encoding(defaultEncoding(m_contentType, specifiedDefaultEncoding)) -> will be converted to Latin1 if (!specifiedDefaultEncoding.isValid()

Due to this change will affect all WebKit ports I wonder if there are objections to prevent setting the encoding which is not supported by WebKit.
Comment 1 Grzegorz Czajkowski 2014-04-28 07:39:19 PDT
Before sending any patches could you possibly express your opinion on that?
Comment 2 Alexey Proskuryakov 2014-04-28 09:16:03 PDT
What is the proposed behavior change, is it only that WKPreferencesCopyDefaultTextEncodingName() getter will not return the unsupported name? Seems OK to me.

Could probably use WTFLogAlways to complain early.
Comment 3 Darin Adler 2014-04-28 09:34:30 PDT
The default encoding name is only used in three places in WebCore. Before we change the rules about setting it we need to figure out what behavior we want.

For WebKit APIs there seem to be a few separate questions:

- Does an unsupported decoding name actually cause any real problems in decoding of what gets set in headers? Some uses of defaultTextEncodingName *do* handle unsupported values in a good way. It seems that TextResourceDecoder handles unsupported values, by treating unknown names as Windows Latin-1. The two other call sites don't, and that might be worth fixing. Even for supported encodings it is strange to use the encoding name string directly from settings; that skips the canonicalization that the TextEncoding class does. For example, TextEncoding will return "UTF-8", not "utf8".

- What actual decoding and encoding is done based on the unsupported decoding name? It seems clear that Windows Latin-1 is used, which seems fine to me. Maybe UTF-8 would be better in theory but on Mac and iOS at least we have the many years of the current behavior and might have apps depending on it by accident.

- Does setting the encoding do anything at all if the encoding is unsupported? Specifically, if someone calls a get function does the caller get back the unsupported encoding name? Today we support a round trip on any string, even if it's not a supported encoding.

- Can a caller tell if a given encoding is supported? Is there some way to either directly or indirectly find out if an encoding is supported?

I would like some more concrete examples of real scenarios today (on any platform) where this makes a difference. What the current behavior is and what the improved behavior would be.
Comment 4 Grzegorz Czajkowski 2014-04-29 08:02:13 PDT
(In reply to comment #2)
> What is the proposed behavior change, is it only that WKPreferencesCopyDefaultTextEncodingName() getter will not return the unsupported name?
 
Yes, this is what I had in mind. Additionally, "document.defaultCharset" won't return the unsupported name as well.

> Could probably use WTFLogAlways to complain early.

I don't mind adding it. Thanks for hint.

(In reply to comment #3)
> - Does an unsupported decoding name actually cause any real problems in decoding of what gets set in headers? 

As far as I know, it doesn't. WebKit behaves as you said. Latin1 encoding will be used when client sets the unsupported encoding. It happens in TextResourceDecoder's constructor.

>
> - The two other call sites don't, and that might be worth fixing. 

IMHO, if we prevent setting the unsupported encoding then the mentioned methods will behave correctly, for example, Document::defaultCharset() will return the empty string if encoding name is not set or the previously set the correct one.
FYI, all WK2 platforms uses ISO-8859-1 as default encoding (Shared/WebPreferencesStore.h) unless they override it in the port specific files. 

>
> - Even for supported encodings it is strange to use the encoding name string directly from settings; that skips the canonicalization that the TextEncoding class does. For example, TextEncoding will return "UTF-8", not "utf8".

I'm afraid of difference between what the client set and what he could get. We might easily break API tests I think. 

> 
> - What actual decoding and encoding is done based on the unsupported decoding name? It seems clear that Windows Latin-1 is used, which seems fine to me. Maybe UTF-8 would be better in theory but on Mac and iOS at least we have the many years of the current behavior and might have apps depending on it by accident.

I don't mind changing it. However, I think is should be done in a separate bug.

> 
> - Does setting the encoding do anything at all if the encoding is unsupported? Specifically, if someone calls a get function does the caller get back the unsupported encoding name? 

Exactly, that mostly inspired me to create a bug. Get function can be called on both JS side and native platform API. 

>
> - Today we support a round trip on any string, even if it's not a supported encoding.

Right. However, getter will return what has been set (the unsupported name).

> 
> - Can a caller tell if a given encoding is supported? Is there some way to either directly or indirectly find out if an encoding is supported?
> 

I guess WebKit doesn't expose any API for that. Inside the WebKie code, I was thinking of using WebCore::TextEncoding::isValid().

>
> I would like some more concrete examples of real scenarios today (on any platform) where this makes a difference. What the current behavior is and what the improved behavior would be.

I believe that above comments showed the difference between current and improved behaviour. Note that, any modification in Settings::setDefaultTextEncodingName will affect all WebKit ports. As far as I know, all of them wrap it as API which is exposed to the client.
Comment 5 Darin Adler 2014-04-29 08:57:58 PDT
I don’t think we should make this change unless there is a significant benefit. So far there does not seem to be one.
Comment 6 Grzegorz Czajkowski 2014-05-02 01:45:27 PDT
Thank you guys for comprehensive comments. I suggest closing it as won't fix.