Bug 66106 - The encoding menu should be grayed out for XML files + for HTML files with the BOM
Summary: The encoding menu should be grayed out for XML files + for HTML files with th...
Status: UNCONFIRMED
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Major
Assignee: Nobody
URL: http://malform.no/testing/html5/bom/x...
Keywords:
Depends on: 66055 66056
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-11 15:32 PDT by Leif Halvard Silli
Modified: 2011-08-13 03:26 PDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Leif Halvard Silli 2011-08-11 15:32:10 PDT
ISSUE: 

   2 facts:
       1) XML 1.0 does not permit that users override the document's declared or default encoding
       2) For HTML pages (and XML pages!) that includes the BOM, then Webkit already prevents the user from overriding the encoding. However, without graying out the encoding menu.
       

BACKGROUND:

   According to section 4.3.3 of the XML 1.0 spec, it is a FATAL ERROR if the page is in another encoding than the one that is declared internally (explicitliy or implicitly/default) or different from the accompagnying (with the file system of protocol) external encoding (typically in the HTTP Content-Type: header):

]]
   In the absence of information provided by an external transport protocol 
   (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding
   declaration to be presented to the XML processor in an encoding other 
   than that named in the declaration, or for an entity which begins with 
   neither a Byte Order Mark nor an encoding declaration to use an encoding
   other than UTF-8.
[[

THUS: It ought to be impossible (aka "FATAL ERROR) to interpret an XML page with another encoding than the internally declared encoding (explicitly or implicitley/default)  or externally accompanying encoding (typically via HTTP Content-Type:'s charset attrbute). Because a user chosen encoding is not something that the page is "accompanied" with. (When XML 1.0 speaks about external encoding info that might override the internally declared encoding, then it has in mind *accompanying* encoding info - that is: info that is directly linked to the file via the protocol or file system. As appendix F.2 of XML 1.0 says, in my emphasize: «when the XML entity is **accompanied** by encoding information, as in some file systems and some network protocols». For more explanation, see comment 3 here: https://bugzilla.gnome.org/show_bug.cgi?id=331266#c3

For that reason, Webkit should gray out the encoding menu, so as to *not* allow the user to override then encoding *and also not* give the user the  impression that he/she can override the encoding.

Since Webkit already prevents users from changing the encoding whenever there is a BOM (for both XML and HTML files), the same grayed out menu behaviour should also be implemented for HTML pages with the BOM.

WAYS TO REPRODUCE THIS BUG:

 -- variant 1 --

1. In a browser in the Webkit family (including nightly build), go to the "Text Encodings" submenu of the "View" menu and select "Western (Macintosh)". NOTE: This step changes - for the current window or tab - the default encoding from "Default/Automatic" to the encoding that you selected.

2. Now, within the same window or tab, visit one of these XHTML (application/xhtml+xml) pages:
2.1. http://malform.no/testing/html5/bom/cyrillic-encoding-declaration
2.2. http://malform.no/testing/html5/bom/cyrillic-http-charset
       NOTE:
Page 2.1. includes an internal XML encoding declaration: <?xml version="1.0" encoding="KOI8-R" ?>
Page 2.2. is served with the charset=KOI8-R in the HTTP Content-Type: header

 -- variant 2 -- (the opposite way)

1. With the encoding set to "Default/Automatic", visit of these XHTML (application/xhtml+xml) pages:
1.1. http://malform.no/testing/html5/bom/cyrillic-encoding-declaration
1.2. http://malform.no/testing/html5/bom/cyrillic-http-charset

2. Now, manually choose the encoding "Western (Macintosh)" from the encoding menu 

 -- variant 3 -- XML file without any encoding info

1. Visit http://malform.no/testing/html5/bom/normal-XML-BOMless-HTTPcharsetLESS
2. Now, manually choose the encoding "Western (Macintosh)" from the encoding menu 

 -- variant 4 -- XML file without any encoding info - opposite variant -

1. Manually choose the encoding "Western (Macintosh)" from the encoding menu 
2. Now, visit http://malform.no/testing/html5/bom/normal-XML-BOMless-HTTPcharsetLESS


 -- variant 5 -- XML file with BOM

1. Visit http://malform.no/testing/html5/bom/normal-XML.html
2. Now, manually choose the encoding "Western (Macintosh)" from the encoding menu 

 -- variant 6 -- XML file BOM - opposite variant -

1. Manually choose the encoding "Western (Macintosh)" from the encoding menu 
2. Now, visit http://malform.no/testing/html5/bom/normal-XML.html


 -- variant 7 -- HTML file with BOM

1. Visit http://malform.no/testing/html5/bom/normal-HTML.html
2. Now, manually choose the encoding "Western (Macintosh)" from the encoding menu 

 -- variant 8 -- HTML file BOM - opposite variant -

1. Manually choose the encoding "Western (Macintosh)" from the encoding menu 
2. Now, visit http://malform.no/testing/html5/bom/normal-HTML.html




EXPECTED RESULTS - FOR ALL VARIANTS ABOVE:  when the pages gets loaded, Webkit should do two things:
  1) it should ignore the user's encoding setting, and only obey the default or declared encoding of the page.
  2) it should  gray out the encoding menu, to prevent the user from changing the encoding,
  For XML files, then both 1) and 2) is justified with section 4.3.3. of XML 1.0,  to prevent the page from being parsed in anything but the declared encoding.
  For HTML files (as well as XML files) with a BOM, then it is (also) justified with current behaviour in Webkit and IE6 to IE9 where the BOM causes encoding overriding to be prevented (however Webkit does not currently gray out the menus)
  
ACTUAL RESULTS:  
   For the XML and HTML pages which includes a BOM, then Webkit behaves as expected, except that it does not gray out the encoding menu.
   For the XML pages which do not include the BOM, then Webkit fails to ignore the user's encoding choice and also fails to gray out the menu.
  
   PS about HTML pages without the BOM:  Webkit - and all other parsers - do allow encoding override for HTML pages, in general. Web pages with the BOM is the single exception - in Webkit and IE. However, I would have *liked* it if Webkit did not permit the encoding to be overridden even when there is no BOM and but there page is declared -  internally or in the HTTP Content-Type: - to be UTF-8.  So feel free to make it so ... However, this bug does not formally make such a request ...


COMMENTS ABOUT OTHER PARSERS:

 * Firefox does not allow users to ovverride the encoding of XML pages. (However, it fails to gray out the menu, to signal that it is impossible.)  Firefox does not have any special behaviour for HTML pages with the BOM, however - it does allow encoding override, despite that it puts the page in quirks mode.

 * IE6 to IE9 do not permit the encoding to be overridden whenever the page has the BOM (however it fails to gray out the menu). Also, I have not tested IE9 in XML mode.

 * XMLlib2 does not permit users to override the encoding. (See https://bugzilla.gnome.org/show_bug.cgi?id=331266) [It does however allow to transcode the page - but that isnot really related.]

* Opera does allow the encoding of XML pages to be overridden, even when the page includes the BOM: And when the page includes the BOM, then the encoding overriding causes FATAL ERROR - and Opera offers to parse the page as HTML [which would not help much, as the page would then go into quirks mode ....  Opera could, instead of offering to parse the page as HTML, evenutally have offered to read the page in another encoding ... OK - this is is a sidetrack ... But it serves to make a point.] 

# Webkit would be the first of it is kind, if it starts to gray out the encoidng menu whenever it is not supposed to have any effect. 
# However, the very thing to - in some situations - ignore the encoding menu, is not really something new - neither in Webkit nor in IE or Firefox.
Comment 1 Leif Halvard Silli 2011-08-12 08:03:16 PDT
Chrome and Safari 5.1 (but not Safari 5.0) already grayes out the encoding menu for images, due to the fact that it has no effect there.