Bug 53398

Summary: Use Document::encoding() instead of DocumentWriter::encoding()
Product: WebKit Reporter: Patrick R. Gansterer <paroga>
Component: DOMAssignee: Patrick R. Gansterer <paroga>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, buildbot, dglazkov, eric, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Bug Depends on: 53406    
Bug Blocks: 52036    
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Patrick R. Gansterer 2011-01-30 09:05:10 PST
Use Document::encoding() instead of DocumentWriter::encoding()
Comment 1 Patrick R. Gansterer 2011-01-30 09:11:39 PST
Created attachment 80593 [details]
Patch
Comment 2 WebKit Review Bot 2011-01-30 09:14:36 PST
Attachment 80593 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7584818
Comment 3 Patrick R. Gansterer 2011-01-30 09:24:47 PST
Created attachment 80594 [details]
Patch
Comment 4 Build Bot 2011-01-30 09:28:32 PST
Attachment 80593 [details] did not build on win:
Build output: http://queues.webkit.org/results/7679413
Comment 5 Adam Barth 2011-01-30 10:34:31 PST
Comment on attachment 80594 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=80594&action=review

R- for using a virtual method when a non-virtual one does the same job.  Also, one question:

> Source/WebCore/ChangeLog:6
> +        Use Document::encoding() instead of DocumentWriter::encoding()
> +        https://bugs.webkit.org/show_bug.cgi?id=53398

Do you mean Document::charset?  Document.h says: "Use Document::encoding() to avoid virtual dispatch" but you're not doing that...  Are these two always the same?
Comment 6 Alexey Proskuryakov 2011-01-30 16:29:20 PST
Note that Document::encoding() can be affected by assigning to document.charset. It's not clear what exactly we want to change in that case - it's particularly questionable that we change document.inputEncoding.
Comment 7 Patrick R. Gansterer 2011-02-03 13:49:06 PST
Comment on attachment 80594 [details]
Patch

Document::encoding() and DocumentWriter::encoding() are always the same.

Document::setDecoder() will be called at
http://trac.webkit.org/browser/trunk/Source/WebCore/loader/DocumentWriter.cpp?rev=76872#L133
http://trac.webkit.org/browser/trunk/Source/WebCore/loader/DocumentWriter.cpp?rev=76872#L177
where Document::encoding will be set to DocumentWriter::encoding

There is one place where this happens in the other direction:
http://trac.webkit.org/browser/trunk/Source/WebCore/loader/FrameLoader.cpp?rev=76872#L2166

And there is one call where we set Document::encoding and don't have a DocumentWriter:
http://trac.webkit.org/browser/trunk/Source/WebCore/xml/XSLTProcessor.cpp?rev=76872#L92
Comment 8 Patrick R. Gansterer 2011-02-04 13:52:52 PST
(In reply to comment #6)
> Note that Document::encoding() can be affected by assigning to document.charset. It's not clear what exactly we want to change in that case - it's particularly questionable that we change document.inputEncoding.
Document::setCharset will set encoding of Document::m_decoder. The decoder of Document and DocumentWriter are the same: see comment #7.

AFAICS this patch does not change behavior of the existing code.
Comment 9 Alexey Proskuryakov 2011-02-04 14:08:20 PST
Comment 7 lists all places where encoding is set. Could you please describe how DocumentWriter encoding becomes equal to document encoding in the particular case of document.charset being set from JS, for ease of comprehension?
Comment 10 Patrick R. Gansterer 2011-02-04 14:20:14 PST
(In reply to comment #6)
> Note that Document::encoding() can be affected by assigning to document.charset. It's not clear what exactly we want to change in that case - it's particularly questionable that we change document.inputEncoding.
charset and inputEncoding are always the same. http://trac.webkit.org/browser/trunk/LayoutTests/fast/dom/Document/document-charset.html?rev=13706 tests for this.


(In reply to comment #9)
> Comment 7 lists all places where encoding is set. Could you please describe how DocumentWriter encoding becomes equal to document encoding in the particular case of document.charset being set from JS, for ease of comprehension?
DocumentWriter and Document share the same TextResourceDecoder (m_decoder).


But two other cases:
a) User selects an ecoding: We generate a new Document, where the TextResourceDecoder has the encoding selected by the user. So decoder() returns the correct encoding.
b) When we fall back to the defaultEncoding we return null string in Document::encoding(), but we will use the defaultEncoding when creating a TextResourceDecoder with no valid encoding.
Comment 11 Alexey Proskuryakov 2011-02-04 14:37:59 PST
> charset and inputEncoding are always the same. http://trac.webkit.org/browser/trunk/LayoutTests/fast/dom/Document/document-charset.html?rev=13706 tests for this.

Yes, that's current behavior, that was chosen in absence of any spec or compatibility info. Now that IE9 has inputEncoding, we should test what it does.
Comment 12 Patrick R. Gansterer 2011-02-04 14:52:31 PST
(In reply to comment #11)
> > charset and inputEncoding are always the same. http://trac.webkit.org/browser/trunk/LayoutTests/fast/dom/Document/document-charset.html?rev=13706 tests for this.
> 
> Yes, that's current behavior, that was chosen in absence of any spec or compatibility info. Now that IE9 has inputEncoding, we should test what it does.
IMHO that's a different task, since this patch does not change the behavior.

Matching the IE9 behavior is a separate task we can do after this is landed too. But if you think we need to ensure IE9 compatibility _before_ I need a IE9, which I don't have and can't install on XP :-(.
Comment 13 Alexey Proskuryakov 2011-02-04 15:10:05 PST
There is a chance of being counter-productive if we clean up implementation before deciding what the desired behavior is. That's all I'm concerned about.
Comment 14 Patrick R. Gansterer 2011-02-05 01:52:46 PST
(In reply to comment #13)
> There is a chance of being counter-productive if we clean up implementation before deciding what the desired behavior is. That's all I'm concerned about.
I agree with you!

Opening LayoutTests/fast/dom/Document/document-charset.html with IE9beta on Windows Server 2008 R2 gives the following result:

Initial
 document.charset: koi8-r
 document.defaultCharset: defined
 document.characterSet: undefined
 document.inputEncoding: undefined

Setting charset to UTF-8...
 document.charset: utf-8
 document.defaultCharset: defined
 document.characterSet: undefined
 document.inputEncoding: undefined

Setting characterSet to KOI8-R (expected to fail, matching Firefox)...
 document.charset: utf-8
 document.defaultCharset: defined
 document.characterSet: KOI8-R
 document.inputEncoding: undefined

It seams inputEncoding isn't working in IE9! Matching the current behavior of IE9 does not seam to go in the right direction. :-)
Comment 15 Patrick R. Gansterer 2011-02-05 01:57:32 PST
comment #14 was with Quirks mode :-/

With IE9 standards mode:

Initial
 document.charset: utf-8
 document.defaultCharset: defined
 document.characterSet: utf-8
 document.inputEncoding: UTF-8

Setting charset to UTF-8...
 document.charset: utf-8
 document.defaultCharset: defined
 document.characterSet: utf-8
 document.inputEncoding: UTF-8

Setting characterSet to KOI8-R (expected to fail, matching Firefox)...
Failure (expected)
Comment 16 Patrick R. Gansterer 2011-02-05 07:51:58 PST
Created attachment 81360 [details]
Patch

After document.charset is set, the given encoding will be used in IE9. AFAIKS we match the IE9 behavior.
Comment 17 Alexey Proskuryakov 2011-02-05 11:36:10 PST
Comment on attachment 80594 [details]
Patch

OK. Please consider adding a doctype to document-charset.html, to make the test pass in IE9 out of the box.
Comment 18 Alexey Proskuryakov 2011-02-05 11:40:41 PST
Comment on attachment 81360 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=81360&action=review

Another thing to test is whether the main document decoding starts using the new encoding after assignment to charset. That's obviously tricky, as decoding happens ahead of execution - but as far as I can tell, that's completely broken in WebKit now. Changing TextResourceDecoder charset on the fly resets it, dropping buffered characters, so assigning to document.charset means that a decoding error will occur at some later point.

> LayoutTests/fast/encoding/external-script-charset-expected.txt:4
>  PASS
> +PASS

Would be nice to explain what these subtests are, like it's done in other tests.
Comment 19 Patrick R. Gansterer 2011-02-05 12:30:50 PST
Committed r77749: <http://trac.webkit.org/changeset/77749>
Comment 20 Patrick R. Gansterer 2011-02-05 12:34:42 PST
Comment on attachment 80594 [details]
Patch

Clearing flags on attachment: 80594

Committed r77750: <http://trac.webkit.org/changeset/77750>