RESOLVED FIXED 148810
Document.characterSet should return "UTF-8" instead of null by default
https://bugs.webkit.org/show_bug.cgi?id=148810
Summary Document.characterSet should return "UTF-8" instead of null by default
dewei_zhu
Reported 2015-09-04 12:15:13 PDT
Set default encoding of document to be 'UTF-8'.
Attachments
Patch (39.90 KB, patch)
2015-09-04 12:20 PDT, dewei_zhu
no flags
Patch (40.63 KB, patch)
2015-09-04 13:47 PDT, dewei_zhu
no flags
Patch (41.91 KB, patch)
2015-09-04 14:29 PDT, dewei_zhu
no flags
Patch (46.87 KB, patch)
2015-09-04 16:10 PDT, dewei_zhu
no flags
Patch (46.15 KB, patch)
2015-09-04 17:11 PDT, dewei_zhu
no flags
Patch (47.30 KB, patch)
2015-09-09 16:52 PDT, dewei_zhu
no flags
Patch (47.48 KB, patch)
2015-09-09 17:34 PDT, dewei_zhu
no flags
Patch (47.36 KB, patch)
2015-09-09 18:19 PDT, dewei_zhu
no flags
dewei_zhu
Comment 1 2015-09-04 12:20:31 PDT
dewei_zhu
Comment 2 2015-09-04 12:22:12 PDT
Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review > Source/WebCore/ChangeLog:14 > + Is it ok to have 2 separate change logs?
Chris Dumez
Comment 3 2015-09-04 12:33:49 PDT
Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review > Source/WebCore/ChangeLog:11 > + Please indicate in the Change Log that this behavior is consistent with Firefox. > Source/WebCore/dom/Document.cpp:-1255 > - return String(); We should also update Document.idl to drop the [TreatReturnedNullStringAs=Null] IDL extended attribute from the following attributes: inputEncoding, charset, characterSet. The spec says that these attributes are not nullable and we no longer return a null string. > Source/WebCore/dom/Document.cpp:1255 > + return String(UTF8Encoding().domName()); I don't believe we need the String() > LayoutTests/ChangeLog:3 > + Update the tests which test the default encoding of document. Please use bug title here: "Set 'UTF-8' to be the default of document encoding." and move "Update the tests which test the default encoding of document." to the description under the Reviewed By line. > LayoutTests/fast/dom/document-attribute-js-null-expected.txt:5 > +TEST FAILED: The value should have been undefined but was the string 'UTF-8'. [tested Document.charset] You need to update this test instead of simply rebaselining it. This is not an imported test.
Chris Dumez
Comment 4 2015-09-04 12:34:49 PDT
Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review > Source/WebCore/ChangeLog:5 > + <rdar://problem/22549736> Wrong radar? > Source/WebCore/ChangeLog:10 > + https://dom.spec.whatwg.org/#concept-document-url. Wrong URL?
Chris Dumez
Comment 5 2015-09-04 12:37:04 PDT
Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review > Source/WebCore/ChangeLog:3 > + Set 'UTF-8' to be the default of document encoding. I would prefer "document.characterSet should return "UTF-8" instead of null by default"
Chris Dumez
Comment 6 2015-09-04 12:37:18 PDT
Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review >> Source/WebCore/ChangeLog:14 >> + > > Is it ok to have 2 separate change logs? Yes.
dewei_zhu
Comment 7 2015-09-04 13:47:11 PDT
Chris Dumez
Comment 8 2015-09-04 13:55:12 PDT
Comment on attachment 260611 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260611&action=review > Source/WebCore/dom/Document.cpp:1255 > + return UTF8Encoding().domName(); What about the IDL changes I suggested?
dewei_zhu
Comment 9 2015-09-04 14:29:01 PDT
dewei_zhu
Comment 10 2015-09-04 14:30:06 PDT
Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review > Source/WebCore/dom/Document.idl:178 > + [TreatNullAs=NullString] attribute DOMString charset; Can we also drop [TreatNullAs=NullString] as well?
Chris Dumez
Comment 11 2015-09-04 14:39:28 PDT
Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review > Source/WebCore/ChangeLog:11 > + is consistent with Firefox. Chrome also follows the spec. >> Source/WebCore/dom/Document.idl:178 >> + [TreatNullAs=NullString] attribute DOMString charset; > > Can we also drop [TreatNullAs=NullString] as well? It is non-standard so I wouldn't touch it.
Chris Dumez
Comment 12 2015-09-04 14:40:58 PDT
Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review > Source/WebCore/dom/Document.cpp:1255 > + return UTF8Encoding().domName(); As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code.
Chris Dumez
Comment 13 2015-09-04 14:43:48 PDT
Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review >> Source/WebCore/dom/Document.cpp:1255 >> + return UTF8Encoding().domName(); > > As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code. I see one call site in FrameLoader::addExtraFieldsToRequest(): // Always try UTF-8. If that fails, try frame encoding (if any) and then the default. request.setResponseContentDispositionEncodingFallbackArray("UTF-8", m_frame.document()->encoding(), m_frame.settings().defaultTextEncodingName()); This already tries UTF-8 first and the implementation ignores null-string encodings so I think this is fine.
Chris Dumez
Comment 14 2015-09-04 14:45:38 PDT
Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review >>> Source/WebCore/dom/Document.cpp:1255 >>> + return UTF8Encoding().domName(); >> >> As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code. > > I see one call site in FrameLoader::addExtraFieldsToRequest(): > // Always try UTF-8. If that fails, try frame encoding (if any) and then the default. > request.setResponseContentDispositionEncodingFallbackArray("UTF-8", m_frame.document()->encoding(), m_frame.settings().defaultTextEncodingName()); > > This already tries UTF-8 first and the implementation ignores null-string encodings so I think this is fine. charset() / inputEncoding() are used a lot more internally and they all call Document::encoding() so we need to double check those.
dewei_zhu
Comment 15 2015-09-04 16:10:04 PDT
Chris Dumez
Comment 16 2015-09-04 16:16:05 PDT
Comment on attachment 260639 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260639&action=review > Source/WebCore/dom/Document.cpp:1258 > +String Document::characterSet() const Please do not duplicate code. Have this call document::encoding(). > Source/WebCore/dom/Document.h:403 > String charset() const { return Document::encoding(); } Please add a comment explaining this may return a null String. > Source/WebCore/dom/Document.h:404 > + String characterSet() const; Please add a comment explaining this returns "UTF-8" by default. > LayoutTests/ChangeLog:9 > + Update the tests which test the default encoding of document. Please add a blank line after this. > LayoutTests/ChangeLog:11 > + * fast/dom/document-attribute-js-null-expected.txt: Obsolete test. Not obsolete. > LayoutTests/fast/dom/document-attribute-js-null-expected.txt:-5 > -TEST SUCCEEDED: The value was undefined. [tested Document.charset] Do not remove. > LayoutTests/fast/dom/document-attribute-js-null.html:-69 > - {name: 'charset', expectedNull: undefined}, Do not remove
dewei_zhu
Comment 17 2015-09-04 17:11:20 PDT
WebKit Commit Bot
Comment 18 2015-09-04 17:14:14 PDT
Attachment 260653 [details] did not pass style-queue: ERROR: Source/WebCore/dom/Document.cpp:1261: Missing space before ( in if( [whitespace/parens] [5] Total errors found: 1 in 15 files If any of these errors are false positives, please file a bug against check-webkit-style.
Chris Dumez
Comment 19 2015-09-04 17:30:13 PDT
Comment on attachment 260653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260653&action=review r=me with changes. > Source/WebCore/dom/Document.cpp:1255 > return String(); nullAtom > Source/WebCore/dom/Document.cpp:1260 > + String name = encoding(); AtomicString > Source/WebCore/dom/Document.cpp:1261 > + if(!name.isEmpty()) missing space between if and ( isNull() not isEmpty() > Source/WebCore/dom/Document.h:403 > + String charset() const { return Document::encoding(); } // may return a null String Comments need to end with a dot and start with a capital letter. > Source/WebCore/dom/Document.h:404 > + String characterSet() const; // returns "UTF-8" by default Ditto > Source/WebCore/dom/InlineStyleSheetOwner.cpp:144 > + m_sheet = CSSStyleSheet::createInline(element, URL(), m_startTextPosition, document.charset()); Let's use encoding() for these as they wanted an encoding. Ditto below.
dewei_zhu
Comment 20 2015-09-09 16:52:28 PDT
Chris Dumez
Comment 21 2015-09-09 16:57:00 PDT
Comment on attachment 260889 [details] Patch r=me
Ryosuke Niwa
Comment 22 2015-09-09 16:57:07 PDT
Comment on attachment 260889 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260889&action=review > Source/WebCore/ChangeLog:15 > + * dom/Document.cpp: > + (WebCore::Document::encoding): > + (WebCore::Document::characterSet): You should add comments as to what you're modifying in each function. > Source/WebCore/dom/Document.h:404 > + String charset() const { return Document::encoding(); } // May return a null String. > + String characterSet() const; // Returns "UTF-8" by default. We don't normally add comments like this. Instead, we rename functions to reflect these semantics. Can we call the latter bindingCharacterSet instead?
Chris Dumez
Comment 23 2015-09-09 17:01:18 PDT
Comment on attachment 260889 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260889&action=review >> Source/WebCore/dom/Document.h:404 >> + String characterSet() const; // Returns "UTF-8" by default. > > We don't normally add comments like this. Instead, we rename functions to reflect these semantics. > Can we call the latter bindingCharacterSet instead? Hmm, OK, how about characterSetForBindings() then. I think we usually use such naming. > Source/WebCore/dom/Document.idl:67 > + [ImplementedAs=characterSet] readonly attribute DOMString inputEncoding; characterSetForBindings > Source/WebCore/dom/Document.idl:191 > + readonly attribute DOMString characterSet; [ImplementedAs=characterSetForBindings]
dewei_zhu
Comment 24 2015-09-09 17:34:30 PDT
Ryosuke Niwa
Comment 25 2015-09-09 17:45:08 PDT
Comment on attachment 260893 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260893&action=review > Source/WebCore/dom/Document.h:404 > + String charset() const { return Document::encoding(); } // May return a null String. > + String characterSetForBindings() const; // Returns "UTF-8" by default. I don't think we want these comments.
Ryosuke Niwa
Comment 26 2015-09-09 17:45:51 PDT
These are "what" comments that repeat the code. We only want "why" comments.
dewei_zhu
Comment 27 2015-09-09 18:19:19 PDT
WebKit Commit Bot
Comment 28 2015-09-09 19:04:13 PDT
Comment on attachment 260896 [details] Patch Clearing flags on attachment: 260896 Committed r189564: <http://trac.webkit.org/changeset/189564>
WebKit Commit Bot
Comment 29 2015-09-09 19:04:20 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 30 2015-09-10 09:54:51 PDT
This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on Windows. Could you please update the results?
dewei_zhu
Comment 31 2015-09-10 09:56:42 PDT
(In reply to comment #30) > This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on > Windows. Could you please update the results? Of course. I'll do it right now.
dewei_zhu
Comment 32 2015-09-10 10:13:36 PDT
(In reply to comment #30) > This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on > Windows. Could you please update the results? Updated. https://bugs.webkit.org/show_bug.cgi?id=149038
Darin Adler
Comment 33 2015-09-11 10:25:32 PDT
Comment on attachment 260896 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260896&action=review I think it’s worthwhile to return at some point and do additional cleanup. > Source/WebCore/dom/Document.h:404 > String charset() const { return Document::encoding(); } > - String characterSet() const { return Document::encoding(); } > + String characterSetForBindings() const; I know we made an effort at clarity here, but I still find the naming confusing. There’s charset, used for the DOM binding attribute "charset", and also characterSetForBindings, used for two other DOM binding attributes "inputEncoding" and "characterSet". Now how do those names make it clear why we wouldn’t want to use characterSetForBindings for "charset"? There’s a reason, presumably charset is supposed to return null rather than "UTF-8", but I think that reason needs to be expressed in these function names. > Source/WebCore/loader/DocumentWriter.cpp:192 > + m_decoder->setEncoding(TextEncoding(parentFrame->document()->encoding()), TextResourceDecoder::EncodingFromParentFrame); This seems inefficient. Why convert from a TextEncoding to a string and then back to a TextEncoding? > Source/WebCore/loader/FormSubmission.cpp:154 > + return TextEncoding(document.encoding()); This seems inefficient. Why convert from a TextEncoding to a string and then back to a TextEncoding?
dewei_zhu
Comment 34 2015-09-14 18:26:50 PDT
Polished in bug 149131. (In reply to comment #33) > Comment on attachment 260896 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=260896&action=review > > I think it’s worthwhile to return at some point and do additional cleanup. > > > Source/WebCore/dom/Document.h:404 > > String charset() const { return Document::encoding(); } > > - String characterSet() const { return Document::encoding(); } > > + String characterSetForBindings() const; > > I know we made an effort at clarity here, but I still find the naming > confusing. There’s charset, used for the DOM binding attribute "charset", > and also characterSetForBindings, used for two other DOM binding attributes > "inputEncoding" and "characterSet". > > Now how do those names make it clear why we wouldn’t want to use > characterSetForBindings for "charset"? There’s a reason, presumably charset > is supposed to return null rather than "UTF-8", but I think that reason > needs to be expressed in these function names. > > > Source/WebCore/loader/DocumentWriter.cpp:192 > > + m_decoder->setEncoding(TextEncoding(parentFrame->document()->encoding()), TextResourceDecoder::EncodingFromParentFrame); > > This seems inefficient. Why convert from a TextEncoding to a string and then > back to a TextEncoding? > > > Source/WebCore/loader/FormSubmission.cpp:154 > > + return TextEncoding(document.encoding()); > > This seems inefficient. Why convert from a TextEncoding to a string and then > back to a TextEncoding?
Note You need to log in before you can comment on or make changes to this bug.