| Summary: | Document.characterSet should return "UTF-8" instead of null by default | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | dewei_zhu | ||||||||||||||||||
| Component: | New Bugs | Assignee: | dewei_zhu | ||||||||||||||||||
| Status: | RESOLVED FIXED | ||||||||||||||||||||
| Severity: | Normal | CC: | ap, cdumez, commit-queue, dewei_zhu, esprehn+autocc, kangil.han, rniwa | ||||||||||||||||||
| Priority: | P2 | ||||||||||||||||||||
| Version: | WebKit Nightly Build | ||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||
| URL: | https://dom.spec.whatwg.org/#concept-document-encoding | ||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
|
Description
dewei_zhu
2015-09-04 12:15:13 PDT
Created attachment 260607 [details]
Patch
Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review > Source/WebCore/ChangeLog:14 > + Is it ok to have 2 separate change logs? Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review > Source/WebCore/ChangeLog:11 > + Please indicate in the Change Log that this behavior is consistent with Firefox. > Source/WebCore/dom/Document.cpp:-1255 > - return String(); We should also update Document.idl to drop the [TreatReturnedNullStringAs=Null] IDL extended attribute from the following attributes: inputEncoding, charset, characterSet. The spec says that these attributes are not nullable and we no longer return a null string. > Source/WebCore/dom/Document.cpp:1255 > + return String(UTF8Encoding().domName()); I don't believe we need the String() > LayoutTests/ChangeLog:3 > + Update the tests which test the default encoding of document. Please use bug title here: "Set 'UTF-8' to be the default of document encoding." and move "Update the tests which test the default encoding of document." to the description under the Reviewed By line. > LayoutTests/fast/dom/document-attribute-js-null-expected.txt:5 > +TEST FAILED: The value should have been undefined but was the string 'UTF-8'. [tested Document.charset] You need to update this test instead of simply rebaselining it. This is not an imported test. Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review > Source/WebCore/ChangeLog:5 > + <rdar://problem/22549736> Wrong radar? > Source/WebCore/ChangeLog:10 > + https://dom.spec.whatwg.org/#concept-document-url. Wrong URL? Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review > Source/WebCore/ChangeLog:3 > + Set 'UTF-8' to be the default of document encoding. I would prefer "document.characterSet should return "UTF-8" instead of null by default" Comment on attachment 260607 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260607&action=review >> Source/WebCore/ChangeLog:14 >> + > > Is it ok to have 2 separate change logs? Yes. Created attachment 260611 [details]
Patch
Comment on attachment 260611 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260611&action=review > Source/WebCore/dom/Document.cpp:1255 > + return UTF8Encoding().domName(); What about the IDL changes I suggested? Created attachment 260620 [details]
Patch
Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review > Source/WebCore/dom/Document.idl:178 > + [TreatNullAs=NullString] attribute DOMString charset; Can we also drop [TreatNullAs=NullString] as well? Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review > Source/WebCore/ChangeLog:11 > + is consistent with Firefox. Chrome also follows the spec. >> Source/WebCore/dom/Document.idl:178 >> + [TreatNullAs=NullString] attribute DOMString charset; > > Can we also drop [TreatNullAs=NullString] as well? It is non-standard so I wouldn't touch it. Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review > Source/WebCore/dom/Document.cpp:1255 > + return UTF8Encoding().domName(); As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code. Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review >> Source/WebCore/dom/Document.cpp:1255 >> + return UTF8Encoding().domName(); > > As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code. I see one call site in FrameLoader::addExtraFieldsToRequest(): // Always try UTF-8. If that fails, try frame encoding (if any) and then the default. request.setResponseContentDispositionEncodingFallbackArray("UTF-8", m_frame.document()->encoding(), m_frame.settings().defaultTextEncodingName()); This already tries UTF-8 first and the implementation ignores null-string encodings so I think this is fine. Comment on attachment 260620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260620&action=review >>> Source/WebCore/dom/Document.cpp:1255 >>> + return UTF8Encoding().domName(); >> >> As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code. > > I see one call site in FrameLoader::addExtraFieldsToRequest(): > // Always try UTF-8. If that fails, try frame encoding (if any) and then the default. > request.setResponseContentDispositionEncodingFallbackArray("UTF-8", m_frame.document()->encoding(), m_frame.settings().defaultTextEncodingName()); > > This already tries UTF-8 first and the implementation ignores null-string encodings so I think this is fine. charset() / inputEncoding() are used a lot more internally and they all call Document::encoding() so we need to double check those. Created attachment 260639 [details]
Patch
Comment on attachment 260639 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260639&action=review > Source/WebCore/dom/Document.cpp:1258 > +String Document::characterSet() const Please do not duplicate code. Have this call document::encoding(). > Source/WebCore/dom/Document.h:403 > String charset() const { return Document::encoding(); } Please add a comment explaining this may return a null String. > Source/WebCore/dom/Document.h:404 > + String characterSet() const; Please add a comment explaining this returns "UTF-8" by default. > LayoutTests/ChangeLog:9 > + Update the tests which test the default encoding of document. Please add a blank line after this. > LayoutTests/ChangeLog:11 > + * fast/dom/document-attribute-js-null-expected.txt: Obsolete test. Not obsolete. > LayoutTests/fast/dom/document-attribute-js-null-expected.txt:-5 > -TEST SUCCEEDED: The value was undefined. [tested Document.charset] Do not remove. > LayoutTests/fast/dom/document-attribute-js-null.html:-69 > - {name: 'charset', expectedNull: undefined}, Do not remove Created attachment 260653 [details]
Patch
Attachment 260653 [details] did not pass style-queue:
ERROR: Source/WebCore/dom/Document.cpp:1261: Missing space before ( in if( [whitespace/parens] [5]
Total errors found: 1 in 15 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 260653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260653&action=review r=me with changes. > Source/WebCore/dom/Document.cpp:1255 > return String(); nullAtom > Source/WebCore/dom/Document.cpp:1260 > + String name = encoding(); AtomicString > Source/WebCore/dom/Document.cpp:1261 > + if(!name.isEmpty()) missing space between if and ( isNull() not isEmpty() > Source/WebCore/dom/Document.h:403 > + String charset() const { return Document::encoding(); } // may return a null String Comments need to end with a dot and start with a capital letter. > Source/WebCore/dom/Document.h:404 > + String characterSet() const; // returns "UTF-8" by default Ditto > Source/WebCore/dom/InlineStyleSheetOwner.cpp:144 > + m_sheet = CSSStyleSheet::createInline(element, URL(), m_startTextPosition, document.charset()); Let's use encoding() for these as they wanted an encoding. Ditto below. Created attachment 260889 [details]
Patch
Comment on attachment 260889 [details]
Patch
r=me
Comment on attachment 260889 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260889&action=review > Source/WebCore/ChangeLog:15 > + * dom/Document.cpp: > + (WebCore::Document::encoding): > + (WebCore::Document::characterSet): You should add comments as to what you're modifying in each function. > Source/WebCore/dom/Document.h:404 > + String charset() const { return Document::encoding(); } // May return a null String. > + String characterSet() const; // Returns "UTF-8" by default. We don't normally add comments like this. Instead, we rename functions to reflect these semantics. Can we call the latter bindingCharacterSet instead? Comment on attachment 260889 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260889&action=review >> Source/WebCore/dom/Document.h:404 >> + String characterSet() const; // Returns "UTF-8" by default. > > We don't normally add comments like this. Instead, we rename functions to reflect these semantics. > Can we call the latter bindingCharacterSet instead? Hmm, OK, how about characterSetForBindings() then. I think we usually use such naming. > Source/WebCore/dom/Document.idl:67 > + [ImplementedAs=characterSet] readonly attribute DOMString inputEncoding; characterSetForBindings > Source/WebCore/dom/Document.idl:191 > + readonly attribute DOMString characterSet; [ImplementedAs=characterSetForBindings] Created attachment 260893 [details]
Patch
Comment on attachment 260893 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260893&action=review > Source/WebCore/dom/Document.h:404 > + String charset() const { return Document::encoding(); } // May return a null String. > + String characterSetForBindings() const; // Returns "UTF-8" by default. I don't think we want these comments. These are "what" comments that repeat the code. We only want "why" comments. Created attachment 260896 [details]
Patch
Comment on attachment 260896 [details] Patch Clearing flags on attachment: 260896 Committed r189564: <http://trac.webkit.org/changeset/189564> All reviewed patches have been landed. Closing bug. This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on Windows. Could you please update the results? (In reply to comment #30) > This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on > Windows. Could you please update the results? Of course. I'll do it right now. (In reply to comment #30) > This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on > Windows. Could you please update the results? Updated. https://bugs.webkit.org/show_bug.cgi?id=149038 Comment on attachment 260896 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=260896&action=review I think it’s worthwhile to return at some point and do additional cleanup. > Source/WebCore/dom/Document.h:404 > String charset() const { return Document::encoding(); } > - String characterSet() const { return Document::encoding(); } > + String characterSetForBindings() const; I know we made an effort at clarity here, but I still find the naming confusing. There’s charset, used for the DOM binding attribute "charset", and also characterSetForBindings, used for two other DOM binding attributes "inputEncoding" and "characterSet". Now how do those names make it clear why we wouldn’t want to use characterSetForBindings for "charset"? There’s a reason, presumably charset is supposed to return null rather than "UTF-8", but I think that reason needs to be expressed in these function names. > Source/WebCore/loader/DocumentWriter.cpp:192 > + m_decoder->setEncoding(TextEncoding(parentFrame->document()->encoding()), TextResourceDecoder::EncodingFromParentFrame); This seems inefficient. Why convert from a TextEncoding to a string and then back to a TextEncoding? > Source/WebCore/loader/FormSubmission.cpp:154 > + return TextEncoding(document.encoding()); This seems inefficient. Why convert from a TextEncoding to a string and then back to a TextEncoding? Polished in bug 149131. (In reply to comment #33) > Comment on attachment 260896 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=260896&action=review > > I think it’s worthwhile to return at some point and do additional cleanup. > > > Source/WebCore/dom/Document.h:404 > > String charset() const { return Document::encoding(); } > > - String characterSet() const { return Document::encoding(); } > > + String characterSetForBindings() const; > > I know we made an effort at clarity here, but I still find the naming > confusing. There’s charset, used for the DOM binding attribute "charset", > and also characterSetForBindings, used for two other DOM binding attributes > "inputEncoding" and "characterSet". > > Now how do those names make it clear why we wouldn’t want to use > characterSetForBindings for "charset"? There’s a reason, presumably charset > is supposed to return null rather than "UTF-8", but I think that reason > needs to be expressed in these function names. > > > Source/WebCore/loader/DocumentWriter.cpp:192 > > + m_decoder->setEncoding(TextEncoding(parentFrame->document()->encoding()), TextResourceDecoder::EncodingFromParentFrame); > > This seems inefficient. Why convert from a TextEncoding to a string and then > back to a TextEncoding? > > > Source/WebCore/loader/FormSubmission.cpp:154 > > + return TextEncoding(document.encoding()); > > This seems inefficient. Why convert from a TextEncoding to a string and then > back to a TextEncoding? |