WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
148810
Document.characterSet should return "UTF-8" instead of null by default
https://bugs.webkit.org/show_bug.cgi?id=148810
Summary
Document.characterSet should return "UTF-8" instead of null by default
dewei_zhu
Reported
2015-09-04 12:15:13 PDT
Set default encoding of document to be 'UTF-8'.
Attachments
Patch
(39.90 KB, patch)
2015-09-04 12:20 PDT
,
dewei_zhu
no flags
Details
Formatted Diff
Diff
Patch
(40.63 KB, patch)
2015-09-04 13:47 PDT
,
dewei_zhu
no flags
Details
Formatted Diff
Diff
Patch
(41.91 KB, patch)
2015-09-04 14:29 PDT
,
dewei_zhu
no flags
Details
Formatted Diff
Diff
Patch
(46.87 KB, patch)
2015-09-04 16:10 PDT
,
dewei_zhu
no flags
Details
Formatted Diff
Diff
Patch
(46.15 KB, patch)
2015-09-04 17:11 PDT
,
dewei_zhu
no flags
Details
Formatted Diff
Diff
Patch
(47.30 KB, patch)
2015-09-09 16:52 PDT
,
dewei_zhu
no flags
Details
Formatted Diff
Diff
Patch
(47.48 KB, patch)
2015-09-09 17:34 PDT
,
dewei_zhu
no flags
Details
Formatted Diff
Diff
Patch
(47.36 KB, patch)
2015-09-09 18:19 PDT
,
dewei_zhu
no flags
Details
Formatted Diff
Diff
Show Obsolete
(7)
View All
Add attachment
proposed patch, testcase, etc.
dewei_zhu
Comment 1
2015-09-04 12:20:31 PDT
Created
attachment 260607
[details]
Patch
dewei_zhu
Comment 2
2015-09-04 12:22:12 PDT
Comment on
attachment 260607
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260607&action=review
> Source/WebCore/ChangeLog:14 > +
Is it ok to have 2 separate change logs?
Chris Dumez
Comment 3
2015-09-04 12:33:49 PDT
Comment on
attachment 260607
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260607&action=review
> Source/WebCore/ChangeLog:11 > +
Please indicate in the Change Log that this behavior is consistent with Firefox.
> Source/WebCore/dom/Document.cpp:-1255 > - return String();
We should also update Document.idl to drop the [TreatReturnedNullStringAs=Null] IDL extended attribute from the following attributes: inputEncoding, charset, characterSet. The spec says that these attributes are not nullable and we no longer return a null string.
> Source/WebCore/dom/Document.cpp:1255 > + return String(UTF8Encoding().domName());
I don't believe we need the String()
> LayoutTests/ChangeLog:3 > + Update the tests which test the default encoding of document.
Please use bug title here: "Set 'UTF-8' to be the default of document encoding." and move "Update the tests which test the default encoding of document." to the description under the Reviewed By line.
> LayoutTests/fast/dom/document-attribute-js-null-expected.txt:5 > +TEST FAILED: The value should have been undefined but was the string 'UTF-8'. [tested Document.charset]
You need to update this test instead of simply rebaselining it. This is not an imported test.
Chris Dumez
Comment 4
2015-09-04 12:34:49 PDT
Comment on
attachment 260607
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260607&action=review
> Source/WebCore/ChangeLog:5 > + <
rdar://problem/22549736
>
Wrong radar?
> Source/WebCore/ChangeLog:10 > +
https://dom.spec.whatwg.org/#concept-document-url
.
Wrong URL?
Chris Dumez
Comment 5
2015-09-04 12:37:04 PDT
Comment on
attachment 260607
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260607&action=review
> Source/WebCore/ChangeLog:3 > + Set 'UTF-8' to be the default of document encoding.
I would prefer "document.characterSet should return "UTF-8" instead of null by default"
Chris Dumez
Comment 6
2015-09-04 12:37:18 PDT
Comment on
attachment 260607
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260607&action=review
>> Source/WebCore/ChangeLog:14 >> + > > Is it ok to have 2 separate change logs?
Yes.
dewei_zhu
Comment 7
2015-09-04 13:47:11 PDT
Created
attachment 260611
[details]
Patch
Chris Dumez
Comment 8
2015-09-04 13:55:12 PDT
Comment on
attachment 260611
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260611&action=review
> Source/WebCore/dom/Document.cpp:1255 > + return UTF8Encoding().domName();
What about the IDL changes I suggested?
dewei_zhu
Comment 9
2015-09-04 14:29:01 PDT
Created
attachment 260620
[details]
Patch
dewei_zhu
Comment 10
2015-09-04 14:30:06 PDT
Comment on
attachment 260620
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260620&action=review
> Source/WebCore/dom/Document.idl:178 > + [TreatNullAs=NullString] attribute DOMString charset;
Can we also drop [TreatNullAs=NullString] as well?
Chris Dumez
Comment 11
2015-09-04 14:39:28 PDT
Comment on
attachment 260620
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260620&action=review
> Source/WebCore/ChangeLog:11 > + is consistent with Firefox.
Chrome also follows the spec.
>> Source/WebCore/dom/Document.idl:178 >> + [TreatNullAs=NullString] attribute DOMString charset; > > Can we also drop [TreatNullAs=NullString] as well?
It is non-standard so I wouldn't touch it.
Chris Dumez
Comment 12
2015-09-04 14:40:58 PDT
Comment on
attachment 260620
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260620&action=review
> Source/WebCore/dom/Document.cpp:1255 > + return UTF8Encoding().domName();
As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code.
Chris Dumez
Comment 13
2015-09-04 14:43:48 PDT
Comment on
attachment 260620
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260620&action=review
>> Source/WebCore/dom/Document.cpp:1255 >> + return UTF8Encoding().domName(); > > As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code.
I see one call site in FrameLoader::addExtraFieldsToRequest(): // Always try UTF-8. If that fails, try frame encoding (if any) and then the default. request.setResponseContentDispositionEncodingFallbackArray("UTF-8", m_frame.document()->encoding(), m_frame.settings().defaultTextEncodingName()); This already tries UTF-8 first and the implementation ignores null-string encodings so I think this is fine.
Chris Dumez
Comment 14
2015-09-04 14:45:38 PDT
Comment on
attachment 260620
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260620&action=review
>>> Source/WebCore/dom/Document.cpp:1255 >>> + return UTF8Encoding().domName(); >> >> As per Alexey's feedback, let's check that this doesn't impact Document::encoding() call sites in native code. > > I see one call site in FrameLoader::addExtraFieldsToRequest(): > // Always try UTF-8. If that fails, try frame encoding (if any) and then the default. > request.setResponseContentDispositionEncodingFallbackArray("UTF-8", m_frame.document()->encoding(), m_frame.settings().defaultTextEncodingName()); > > This already tries UTF-8 first and the implementation ignores null-string encodings so I think this is fine.
charset() / inputEncoding() are used a lot more internally and they all call Document::encoding() so we need to double check those.
dewei_zhu
Comment 15
2015-09-04 16:10:04 PDT
Created
attachment 260639
[details]
Patch
Chris Dumez
Comment 16
2015-09-04 16:16:05 PDT
Comment on
attachment 260639
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260639&action=review
> Source/WebCore/dom/Document.cpp:1258 > +String Document::characterSet() const
Please do not duplicate code. Have this call document::encoding().
> Source/WebCore/dom/Document.h:403 > String charset() const { return Document::encoding(); }
Please add a comment explaining this may return a null String.
> Source/WebCore/dom/Document.h:404 > + String characterSet() const;
Please add a comment explaining this returns "UTF-8" by default.
> LayoutTests/ChangeLog:9 > + Update the tests which test the default encoding of document.
Please add a blank line after this.
> LayoutTests/ChangeLog:11 > + * fast/dom/document-attribute-js-null-expected.txt: Obsolete test.
Not obsolete.
> LayoutTests/fast/dom/document-attribute-js-null-expected.txt:-5 > -TEST SUCCEEDED: The value was undefined. [tested Document.charset]
Do not remove.
> LayoutTests/fast/dom/document-attribute-js-null.html:-69 > - {name: 'charset', expectedNull: undefined},
Do not remove
dewei_zhu
Comment 17
2015-09-04 17:11:20 PDT
Created
attachment 260653
[details]
Patch
WebKit Commit Bot
Comment 18
2015-09-04 17:14:14 PDT
Attachment 260653
[details]
did not pass style-queue: ERROR: Source/WebCore/dom/Document.cpp:1261: Missing space before ( in if( [whitespace/parens] [5] Total errors found: 1 in 15 files If any of these errors are false positives, please file a bug against check-webkit-style.
Chris Dumez
Comment 19
2015-09-04 17:30:13 PDT
Comment on
attachment 260653
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260653&action=review
r=me with changes.
> Source/WebCore/dom/Document.cpp:1255 > return String();
nullAtom
> Source/WebCore/dom/Document.cpp:1260 > + String name = encoding();
AtomicString
> Source/WebCore/dom/Document.cpp:1261 > + if(!name.isEmpty())
missing space between if and ( isNull() not isEmpty()
> Source/WebCore/dom/Document.h:403 > + String charset() const { return Document::encoding(); } // may return a null String
Comments need to end with a dot and start with a capital letter.
> Source/WebCore/dom/Document.h:404 > + String characterSet() const; // returns "UTF-8" by default
Ditto
> Source/WebCore/dom/InlineStyleSheetOwner.cpp:144 > + m_sheet = CSSStyleSheet::createInline(element, URL(), m_startTextPosition, document.charset());
Let's use encoding() for these as they wanted an encoding. Ditto below.
dewei_zhu
Comment 20
2015-09-09 16:52:28 PDT
Created
attachment 260889
[details]
Patch
Chris Dumez
Comment 21
2015-09-09 16:57:00 PDT
Comment on
attachment 260889
[details]
Patch r=me
Ryosuke Niwa
Comment 22
2015-09-09 16:57:07 PDT
Comment on
attachment 260889
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260889&action=review
> Source/WebCore/ChangeLog:15 > + * dom/Document.cpp: > + (WebCore::Document::encoding): > + (WebCore::Document::characterSet):
You should add comments as to what you're modifying in each function.
> Source/WebCore/dom/Document.h:404 > + String charset() const { return Document::encoding(); } // May return a null String. > + String characterSet() const; // Returns "UTF-8" by default.
We don't normally add comments like this. Instead, we rename functions to reflect these semantics. Can we call the latter bindingCharacterSet instead?
Chris Dumez
Comment 23
2015-09-09 17:01:18 PDT
Comment on
attachment 260889
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260889&action=review
>> Source/WebCore/dom/Document.h:404 >> + String characterSet() const; // Returns "UTF-8" by default. > > We don't normally add comments like this. Instead, we rename functions to reflect these semantics. > Can we call the latter bindingCharacterSet instead?
Hmm, OK, how about characterSetForBindings() then. I think we usually use such naming.
> Source/WebCore/dom/Document.idl:67 > + [ImplementedAs=characterSet] readonly attribute DOMString inputEncoding;
characterSetForBindings
> Source/WebCore/dom/Document.idl:191 > + readonly attribute DOMString characterSet;
[ImplementedAs=characterSetForBindings]
dewei_zhu
Comment 24
2015-09-09 17:34:30 PDT
Created
attachment 260893
[details]
Patch
Ryosuke Niwa
Comment 25
2015-09-09 17:45:08 PDT
Comment on
attachment 260893
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260893&action=review
> Source/WebCore/dom/Document.h:404 > + String charset() const { return Document::encoding(); } // May return a null String. > + String characterSetForBindings() const; // Returns "UTF-8" by default.
I don't think we want these comments.
Ryosuke Niwa
Comment 26
2015-09-09 17:45:51 PDT
These are "what" comments that repeat the code. We only want "why" comments.
dewei_zhu
Comment 27
2015-09-09 18:19:19 PDT
Created
attachment 260896
[details]
Patch
WebKit Commit Bot
Comment 28
2015-09-09 19:04:13 PDT
Comment on
attachment 260896
[details]
Patch Clearing flags on attachment: 260896 Committed
r189564
: <
http://trac.webkit.org/changeset/189564
>
WebKit Commit Bot
Comment 29
2015-09-09 19:04:20 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 30
2015-09-10 09:54:51 PDT
This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on Windows. Could you please update the results?
dewei_zhu
Comment 31
2015-09-10 09:56:42 PDT
(In reply to
comment #30
)
> This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on > Windows. Could you please update the results?
Of course. I'll do it right now.
dewei_zhu
Comment 32
2015-09-10 10:13:36 PDT
(In reply to
comment #30
)
> This broke imported/w3c/web-platform-tests/html/dom/interfaces.html on > Windows. Could you please update the results?
Updated.
https://bugs.webkit.org/show_bug.cgi?id=149038
Darin Adler
Comment 33
2015-09-11 10:25:32 PDT
Comment on
attachment 260896
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=260896&action=review
I think it’s worthwhile to return at some point and do additional cleanup.
> Source/WebCore/dom/Document.h:404 > String charset() const { return Document::encoding(); } > - String characterSet() const { return Document::encoding(); } > + String characterSetForBindings() const;
I know we made an effort at clarity here, but I still find the naming confusing. There’s charset, used for the DOM binding attribute "charset", and also characterSetForBindings, used for two other DOM binding attributes "inputEncoding" and "characterSet". Now how do those names make it clear why we wouldn’t want to use characterSetForBindings for "charset"? There’s a reason, presumably charset is supposed to return null rather than "UTF-8", but I think that reason needs to be expressed in these function names.
> Source/WebCore/loader/DocumentWriter.cpp:192 > + m_decoder->setEncoding(TextEncoding(parentFrame->document()->encoding()), TextResourceDecoder::EncodingFromParentFrame);
This seems inefficient. Why convert from a TextEncoding to a string and then back to a TextEncoding?
> Source/WebCore/loader/FormSubmission.cpp:154 > + return TextEncoding(document.encoding());
This seems inefficient. Why convert from a TextEncoding to a string and then back to a TextEncoding?
dewei_zhu
Comment 34
2015-09-14 18:26:50 PDT
Polished in
bug 149131
. (In reply to
comment #33
)
> Comment on
attachment 260896
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=260896&action=review
> > I think it’s worthwhile to return at some point and do additional cleanup. > > > Source/WebCore/dom/Document.h:404 > > String charset() const { return Document::encoding(); } > > - String characterSet() const { return Document::encoding(); } > > + String characterSetForBindings() const; > > I know we made an effort at clarity here, but I still find the naming > confusing. There’s charset, used for the DOM binding attribute "charset", > and also characterSetForBindings, used for two other DOM binding attributes > "inputEncoding" and "characterSet". > > Now how do those names make it clear why we wouldn’t want to use > characterSetForBindings for "charset"? There’s a reason, presumably charset > is supposed to return null rather than "UTF-8", but I think that reason > needs to be expressed in these function names. > > > Source/WebCore/loader/DocumentWriter.cpp:192 > > + m_decoder->setEncoding(TextEncoding(parentFrame->document()->encoding()), TextResourceDecoder::EncodingFromParentFrame); > > This seems inefficient. Why convert from a TextEncoding to a string and then > back to a TextEncoding? > > > Source/WebCore/loader/FormSubmission.cpp:154 > > + return TextEncoding(document.encoding()); > > This seems inefficient. Why convert from a TextEncoding to a string and then > back to a TextEncoding?
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug