WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
24150
Add virtual ScriptExecutionContext::encoding()
https://bugs.webkit.org/show_bug.cgi?id=24150
Summary
Add virtual ScriptExecutionContext::encoding()
Dmitry Titov
Reported
2009-02-24 18:31:31 PST
This also moved inputEncoding() to the SEC. The idea is to reflect actual encoding of loaded JavaScript (initially initialized from the parent context and then optionally changed by http response headers). This will allow to unvirtualize completeURL() and also properly "inherit" encoding in case of nested workers.
Attachments
Proposed patch
(7.47 KB, patch)
2009-02-24 18:34 PST
,
Dmitry Titov
no flags
Details
Formatted Diff
Diff
Updated patch.
(14.96 KB, patch)
2009-02-25 17:55 PST
,
Dmitry Titov
ap
: review-
Details
Formatted Diff
Diff
updated patch.
(24.50 KB, patch)
2009-03-04 17:51 PST
,
Dmitry Titov
no flags
Details
Formatted Diff
Diff
Updated the test so it runs in FF too.
(24.79 KB, patch)
2009-03-05 20:09 PST
,
Dmitry Titov
ap
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Dmitry Titov
Comment 1
2009-02-24 18:34:07 PST
Created
attachment 27950
[details]
Proposed patch
David Levin
Comment 2
2009-02-24 18:54:16 PST
Minor comment:
> String displayStringModifiedByEncoding(const String& str) const {
I know you only moved the methods, but it would be nice to fix the { placement on the methods to follow WebKit style while you are doing this.
Alexey Proskuryakov
Comment 3
2009-02-25 04:03:52 PST
Comment on
attachment 27950
[details]
Proposed patch Why move displayStringModifiedByEncoding() to ScriptExecutionContext? It isn't needed there. Keeping a full-blown TextResourceDecoder in SEC while it only needs TextEncoding doesn't look very good either. Document::m_decoder was the decoder used for document source - but worker script source is decoded elsewhere.
Dmitry Titov
Comment 4
2009-02-25 17:22:10 PST
Agreed. Changing the title of the bug and attaching new patch. Instead of moving the decoder, have a virtual method that returns the encoding as a String. Document does what it did with decoder, while WorkerContext initially inherits the encoding from the parent context and then updates it after script loading. Also update WorkerContext::completeURL to use the encoding.
Dmitry Titov
Comment 5
2009-02-25 17:55:45 PST
Created
attachment 27999
[details]
Updated patch. No more moving decoder :-) It feels that amount of wiring the encoding string needs to propagate from Worker::notifyFinished to the WorkerContext is a bit excessive but I'm not sure how to make it cleaner.
Alexey Proskuryakov
Comment 6
2009-02-26 04:52:10 PST
Comment on
attachment 27999
[details]
Updated patch.
> + String inputEncoding() const { return encoding(); } > + String charset() const { return encoding(); } > + String characterSet() const { return encoding(); }
You need to use "return Document::encoding();" syntax to avoid virtual dispatch. Please also consider making encoding() private (that would improve safety, but decrease clarity somewhat, as the functions won't be near each other any more).
> + // FIXME: nested workers need loading support. Consider adopting ThreadableLoader here.
Capital "N" is needed in "nested".
> - // FIXME: does this need to provide a charset, like Document::completeURL does? > - return KURL(m_location->url(), url); > + > + if (m_encoding.isNull()) > + return KURL(m_location->url(), url); > + > + return KURL(m_location->url(), url, TextEncoding(m_encoding)); > }
Have you tested that this is what Firefox does? I wouldn't be surprised if it always used UTF-8, and didn't inherit the encoding from the document at all. This change needs an automated test.
> + String m_encoding;
Why is it better to store it as String, not as TextEncoding?
> - WorkerThread(const KURL&, const String& userAgent, const String& sourceCode, WorkerObjectProxy*); > + WorkerThread(const KURL&, const String& userAgent, const String& encoding, const String& sourceCode, WorkerObjectProxy*);
Two spaces here. r-, because this needs an automated test (XMLHttpRequest URL completion for now, but it's worth testing how Firefox decodes nested worker scripts and importScripts() content).
Dmitry Titov
Comment 7
2009-03-02 18:22:32 PST
Tested FF 3.1b2 Interestingly enough, they inherit encoding from the document - with a new worker on any nesting level starting from the encoding of the document rather then the parent script. For XHR it's different - it always starts from utf8. Will match the behavior and add a test.
Dmitry Titov
Comment 8
2009-03-04 17:51:32 PST
Created
attachment 28295
[details]
updated patch. Updated patch, now with test.
Dmitry Titov
Comment 9
2009-03-04 18:01:57 PST
(In reply to
comment #6
)
> (From update of
attachment 27999
[details]
[review]) > > + String inputEncoding() const { return encoding(); } > > + String charset() const { return encoding(); } > > + String characterSet() const { return encoding(); } > > You need to use "return Document::encoding();" syntax to avoid virtual > dispatch.
Done.
> Please also consider making encoding() private (that would improve > safety, but decrease clarity somewhat, as the functions won't be near each > other any more).
Moved the function in .h file (to another similar one, userAgent()) but not sure if making it private (and virtual) will help - because on WorkerContext we'll need it public...
> > + // FIXME: nested workers need loading support. Consider adopting ThreadableLoader here. > Capital "N" is needed in "nested".
Done.
> > - // FIXME: does this need to provide a charset, like Document::completeURL does? > > - return KURL(m_location->url(), url); > > + > > + if (m_encoding.isNull()) > > + return KURL(m_location->url(), url); > > + > > + return KURL(m_location->url(), url, TextEncoding(m_encoding)); > > } > > Have you tested that this is what Firefox does? I wouldn't be surprised if it > always used UTF-8, and didn't inherit the encoding from the document at all. > > This change needs an automated test.
That happened to be a very good suggestion! I've tested FF and realized I'm adding wrong behavior :-). So I've changed the code and I'm adding the test that verifies the behavior is same as in FF (and also pretty logical).
> > + String m_encoding; > Why is it better to store it as String, not as TextEncoding?
It is only used as a string - to pass into script loader as initial value (now that we don't need to create TextEncoding in completeURL). So storing String is now simpler and perhaps faster.
> > + WorkerThread(const KURL&, const String& userAgent, const String& encoding, const String& sourceCode, WorkerObjectProxy*); > Two spaces here.
Done.
Alexey Proskuryakov
Comment 10
2009-03-05 00:41:55 PST
Comment on
attachment 28295
[details]
updated patch. That's very interesting, thanks! So, it turns out that the charset is inherited as a default for nested worker script decoding, correct?
> + // Note Document::completeURL provides a charset here (to encode query portion of url when > + // submitting charset-encoded forms data). Workers always use UTF-8 which is default for KURL. > return KURL(m_location->url(), url);
How does URL completion work in Firefox for Worker source and importScript() URLs? If I remember correctly, Firefox doesn't use document encoding for completing XHR URLs even on main thread, and we don't currently match this quirk. If 'new Worker(relativeUrl)' syntax uses UTF-8 even on main thread in Firefox, then we should consider moving XMLHttpRequest and Worker off completeURL entirely (but we'll need to test IE behavior for XHR URL completion then). Any changes to XHR and Worker behavior on main thread will need a separate bug of course, but I'd like to have a full understanding of correct expected behavior before going forward with this change.
Dmitry Titov
Comment 11
2009-03-05 20:09:41 PST
Created
attachment 28345
[details]
Updated the test so it runs in FF too.
Dmitry Titov
Comment 12
2009-03-05 20:27:23 PST
Here are results of more testing. I think the nested case, which we don't implement yet, may be still 'in the works', especially wrt base url... - tested version is FF 3.1b2 - 'XYZ' below is a non-latin string - 'document' is the main html page - 'first worker' is a worker created by document - 'nested worker' is a worker created by the first worker - (WK+) means behavior matched WebKit In the document: new Worker('foo.php?query=XYZ'); - query part of URL will be encoded using document's encoding (WK+) - relative URL will be completed using base URL of the document (WK+) - loaded script will be decoded using document's encoding or http header if any (WK+) XHR('foo.php?query=XYZ'); - query part of URL will be encoded using UTF8 (in WK, document encoding) - relative URL will be completed using base URL of the document (WK+) - loaded content will be decoded using UTF8 or http header if any (WK+) <iframe src='foo.php?query=XYZ'> - query part of URL will be encoded using document's encoding (WK+) - relative URL will be completed using base URL of the document (WK+) - loaded content will be decoded using document's encoding or http header if any (WK+) inside first worker: new Worker('foo.php?query=XYZ'); importScripts('foo.php?query=XYZ'); - query part of URL will be encoded using document's encoding, even if first worker had different http header (WK not impl) - relative URL will be completed using base URL of the document, not the one of the first worker (WK not impl) - loaded script will be decoded using document's encoding or http header if any, even if first worker had different http header (WK not impl) XHR("foo.php?XXX") - query part of URL will be encoded using UTF8 (in WK, document encoding) - relative URL will be completed using base URL of the document (in WK, base url of the first worker) - loaded content will be decoded using UTF8 or http header if any (WK+)
Alexey Proskuryakov
Comment 13
2009-03-06 01:58:46 PST
Comment on
attachment 28345
[details]
Updated the test so it runs in FF too. Thanks, that's very useful information! Much of the Gecko behavior looks like it may be unintentional indeed - would you be willing to file bugs with Mozilla? It could also be worth discussing this with WHATWG. - // FIXME: does this need to provide a charset, like Document::completeURL does? + // Note Document::completeURL provides a charset here (to encode query portion of url when + // submitting charset-encoded forms data). Workers always use UTF-8 which is default for KURL. return KURL(m_location->url(), url); I think that the word "Note" is extraneous here - it's a comment, after all. Per your research, it is not clear that we'll want this behavior forever, given that Firefox uses document encoding for Worker and importScripts() URL completion. So, the FIXME still seems appropriate, perhaps amended with results of your research. r=me
Dmitry Titov
Comment 14
2009-03-06 13:37:33 PST
Fixed the comment to be FIXME and landed in
http://trac.webkit.org/changeset/41499
I'll follow up separately with FF on what is the right encodings and base URLs to be used there.
Dmitry Titov
Comment 15
2009-03-06 18:20:41 PST
Retested with latest FF nightly (Minefield, 03/04/09) There is progress - nested workers and importScripts now resolves relative URLs using the parent worker script' base url (previously was using top-level document base URL) :-) inside workers, non-latin portions of query in XHR URLs now are encoded using top document encoding (was UTF-8 before) - strange, in document they still use UTF-8
Dmitry Titov
Comment 16
2009-03-19 16:54:03 PDT
More info: Discussion on WHATWG IRC and mailing list show that the spec will very soon capture the follwoing; 1. All URLs in Workers are encoded using UTF-8 2. 'Default' encoding for scripts themselves is always UTF-8 unless overridden by Con tent-Type http header. I've filed bugs for FF and will file a new one for WebKit, to remove encoding() from ScriptExecutionContext().
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug