WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
24285
Text resource loading checks for BOM twice
https://bugs.webkit.org/show_bug.cgi?id=24285
Summary
Text resource loading checks for BOM twice
Alexey Proskuryakov
Reported
2009-03-02 02:28:14 PST
We check for BOM both in TextResourceDecoder and in TextDecoder, which makes no sense. Patch forthcoming.
Attachments
proposed patch
(33.28 KB, patch)
2009-03-02 02:54 PST
,
Alexey Proskuryakov
no flags
Details
Formatted Diff
Diff
proposed patch
(33.41 KB, patch)
2009-03-02 03:02 PST
,
Alexey Proskuryakov
no flags
Details
Formatted Diff
Diff
proposed patch
(33.36 KB, patch)
2009-03-02 03:42 PST
,
Alexey Proskuryakov
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2009-03-02 02:54:15 PST
Created
attachment 28161
[details]
proposed patch
Alexey Proskuryakov
Comment 2
2009-03-02 03:02:47 PST
Created
attachment 28162
[details]
proposed patch Added a check for null m_codec in TextResourceDecoder::flush().
Alexey Proskuryakov
Comment 3
2009-03-02 03:42:22 PST
Created
attachment 28163
[details]
proposed patch This time for sure!
Darin Adler
Comment 4
2009-03-10 05:38:57 PDT
Comment on
attachment 28163
[details]
proposed patch
> + * loader/CachedScript.cpp: > + (WebCore::CachedScript::CachedScript): > + (WebCore::CachedScript::setEncoding): > + (WebCore::CachedScript::encoding): > + (WebCore::CachedScript::script): > + * loader/CachedScript.h: > + * loader/appcache/ManifestParser.cpp: > + (WebCore::parseManifest): > + Use TextResourceDecoder, as TextEncoding::decode() no longer checks for BOM.
Isn't this a bit more expensive? Are there really no differences in behavior other than the BOM check by using the TextResourceDecoder?
> if (m_source != UserChosenEncoding && m_source != AutoDetectedEncoding && encoding().isJapanese()) > detectJapaneseEncoding(data, len); > > - ASSERT(encoding().isValid()); > + ASSERT(m_encoding.isValid());
There's a mix here of m_encoding and encoding(). We should consistently use m_encoding, I think. I'll say r=me for now. I'm thinking that maybe we should remove the decode functions from TextEncoding, since that's just a convenience for newTextCodec. What callers really need that and are they better served by having that helper rather than calling newTextCodec directly?
Alexey Proskuryakov
Comment 5
2009-03-10 05:57:14 PDT
(In reply to
comment #4
)
> Isn't this a bit more expensive? Are there really no differences in behavior > other than the BOM check by using the TextResourceDecoder?
For file types that don't have charset declarations internally, the only difference I can think of is enabling of Japanese encoding auto-detection. I do not expect to see any measurable difference in performance, although it's possible that I underestimate some book-keeping work that is done by TextResourceDecoder.
> I'm thinking that maybe we should remove the decode > functions from TextEncoding, since that's just a convenience for newTextCodec. > What callers really need that and are they better served by having that helper > rather than calling newTextCodec directly?
The primary reason I didn't look into removing TextEncoding::decode() was that I wasn't sure what to do with TextEncoding:: encode().
Darin Adler
Comment 6
2009-03-10 06:00:18 PDT
(In reply to
comment #5
)
> For file types that don't have charset declarations internally, the only > difference I can think of is enabling of Japanese encoding auto-detection.
We don't want that auto-detection for scripts, do we? That's the behavior change from this patch, right?
> The primary reason I didn't look into removing TextEncoding::decode() was that > I wasn't sure what to do with TextEncoding:: encode().
I think we can leave encode there even if we do remove decode. There was already asymmetry between the two due to the presence of the TextDecoder object. And some day we might find some great thing to do with encode.
Alexey Proskuryakov
Comment 7
2009-03-10 06:19:02 PDT
(In reply to
comment #6
)
> We don't want that auto-detection for scripts, do we? That's the behavior > change from this patch, right?
Yes, it's a somewhat hidden change indeed. I can't say for sure whether we want it or not - there doesn't seem to be a reason to handle JS and other text subresources differently in this regard. There are certainly security concerns related to script decoding, but they apply to inline scripts as well as to external ones. We don't have any tests for Japanese encoding auto-detection, and I'd be reluctant to add any tests that aren't based on actual Web content.
> I think we can leave encode there even if we do remove decode.
OK. I think that can can be done in a separate patch though (which I don't promise to make right away).
Darin Adler
Comment 8
2009-03-10 06:36:29 PDT
(In reply to
comment #7
)
> > I think we can leave encode there even if we do remove decode. > > OK. I think that can can be done in a separate patch though (which I don't > promise to make right away).
Absolutely. No hurry on this.
Alexey Proskuryakov
Comment 9
2009-03-10 07:11:38 PDT
Committed revision 41551. I added a note about text encoding detection to the ChangeLog.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug