24285 – Text resource loading checks for BOM twice

RESOLVED FIXED 24285

Text resource loading checks for BOM twice

https://bugs.webkit.org/show_bug.cgi?id=24285

Summary Text resource loading checks for BOM twice

Alexey Proskuryakov

Reported 2009-03-02 02:28:14 PST

We check for BOM both in TextResourceDecoder and in TextDecoder, which makes no sense. Patch forthcoming.

Attachments
proposed patch (33.28 KB, patch) 2009-03-02 02:54 PST, Alexey Proskuryakov	no flags	Details Formatted Diff Diff
proposed patch (33.41 KB, patch) 2009-03-02 03:02 PST, Alexey Proskuryakov	no flags	Details Formatted Diff Diff
proposed patch (33.36 KB, patch) 2009-03-02 03:42 PST, Alexey Proskuryakov	darin: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Alexey Proskuryakov

Comment 1 2009-03-02 02:54:15 PST

Created attachment 28161 [details] proposed patch

Alexey Proskuryakov

Comment 2 2009-03-02 03:02:47 PST

Created attachment 28162 [details] proposed patch Added a check for null m_codec in TextResourceDecoder::flush().

Alexey Proskuryakov

Comment 3 2009-03-02 03:42:22 PST

Created attachment 28163 [details] proposed patch This time for sure!

Darin Adler

Comment 4 2009-03-10 05:38:57 PDT

Comment on attachment 28163 [details] proposed patch > + * loader/CachedScript.cpp: > + (WebCore::CachedScript::CachedScript): > + (WebCore::CachedScript::setEncoding): > + (WebCore::CachedScript::encoding): > + (WebCore::CachedScript::script): > + * loader/CachedScript.h: > + * loader/appcache/ManifestParser.cpp: > + (WebCore::parseManifest): > + Use TextResourceDecoder, as TextEncoding::decode() no longer checks for BOM. Isn't this a bit more expensive? Are there really no differences in behavior other than the BOM check by using the TextResourceDecoder? > if (m_source != UserChosenEncoding && m_source != AutoDetectedEncoding && encoding().isJapanese()) > detectJapaneseEncoding(data, len); > > - ASSERT(encoding().isValid()); > + ASSERT(m_encoding.isValid()); There's a mix here of m_encoding and encoding(). We should consistently use m_encoding, I think. I'll say r=me for now. I'm thinking that maybe we should remove the decode functions from TextEncoding, since that's just a convenience for newTextCodec. What callers really need that and are they better served by having that helper rather than calling newTextCodec directly?

Alexey Proskuryakov

Comment 5 2009-03-10 05:57:14 PDT

(In reply to comment #4) > Isn't this a bit more expensive? Are there really no differences in behavior > other than the BOM check by using the TextResourceDecoder? For file types that don't have charset declarations internally, the only difference I can think of is enabling of Japanese encoding auto-detection. I do not expect to see any measurable difference in performance, although it's possible that I underestimate some book-keeping work that is done by TextResourceDecoder. > I'm thinking that maybe we should remove the decode > functions from TextEncoding, since that's just a convenience for newTextCodec. > What callers really need that and are they better served by having that helper > rather than calling newTextCodec directly? The primary reason I didn't look into removing TextEncoding::decode() was that I wasn't sure what to do with TextEncoding:: encode().

Darin Adler

Comment 6 2009-03-10 06:00:18 PDT

(In reply to comment #5) > For file types that don't have charset declarations internally, the only > difference I can think of is enabling of Japanese encoding auto-detection. We don't want that auto-detection for scripts, do we? That's the behavior change from this patch, right? > The primary reason I didn't look into removing TextEncoding::decode() was that > I wasn't sure what to do with TextEncoding:: encode(). I think we can leave encode there even if we do remove decode. There was already asymmetry between the two due to the presence of the TextDecoder object. And some day we might find some great thing to do with encode.

Alexey Proskuryakov

Comment 7 2009-03-10 06:19:02 PDT

(In reply to comment #6) > We don't want that auto-detection for scripts, do we? That's the behavior > change from this patch, right? Yes, it's a somewhat hidden change indeed. I can't say for sure whether we want it or not - there doesn't seem to be a reason to handle JS and other text subresources differently in this regard. There are certainly security concerns related to script decoding, but they apply to inline scripts as well as to external ones. We don't have any tests for Japanese encoding auto-detection, and I'd be reluctant to add any tests that aren't based on actual Web content. > I think we can leave encode there even if we do remove decode. OK. I think that can can be done in a separate patch though (which I don't promise to make right away).

Darin Adler

Comment 8 2009-03-10 06:36:29 PDT

(In reply to comment #7) > > I think we can leave encode there even if we do remove decode. > > OK. I think that can can be done in a separate patch though (which I don't > promise to make right away). Absolutely. No hurry on this.

Alexey Proskuryakov

Comment 9 2009-03-10 07:11:38 PDT

Committed revision 41551. I added a note about text encoding detection to the ChangeLog.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Page Loading

Assignee

Alexey Proskuryakov

Reported

2009-03-02 02:28 PST

Modified

2009-03-10 07:11 PDT History

CC List

1 user Show

URL

Keywords

Depends on

Blocks