39131 – Refactor text encoding detection logic in FileReader

RESOLVED FIXED 39131

Refactor text encoding detection logic in FileReader

https://bugs.webkit.org/show_bug.cgi?id=39131

Summary Refactor text encoding detection logic in FileReader

Jian Li

Reported 2010-05-14 13:09:58 PDT

This is the cleanup for the FileReader patch (https://bugs.webkit.org/show_bug.cgi?id=38157) per the comment from Adam Barth: WebCore/html/FileReader.cpp:256 + if (m_rawData.size() >= 2 && m_rawData[0] == '\xFE' && m_rawData[1] == '\xFF') { Yuck. Is this in the spec? We should move this to a more generic location.

Attachments
Proposed Patch (3.68 KB, patch) 2010-05-14 13:11 PDT, Jian Li	ap: review- jianli: commit-queue-	Details Formatted Diff Diff
Proposed Patch (10.01 KB, patch) 2010-05-19 15:51 PDT, Jian Li	jianli: commit-queue-	Details Formatted Diff Diff
Proposed Patch (9.37 KB, patch) 2010-05-19 16:40 PDT, Jian Li	ap: review+ jianli: commit-queue-	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Jian Li

Comment 1 2010-05-14 13:11:21 PDT

Created attachment 56101 [details] Proposed Patch

Alexey Proskuryakov

Comment 2 2010-05-14 16:49:06 PDT

Comment on attachment 56101 [details] Proposed Patch Encoding sniffing is definitely not something the TextEncoding class should responsible for. And I think that we already have all the code for this - can't you just feed the data to TextResourceDecoder as text/plain?

Jian Li

Comment 3 2010-05-19 15:51:26 PDT

Created attachment 56525 [details] Proposed Patch Changed to call TextResourceDecoder::decode.

Alexey Proskuryakov

Comment 4 2010-05-19 16:26:23 PDT

+ // Per the File API spec, we SHOULD use the supplied encoding and do not check for BOM if the encoding is provided and valid. Since this is SHOULD, and not a MUST, we could also ignore this requirement, like we do for Web content (it's also debatable whether it's right to always have the BOM override every other indication of character set, but that's what we do). I don't have a strong opinion, just wanted to point out that it might be good to be consistent within WebKit.

Jian Li

Comment 5 2010-05-19 16:40:34 PDT

Created attachment 56531 [details] Proposed Patch Your suggestion makes sense. I changed to ignore this requirement in order to be consistent with other part of WebKit.

Alexey Proskuryakov

Comment 6 2010-05-19 16:54:29 PDT

Comment on attachment 56531 [details] Proposed Patch + m_result = m_decoder->decode(&m_rawData.at(0), m_rawData.size()); One also needs to call flush() to make sure that e.g. incomplete UTF-8 sequences at the end get converted to U+FFFD REPLACEMENT CHARACTER, and to be generally compatible with TextResourceDecoder design. r=me if you call flush() and append the result.

Jian Li

Comment 7 2010-05-19 17:25:15 PDT

Fixed as suggested and committed as http://trac.webkit.org/changeset/59797.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component WebCore JavaScript

Assignee

Jian Li

Reported

2010-05-14 13:09 PDT

Modified

2010-05-19 17:25 PDT History

CC List

4 users Show

URL

Keywords

Depends on

Blocks