Bug 39131

Summary: Refactor text encoding detection logic in FileReader
Product: WebKit Reporter: Jian Li <jianli>
Component: WebCore JavaScriptAssignee: Jian Li <jianli>
Severity: Normal CC: abarth, ap, darin, dimich
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Description Flags
Proposed Patch
ap: review-, jianli: commit-queue-
Proposed Patch
jianli: commit-queue-
Proposed Patch ap: review+, jianli: commit-queue-

Description Jian Li 2010-05-14 13:09:58 PDT
This is the cleanup for the FileReader patch (https://bugs.webkit.org/show_bug.cgi?id=38157) per the comment from Adam Barth:

 +          if (m_rawData.size() >= 2 && m_rawData[0] == '\xFE' && m_rawData[1] == '\xFF') {
Yuck.  Is this in the spec?  We should move this to a more generic location.
Comment 1 Jian Li 2010-05-14 13:11:21 PDT
Created attachment 56101 [details]
Proposed Patch
Comment 2 Alexey Proskuryakov 2010-05-14 16:49:06 PDT
Comment on attachment 56101 [details]
Proposed Patch

Encoding sniffing is definitely not something the TextEncoding class should responsible for. And I think that we already have all the code for this - can't you just feed the data to TextResourceDecoder as text/plain?
Comment 3 Jian Li 2010-05-19 15:51:26 PDT
Created attachment 56525 [details]
Proposed Patch

Changed to call TextResourceDecoder::decode.
Comment 4 Alexey Proskuryakov 2010-05-19 16:26:23 PDT
+            // Per the File API spec, we SHOULD use the supplied encoding and do not check for BOM if the encoding is provided and valid.

Since this is SHOULD, and not a MUST, we could also ignore this requirement, like we do for Web content (it's also debatable whether it's right to always have the BOM override every other indication of character set, but that's what we do).

I don't have a strong opinion, just wanted to point out that it might be good to be consistent within WebKit.
Comment 5 Jian Li 2010-05-19 16:40:34 PDT
Created attachment 56531 [details]
Proposed Patch

Your suggestion makes sense. I changed to ignore this requirement in order to be consistent with other part of WebKit.
Comment 6 Alexey Proskuryakov 2010-05-19 16:54:29 PDT
Comment on attachment 56531 [details]
Proposed Patch

+    m_result = m_decoder->decode(&m_rawData.at(0), m_rawData.size());

One also needs to call flush() to make sure that e.g. incomplete UTF-8 sequences at the end get converted to U+FFFD REPLACEMENT CHARACTER, and to be generally compatible with TextResourceDecoder design.

r=me if you call flush() and append the result.
Comment 7 Jian Li 2010-05-19 17:25:15 PDT
Fixed as suggested and committed as http://trac.webkit.org/changeset/59797.