Bug 47691

Summary: Support readAsArrayBuffer in FileReader and FileReaderSync
Product: WebKit Reporter: Jian Li <jianli>
Component: WebCore JavaScriptAssignee: Jian Li <jianli>
Status: RESOLVED FIXED    
Severity: Normal CC: dimich, ericu, kinuko, levin
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
Attachments:
Description Flags
Proposed Patch
jianli: commit-queue-
Proposed Patch
jianli: commit-queue-
Proposed Patch
jianli: commit-queue-
Proposed Patch
levin: review-, jianli: commit-queue-
Proposed Patch
jianli: commit-queue-
Proposed Patch
jianli: commit-queue-
Proposed Patch levin: review+, jianli: commit-queue-

Jian Li
Reported 2010-10-14 13:47:36 PDT
We need to support readAsArrayBuffer in FileReader and FileReaderSync per the latest File API spec: http://dev.w3.org/2006/webapi/FileAPI/#dfn-readAsArrayBuffer
Attachments
Proposed Patch (45.03 KB, patch)
2010-10-14 16:43 PDT, Jian Li
jianli: commit-queue-
Proposed Patch (45.85 KB, patch)
2010-10-15 16:13 PDT, Jian Li
jianli: commit-queue-
Proposed Patch (45.85 KB, patch)
2010-10-19 18:04 PDT, Jian Li
jianli: commit-queue-
Proposed Patch (48.50 KB, patch)
2010-10-21 14:54 PDT, Jian Li
levin: review-
jianli: commit-queue-
Proposed Patch (51.38 KB, patch)
2010-10-25 17:40 PDT, Jian Li
jianli: commit-queue-
Proposed Patch (74.78 KB, patch)
2010-10-26 18:17 PDT, Jian Li
jianli: commit-queue-
Proposed Patch (74.75 KB, patch)
2010-10-26 18:26 PDT, Jian Li
levin: review+
jianli: commit-queue-
Jian Li
Comment 1 2010-10-14 16:43:45 PDT
Created attachment 70797 [details] Proposed Patch
Jian Li
Comment 2 2010-10-15 16:13:08 PDT
Created attachment 70911 [details] Proposed Patch Updated the patch to make it work for commit-queue.
Kinuko Yasuda
Comment 3 2010-10-19 16:06:02 PDT
Comment on attachment 70911 [details] Proposed Patch View in context: https://bugs.webkit.org/attachment.cgi?id=70911&action=review > WebCore/fileapi/FileReader.cpp:335 > + // If an error occur, return an empty result. nit: If an error occur -> occurs or has occured > WebCore/fileapi/FileReader.cpp:340 > + if (m_state == Completed && !m_isRawDataConverted && m_readType != ReadFileAsBinaryString) { Early exit might be better. Does this mean we no longer support returning partial result for ReadAsText? > WebCore/fileapi/FileReaderSync.cpp:188 > + convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder); Don't we need to set isRawDataConverted here (and below)?
Jian Li
Comment 4 2010-10-19 16:18:09 PDT
(In reply to comment #3) > (From update of attachment 70911 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=70911&action=review > > > WebCore/fileapi/FileReader.cpp:335 > > + // If an error occur, return an empty result. > > nit: If an error occur -> occurs or has occured Will fix and upload a new patch. > > > WebCore/fileapi/FileReader.cpp:340 > > + if (m_state == Completed && !m_isRawDataConverted && m_readType != ReadFileAsBinaryString) { > > Early exit might be better. I prefer not use exit earlier since otherwise we will have to duplicate the return code "return m_build.toSting(). > > Does this mean we no longer support returning partial result for ReadAsText? Yes, this is to be consistent with the latest spec and FF implementation. > > > WebCore/fileapi/FileReaderSync.cpp:188 > > + convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder); > > Don't we need to set isRawDataConverted here (and below)? No, we do not need to use a flag as in FileReader where delay-conversion is intended. We will do the conversion right when FileReaderSync::didReceiveData is called.
Jian Li
Comment 5 2010-10-19 18:04:46 PDT
Created attachment 71236 [details] Proposed Patch
Kinuko Yasuda
Comment 6 2010-10-20 16:25:42 PDT
> > Does this mean we no longer support returning partial result for ReadAsText? > > Yes, this is to be consistent with the latest spec and FF implementation. I may need to refresh my memory... I vaguely remember that Jonus mentioned that it'd be a bug in FF, though it was sometime ago. The Oct 19 version of spec still seems to have the notion about partial result for readAsText? http://dev.w3.org/2006/webapi/FileAPI/#filedata-attr > > > WebCore/fileapi/FileReaderSync.cpp:188 > > > + convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder); > > > > Don't we need to set isRawDataConverted here (and below)? > > No, we do not need to use a flag as in FileReader where delay-conversion is intended. We will do the conversion right when FileReaderSync::didReceiveData is called. I got it, thanks.
Jian Li
Comment 7 2010-10-21 14:54:21 PDT
Created attachment 71496 [details] Proposed Patch
Jian Li
Comment 8 2010-10-21 14:57:10 PDT
(In reply to comment #6) > > > Does this mean we no longer support returning partial result for ReadAsText? > > > > Yes, this is to be consistent with the latest spec and FF implementation. > > I may need to refresh my memory... I vaguely remember that Jonus mentioned that it'd be a bug in FF, though it was sometime ago. The Oct 19 version of spec still seems to have the notion about partial result for readAsText? > http://dev.w3.org/2006/webapi/FileAPI/#filedata-attr I read other part of the spec which does not say such thing clearly. I fixed readAsText plus readAsArrayBuffer to support returning partial data.
David Levin
Comment 9 2010-10-24 18:58:42 PDT
Comment on attachment 71496 [details] Proposed Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71496&action=review I may have more comments later, but there are enough issues here to sort out first. > WebCore/ChangeLog:34 > + functionalities to FileReaderSync. Why? > WebCore/fileapi/FileReader.cpp:99 > + What happens if FileReader::start is called a second time (while the first call is still proceeding)? > WebCore/fileapi/FileReader.cpp:215 > + if (static_cast<unsigned long long>(m_totalBytes) > static_cast<unsigned long long>(std::numeric_limits<unsigned>::max())) { Why is this cast needed "static_cast<unsigned long long>(std::numeric_limits<unsigned>::max())"? > WebCore/fileapi/FileReader.cpp:222 > + m_rawData = ArrayBuffer::create(static_cast<unsigned>(m_totalBytes), 1); static_cast<unsigned>(m_totalBytes) (Given the number of casts of m_totalBytes) Why not make m_totalBytes unsigned? You'll need to store the initial assignment of response.expectedContentLength(); in another variable but after the if check, you can store the result in m_totalBytes. > WebCore/fileapi/FileReader.cpp:@ > void FileReader::didReceiveData(const char* data, int lengthReceived) It looks like a fair amount of code is duplicated between FileReader and FilereaderSync can this be reduced? > WebCore/fileapi/FileReader.cpp:322 > +PassRefPtr<ArrayBuffer> FileReader::arrayBufferResult() const This should be a ArrayBuffer* since it isn't return ownership. > WebCore/fileapi/FileReader.cpp:342 > + // Do the coversion if not yet. typo: coversion > WebCore/fileapi/FileReader.h:157 > + // We have to keep track of all the raw data for all readings other than binary string. s/readings/encodings/ > WebCore/fileapi/FileReaderSync.cpp:50 > +#include <wtf/RefPtr.h> I don't think you have any test to verify that using a FileReaderSync twice in a row (using the binary string works) because it doesn't appear to at the moment. It looks like the second call will get its result appended to the results from the first call. Please add this test. > WebCore/fileapi/FileReaderSync.cpp:61 > +PassRefPtr<ArrayBuffer> FileReaderSync::readAsArrayBuffer(ScriptExecutionContext* scriptExecutionContext, Blob* blob, ExceptionCode& ec) This shouldn't be a PassRefPtr because it isn't returning ownership. Actually, why not return ownership? Just do return m_rawData.release(); > WebCore/fileapi/FileReaderSync.cpp:100 > { ASSERT(!m_rawData.get()); (or set it to 0). Actually there is a lot of state to be reset here that doesn't appear to be. This would happen nearly automatically if FileReaderSyncLoader existed. Why did that go away? > WebCore/fileapi/FileReaderSync.cpp:153 > + m_rawData = ArrayBuffer::create(static_cast<unsigned>(totalBytes), 1); Why not use m_rawData for all cases? > WebCore/fileapi/FileReaderSync.cpp:188 > + convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder); Why isn't m_rawData cleared here? > WebCore/fileapi/FileReaderSync.cpp:191 > + FileReader::convertToDataURL(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_blobType, m_builder); Why isn't m_rawData cleared here? > WebCore/fileapi/FileReaderSync.h:89 > + // We have to keep track of all the raw data for all readings other than binary string. This comment doesn't explain to me the need to keep m_rawData. After all, the code previously supported other encodings without the need to keep around m_rawData.
Jian Li
Comment 10 2010-10-25 17:40:08 PDT
> > WebCore/ChangeLog:34 > > + functionalities to FileReaderSync. > > Why? To simplify the logic. I think embedding the loading logic in the FileReaderSync seems to be simpler. Updated the comment in ChangeLog. > > > WebCore/fileapi/FileReader.cpp:99 > > + > > What happens if FileReader::start is called a second time (while the first call is still proceeding)? FileReader::start will not be called 2nd time because FileReader::readInternal will only schedule delayedStart to be called once when m_state is None though readInternal could be called multiple times. > > WebCore/fileapi/FileReader.cpp:215 > > + if (static_cast<unsigned long long>(m_totalBytes) > static_cast<unsigned long long>(std::numeric_limits<unsigned>::max())) { > > Why is this cast needed "static_cast<unsigned long long>(std::numeric_limits<unsigned>::max())"? Removed. > > WebCore/fileapi/FileReader.cpp:222 > > + m_rawData = ArrayBuffer::create(static_cast<unsigned>(m_totalBytes), 1); > > static_cast<unsigned>(m_totalBytes) > > (Given the number of casts of m_totalBytes) Why not make m_totalBytes unsigned? > > You'll need to store the initial assignment of response.expectedContentLength(); in another variable but after the if check, you can store the result in m_totalBytes. Done. > > WebCore/fileapi/FileReader.cpp:@ > > void FileReader::didReceiveData(const char* data, int lengthReceived) > > It looks like a fair amount of code is duplicated between FileReader and FilereaderSync can this be reduced? Good thought. However, I rather do it in a separate patch because otherwise it will make the patch too big. I added a FIXME for it. > > WebCore/fileapi/FileReader.cpp:322 > > +PassRefPtr<ArrayBuffer> FileReader::arrayBufferResult() const > > This should be a ArrayBuffer* since it isn't return ownership. I think we want to return PassRefPtr<ArrayBuffer> so that ArrayBuffer object can still live after FileReader object is GC-ed. > > WebCore/fileapi/FileReader.cpp:342 > > + // Do the coversion if not yet. > > typo: coversion Done. > > WebCore/fileapi/FileReader.h:157 > > + // We have to keep track of all the raw data for all readings other than binary string. > > s/readings/encodings/ I think I mean reading for all other scenarios, like as text, as data URL, and as array buffer. > > WebCore/fileapi/FileReaderSync.cpp:50 > > +#include <wtf/RefPtr.h> > > I don't think you have any test to verify that using a FileReaderSync twice in a row (using the binary string works) because it doesn't appear to at the moment. > > It looks like the second call will get its result appended to the results from the first call. > > Please add this test. Done. > > WebCore/fileapi/FileReaderSync.cpp:61 > > +PassRefPtr<ArrayBuffer> FileReaderSync::readAsArrayBuffer(ScriptExecutionContext* scriptExecutionContext, Blob* blob, ExceptionCode& ec) > > This shouldn't be a PassRefPtr because it isn't returning ownership. > > Actually, why not return ownership? > Just do > return m_rawData.release(); Done. > > WebCore/fileapi/FileReaderSync.cpp:100 > > { > > ASSERT(!m_rawData.get()); (or set it to 0). > > Actually there is a lot of state to be reset here that doesn't appear to be. > > This would happen nearly automatically if FileReaderSyncLoader existed. Why did that go away? Fixed. I removed FileReaderSyncLoader to simplify the logic. Yes, it bring up other complexity. I am thinking of introducing FileReaderLoader that can be used by FileReader and FileReaderSync to compact the similar logic. I will do it in another patch. > > WebCore/fileapi/FileReaderSync.cpp:153 > > + m_rawData = ArrayBuffer::create(static_cast<unsigned>(totalBytes), 1); > > Why not use m_rawData for all cases? Because I want to avoid having 2 buffers at one moment when reading as binary string. > > WebCore/fileapi/FileReaderSync.cpp:188 > > + convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder); > > Why isn't m_rawData cleared here? Done. > > WebCore/fileapi/FileReaderSync.cpp:191 > > + FileReader::convertToDataURL(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_blobType, m_builder); > > Why isn't m_rawData cleared here? Done. > > WebCore/fileapi/FileReaderSync.h:89 > > + // We have to keep track of all the raw data for all readings other than binary string. > > This comment doesn't explain to me the need to keep m_rawData. After all, the code previously supported other encodings without the need to keep around m_rawData. Removed.
Jian Li
Comment 11 2010-10-25 17:40:45 PDT
Created attachment 71826 [details] Proposed Patch
David Levin
Comment 12 2010-10-26 05:04:40 PDT
(In reply to comment #10) > > This should be a ArrayBuffer* since it isn't return ownership. > I think we want to return PassRefPtr<ArrayBuffer> so that ArrayBuffer object can still live after FileReader object is GC-ed. The point out ArrayBuffer living on should be managed by the caller not the function. The function should return an ArrayBuffer*. The caller may then stuff that into a RefPtr (to ensure that the item lives on for as long as it wishes to hold it). Basically, there is no need to return a PassRefPtr *unless* a function is letting go of its reference. (This return value is about what the function is doing. It is not about what the caller is doing.) > > > WebCore/fileapi/FileReaderSync.cpp:153 > > > + m_rawData = ArrayBuffer::create(static_cast<unsigned>(totalBytes), 1); > > > > Why not use m_rawData for all cases? > Because I want to avoid having 2 buffers at one moment when reading as binary string. I don't understand this. Let me sketch out something: void FileReaderSync::didReceiveResponse(const ResourceResponse& response) { ... m_rawData = ArrayBuffer::create(static_cast<unsigned>(totalBytes), 1); } void FileReaderSync::didReceiveData(const char* data, int lengthReceived) { ... memcpy(static_cast<char*>(m_rawData->data()) + m_totalLoaded, data, lengthReceived); m_totalLoaded += static_cast<unsigned>(lengthReceived); } void FileReaderSync::didFinishLoading(unsigned long) { ... StringBuilder builder; switch (m_readType) { case ReadAsArrayBuffer: // Nothing to do since we need no conversion. return; case ReadAsBinaryString: m_string = String(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength()); return; case ReadAsText: { convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), builder); m_rawData = 0; m_stringResult = build.toString(); return; } case ReadAsDataURL: FileReader::convertToDataURL(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_blobType, m_builder); m_rawData = 0; m_stringResult = builder.toString(); return; } ASSERT_NOT_REACHED(); } I omitted the error checking (with ...) for brevity. Then you get rid of the member variable StringBuilder m_builder and add the member variable String m_stringResult ; This gets rid of several special cases and concentrated them in the didFinishLoading stage. In fact you could probably change of didFinishLoading to if (hasError()) m_rawData = 0; and move the conversion logic into each read function since there is no sharing of logic here.
David Levin
Comment 13 2010-10-26 05:05:10 PDT
s/In fact you could probably change of didFinishLoading to/In fact you could probably change didFinishLoading to/
David Levin
Comment 14 2010-10-26 05:15:04 PDT
Comment on attachment 71826 [details] Proposed Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71826&action=review > WebCore/WebCore.xcodeproj/project.pbxproj:-21224 > - developmentRegion = English; Why did this line get removed? (Are you using the xcode version which removes this and someone else will add it?) > WebCore/bindings/js/JSFileReaderCustom.cpp:48 > + switch (imp->readType()) { A switch for a since case just seems odd (with a default -- it looks like an if/else to me) but ok. > WebCore/fileapi/FileReader.cpp:229 > + if (static_cast<unsigned long long>(response.expectedContentLength()) > std::numeric_limits<unsigned>::max()) { Why is this code different from the exact same code in the FileReaderSync which stores the value of expectedContentLength in a variable? Please make this one look the same (and in a subsequent patch make them the same function). > WebCore/fileapi/FileReaderSync.cpp:185 > + if (hasError()) m_rawData = 0; Here as well.
David Levin
Comment 15 2010-10-26 05:20:52 PDT
I'm having a really hard time reviewing this due to the duplication of code between FileReader and FileReaderSync. The problem is that I'll make suggestions for FileReaderSync but the ideas may not be applicable when the logic is de-duplicated. Also the duplication means that I'm reviewing similar code twice but it isn't quite the same and it becomes harder to review in this fashion. I would do a much better job if I only had to focus on one set of logic changes. In short, it would be much, much better if the de-duplication happened before this additional functionality was added. Otherwise, I'm not going to do as good of a job with understanding this code. I'm sorry to put this in the way of landing the patch but I think it will lead to better code in the end.
Jian Li
Comment 16 2010-10-26 18:17:37 PDT
Created attachment 71973 [details] Proposed Patch
Jian Li
Comment 17 2010-10-26 18:22:28 PDT
(In reply to comment #15) > I'm having a really hard time reviewing this due to the duplication of code between FileReader and FileReaderSync. > > The problem is that I'll make suggestions for FileReaderSync but the ideas may not be applicable when the logic is de-duplicated. > > Also the duplication means that I'm reviewing similar code twice but it isn't quite the same and it becomes harder to review in this fashion. I would do a much better job if I only had to focus on one set of logic changes. > > In short, it would be much, much better if the de-duplication happened before this additional functionality was added. Otherwise, I'm not going to do as good of a job with understanding this code. I'm sorry to put this in the way of landing the patch but I think it will lead to better code in the end. I've refactored the code to merge the similar code into FileReaderLoader. Also, I've addressed all the issues except PassRefPtr<ArrayBuffer>. I understand why we could return the pointer directly. However, when I tried it, I got the problem because the JS generated code assume PassRefPtr<ArrayBuffer>. Unless we want to introduce custom code for FileReaderSync::readAsArrayBuffer, we have to keep using PassRefPtr<ArrayBuffer>.
Jian Li
Comment 18 2010-10-26 18:26:32 PDT
Created attachment 71975 [details] Proposed Patch Fixed style error.
David Levin
Comment 19 2010-10-26 20:52:01 PDT
Comment on attachment 71975 [details] Proposed Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71975&action=review First impression is that this is awesome. Thanks for doing this! Ok, I only had a few comments. Several are simple enough to just handle quickly and commit. If any require significant changes (and you want another review on them), then please submit without the change and create another bug for the extra change. > WebCore/WebCore.xcodeproj/project.pbxproj:-21238 > - developmentRegion = English; Why is this line deleted? > WebCore/bindings/js/JSFileReaderCustom.cpp:50 > + else No else here due to return in if condition. > WebCore/bindings/v8/custom/V8FileReaderCustom.cpp:69 > + switch (imp->readType()) { You didn't need to convert the switch to an if in the jsc code. You can go either way, but it would be nice to do the same thing in the jsc bindings and v8 bindings, so pick one and do the same in both. > WebCore/fileapi/FileReaderLoader.cpp:111 > + m_loader = 0; please double check that it is ok to do to this when you get called from didFail. (That somehow the loader doesn't try to do something after that call and it already destroyed.) > WebCore/fileapi/FileReaderLoader.cpp:145 > + // Bail out if we encounetr an error. typo: encounetr > WebCore/fileapi/FileReaderLoader.cpp:204 > + return m_rawData; What does it mean to return this when m_rawData keeps changing as we get more data (in the async case)? Is the caller able to modify it? Do we need to make a copy here? > WebCore/fileapi/FileReaderLoader.cpp:219 > + if (m_readType == ReadAsBinaryString) switch on m_readType? > WebCore/fileapi/FileReaderLoader.cpp:293 > + Extra returns. > WebCore/fileapi/FileReaderLoaderClient.h:38 > +class FileReaderLoaderClient { Since FileReader is the only thing to implement FileReaderLoaderClient, why not get rid of this interface and just pass in FileReader?
David Levin
Comment 20 2010-10-26 20:52:52 PDT
(In reply to comment #19) > > WebCore/fileapi/FileReaderLoader.cpp:293 > > + > > Extra returns. > I should have said extra blank lines.
Jian Li
Comment 21 2010-10-29 11:24:59 PDT
Jian Li
Comment 22 2010-10-29 11:25:53 PDT
All fixed except: (In reply to comment #19) > > WebCore/fileapi/FileReaderLoaderClient.h:38 > > +class FileReaderLoaderClient { > > Since FileReader is the only thing to implement FileReaderLoaderClient, why not get rid of this interface and just pass in FileReader? I will address this in another patch.
Note You need to log in before you can comment on or make changes to this bug.