Bug 47691 - Support readAsArrayBuffer in FileReader and FileReaderSync
Summary: Support readAsArrayBuffer in FileReader and FileReaderSync
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore JavaScript (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Jian Li
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-14 13:47 PDT by Jian Li
Modified: 2010-10-29 11:25 PDT (History)
4 users (show)

See Also:


Attachments
Proposed Patch (45.03 KB, patch)
2010-10-14 16:43 PDT, Jian Li
jianli: commit-queue-
Details | Formatted Diff | Diff
Proposed Patch (45.85 KB, patch)
2010-10-15 16:13 PDT, Jian Li
jianli: commit-queue-
Details | Formatted Diff | Diff
Proposed Patch (45.85 KB, patch)
2010-10-19 18:04 PDT, Jian Li
jianli: commit-queue-
Details | Formatted Diff | Diff
Proposed Patch (48.50 KB, patch)
2010-10-21 14:54 PDT, Jian Li
levin: review-
jianli: commit-queue-
Details | Formatted Diff | Diff
Proposed Patch (51.38 KB, patch)
2010-10-25 17:40 PDT, Jian Li
jianli: commit-queue-
Details | Formatted Diff | Diff
Proposed Patch (74.78 KB, patch)
2010-10-26 18:17 PDT, Jian Li
jianli: commit-queue-
Details | Formatted Diff | Diff
Proposed Patch (74.75 KB, patch)
2010-10-26 18:26 PDT, Jian Li
levin: review+
jianli: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jian Li 2010-10-14 13:47:36 PDT
We need to support readAsArrayBuffer in FileReader and FileReaderSync per the latest File API spec:
  http://dev.w3.org/2006/webapi/FileAPI/#dfn-readAsArrayBuffer
Comment 1 Jian Li 2010-10-14 16:43:45 PDT
Created attachment 70797 [details]
Proposed Patch
Comment 2 Jian Li 2010-10-15 16:13:08 PDT
Created attachment 70911 [details]
Proposed Patch

Updated the patch to make it work for commit-queue.
Comment 3 Kinuko Yasuda 2010-10-19 16:06:02 PDT
Comment on attachment 70911 [details]
Proposed Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=70911&action=review

> WebCore/fileapi/FileReader.cpp:335
> +    // If an error occur, return an empty result.

nit: If an error occur -> occurs or has occured

> WebCore/fileapi/FileReader.cpp:340
> +    if (m_state == Completed && !m_isRawDataConverted && m_readType != ReadFileAsBinaryString) {

Early exit might be better.

Does this mean we no longer support returning partial result for ReadAsText?

> WebCore/fileapi/FileReaderSync.cpp:188
> +        convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder);

Don't we need to set isRawDataConverted here (and below)?
Comment 4 Jian Li 2010-10-19 16:18:09 PDT
(In reply to comment #3)
> (From update of attachment 70911 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=70911&action=review
> 
> > WebCore/fileapi/FileReader.cpp:335
> > +    // If an error occur, return an empty result.
> 
> nit: If an error occur -> occurs or has occured

Will fix and upload a new patch.

> 
> > WebCore/fileapi/FileReader.cpp:340
> > +    if (m_state == Completed && !m_isRawDataConverted && m_readType != ReadFileAsBinaryString) {
> 
> Early exit might be better.

I prefer not use exit earlier since otherwise we will have to duplicate the return code "return m_build.toSting().

> 
> Does this mean we no longer support returning partial result for ReadAsText?

Yes, this is to be consistent with the latest spec and FF implementation.
> 
> > WebCore/fileapi/FileReaderSync.cpp:188
> > +        convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder);
> 
> Don't we need to set isRawDataConverted here (and below)?

No, we do not need to use a flag as in FileReader where delay-conversion is intended. We will do the conversion right when FileReaderSync::didReceiveData is called.
Comment 5 Jian Li 2010-10-19 18:04:46 PDT
Created attachment 71236 [details]
Proposed Patch
Comment 6 Kinuko Yasuda 2010-10-20 16:25:42 PDT
> > Does this mean we no longer support returning partial result for ReadAsText?
> 
> Yes, this is to be consistent with the latest spec and FF implementation.

I may need to refresh my memory... I vaguely remember that Jonus mentioned that it'd be a bug in FF, though it was sometime ago.  The Oct 19 version of spec still seems to have the notion about partial result for readAsText?
http://dev.w3.org/2006/webapi/FileAPI/#filedata-attr

> > > WebCore/fileapi/FileReaderSync.cpp:188
> > > +        convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder);
> > 
> > Don't we need to set isRawDataConverted here (and below)?
> 
> No, we do not need to use a flag as in FileReader where delay-conversion is intended. We will do the conversion right when FileReaderSync::didReceiveData is called.

I got it, thanks.
Comment 7 Jian Li 2010-10-21 14:54:21 PDT
Created attachment 71496 [details]
Proposed Patch
Comment 8 Jian Li 2010-10-21 14:57:10 PDT
(In reply to comment #6)
> > > Does this mean we no longer support returning partial result for ReadAsText?
> > 
> > Yes, this is to be consistent with the latest spec and FF implementation.
> 
> I may need to refresh my memory... I vaguely remember that Jonus mentioned that it'd be a bug in FF, though it was sometime ago.  The Oct 19 version of spec still seems to have the notion about partial result for readAsText?
> http://dev.w3.org/2006/webapi/FileAPI/#filedata-attr

I read other part of the spec which does not say such thing clearly. I fixed readAsText plus readAsArrayBuffer to support returning partial data.
Comment 9 David Levin 2010-10-24 18:58:42 PDT
Comment on attachment 71496 [details]
Proposed Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=71496&action=review

I may have more comments later, but there are enough issues here to sort out first.

> WebCore/ChangeLog:34
> +        functionalities to FileReaderSync.

Why?

> WebCore/fileapi/FileReader.cpp:99
> +

What happens if FileReader::start is called a second time (while the first call is still proceeding)?

> WebCore/fileapi/FileReader.cpp:215
> +    if (static_cast<unsigned long long>(m_totalBytes) >  static_cast<unsigned long long>(std::numeric_limits<unsigned>::max())) {

Why is this cast needed "static_cast<unsigned long long>(std::numeric_limits<unsigned>::max())"?

> WebCore/fileapi/FileReader.cpp:222
> +        m_rawData = ArrayBuffer::create(static_cast<unsigned>(m_totalBytes), 1);

static_cast<unsigned>(m_totalBytes)

(Given the number of casts of m_totalBytes) Why not make m_totalBytes unsigned?

You'll need to store the initial assignment of response.expectedContentLength(); in another variable but after the if check, you can store the result in m_totalBytes.

> WebCore/fileapi/FileReader.cpp:@
>  void FileReader::didReceiveData(const char* data, int lengthReceived)

It looks like a fair amount of code is duplicated between FileReader and FilereaderSync can this be reduced?

> WebCore/fileapi/FileReader.cpp:322
> +PassRefPtr<ArrayBuffer> FileReader::arrayBufferResult() const

This should be a ArrayBuffer* since it isn't return ownership.

> WebCore/fileapi/FileReader.cpp:342
> +    // Do the coversion if not yet.

typo: coversion

> WebCore/fileapi/FileReader.h:157
> +    // We have to keep track of all the raw data for all readings other than binary string.

s/readings/encodings/

> WebCore/fileapi/FileReaderSync.cpp:50
> +#include <wtf/RefPtr.h>

I don't think you have any test to verify that using a FileReaderSync twice in a row (using the binary string works) because it doesn't appear to at the moment.

It looks like the second call will get its result appended to the results from the first call.

Please add this test.

> WebCore/fileapi/FileReaderSync.cpp:61
> +PassRefPtr<ArrayBuffer> FileReaderSync::readAsArrayBuffer(ScriptExecutionContext* scriptExecutionContext, Blob* blob, ExceptionCode& ec)

This shouldn't be a PassRefPtr because it isn't returning ownership.

Actually, why not return ownership?
Just do
 return m_rawData.release();

> WebCore/fileapi/FileReaderSync.cpp:100
>  {

ASSERT(!m_rawData.get());  (or set it to 0).

Actually there is a lot of state to be reset here that doesn't appear to be.

This would happen nearly automatically if FileReaderSyncLoader existed. Why did that go away?

> WebCore/fileapi/FileReaderSync.cpp:153
> +        m_rawData = ArrayBuffer::create(static_cast<unsigned>(totalBytes), 1);

Why not use m_rawData for all cases?

> WebCore/fileapi/FileReaderSync.cpp:188
> +        convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder);

Why isn't m_rawData cleared here?

> WebCore/fileapi/FileReaderSync.cpp:191
> +        FileReader::convertToDataURL(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_blobType, m_builder);

Why isn't m_rawData cleared here?

> WebCore/fileapi/FileReaderSync.h:89
> +    // We have to keep track of all the raw data for all readings other than binary string.

This comment doesn't explain to me the need to keep m_rawData. After all, the code previously supported other encodings without the need to keep around m_rawData.
Comment 10 Jian Li 2010-10-25 17:40:08 PDT
> > WebCore/ChangeLog:34
> > +        functionalities to FileReaderSync.
> 
> Why?
To simplify the logic. I think embedding the loading logic in the FileReaderSync seems to be simpler. Updated the comment in ChangeLog.

> 
> > WebCore/fileapi/FileReader.cpp:99
> > +
> 
> What happens if FileReader::start is called a second time (while the first call is still proceeding)?
FileReader::start will not be called 2nd time because FileReader::readInternal will only schedule delayedStart to be called once when m_state is None though readInternal could be called multiple times.

> > WebCore/fileapi/FileReader.cpp:215
> > +    if (static_cast<unsigned long long>(m_totalBytes) >  static_cast<unsigned long long>(std::numeric_limits<unsigned>::max())) {
> 
> Why is this cast needed "static_cast<unsigned long long>(std::numeric_limits<unsigned>::max())"?
Removed.

> > WebCore/fileapi/FileReader.cpp:222
> > +        m_rawData = ArrayBuffer::create(static_cast<unsigned>(m_totalBytes), 1);
> 
> static_cast<unsigned>(m_totalBytes)
> 
> (Given the number of casts of m_totalBytes) Why not make m_totalBytes unsigned?
> 
> You'll need to store the initial assignment of response.expectedContentLength(); in another variable but after the if check, you can store the result in m_totalBytes.
Done.

> > WebCore/fileapi/FileReader.cpp:@
> >  void FileReader::didReceiveData(const char* data, int lengthReceived)
> 
> It looks like a fair amount of code is duplicated between FileReader and FilereaderSync can this be reduced?
Good thought. However, I rather do it in a separate patch because otherwise it will make the patch too big. I added a FIXME for it.

> > WebCore/fileapi/FileReader.cpp:322
> > +PassRefPtr<ArrayBuffer> FileReader::arrayBufferResult() const
> 
> This should be a ArrayBuffer* since it isn't return ownership.
I think we want to return PassRefPtr<ArrayBuffer> so that ArrayBuffer object can still live after FileReader object is GC-ed.

> > WebCore/fileapi/FileReader.cpp:342
> > +    // Do the coversion if not yet.
> 
> typo: coversion
Done.

> > WebCore/fileapi/FileReader.h:157
> > +    // We have to keep track of all the raw data for all readings other than binary string.
> 
> s/readings/encodings/
I think I mean reading for all other scenarios, like as text, as data URL, and as array buffer.

> > WebCore/fileapi/FileReaderSync.cpp:50
> > +#include <wtf/RefPtr.h>
> 
> I don't think you have any test to verify that using a FileReaderSync twice in a row (using the binary string works) because it doesn't appear to at the moment.
> 
> It looks like the second call will get its result appended to the results from the first call.
> 
> Please add this test.
Done.

> > WebCore/fileapi/FileReaderSync.cpp:61
> > +PassRefPtr<ArrayBuffer> FileReaderSync::readAsArrayBuffer(ScriptExecutionContext* scriptExecutionContext, Blob* blob, ExceptionCode& ec)
> 
> This shouldn't be a PassRefPtr because it isn't returning ownership.
> 
> Actually, why not return ownership?
> Just do
>  return m_rawData.release();
Done.

> > WebCore/fileapi/FileReaderSync.cpp:100
> >  {
> 
> ASSERT(!m_rawData.get());  (or set it to 0).
> 
> Actually there is a lot of state to be reset here that doesn't appear to be.
> 
> This would happen nearly automatically if FileReaderSyncLoader existed. Why did that go away?
Fixed. I removed FileReaderSyncLoader to simplify the logic. Yes, it bring up other complexity. I am thinking of introducing FileReaderLoader that can be used by FileReader and FileReaderSync to compact the similar logic. I will do it in another patch.

> > WebCore/fileapi/FileReaderSync.cpp:153
> > +        m_rawData = ArrayBuffer::create(static_cast<unsigned>(totalBytes), 1);
> 
> Why not use m_rawData for all cases?
Because I want to avoid having 2 buffers at one moment when reading as binary string.

> > WebCore/fileapi/FileReaderSync.cpp:188
> > +        convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_builder);
> 
> Why isn't m_rawData cleared here?
Done.

> > WebCore/fileapi/FileReaderSync.cpp:191
> > +        FileReader::convertToDataURL(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_blobType, m_builder);
> 
> Why isn't m_rawData cleared here?
Done.

> > WebCore/fileapi/FileReaderSync.h:89
> > +    // We have to keep track of all the raw data for all readings other than binary string.
> 
> This comment doesn't explain to me the need to keep m_rawData. After all, the code previously supported other encodings without the need to keep around m_rawData.
Removed.
Comment 11 Jian Li 2010-10-25 17:40:45 PDT
Created attachment 71826 [details]
Proposed Patch
Comment 12 David Levin 2010-10-26 05:04:40 PDT
(In reply to comment #10)
> > This should be a ArrayBuffer* since it isn't return ownership.
> I think we want to return PassRefPtr<ArrayBuffer> so that ArrayBuffer object can still live after FileReader object is GC-ed.

The point out ArrayBuffer living on should be managed by the caller not the function.

The function should return an ArrayBuffer*. The caller may then stuff that into a RefPtr (to ensure that the item lives on for as long as it wishes to hold it).

Basically, there is no need to return a PassRefPtr *unless* a function is letting go of its reference. (This return value is about what the function is doing. It is not about what the caller is doing.)


> > > WebCore/fileapi/FileReaderSync.cpp:153
> > > +        m_rawData = ArrayBuffer::create(static_cast<unsigned>(totalBytes), 1);
> > 
> > Why not use m_rawData for all cases?
> Because I want to avoid having 2 buffers at one moment when reading as binary string.

I don't understand this.


Let me sketch out something:
 void FileReaderSync::didReceiveResponse(const ResourceResponse& response)
 {
...
     m_rawData = ArrayBuffer::create(static_cast<unsigned>(totalBytes), 1);
 }

 void FileReaderSync::didReceiveData(const char* data, int lengthReceived)
 {
...
     memcpy(static_cast<char*>(m_rawData->data()) + m_totalLoaded, data, lengthReceived);
     m_totalLoaded += static_cast<unsigned>(lengthReceived);
 }
 void FileReaderSync::didFinishLoading(unsigned long)
 {
...
     StringBuilder builder;
     switch (m_readType) {
     case ReadAsArrayBuffer:
         // Nothing to do since we need no conversion.
         return;
     case ReadAsBinaryString:
         m_string = String(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength());         
         return;
     case ReadAsText: {
         convertToText(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), builder);
         m_rawData = 0;
         m_stringResult = build.toString();
         return;
     }
     case ReadAsDataURL:
         FileReader::convertToDataURL(static_cast<const char*>(m_rawData->data()), m_rawData->byteLength(), m_blobType, m_builder);
         m_rawData = 0;
         m_stringResult = builder.toString();
         return;
     }
 
     ASSERT_NOT_REACHED();
 }

I omitted the error checking (with ...) for brevity.


Then you get rid of the member variable StringBuilder m_builder and add the member variable String m_stringResult ;

This gets rid of several special cases and concentrated them in the didFinishLoading stage.

In fact you could probably change of didFinishLoading to 
   if (hasError())
       m_rawData = 0;

and move the conversion logic into each read function since there is no sharing of logic here.
Comment 13 David Levin 2010-10-26 05:05:10 PDT
s/In fact you could probably change of didFinishLoading to/In fact you could probably change didFinishLoading to/
Comment 14 David Levin 2010-10-26 05:15:04 PDT
Comment on attachment 71826 [details]
Proposed Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=71826&action=review

> WebCore/WebCore.xcodeproj/project.pbxproj:-21224
> -			developmentRegion = English;

Why did this line get removed? (Are you using the xcode version which removes this and someone else will add it?)

> WebCore/bindings/js/JSFileReaderCustom.cpp:48
> +    switch (imp->readType()) {

A switch for a since case just seems odd (with a default -- it looks like an if/else to me) but ok.

> WebCore/fileapi/FileReader.cpp:229
> +    if (static_cast<unsigned long long>(response.expectedContentLength()) >  std::numeric_limits<unsigned>::max()) {

Why is this code different from the exact same code in the FileReaderSync which stores the value of expectedContentLength in a variable?

Please make this one look the same (and in a subsequent patch make them the same function).

> WebCore/fileapi/FileReaderSync.cpp:185
> +    if (hasError())

m_rawData = 0;

Here as well.
Comment 15 David Levin 2010-10-26 05:20:52 PDT
I'm having a really hard time reviewing this due to the duplication of code between FileReader and FileReaderSync.

The problem is that I'll make suggestions for FileReaderSync but the ideas may not be applicable when the logic is de-duplicated.

Also the duplication means that I'm reviewing similar code twice but it isn't quite the same and it becomes harder to review in this fashion. I would do a much better job if I only had to focus on one set of logic changes.

In short, it would be much, much better if the de-duplication happened before this additional functionality was added. Otherwise, I'm not going to do as good of a job with understanding this code. I'm sorry to put this in the way of landing the patch but I think it will lead to better code in the end.
Comment 16 Jian Li 2010-10-26 18:17:37 PDT
Created attachment 71973 [details]
Proposed Patch
Comment 17 Jian Li 2010-10-26 18:22:28 PDT
(In reply to comment #15)
> I'm having a really hard time reviewing this due to the duplication of code between FileReader and FileReaderSync.
> 
> The problem is that I'll make suggestions for FileReaderSync but the ideas may not be applicable when the logic is de-duplicated.
> 
> Also the duplication means that I'm reviewing similar code twice but it isn't quite the same and it becomes harder to review in this fashion. I would do a much better job if I only had to focus on one set of logic changes.
> 
> In short, it would be much, much better if the de-duplication happened before this additional functionality was added. Otherwise, I'm not going to do as good of a job with understanding this code. I'm sorry to put this in the way of landing the patch but I think it will lead to better code in the end.

I've refactored the code to merge the similar code into FileReaderLoader.

Also, I've addressed all the issues except PassRefPtr<ArrayBuffer>. I understand why we could return the pointer directly. However, when I tried it, I got the problem because the JS generated code assume PassRefPtr<ArrayBuffer>. Unless we want to introduce custom code for FileReaderSync::readAsArrayBuffer, we have to keep using PassRefPtr<ArrayBuffer>.
Comment 18 Jian Li 2010-10-26 18:26:32 PDT
Created attachment 71975 [details]
Proposed Patch

Fixed style error.
Comment 19 David Levin 2010-10-26 20:52:01 PDT
Comment on attachment 71975 [details]
Proposed Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=71975&action=review

First impression is that this is awesome.  Thanks for doing this!

Ok, I only had a few comments. Several are simple enough to just handle quickly and commit. If any require significant changes (and you want another review on them), then please submit without the change and create another bug for the extra change.

> WebCore/WebCore.xcodeproj/project.pbxproj:-21238
> -			developmentRegion = English;

Why is this line deleted?

> WebCore/bindings/js/JSFileReaderCustom.cpp:50
> +    else

No else here due to return in if condition.

> WebCore/bindings/v8/custom/V8FileReaderCustom.cpp:69
> +    switch (imp->readType()) {

You didn't need to convert the switch to an if in the jsc code. You can go either way, but it would be nice to do the same thing in the jsc bindings and v8 bindings, so pick one and do the same in both.

> WebCore/fileapi/FileReaderLoader.cpp:111
> +    m_loader = 0;

please double check that it is ok to do to this when you get called from didFail. (That somehow the loader doesn't try to do something after that call and it already destroyed.)

> WebCore/fileapi/FileReaderLoader.cpp:145
> +    // Bail out if we encounetr an error.

typo: encounetr

> WebCore/fileapi/FileReaderLoader.cpp:204
> +    return m_rawData;

What does it mean to return this when m_rawData keeps changing as we get more data (in the async case)?

Is the caller able to modify it?

Do we need to make a copy here?

> WebCore/fileapi/FileReaderLoader.cpp:219
> +    if (m_readType == ReadAsBinaryString)

switch on m_readType?

> WebCore/fileapi/FileReaderLoader.cpp:293
> +

Extra returns.

> WebCore/fileapi/FileReaderLoaderClient.h:38
> +class FileReaderLoaderClient {

Since FileReader is the only thing to implement FileReaderLoaderClient, why not get rid of this interface and just pass in FileReader?
Comment 20 David Levin 2010-10-26 20:52:52 PDT
(In reply to comment #19)
> > WebCore/fileapi/FileReaderLoader.cpp:293
> > +
> 
> Extra returns.
> 
I should have said extra blank lines.
Comment 21 Jian Li 2010-10-29 11:24:59 PDT
Committed as http://trac.webkit.org/changeset/70904.
Comment 22 Jian Li 2010-10-29 11:25:53 PDT
All fixed except:

(In reply to comment #19)

> > WebCore/fileapi/FileReaderLoaderClient.h:38
> > +class FileReaderLoaderClient {
> 
> Since FileReader is the only thing to implement FileReaderLoaderClient, why not get rid of this interface and just pass in FileReader?

I will address this in another patch.