Bug 58293 - NRWT: doesn't support webarchives
Summary: NRWT: doesn't support webarchives
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Dirk Pranke
Depends on:
Blocks: 34984 58625
  Show dependency treegraph
Reported: 2011-04-11 18:00 PDT by Chris Rogers
Modified: 2011-06-15 17:49 PDT (History)
6 users (show)

See Also:

Patch (2.39 KB, patch)
2011-04-11 18:03 PDT, Chris Rogers
no flags Details | Formatted Diff | Diff
Patch (4.32 KB, patch)
2011-06-14 20:17 PDT, Dirk Pranke
eric: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Rogers 2011-04-11 18:00:27 PDT
Follow-up fix to DRT for audio (write Content-Length in header)
Comment 1 Chris Rogers 2011-04-11 18:03:53 PDT
Created attachment 89133 [details]
Comment 2 Dirk Pranke 2011-04-11 18:24:46 PDT
patch looks good to me.
Comment 3 Alexey Proskuryakov 2011-04-11 22:09:17 PDT
What is special about audio files that they need Content-Length? We don't write it for any other types.
Comment 4 Dirk Pranke 2011-04-11 23:00:21 PDT
(In reply to comment #3)
> What is special about audio files that they need Content-Length? We don't write it for any other types.

We write the Content-Length for image files.

While it's true that run-webkit-tests could probably scan over the bytes and look for the #EOF sentinel, this makes the parsing on the client side trivial, and it doesn't seem to hurt much.
Comment 5 Alexey Proskuryakov 2011-04-11 23:36:34 PDT
The same dump() function that is modified here saves application/pdf and application/x-webarchive without Content-Length - why is audio different?
Comment 6 Dirk Pranke 2011-04-12 11:33:59 PDT
(In reply to comment #5)
> The same dump() function that is modified here saves application/pdf and application/x-webarchive without Content-Length - why is audio different?

It's a good question. the NRWT code processes text/plain a line at a time. I'm not actually sure if or how it handles PDF and webarchives, so I'll double check.
Comment 7 Dirk Pranke 2011-04-12 19:10:24 PDT
Okay, the NRWT code was skipping all of the webarchive files -> not so good.

I am implementing support for them now. It appears that all of the .webarchive files manage to end with a newline, causing the subsequent "#EOF" sentinel from DRT to be on a line by itself, so NRWT continues to work. .WAV files do not (currently) make that guarantee, and so NRWT's existing output-parsing will break.

I need to continue working on the patch and doing more testing (to confirm w/ PDFs and files of other content types, for example). I also need to double check the logic against ORWT to see if ORWT is doing something more sophisticated to find where the sentinels are.

Thanks for asking the question that revealed this deficiency.
Comment 8 Dirk Pranke 2011-04-12 19:11:18 PDT
cancelling the review for now.
Comment 9 Dirk Pranke 2011-05-31 16:31:32 PDT
Okay, it turns out that I can make NRWT handle base64-encoded wav output without needing a content-length header, and I think it doesn't break anything else (see bug 61819).

I *suspect* that all of the pdf and webarchive files end up having a newline at the end so they didn't end up tripping up NRWT, although NRWT doesn't currently support webarchives and I haven't investigated PDFs yet.

I am going to change the subject and use this bug to track adding support for the other output types to NRWT so I can continue to investigate this, but this'll take this bug off the blocking path for webaudio.
Comment 10 Dirk Pranke 2011-05-31 18:21:03 PDT
In response to Alexey's question in comment #5 (why is webaudio different from webarchive or pdf) it looks like all of the webarchive files's expected output ends in a newline, and we don't actually generate any pdf output at this point. (although there appear to be tests that once did, they all generate rendertrees + images now).

ORWT's code says that it expects "#EOF" to show up on a separate line, so in the absence of any other patches, we'd have to modify the base64-encoded output to also include a trailing newline.

That seems like kind of a goofy change, so instead I have modified ORWT to match against any line that ends in a #EOF$ and if there is content preceding the #EOF, stick it into the returned result (see bug 57992 for the patch in question). 

I will still leave this open until NRWT supports webarchives correctly.
Comment 11 Dirk Pranke 2011-06-14 20:17:30 PDT
Created attachment 97225 [details]
Comment 12 Eric Seidel (no email) 2011-06-14 21:08:39 PDT
Comment on attachment 97225 [details]

That's it?  Seems simple.  If this works, great.
Comment 13 Eric Seidel (no email) 2011-06-14 21:09:36 PDT
The PDF tests were added by me and immediately disabled.  They were for printing.  TUrns out that PDFs encode system information (including username!) so it's not possible to generate them in a system agnostic way... or at least didn't seem that way.
Comment 14 Dirk Pranke 2011-06-14 22:10:38 PDT
(In reply to comment #12)
> (From update of attachment 97225 [details])
> That's it?  Seems simple.  If this works, great.

Yeah, as far as I can tell, .webarchives are just XML, so all we had to do way make the code aware of the other extension.

I've edited the subject to remove reference to PDFs.
Comment 15 Dirk Pranke 2011-06-15 17:49:58 PDT
Committed r88986: <http://trac.webkit.org/changeset/88986>