Bug 108589 - REGRESSION (r137607): Loading of archives as substitute data is broken
Summary: REGRESSION (r137607): Loading of archives as substitute data is broken
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P1 Normal
Assignee: Nate Chapin
URL:
Keywords: InRadar, Regression
Depends on:
Blocks:
 
Reported: 2013-01-31 23:47 PST by Alexey Proskuryakov
Modified: 2013-02-04 14:21 PST (History)
4 users (show)

See Also:


Attachments
patch (no test) (1.31 KB, patch)
2013-02-01 16:41 PST, Nate Chapin
buildbot: commit-queue-
Details | Formatted Diff | Diff
Fix in MainResourceLoader (1.30 KB, patch)
2013-02-04 12:32 PST, Nate Chapin
no flags Details | Formatted Diff | Diff
Patch for landing (1.56 KB, patch)
2013-02-04 13:27 PST, Nate Chapin
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2013-01-31 23:47:50 PST
It is possible to call FrameLoader::load with substitute data that is a WebArchive (or another type of archive, such as MHTML for Chromium). This is exposed via both WebKit1 and WebKit2 API on Mac, see e.g. WKPageLoadWebArchiveData() for the latter.

Our bisection shows that this functionality got broken in <http://trac.webkit.org/changeset/137607>. By the time MainDocumentLoader tries to decode the archive, the data it tries to use is null. Here is how a debug stack trace looks like, with an assertion being fired:

ASSERTION FAILED: data
/Volumes/Data/SafariSources/OpenSource/Source/WebCore/loader/archive/cf/LegacyWebArchive.cpp(269) : static PassRefPtr<WebCore::LegacyWebArchive> WebCore::LegacyWebArchive::create(const WebCore::KURL &, WebCore::SharedBuffer *)
1   0x104657d5a WebCore::LegacyWebArchive::create(WebCore::KURL const&, WebCore::SharedBuffer*)
2   0x10350402f WTF::PassRefPtr<WebCore::Archive> WebCore::archiveFactoryCreate<WebCore::LegacyWebArchive>(WebCore::KURL const&, WebCore::SharedBuffer*)
3   0x103503f18 WebCore::ArchiveFactory::create(WebCore::KURL const&, WebCore::SharedBuffer*, WTF::String const&)
4   0x1038f26ee WebCore::DocumentLoader::maybeCreateArchive()
5   0x1038f254c WebCore::DocumentLoader::finishedLoading()
6   0x1046772d1 WebCore::MainResourceLoader::didFinishLoading(double)
7   0x104677085 WebCore::MainResourceLoader::continueAfterContentPolicy(WebCore::PolicyAction, WebCore::ResourceResponse const&)
8   0x1046773e6 WebCore::MainResourceLoader::continueAfterContentPolicy(WebCore::PolicyAction)
9   0x10467732b WebCore::MainResourceLoader::callContinueAfterContentPolicy(void*, WebCore::PolicyAction)
10  0x1046779f8 WebCore::MainResourceLoader::responseReceived(WebCore::CachedResource*, WebCore::ResourceResponse const&)
11  0x104675824 WebCore::MainResourceLoader::handleSubstituteDataLoadNow(WebCore::RunLoopTimer<WebCore::MainResourceLoader>*)
12  0x104679f83 WebCore::RunLoopTimer<WebCore::MainResourceLoader>::fired()

One easy way to reproduce this that I'm aware of is:

1. Open http://www.apple.com/startpage/ in Safari.
2. Select Email This Page from the Share menu (a button to the left of address bar, or just Command+I).
3. In the new email message that appears in Mail.app, change the "Send as web content" selection at top right to PDF.
Comment 1 Alexey Proskuryakov 2013-01-31 23:51:02 PST
In Radar as <rdar://problem/13089321>.

It seems easy to find some way to access substitute data from maybeCreateArchive(), but not easy to decide which design is right. I suspect that more code paths that use DocumentLoader::mainResourceData() function for substitute data could be broken.

Nate, would you be interested to take a look at this?
Comment 2 Nate Chapin 2013-02-01 13:24:02 PST
(In reply to comment #1)
> In Radar as <rdar://problem/13089321>.
> 
> It seems easy to find some way to access substitute data from maybeCreateArchive(), but not easy to decide which design is right. I suspect that more code paths that use DocumentLoader::mainResourceData() function for substitute data could be broken.
> 
> Nate, would you be interested to take a look at this?

I think this is a pretty simple fix, MainResourceLoader::resourceData() should check m_substituteData for data if it is valid. I haven't tried to write a test yet, though.
Comment 3 Alexey Proskuryakov 2013-02-01 14:21:03 PST
Do you think that other callers of this function would be OK with that? An archive is not really truly the main resource, even though it's handled like that at the first stage of loading.

I do not know much about how archives work.
Comment 4 Nate Chapin 2013-02-01 15:33:19 PST
(In reply to comment #3)
> Do you think that other callers of this function would be OK with that? An archive is not really truly the main resource, even though it's handled like that at the first stage of loading.

Another option, it appears that we can safely exit early and get the right result in DocumentLoader::maybeCreateArchive() if mainResourceData() returns null. I don't know whether that has side effects, though.

I'm trying to decide whether there are other situations where this underlying issue (Non-empty SubstituteData loads leave DocumentLoader::mainResourceData() empty) could be a problem. My guess is Yes, but I don't have definitive proof.
Comment 5 Nate Chapin 2013-02-01 16:41:25 PST
Created attachment 186185 [details]
patch (no test)

I'm not sure I like this solution, and I'm not sure if there's a good way to test this (besides the repro steps in the initial report), but here's what I'm thinking at the moment.
Comment 6 Build Bot 2013-02-01 19:31:47 PST
Comment on attachment 186185 [details]
patch (no test)

Attachment 186185 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/16332442
Comment 7 Alexey Proskuryakov 2013-02-02 13:03:54 PST
Comment on attachment 186185 [details]
patch (no test)

View in context: https://bugs.webkit.org/attachment.cgi?id=186185&action=review

> Source/WebCore/loader/DocumentLoader.cpp:134
>      if (m_mainResourceLoader)
>          return m_mainResourceLoader->resourceData();

I've built r137606 now, and how it worked was that m_mainResourceLoader->resourceData() was returning non-null data. Is it intentional that it no longer works this way?
Comment 8 Alexey Proskuryakov 2013-02-02 23:36:44 PST
The reason ResourceData used to be non-null is that the data was accumulated in ResourceLoader once received. Now that MainResourceLoader does not inherit from ResourceLoader, the data goes nowhere.

A fix that restores previous behavior would be one that makes MainResourceLoader::resourceData() return actual data for substitute loads. I can think of two approaches - either try MainResourceLoader::m_substituteData when MainResourceLoader::m_resource is null, or make sure to generate m_resource for these loads.

Which one better fits the overall design?
Comment 9 Nate Chapin 2013-02-04 10:59:36 PST
(In reply to comment #8)
> The reason ResourceData used to be non-null is that the data was accumulated in ResourceLoader once received. Now that MainResourceLoader does not inherit from ResourceLoader, the data goes nowhere.
> 
> A fix that restores previous behavior would be one that makes MainResourceLoader::resourceData() return actual data for substitute loads. I can think of two approaches - either try MainResourceLoader::m_substituteData when MainResourceLoader::m_resource is null, or make sure to generate m_resource for these loads.
> 
> Which one better fits the overall design?

Given that m_resource has no knowledge of SubstituteData loads, fixing this in MainResourceLoader::resourceData() probably makes more sense. Will fix shortly.
Comment 10 Nate Chapin 2013-02-04 12:32:26 PST
Created attachment 186438 [details]
Fix in MainResourceLoader
Comment 11 Alexey Proskuryakov 2013-02-04 12:44:49 PST
Comment on attachment 186438 [details]
Fix in MainResourceLoader

View in context: https://bugs.webkit.org/attachment.cgi?id=186438&action=review

It seems wasteful to copy all the data. Is there a way to redesign the storage to avoid such copying?

> Source/WebCore/loader/MainResourceLoader.cpp:213
> +    if (m_substituteData.isValid())
> +        return ResourceBuffer::create(m_substituteData.content()->data(), m_substituteData.content()->size());
>      return m_resource ? m_resource->resourceBuffer() : 0;

I'd have written this differently, putting common case first. But this is OK too.
Comment 12 Nate Chapin 2013-02-04 13:23:36 PST
(In reply to comment #11)
> (From update of attachment 186438 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=186438&action=review
> 
> It seems wasteful to copy all the data. Is there a way to redesign the storage to avoid such copying?
> 
> > Source/WebCore/loader/MainResourceLoader.cpp:213
> > +    if (m_substituteData.isValid())
> > +        return ResourceBuffer::create(m_substituteData.content()->data(), m_substituteData.content()->size());
> >      return m_resource ? m_resource->resourceBuffer() : 0;
> 
> I'd have written this differently, putting common case first. But this is OK too.

Good point, will land with the reordering.
Comment 13 Nate Chapin 2013-02-04 13:27:08 PST
Created attachment 186448 [details]
Patch for landing
Comment 14 Darin Adler 2013-02-04 14:11:58 PST
Comment on attachment 186438 [details]
Fix in MainResourceLoader

View in context: https://bugs.webkit.org/attachment.cgi?id=186438&action=review

>>> Source/WebCore/loader/MainResourceLoader.cpp:213
>>>      return m_resource ? m_resource->resourceBuffer() : 0;
>> 
>> I'd have written this differently, putting common case first. But this is OK too.
> 
> Good point, will land with the reordering.

When it comes to early return, I like “unusual case first”, but I guess here there is a small performance reason to put the common case first.
Comment 15 Nate Chapin 2013-02-04 14:14:21 PST
(In reply to comment #14)
> (From update of attachment 186438 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=186438&action=review
> 
> >>> Source/WebCore/loader/MainResourceLoader.cpp:213
> >>>      return m_resource ? m_resource->resourceBuffer() : 0;
> >> 
> >> I'd have written this differently, putting common case first. But this is OK too.
> > 
> > Good point, will land with the reordering.
> 
> When it comes to early return, I like “unusual case first”, but I guess here there is a small performance reason to put the common case first.

I strongly agree with you in the case where the unusual case is much shorter than the common case. When they're both so short, I'm happy to do whatever the reviewer suggests :)
Comment 16 WebKit Review Bot 2013-02-04 14:21:22 PST
Comment on attachment 186448 [details]
Patch for landing

Clearing flags on attachment: 186448

Committed r141811: <http://trac.webkit.org/changeset/141811>
Comment 17 WebKit Review Bot 2013-02-04 14:21:27 PST
All reviewed patches have been landed.  Closing bug.