Bug 22466

Summary: REGRESSION (35867): Many resources missing when saving webarchive of webkit.org
Product: WebKit Reporter: David Kilzer (:ddkilzer) <ddkilzer>
Component: WebCore Misc.Assignee: David Kilzer (:ddkilzer) <ddkilzer>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, beidson, koivisto
Priority: P2 Keywords: HasReduction, InRadar, Regression
Version: 528+ (Nightly build)   
Hardware: Macintosh   
OS: OS X 10.5   
URL: http://webkit.org/
Attachments:
Description Flags
WebArchive saved from nightly build r38707
none
Patch v1 (naïve fix)
none
Patch v2
none
Patch v3
none
Patch v4
none
Patch v5 beidson: review+

Description David Kilzer (:ddkilzer) 2008-11-24 13:55:56 PST
* SUMMARY
When saving a webarchive of <http://webkit.org/> with WebKit nightly build r38707, there are many resources missing that are linked from the main web page.  (See Bug 11850 for images referenced within CSS that are not saved.)

* STEPS TO REPRODUCE
1. Launch Safari.
2. Open URL:  http://webkit.org/
3. Select "Save As..." from the File menu.
4. Change Format: to "Web Archive".
5. Save webkit.webarchive.
6. Open webkit.webarchive in "Property List Editor" application (installed with Xcode).

* RESULTS
Only 5 subresources are saved from the main web page.  Some of the <color>.css files are missing from the archive.

* REGRESSION
I think this is a regression, but I don't have the ability to verify this now.
Comment 1 David Kilzer (:ddkilzer) 2008-11-24 13:56:44 PST
Created attachment 25450 [details]
WebArchive saved from nightly build r38707

This is the webarchive file saved from WebKit nightly build r38707.
Comment 2 David Kilzer (:ddkilzer) 2008-11-24 14:07:37 PST
Created attachment 25451 [details]
Patch v1 (naïve fix)

This patch fixes the bug, but I don't understand why DocumentLoader::cachedResource() doesn't have a reference to resources loaded on the page.  Are we violating a contract, or should we just be checking for all resources in the WebCore cache instead?
Comment 3 David Kilzer (:ddkilzer) 2008-11-24 14:16:04 PST
Comment on attachment 25451 [details]
Patch v1 (naïve fix)

Clearing review flag and obsoleting since we need a ChangeLog and a layout test.
Comment 4 Antti Koivisto 2008-11-24 15:29:56 PST
http://trac.webkit.org/changeset/35867
Comment 5 David Kilzer (:ddkilzer) 2008-11-25 06:14:19 PST
Created attachment 25479 [details]
Patch v2

Patch with layout test and changelog.
Comment 6 Alexey Proskuryakov 2008-11-25 11:32:53 PST
See also: bug 17151.
Comment 7 David Kilzer (:ddkilzer) 2008-11-25 13:56:38 PST
Created attachment 25497 [details]
Patch v3

Same as Patch v2, except we make sure "Etag" and "Last-Modified" headers are normalized as well.
Comment 8 David Kilzer (:ddkilzer) 2008-11-26 11:23:24 PST
<rdar://problem/6403593>
Comment 9 David Kilzer (:ddkilzer) 2008-11-26 21:40:23 PST
Created attachment 25551 [details]
Patch v4

Null check ArchiveResource after creating it from a CacheResource.  Rearranged the logic in LegacyWebArchive::create() to use more continue statements.
Comment 10 David Kilzer (:ddkilzer) 2008-11-29 12:49:14 PST
Created attachment 25599 [details]
Patch v5

Changes since Patch v4:
- Also normalize the "Server" HTTP response header.
- Added radar bug number to ChangeLogs.
Comment 11 Brady Eidson 2008-12-01 16:34:51 PST
Comment on attachment 25599 [details]
Patch v5

Nice, r+
Comment 12 David Kilzer (:ddkilzer) 2008-12-01 16:54:03 PST
$ git svn dcommit
Committing to http://svn.webkit.org/repository/webkit/trunk ...
        M       LayoutTests/ChangeLog
        A       LayoutTests/http/tests/webarchive/resources/test-preload-resources.css
        A       LayoutTests/http/tests/webarchive/test-preload-resources-expected.webarchive
        A       LayoutTests/http/tests/webarchive/test-preload-resources.html
        M       LayoutTests/platform/qt/Skipped
        M       LayoutTests/platform/win/Skipped
        M       WebCore/ChangeLog
        M       WebCore/loader/archive/cf/LegacyWebArchive.cpp
        M       WebKitTools/ChangeLog
        M       WebKitTools/DumpRenderTree/mac/DumpRenderTree.mm
Committed r38884

http://trac.webkit.org/changeset/38884

Comment 13 David Kilzer (:ddkilzer) 2008-12-01 17:08:22 PST
Follow-up fix:

$ git svn dcommit
Committing to http://svn.webkit.org/repository/webkit/trunk ...
        M       WebCore/ChangeLog
        M       WebCore/loader/archive/cf/LegacyWebArchive.cpp
Committed r38885

http://trac.webkit.org/changeset/38885