RESOLVED FIXED 7266
Webarchive format saves duplicate WebSubresources to .webarchive file
https://bugs.webkit.org/show_bug.cgi?id=7266
Summary Webarchive format saves duplicate WebSubresources to .webarchive file
David Kilzer (:ddkilzer)
Reported 2006-02-14 22:07:08 PST
Version 1 of the .webarchive format has some issues. This may turn into a tracking bug later, but for now I'll just list them here. Add more to comments as needed. These issues occur in the latest WebKit (r12809 on anonsvn). I have not tested them yet on released Safari. 1. Neither JavaScript nor CSS files are stored in the .webarchive. Apparently these resources are reloaded from the web server once the .webarchive is reconstituted in the browser since the base URL for the main web page is preserved. 2. Duplicate copies of images are stored in the .webarchive: one copy for each time it appears on the web page (even though they reference the exact same URL).
Attachments
Test files in zip archive (838 bytes, application/zip)
2006-12-17 20:44 PST, David Kilzer (:ddkilzer)
no flags
Patch v1 (2.11 KB, patch)
2006-12-17 20:45 PST, David Kilzer (:ddkilzer)
no flags
Patch v2 (14.60 KB, patch)
2007-02-03 16:50 PST, David Kilzer (:ddkilzer)
darin: review+
David Kilzer (:ddkilzer)
Comment 1 2006-02-14 23:20:48 PST
I can't reproduce issue #1 listed in the description (needs further testing), so changing this bug to document issue #2: saving multiple copies of the same image in a .webarchive file. Steps to reproduce: 1. Open Safari (ToT WebKit or production release from 10.4.4/10.4.5). 2. Go to a page that reuses the same image multiple times on one page. I use: http://tv.yahoo.com/grid/ (enter your zip and choose your cable/satellite provider as needed) 3. Save the web page as a "Web Archive". 4. Open the "Web Archive" in the Property List Editor. 5. Note under Root->WebSubresources that buttonleft.gif and buttonright.gif are stored as many times in the file as they appear on the web page. Expected behavior: The buttonleft.gif and buttonright.gif images should only be stored once per .webarchive. Actual behavior: The buttonleft.gif and buttonright.gif images are stored once each time they .webarchive.
David Kilzer (:ddkilzer)
Comment 2 2006-02-14 23:42:01 PST
Filed Bug 7267 for issue #1 in the description (Comment #0) of this bug.
David Kilzer (:ddkilzer)
Comment 3 2006-12-01 03:25:29 PST
Another good test page is the BugReporter login page. Count the number of spacer.gif images that appear in .webarchive file: https://bugreport.apple.com/
David Kilzer (:ddkilzer)
Comment 4 2006-12-16 07:12:42 PST
WebSubresources may include images, CSS stylesheets and JavaScript sources, and all may be duplicated when saving a .webarchive file.
David Kilzer (:ddkilzer)
Comment 5 2006-12-17 20:44:28 PST
Created attachment 11899 [details] Test files in zip archive After unzipping the zip file, open bug-7266-test.html and save in webarchive. There should only be three WebSubresources, but shipping Safari and ToT will save six (three duplicates). After applying this patch, only three WebSubresources are saved.
David Kilzer (:ddkilzer)
Comment 6 2006-12-17 20:45:32 PST
Created attachment 11900 [details] Patch v1 First pass at a patch to prevent duplicate WebSubresources from being saved to webarchive files.
David Kilzer (:ddkilzer)
Comment 7 2006-12-18 10:04:03 PST
(In reply to comment #6) > Created an attachment (id=11900) [edit] > Patch v1 > > First pass at a patch to prevent duplicate WebSubresources from being saved to > webarchive files. Another approach I could take with this is to pull the code that finds subresources by NSURL out into a separate loop, and use either an NSMutableDictionary (keys are NSURLs, values are integers, to keep order of URLs the same as they would have been with an NSArray) or an NSMutableSet (if the order doesn't really matter) to store the unique list of NSURL objects.
David Kilzer (:ddkilzer)
Comment 8 2006-12-18 20:14:20 PST
BTW, the reason this works is that (void)addArchive:(WebArchive *)archive in WebUnarchivingState puts the subresource values into an NSMutableDictionary named archivedResources, which causes duplicates originally saved to the .webarchive file to be removed.
Darin Adler
Comment 9 2006-12-19 07:43:09 PST
Comment on attachment 11900 [details] Patch v1 r=me
David Kilzer (:ddkilzer)
Comment 10 2006-12-20 03:42:23 PST
Comment on attachment 11900 [details] Patch v1 Clearing review flag pending Bug 11882 and conversion to C++ of the archiving code.
David Kilzer (:ddkilzer)
Comment 11 2007-02-03 16:50:31 PST
Created attachment 12905 [details] Patch v2 Proposed fix.
Darin Adler
Comment 12 2007-02-04 18:41:44 PST
Comment on attachment 12905 [details] Patch v2 As we make more and more substantial changes it really makes me wish more of this was translated to C++. r=me
David Kilzer (:ddkilzer)
Comment 13 2007-02-05 11:08:58 PST
(In reply to comment #12) > As we make more and more substantial changes it really makes me wish more of > this was translated to C++. Given my time constraints and the apparent release schedule (e.g., the impending tree lock-down in two days), I don't think it's realistic to do this and still get the fixes into the next release (which is my primary goal).
David Kilzer (:ddkilzer)
Comment 14 2007-02-05 19:53:02 PST
Committed revision 19422.
Note You need to log in before you can comment on or make changes to this bug.