WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
7266
Webarchive format saves duplicate WebSubresources to .webarchive file
https://bugs.webkit.org/show_bug.cgi?id=7266
Summary
Webarchive format saves duplicate WebSubresources to .webarchive file
David Kilzer (:ddkilzer)
Reported
2006-02-14 22:07:08 PST
Version 1 of the .webarchive format has some issues. This may turn into a tracking bug later, but for now I'll just list them here. Add more to comments as needed. These issues occur in the latest WebKit (
r12809
on anonsvn). I have not tested them yet on released Safari. 1. Neither JavaScript nor CSS files are stored in the .webarchive. Apparently these resources are reloaded from the web server once the .webarchive is reconstituted in the browser since the base URL for the main web page is preserved. 2. Duplicate copies of images are stored in the .webarchive: one copy for each time it appears on the web page (even though they reference the exact same URL).
Attachments
Test files in zip archive
(838 bytes, application/zip)
2006-12-17 20:44 PST
,
David Kilzer (:ddkilzer)
no flags
Details
Patch v1
(2.11 KB, patch)
2006-12-17 20:45 PST
,
David Kilzer (:ddkilzer)
no flags
Details
Formatted Diff
Diff
Patch v2
(14.60 KB, patch)
2007-02-03 16:50 PST
,
David Kilzer (:ddkilzer)
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
David Kilzer (:ddkilzer)
Comment 1
2006-02-14 23:20:48 PST
I can't reproduce issue #1 listed in the description (needs further testing), so changing this bug to document issue #2: saving multiple copies of the same image in a .webarchive file. Steps to reproduce: 1. Open Safari (ToT WebKit or production release from 10.4.4/10.4.5). 2. Go to a page that reuses the same image multiple times on one page. I use:
http://tv.yahoo.com/grid/
(enter your zip and choose your cable/satellite provider as needed) 3. Save the web page as a "Web Archive". 4. Open the "Web Archive" in the Property List Editor. 5. Note under Root->WebSubresources that buttonleft.gif and buttonright.gif are stored as many times in the file as they appear on the web page. Expected behavior: The buttonleft.gif and buttonright.gif images should only be stored once per .webarchive. Actual behavior: The buttonleft.gif and buttonright.gif images are stored once each time they .webarchive.
David Kilzer (:ddkilzer)
Comment 2
2006-02-14 23:42:01 PST
Filed
Bug 7267
for issue #1 in the description (
Comment #0
) of this bug.
David Kilzer (:ddkilzer)
Comment 3
2006-12-01 03:25:29 PST
Another good test page is the BugReporter login page. Count the number of spacer.gif images that appear in .webarchive file:
https://bugreport.apple.com/
David Kilzer (:ddkilzer)
Comment 4
2006-12-16 07:12:42 PST
WebSubresources may include images, CSS stylesheets and JavaScript sources, and all may be duplicated when saving a .webarchive file.
David Kilzer (:ddkilzer)
Comment 5
2006-12-17 20:44:28 PST
Created
attachment 11899
[details]
Test files in zip archive After unzipping the zip file, open bug-7266-test.html and save in webarchive. There should only be three WebSubresources, but shipping Safari and ToT will save six (three duplicates). After applying this patch, only three WebSubresources are saved.
David Kilzer (:ddkilzer)
Comment 6
2006-12-17 20:45:32 PST
Created
attachment 11900
[details]
Patch v1 First pass at a patch to prevent duplicate WebSubresources from being saved to webarchive files.
David Kilzer (:ddkilzer)
Comment 7
2006-12-18 10:04:03 PST
(In reply to
comment #6
)
> Created an attachment (id=11900) [edit] > Patch v1 > > First pass at a patch to prevent duplicate WebSubresources from being saved to > webarchive files.
Another approach I could take with this is to pull the code that finds subresources by NSURL out into a separate loop, and use either an NSMutableDictionary (keys are NSURLs, values are integers, to keep order of URLs the same as they would have been with an NSArray) or an NSMutableSet (if the order doesn't really matter) to store the unique list of NSURL objects.
David Kilzer (:ddkilzer)
Comment 8
2006-12-18 20:14:20 PST
BTW, the reason this works is that (void)addArchive:(WebArchive *)archive in WebUnarchivingState puts the subresource values into an NSMutableDictionary named archivedResources, which causes duplicates originally saved to the .webarchive file to be removed.
Darin Adler
Comment 9
2006-12-19 07:43:09 PST
Comment on
attachment 11900
[details]
Patch v1 r=me
David Kilzer (:ddkilzer)
Comment 10
2006-12-20 03:42:23 PST
Comment on
attachment 11900
[details]
Patch v1 Clearing review flag pending
Bug 11882
and conversion to C++ of the archiving code.
David Kilzer (:ddkilzer)
Comment 11
2007-02-03 16:50:31 PST
Created
attachment 12905
[details]
Patch v2 Proposed fix.
Darin Adler
Comment 12
2007-02-04 18:41:44 PST
Comment on
attachment 12905
[details]
Patch v2 As we make more and more substantial changes it really makes me wish more of this was translated to C++. r=me
David Kilzer (:ddkilzer)
Comment 13
2007-02-05 11:08:58 PST
(In reply to
comment #12
)
> As we make more and more substantial changes it really makes me wish more of > this was translated to C++.
Given my time constraints and the apparent release schedule (e.g., the impending tree lock-down in two days), I don't think it's realistic to do this and still get the fixes into the next release (which is my primary goal).
David Kilzer (:ddkilzer)
Comment 14
2007-02-05 19:53:02 PST
Committed revision 19422.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug