Bug 7211 - Support save as "Web page, complete" into a directory with separate files for resources
Summary: Support save as "Web page, complete" into a directory with separate files for...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Enhancement
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-11 23:33 PST by David Kilzer (:ddkilzer)
Modified: 2012-10-08 11:43 PDT (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Kilzer (:ddkilzer) 2006-02-11 23:33:05 PST
It would be nice if WebKit supported saving a web page as "Web page, complete" in the same way that Firefox saves web pages (storing the HTML file with a similarly-named directory for resources from the original page).
Comment 1 Joost de Valk (AlthA) 2006-02-12 02:53:54 PST
Agreed, this would make the life of bug reducers a LOT easier, now i have to fire up Firefox to do these things :)
Comment 2 Marco Barisione 2008-06-25 09:31:35 PDT
I'm interested in fixing this for the GTK port but I suspect that most of the code would be shared with other ports.

Did someone already try to do this? Do you have implementation suggestions?
Comment 3 David Kilzer (:ddkilzer) 2008-06-25 09:51:50 PDT
(In reply to comment #2)
> Did someone already try to do this?

No, not that I'm aware of.

> Do you have implementation suggestions?

Basically the code will need to walk (or somehow traverse) the HTML DOM looking for references to external resources (like images, CSS files, JavaScript files, etc.) and (1) modify the references to point to (new) local copies that will be saved on disk and (2) queue the resources for later saving to disk.  And this will have to be done "recursively" for all resources since the outer HTML file could reference an <iframe>, which could reference another <iframe>, etc.

Note that you'll have to walk the CSS object model (CSS OM) as well since references such as other CSS files and images may be included in CSS source.  (Firefox 2.0.0.x currently doesn't do this, so you don't truly get a complete web page with this feature.  Haven't tried Firefox 3 yet.)

Finally, I'd use Firefox 2/3 as a guide for how to structure the output (it saves the top level HTML file with a "_files" directory beside it, but any subresources that are HTML pages use a "_data" suffix for their corresponding directory), and then improve on the design as needed.
Comment 4 David Kilzer (:ddkilzer) 2008-06-25 09:53:58 PDT
(In reply to comment #2)
> Do you have implementation suggestions?

Also the current WebArchive code in WebCore walks the HTML DOM and CSS OM in a similar fashion today, so that code may be reused or at least act as a starting point.

Comment 5 Marco Barisione 2008-09-02 10:34:24 PDT
(In reply to comment #2)
> I'm interested in fixing this for the GTK port but I suspect that most of the
> code would be shared with other ports.

In the end we decided that MHTML was better for our requirements, so I implemented that instead. As a user I hope that somebody else will fix this :)
Comment 6 David Kilzer (:ddkilzer) 2008-10-30 09:51:40 PDT
In <https://lists.webkit.org/pipermail/webkit-dev/2008-October/005537.html>, Darin Fisher wrote:

We have code to support this feature in the Chromium code base.  You can find it here:
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.h?view=markup
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.cc?view=markup

It is something we would love to one day see as part of WebKit.

Comment 7 Alexey Proskuryakov 2012-10-08 11:43:34 PDT
Renaming, since this enhancement request is not about Firefox compatibility at all.