Bug 7211 - Support save as "Web page, complete" into a directory with separate files for resources
: Support save as "Web page, complete" into a directory with separate files for...
Status: NEW
: WebKit
WebKit Misc.
: 420+
: Macintosh Mac OS X 10.4
: P2 Enhancement
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-02-11 23:33 PST by
Modified: 2012-10-08 11:43 PST (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-02-11 23:33:05 PST
It would be nice if WebKit supported saving a web page as "Web page, complete" in the same way that Firefox saves web pages (storing the HTML file with a similarly-named directory for resources from the original page).
------- Comment #1 From 2006-02-12 02:53:54 PST -------
Agreed, this would make the life of bug reducers a LOT easier, now i have to fire up Firefox to do these things :)
------- Comment #2 From 2008-06-25 09:31:35 PST -------
I'm interested in fixing this for the GTK port but I suspect that most of the code would be shared with other ports.

Did someone already try to do this? Do you have implementation suggestions?
------- Comment #3 From 2008-06-25 09:51:50 PST -------
(In reply to comment #2)
> Did someone already try to do this?

No, not that I'm aware of.

> Do you have implementation suggestions?

Basically the code will need to walk (or somehow traverse) the HTML DOM looking for references to external resources (like images, CSS files, JavaScript files, etc.) and (1) modify the references to point to (new) local copies that will be saved on disk and (2) queue the resources for later saving to disk.  And this will have to be done "recursively" for all resources since the outer HTML file could reference an <iframe>, which could reference another <iframe>, etc.

Note that you'll have to walk the CSS object model (CSS OM) as well since references such as other CSS files and images may be included in CSS source.  (Firefox 2.0.0.x currently doesn't do this, so you don't truly get a complete web page with this feature.  Haven't tried Firefox 3 yet.)

Finally, I'd use Firefox 2/3 as a guide for how to structure the output (it saves the top level HTML file with a "_files" directory beside it, but any subresources that are HTML pages use a "_data" suffix for their corresponding directory), and then improve on the design as needed.
------- Comment #4 From 2008-06-25 09:53:58 PST -------
(In reply to comment #2)
> Do you have implementation suggestions?

Also the current WebArchive code in WebCore walks the HTML DOM and CSS OM in a similar fashion today, so that code may be reused or at least act as a starting point.
------- Comment #5 From 2008-09-02 10:34:24 PST -------
(In reply to comment #2)
> I'm interested in fixing this for the GTK port but I suspect that most of the
> code would be shared with other ports.

In the end we decided that MHTML was better for our requirements, so I implemented that instead. As a user I hope that somebody else will fix this :)
------- Comment #6 From 2008-10-30 09:51:40 PST -------
In <https://lists.webkit.org/pipermail/webkit-dev/2008-October/005537.html>, Darin Fisher wrote:

We have code to support this feature in the Chromium code base.  You can find it here:
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.h?view=markup
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.cc?view=markup

It is something we would love to one day see as part of WebKit.
------- Comment #7 From 2012-10-08 11:43:34 PST -------
Renaming, since this enhancement request is not about Firefox compatibility at all.