I hope that summary doesn't sound too grandiose. After a brief chat with Alp on IRC I'm going to describe/discuss what (for the requirements of the [open-source] product I work on) our dream embedding API would look like.
To recap on the IRC discussion, I'm looking to provide an alternative embedded HTML/web rendering implementation for our open-source Second Life client <http://www.secondlife.com/>; we currently use Gecko (on Win32, OSX, and Linux) but this is not proving to be a satisfactory embedded solution as we move forward.
WebKit's existing official GTK+ API is delightful but its I/O operates at a different, somewhat higher level than is useful to us.
As an OpenGL application we need to suck the rendered web-view's pixel data into an OpenGL texture. Practically speaking, this means being able to reliably access the raw pixel data for the page in some canonical RGB format; I believe Cairo is already rendering to precisely such a backbuffer but app-level access to such isn't supported by the WebKit/GTK+ API.
We'd also (for efficiency) like to know when this backbuffer has been drawn to so that we can be informed of updates to the backbuffer instead of polling it - ChromeClient::addToDirtyRegion() looks perfect but it does not seem to expose itself to the embedding app (I imagine it easily could, by emitting a signal).
That's the core of the graphics-scraping requirement. Input to the embedded widget is another matter.
The mouse and keyboard input events which we need to propagate to the embedded widget are by no means GTK native events (for example, the mouse co-ordinates from a canvas inside the 3D world need to come through another input abstraction and be reverse-projected, ultimately having little resemblance to the mouse events arriving to the 'real' application window even if the 'real' application window used GTK events natively, which it doesn't).
So, we really need to be able to inject synthetic mouse and keyboard events into a given WebKit view. I don't mind *terribly* much if I have to construct fake Gdk events for these, I'd mind even less if I could inject these through glib signals, but a toolkit-neutral way to inject events seems ideal. I imagine such a thing exists internally to WebKit.
One of my goals with WebKit integration is to be able to use the officially-exported API of the 'official' WebKit trunk; I hate unnecessary forks. I understand that the API isn't really fleshed-out to that point yet.
Now, I asked on #webkit-gtk whether this should be a new webkit-target or whether this can adequately layer on top of the nice GTK+ interface. Alp suggested (correct me if I'm wrong, Alp) that there has been interest in making the API expose the required functionality (the underlying Cairo buffer/context, at least, at the WebKit/GTK+ level) but uncertainty about what the API requirements really look like.
I'm hoping we can work out answers to some of these issues here.
I'm going to attach the interface definition of our app's new work-in-progress media abstraction layer which is probably a fair discussion point - this is the same interface that's currently wrapping the functionality we need from Gecko but it's not Gecko-centric (e.g. it wraps some other media types such as QuickTime movies).
Created attachment 18463 [details]
The interface I'd like to be able to implement on top of WebKit's API.
Here's llmediabase.h. It describes our app-level abstraction on top of (primarily) Gecko, QuickTime, and WebKit. I don't, of course, hope WebKit to implement this interface or a particularly close analogue - I hope merely that WebKit's official API exports enough functionality to implement this interface with a reasonably straightforward layer of code.
The header file is briefly commented, but here are the most interesting bits:
* Graphic scraping:
// returns pointer to raw media pixels
virtual unsigned char* getMediaData() = 0;
(also setRequestedMediaSize(), getMediaWidth(), getMediaHeight(), getTextureFormatInteral() etc.)
* Synthetic user input:
// mouse and keyboard interaction
virtual bool mouseDown( int x_pos, int y_pos ) = 0;
virtual bool mouseUp( int x_pos, int y_pos ) = 0;
virtual bool mouseMove( int x_pos, int y_pos ) = 0;
virtual bool keyPress( int key_code ) = 0;
virtual bool scrollByLines( int lines ) = 0;
virtual bool focus( bool focus ) = 0;
virtual bool unicodeInput( unsigned long uni_char ) = 0;
// niceties: set/clear URL to visit when a 404 page is reached
virtual bool set404RedirectUrl( std::string redirect_url ) = 0;
virtual bool clr404RedirectUrl() = 0;
virtual bool navigateTo( const std::string url ) = 0;
virtual bool navigateForward() = 0;
virtual bool navigateBack() = 0;
virtual bool canNavigateForward() = 0;
virtual bool canNavigateBack() = 0;
virtual bool enableCookies( bool enable ) = 0;
virtual bool clearCache() = 0;
virtual bool clearCookies() = 0;
virtual bool enableProxy(bool enable, std::string proxy_host_name, int proxy_port) = 0;
As a final note, I'm taking a slightly Linux-centric (and GTK-centric) implementor's view, but ideally we'd like to support WebKit across OSX, Win32, Linux and as much as possible. The chances of this seem increased when the API is built merely upon glib+cairo without bringing GTK itself into the equation (esp. when GTK is just being sidestepped as a fakey translation layer) but I'm not deeply worried either way.
(In reply to comment #5)
> As a final note, I'm taking a slightly Linux-centric (and GTK-centric)
> implementor's view, but ideally we'd like to support WebKit across OSX, Win32,
> Linux and as much as possible. The chances of this seem increased when the API
> is built merely upon glib+cairo without bringing GTK itself into the equation
> (esp. when GTK is just being sidestepped as a fakey translation layer) but I'm
> not deeply worried either way.
Replying to your last comment first (the other stuff looks well considered, I look forward to reviewing your proposal).
As you noticed, the GTK+ port already does work headlessly without actually using GTK+. It should be fairly straightforard to make build fixes to allow for a GTK+-less build, but dropping GLib is less ideal since it would be equivalent to developing a brand new port.
So, I recommend that you ship the product with GLib and develop an API that's shared with the GTK+ port.
I can help make the necessary build system changes and code fixes to allow for a GLib+Cairo-only build (without GTK+) once we're done designing and implementing the API. This seems to provide portability without heavy dependencies and lets us use the same WebKit project infrastructure. Does that sound good?
Thanks for the feedback. That does sound good - I have no objection to shipping glib on Win32 or OSX.
One gotcha (which is a source of pain to work around in Gecko) is form widget rendering, most particularly <select> widgets; <select> causes a transient pop-up which can extend beyond the bounds of the parent window. That's detrimental to the idea of a flat pre-composited rendering of the page view, and I don't know the extent to which WebKit relies on breaking this idea by assuming high-level 'real' window management from the UI layer.
Two more nice-to-haves from the theoretical exposed low-level interface:
* Cursor change notifications to the host app. These don't have to be as detailed as specifying precise bitmaps, but should hint when the app should change the cursor to 'ibeam', 'hand', 'arrow', etc (perhaps with the same cursor naming as http://www.w3schools.com/htmldom/prop_style_cursor.asp ).
* Simple hooks for copying and pasting text from the host app rather than some 'native' clipboard.
Ok, I have a proof of concept implementation of pure/direct GL rendering using the new OpenVG Cairo backend and ShivaVG.
Good news, performance is stunning (~500 FPS full-rerendering of content). It works using only GLib, though WebView is still used as a surrogate loader (without requiring a windowing system or realization).
Will post patches soon. The API is very basic and not what I'd propose but it is a start. I hope having a real implementation will help get API discussion get started.
Wow, that's an unexpected surprise. :)
So, this is... WebKit using a Cairo back-end which targets OpenVG, and that OpenVG implementation is ShivaVG (which uses OpenGL)?
(In reply to comment #11)
> Wow, that's an unexpected surprise. :)
> So, this is... WebKit using a Cairo back-end which targets OpenVG, and that
> OpenVG implementation is ShivaVG (which uses OpenGL)?
Yep, some early screenshots:
http://www.ndesk.org/tmp/WebKitOpenVG.png (color issue has been fixed)
From looking at a proper clutter webkit implementation (rather than using the gtk WebView as a surrogate loader) this is my concept of an API. I don't know how this would be best implemented to fit in the WebKit system. I suppose there would just be a backend class like the gtk backend which uses the cairo backend to draw, then people could wrap their own frontend object/widget/actor around it, implementing their own WebView to their needs?
/* methods */
void draw (cairo_t *cr, IntRect *clip_rect); /* Draws to the cairo context */
void set_size (int width, int height); /* Sets the width/height of the page */
void set_origin (int x, int y); /* Sets the origin of the page to x,y */
/* Synthesizes events from frontend to backend */
gboolean button_press_event (ButtonEvent *event);
gboolean button_release_event (ButtonEvent *event);
gboolean motion_event (MotionEvent *event);
gboolean key_press_event (KeyEvent *event);
gboolean key_release_event (KeyEvent *event);
gboolean scroll_event (ScrollEvent *event);
/* signals */
RegionInvalidated (IntRect *region); /* Notifies the front end that the page has been invalidated and should be redrawn */
CursorChanged (Cursor cursor); /* Notifies the front end that the cursor should be changed */
Some other thoughts:
* In the API above there is the method getMediaData() which gets the specific pixel data about the page. It is possible to get this data from a cairo surface if required.
* A GObject binding for accessing the DOM would be useful, as could the ability to get an object at a specific co-ordinate (or can this be retreived through the DOM?)
* For popup menus could we pass an array of structures (similar to GtkAction) that describes a popup menu for the host app to draw?
* So that the backend does not need to depend on a specific toolkit like Gtk for form elements, there might be a need for methods that the backend calls on the frontend, passing a cairo_t on which the front end draws the UI element as it desires.
* I've not mentioned clipboard support or any of the other nicities that the llmediabase.h file mentions because these things can be implemented as needed as in the gtk WebView class.
Created attachment 19119 [details]
event observer API
Attached a missing part of the demonstrative API - the observers.
Hi, We (http://www.lateralvisions.co.uk) are also very interested in a cross platform web-browser that can be embedded into a 3D world. The suggested API looks like it would be fantastic for what we need, but I do have a couple more 'dream api' suggestions:
* We use a cross-api graphics layer, so we do not want to be tied to OpenGL. This is no problem as long as the api gets the raw (system memory) pixel data rather than an actual gl texture or anything. Looking at the suggestions above, this is what is being planned anyway.
* It would be nice to be able to deactivate the embedded WebKit: for example, when it's not visible to the viewer, or just off in the distance, there may be no real need for it to redraw itself (even if animations/plugins etc are changing). Further, there would be times when it would be good to deactivate it fully, so it doesn't even update animations/script/plugins etc (to save CPU cycles). Don't know how practical this is though.
* It would be useful to be able to request that it draw at a 'different resolution than the layout is at'. For example, if we normally draw to a 512x512 texture when up close, it would be nice to get it to render the same content, laid out in exactly the same way, but 'zoomed out' on a 128x128 target instead of 512x512. This would be useful, for example, when the browser is off in the distance, and there's no need to spend lots of video memory on a high-res version of it. It would be inefficient to render to 512x512 then downsample to 128x128. I guess this might use the same approach as when doing a full page zoom in a normal web browser.
* It would be great to be able to reroute audio output too (so it can play in a 3D audio system), but I guess most audio in a browser actually comes from plugins like Flash, so this is probably not practical.
Taking those points in order:
1) Right, GL should just be an app-level implementation detail which the WebKit API doesn't care about (except insofar as it should be 'easy' to grab the raw buffer in one of the pixel formats which GL favours). Perhaps I should rename this issue to make that clearer?
2) I imagine that you can simply stop servicing the event loop for a WebKit instance if you want to 'pause' it, though there are probably more elegant methods.
3) That would be quite nice, though not an urgent requirement for us - maybe the underlying Cairo renderer makes this pretty easy to implement.
4) Again a nice-to-have but probably very difficult for now.
Please add us to the group of people who want to integrate webkit using a custom back-end (in our case, OpenGL, or cairo on OpenGL). I was excited to find this bug and to see Alp's blog entry regarding webkit rendering to OpenGL:
I've been waiting anxiously to see if this API discussion would evolve, or if some webkit->OpenGL patches would show up.
Failing that, I have looked at the work that Iain has done on clutter webkit integration, which can be found here:
It seems that the approach was basically to clone the gtk back-end to make a new clutter back-end, then add the clutter back-end target everywhere that it was needed (a lot of places). It looks like a significant reworking of the gtk back-end was required, making this both a lot of work and somewhat of a challenge to maintain going forward. (I might be misunderstanding or overstating this, so Iain please feel free to jump in if this characterization isn't accurate).
We also looked at the earlier approach that was taken for clutter, and which can be found here:
This method didn't seem to work for what clutter needed, and was abandoned. The patch to webkitwebframe.h/cpp was very straightforward, and seemed promising at least from the standpoint of getting something to show up in our environment. But enough has changed in webkit/gtk that this patch doesn't work anymore.
Assuming that we want to render webkit to OpenGL, and of course be able to interact with the embedded webkit context in all the ways that an app hosting a web view would need (pretty much what has been outlined above on this bug), what is the best approach? It seems like our current options are to wait for progress on this bug (which doesn't seem to be getting much attention), or to do what Iain has done with clutter integration. We're not too excited about the latter approach, but we're wondering if that is our only option.
> Failing that, I have looked at the work that Iain has done on clutter webkit
> integration, which can be found here:
> It seems that the approach was basically to clone the gtk back-end to make a
> new clutter back-end, then add the clutter back-end target everywhere that it
> was needed (a lot of places). It looks like a significant reworking of the gtk
> back-end was required, making this both a lot of work and somewhat of a
> challenge to maintain going forward. (I might be misunderstanding or
> overstating this, so Iain please feel free to jump in if this characterization
> isn't accurate).
It actually wasn't a lot of work, maybe 2 days to get it working, and then a few things tweaked here and there for fix some bugs. (Bearing in mind I was starting from a point of not being familiar with the internal code at all)
Maintaining it isn't much work either to be honest. Git is very good for merging upstream and I guess its as much work as maintaining any backend port.
> We also looked at the earlier approach that was taken for clutter, and which
> can be found here:
> This method didn't seem to work for what clutter needed, and was abandoned.
> The patch to webkitwebframe.h/cpp was very straightforward, and seemed
> promising at least from the standpoint of getting something to show up in our
> environment. But enough has changed in webkit/gtk that this patch doesn't work
While it is very simple, there's many many places where it simply won't work which make it useless for anything other than displaying static HTML and it was quite a processor hog as it redrew the entire widget even if nothing changed.
> do what Iain has done with clutter integration. We're not too excited about
> the latter approach, but we're wondering if that is our only option.
If you want a fully functional web browser then that is your only option I'm afraid to say.
It appears that there is some movement on this issue at http://www.atoker.com/blog/2008/06/12/webkit-meta-a-new-standard-for-in-game-web-content/
And yes, it's not lost on me that the author of this bug and at least one contributor are directly involved with the effort. ;)
Has there been any progress on this bug? Specifically is there now a way to compile WebKIT on Glib only (no GTK) and somehow get access to the actual texture data needed to put into a GL texture?
Add one more to the interested parties on this enhancement. Is there any recent progress on the issue? Particularly the ability to build glib+cairo, and retrieve image data.
Removing the GTK+ flag as this isn't necessarily a WebKitGTK+ bug.
This is really sounding more and more like the Nix port: http://nix.openbossa.org/