Requirements of a browser
A web browser is made of a few major components:
- A graphical user interface (GUI), written in, say Cocoa or GTK.
- A renderer (e.g. Gecko or WebKit).
- A controller (the glue code, which includes user configuration and interaction).
The current landscape of browsers
The graphical user interface can be a prominent part of the experience for browsers targeting a larger audience like Firefox or Safari. On the contrary, browsers targeting power-users tend to make minimal use of the toolkit, leaving most of the screen real-estate to the renderer.
A renderer can be extremely difficult to write, so most browsers end up reusing an existing one, mostly without change. WebKit knows some success among power-user browsers.
Last but not least, the controllers are mostly divided into two groups that correlate those of the graphical user interfaces: when predominantly "user-friendly", the controller will often be an obscure, unhackable blackbox; when targeting power users, it will often be configurable with INI-style config files, or sometimes even an interpreted language such as Python or Lua.
The poor state of language bindings for GUI toolkits
Initially, Next was 100% written in Common Lisp, which means that the GUI toolkit had to be driven by language bindings.
Indeed, the most common GUI toolkits are written in C or a variant of C, and as such if we want to use a different language, we need some form of translation between our language and the library language. Those language bindings can be tedious and difficult to write. For the ones that are developed to the point of being stable enough to be useful, they might not fully cover the library nonetheless, which means that some important features will be missing.
Common Lisp has a rather complete Cocoa-binding library, but it mostly works with Clozure CL and not so well with other Common Lisp compilers.
Sadly, the GTK bindings are not in a shape that is enough to fulfill our needs. We've tried to get Next running with CL-CFFI-GTK on GNU/Linux for many months, to no avail.
It turns out that this is not an isolated issue. Because of the complexity of GUI toolkits and how challenging it is to write bindings for various languages, there might not be a single language out there in which we can program complex graphical user interfaces that works well enough on all platforms.
Split-process design: A cure for portable interfaces
Next is all about the controller. Its selling point is to be infinitely extensible, so we really needed to write it in Lisp.
The GUI toolkit and the renderer are only secondary. If we can't get all three components to work together in Lisp, then couldn't we just get the GUI and the renderer to work in a separate piece of software written in their native language?
Then we would have only one thing left: the controller, manipulating the GUI and the renderer via some RPC protocols. This way, we would get both the GUI and the renderer out of the way and solve both the language and the portability issue once and for all.
Choosing an RPC protocol
We chose XML-RPC, mostly because we had to choose one of the many options. XML-RPC is simple, wildly supported, old and stable.
It operates very simply: the server registers a set of callbacks to execute when receiving a method name. The client sends an HTTP request whose body is a simple XML document with just two entries: the method name and its parameters, if any.
While XML is more verbose (and arguably uglier) than JSON, it does not matter much because even from a developer's perspective we never get to look at the XML part. It's all handled by the XML-RPC clients and servers.
XML-RPC can be used over HTTP sockets, and this is what we do. This comes with a nice side effect: we don't even need to run the separate parts on the same machine, they can be connected remotely over the Internet!
Implementation: The platform port and the Lisp core
We've implemented two processes:
The Lisp core which is a regular Common Lisp application firing up an XML-RPC server on startup.
The "platform port", which does three things:
- all the GUI parts,
- all the web rendering,
- and it also fires up its own XML-RPC server which responds to a set of specific requests.
The Lisp core and the platform port are both XML-RPC clients and servers, because they both independently need to send messages to the other party.
For instance, whenever the user presses a key, say
C-l, it is intercepted by the GUI toolkit and sent over to the Lisp core. From there, the Lisp core checks whether it knows the binding or not. If not, it sends a response to the GUI telling it to forward the binding the other handlers in the application (a key like
space would then scroll the page in the web view). If the binding is recognized, then the Lisp core calls the associated function, which in turn might send a new message to the GUI (in the case of
C-l, it would tell the GUI to open the minibuffer).
Indeed, once a platform port has been made to work, there is no more need to touch its code base. All future development can happen on the Lisp side!
Benefits of a split-process design
High-level browser library
In the end, the platform port is just a graphical user interface and some web rendering queries grouped together. That's it. For less than 1000 lines of C or Objective-C code, we have a fully working browser interface.
Which means that from a programmer's point of view, this is possibly the simplest working example of a fully-functional web browser interface with a clear, minimal list of all the features required by the controller.
Native look & feel
A nice consequence of the separation between the platform port and the controller is that it allows us to provides excellent user experience on any possible platform.
Multiple renderer support
Another cool side effect of this approach and that it is by design very natural to add support for extra web renderers.
For the initial release, we've opted for WebKit for a few reasons:
It's easy enough to embed into a project, possibly more than Gecko or Blink.
It works well on all platforms and has both Cocoa and GTK support.
In 2018, it's a call against the pressing monopoly of Blink. We wouldn't be too happy, would Internet ever become driven by a unique renderer whose development is not particularly community-oriented, while being subject to some arbitrary corporate policies.
Robustness and security
Because the platform port is so minimal, it is much easier to maintain and to fix bugs.
The security-sensitive part, i.e. the renderer, is contained into a relatively simple executable. It's possible to start this executable within a container, so that security issues with the renderer (or, who knows, with the GUI toolkit) won't ever reach beyond the boundaries of the RPC calls.
Resistance to web renderer API breakages
Last but not least, our design means that the web renderer is no longer a critical dependency.
Many times in the past, it happened that a web renderer API would break the backward compatibility, thus breaking all web browsers depending on it in the process.
When WebKit 2.0 came out, many WebKit-1-based browsers became obsolete. Rewriting them was so much work that many of them were simply abandoned.
More recently, it happened again with Firefox' extensions when XUL was dropped. Not only the extensions, but also Conkeror, which was a full-fledged browser written in XUL and became obsolete from one day to the next.
With our split-process design, should a web renderer break, we would only have to update or rewrite a platform port. Thus we get the guarantee that the Lisp core and all the community-written extensions will never break.