Literate programming: combining documentation and source code
One of the difficulties of programming is that code is ordered in the way that is required by the language rather than the order that we create it and think of it. Even when working with a system as productive and effective as Java, writing code usually consists of adding some code to one class, then some code to another class building up to a complete working system. Sitting down and explaining code to someone else often follows a similar route. You trace a thread of execution from one class to another until it can be seen how a complete task was accomplished. This is sometimes captured in a UML sequence diagram.
Donald Knuth first popularized a solution to the problem of understanding code with an idea he called literate programming, in the early 80's. His WEB system (which predates and is nothing to do with the world wide web) takes a single source document containing code and explanatory text laid out in the order that you would describe a program.
Run a WEB document through his Tangle program and it extracted the code parts and assembled them into a syntactically correct, though not very readable program. Run the same source through the Weave program and you get a TeX source that can create a beautifully typeset document. TeX produces a device independent print format called DVI that needs one further step to translate it into instructions for a particular printer language such as PostScript. In the hands of a skilled practitioner, a literate program can be read as a well structured essay.
Something slightly similar to Weave already exists in the Java Development kit supplied by Sun. The "javadoc" utility that works with some simple Java commenting conventions to produce HTML documents that act as an excellent reference to a collection of classes. But reading a reference manual is not necessarily the best way of understanding how to use a complicated API - hence the huge number of "how to" programming books that can be found in bookstores these days. The documents produced by Weave are anything but reference manuals - they are often works of art and can be read for enjoyment as well as explanation.
WEB produced output via Knuth's TeX system and worked for programs written in Pascal. Marius is a partial recreation of Knuth's ideas using Java as its programming language, and outputting HTML. Another relatively recent technology that has been taken advantage of in this project is XML.
Since the use of HTML has become common, the idea of marking up a document with a fixed set of tags has become quite familiar. XML is a standard related to SGML, that allows a document to be marked up with a set of tags that can be specially defined for the purpose. There are a number of programmer's tools available such as parsers that make it easy to work with XML. In case of Marius the source document has to be marked up into areas that represent code, and areas that represent explanation, XML is the ideal way to accomplish this. Taking advantage of the free XML parser available from Sun, means that projects such as Marius can be created in a remarkably short time - days rather than months.
The Marius source document will contain chunks of explanatory text, called "narrative" and chunks of Java code that will ultimately be assembled into working classes. However, since we want to present code in an order suitable for explanation, we will allow code to be presented before a class is defined, or indeed at any point in the document.
We rely on our version of Weave to take the source document and produce HTML that can be read with an ordinary web browser. We will use another program to take the source document and produce syntactically correct Java. In this case, we would like the Java to be readable as well so our program is called "Comb" rather than Knuth's original "Tangle".
