API Documentation Extraction

This page is about making documentation of the Vesta APIs available in a usable on-line format.

The Documentation Gap

Vesta has a sizable set of libraries with API that are available in 5 programming languages. There's not nearly enough documentation to help people make sense of how to use these interfaces.

The core code is written in C++ and is dvidied into 13 distinct libraries. (There are 3 sub-libraries that are a part of basics_umb and 10 that are a part of vesta_umb.) We've added interfaces to a subset of these libraries using SWIG to make them call-able from 4 more languages (Perl, Python, Tcl, and Java). Those interfaces have some subtle differences to make them map more naturally onto the different languages. (For example, some parts of the C++ API use call-back functions to process a set of items, and in the wrapped APIs instead return some list of data structures.)

Unfortunately, there's not much documentation of all this. The C++ header files do include some useful comments, but making use of them requires finding the right source file and then reading through the C++. This is a non-trivial exercise for programmers who wish to work in one of the other languages. Also, the subtle changes made in the wrappers are undocumented except for in the SWIG input files. So, in order for someone to make use of the wrapped APIs in these other languages, they would have to read both the C++ header files which define the core API and then read the SWIG input file. This is obviously too much effort to expect.

What Users Need

Processing and Intermediate Formats

For the core APIs, it seems best to extract documentation from the code itself (i.e. a literate programming approach). To get the most out of this will require the addition of structured comments to the source code, though some tools should allow us to generate at least some minimal indexing and documentation with the code as it is today.

Documenting the SWIG interfaces will probably require creating our own documentation annotation and extraction system. (SWIG 1.1 had a documentation generation system. We're using SWIG 1.3 which currently lacks a cohesive documentation system, though it has some limited support for generating documentation for certain languages.) Hopefully the limited scope will keep it from being too much of a burden.

We believe that the right solution will involve XML as an intermediate format. This should enable both flexible presentation through XSL and some advanced cross-indexing and searching capabilites (such as searching for every function which takes a particular type as an argument).

It seems that doxygen can output XML which may prove quite useful for the C++ APIs. SWIG (which we already use) also has an XML output capability, but it is totally undocumented as of Dec 6, 2005.

What Exisits Today

The only guide which provides any help today in understanding the Vesta APIs is the vcheckout dissection.

Next Steps

There are at least two other kinds of documentation which is would be nice to incorporate into the code and keep up to date automatically: