Project documentation and structure

In many cases, one of the major goals of any project involving code is to share it with others, either for them to use or to modify. For other users to do these things, they must first learn the purpose and functionality of the code, boht on a microscopic and macroscopic level. Lightly commenting code, although useful, is often not enough, and it is greatly beneficial to write complete documentation for the project. Rather than writing documentation from scratch, there are tools that can create documentation using the code itself and the comments surrounding them. These are called documentation generators.

Documentation generators

Documentation generators are programs that take a source directory containing code written in a particular language, and they output HTML or PDF files containing human-readable documentation (as opposed to comments that are just embedded in the code). There are various documentation generators for many languages, but they all have similar underlying concepts.

For every function and class, ideally there would be an entry in the corresponding documentation that specifies its purpose, inputs and outputs, potential errors it could throw, or anything else relevant for understanding its purpose. These explanations should be present in the code as well. There is a potential for duplicate and unnecessary work here.

However, documentation generators solve both problems at once. This is done by writing a block comment either just before or just after the definition of the function or class. Such block comments will have a specified format that allows for an overall summary, then more in depth descriptions of inputs, outputs, and any other useful information. The exact format is dependent on the documentation generator, and thus the programming language. Typically, the format of the comments will nicely complement the naming conventions of the corresponding programming language.

For example, Python makes use of Docstrings to embed comments in code that are directly linked to their corresponding functions or classes, and documentation generators take advantage of this. One particular format comes from the Google Python Style Guide.

def foo_bar(foo,bar):
     """
     An example of Google style docstring.

     Args:
          foo: This is the first paramter.
          bar: This is the second parameter.

     Returns:
          Description of function output.

     Raises:
          ValueError: Raises an exception.

     """

     try:
        foobar = foo + bar
     except ValueError:
        raise

     return foobar

This is one particular style of docstring, but there are others. Another format is referred to as reST style, which refers to the reStructured text files that Sphinx uses to generate its documents.

def foo_bar(foo,bar):
     """
     An example of reST style docstring.

     :param: foo: This is the first paramter.
     :param: bar: This is the second parameter.
     :returns: Description of function output.
     :raises: ValueError: Raises an exception.

     """

     try:
        foobar = foo + bar
     except ValueError:
        raise

     return foobar

To provide an example for another coding language, we turn to Javadoc, which documents code written in Java and has been around long enough to influence similar documentation generation tools for other languages.

/**
 * An example of Javadoc.
 *
 * @param foo This is the first argument
 * @param bar This is the second argument
 * @return Description of function output
 * @exception Exception
 */
 public FooBar addFooBar(Foo foo, Bar bar) {

      FooBar foobar = add(foo,bar)

      return foobar

 }

The format differences between all of these, even with the language differences, are mainly cosmetic. While writing code, it is convenient to choose a valid format in advance so a documentation generator can be used later. Later, we will describe how this is typically done, but beforehand, it is important to consider the structure of a project.

Project structure

Making your code easy to read, understand and use does not necessarily start or end with documentation, although that is a critical component. Something as simple as the directory structure for your project can make all the difference in the world.

For any complete repository or project, there are many moving parts: source code, test code, documentation, readme or installation instructions, and so on. Especially for large endeavors, it is important to seperate and organize these parts as much as possible. A common way to organize your project can be seen in the directory structure below.

|- project -|
   :
   :- README
   :- INSTALL
   :- LICENSE
   :- VERSION
   :
   |- source -|
   :  :
   :  : <source code here>
   :
   |- test -|
   :  :
   :  : <test code here>
   :
   |- doc -|
   :  :
   :  |- source -|
   :  :  :
   :  :  : <source code for building docs>
   :  :
   :  |- build -|
   :     :
   :     : <build location for docs>
   :
   |- examples -|
      :
      :- <possible example instances>

This is a fairly complete example. The main directory contains files like README and INSTALL, which respectively describes the project and provides instructions on how to install or compile. Other files that might be here are LICENSE, which describes permissions for using, distributing or modifying code, and sometimes VERSION file that describes the newest major changes to your code. These are either plaintext files or a lightweight markup language files like Markdown.

There are seperate directories for source code and test code, which keeps everything organized. The directory structure of both should resemble each other, since there is usually a one-to-one correspondence between source files and test files. The examples directory is self-explanatory and not always requried.

The other directory is doc, referring to the documentation of the code in the repository. There are two subdirectories here: one that contains the source code for the documentation, and the other that contains the HTML and/or PDF files. This suggests that documentation is created using a build system of some kind. Now we return to the question of how the files in the build directory are created.

Documentation build systems

Documentation generators use build systems to generate the HTML/PDF files desired. The bare minimum entails reformatting the block comments in the source code in a more readable format. But sometimes this is not enough. In some cases a developer may want to add more context in the source code documentation that is not already in the block comments. Indeed, many documentation generators have this functionality.

For example, one documentation generation tool for Python called Sphinx allows one to easily write HTML pages or a hyperlinked PDF file using many reStructuredText files. These files are created automatically, reading the docstrings in the source code. When additional context is desired, one can modify the reStructuredText files to provide further explanations, relevant internal and external hyperlinks, and any other necessary additions. To see how it works, install Sphinx and check out the documentation example.