Options documentation in CLI

After announcing CLI 1.0.0, the feature that was requested the most was the automatic documentation generation in the form of the program usage information and man/html pages. I myself wished for this feature while writing essentially the same description of the CLI compiler options in three different places. This also seems to be the last point of defense for the Boost program_options advocates ;-).

We have already considered support for documentation when we first talked about the CLI language. At that point the goal was to think about it just enough to make sure it will be possible without a major language redesign. Now that we are ready to implement this, we will need to think things through more thoroughly. Based on my past experience of documenting a large number of options for the XSD and XSD/e compilers, I have identified the following requirements for this feature:

  • Support for both short (usage) and long (man/html pages) descriptions.
  • Support for basic text formatting, namely, italic, bold, and monospace (code) fonts as well as paragraphs.
  • The documentation in the .cli file should look as close to plain text as possible.
  • The CLI language syntax used to capture the documentation should model C++ as closely as possible.

The first requirement stems from the fact that the usage information printed by the application is usually an abridged version of the complete documentation found in man/html pages. There are several ways in which this can be achieved: We can provide two versions of the documentation: short and long. Or we can use the first sentence from the long description as the short version. Finally, for simple options, the short and long descriptions can be the same. All these alternatives can make sense in different situations and we will need to support all three of them.

When it comes to providing basic formatting support, there are many ways to implement this. We could use the HTML tag system but it is fairly obtrusive. Alternatively, we could use one of the Wiki notations, for example, ''italic'', '''bold''', etc., but that is also quite verbose. I am leaning towards a LaTeX-like notation that can also be viewed as an extension of the C++ character escaping mechanism: \i{italic} \b{bold}, \i{code}, \bc{boldcode}, etc. It is also fairly light on the eyes when viewed in the source code. For the paragraph separation, a blank line seems like a natural choice. There is also the option argument that is normally set out with the italic style (man/html pages) or by enclosing it in angle brackets (e.g., <name>). While we could use the above formatting mechanism for this, it would be convenient to provide a shortcut for this special case by automatically recognizing the angle brackets and replacing them with italicized text where possible.

One part of ensuring that the option documentation looks as close to plain text as possible is to carefully select the formatting mechanism, which we have already done. The other part is to make sure the language syntax is not too obtrusive. Ideally, we would allow straight plain text in certain parts of the language but that makes it difficult to figure out where the text stops. Plus, such a mechanism would be fairly foreign to C++ and thus require some getting used to. Furthermore, one of the reasons for keeping the CLI language as syntactically close to C++ as possible is to allow the use of existing C++ code editors and indenters on .cli files.

To represent arbitrary text in C++ we would use a string literal. Since we may need to provide more than one string, the string array initialization syntax seems like a good choice. For example:

class options
{
  bool --help {"Show usage and exit."}
 
  int --compression = 5
  {
    "Set compression level.",
    "Set compression level between 0 (no compression)
     and 9 (maximum compression). 5 is the default.
 
     Setting the level to a higher value i{may}
     result in smaller output but may also require
     more memory and CPU time."
  }
};

Notice that in the long documentation for the last option we use a multi-line string literal which is illegal in C++ due to the way the C++ preprocessor works. Since we don’t have a preprocessor, we can allow such multi-line strings since they are quite convenient.

When we print the usage information for the above options, we would expect an output along these lines:

--help               Show usage and exit.
--compression <num>  Set compression level.

As you can see, we haven’t specified the argument name (<num>) for the second option anywhere in the documentation. To capture this information we will need to introduce the third string for non-flag options (those of a type other than bool). For example:

class options
{
  bool --help {"Show usage and exit."}
 
  int --compression = 5
  {
    "<num>",
    "Set compression level.",
    "Set the compression level to <num> which should
     be between 0 (no compression) and 9 (maximum
     compression). 5 is the default.
 
     Setting the level to a higher value i{may}
     result in smaller output but may also require
     more memory and CPU time."
  }
};

The <num> word will be automatically converted to num in the option description when producing the man and html output.

While the option documentation mechanism should be sufficient for the majority of cases, there will be situations where more advanced formatting is required. To support such cases we can provide a compiler option which would allow specifying a pre-formatted description for individual options.

When we need to print usage, the option description is only a part of the output. There is normally at least the command line synopsis, for example:

Usage: program [options] argument
 
Options:
--help               Show usage and exit.
--compression <num>  Set compression level.

While the options class will only print the option information, the rest can be printed manually, for example:

cerr << "Usage: program [options] argument" << endl
     << endl
     << "Options:" << endl;
 
options::print_usage (cerr);

A similar situation arises when we create the man/html pages. That is, the beginning of a man page as well as the end would normally be written manually. The CLI compiler can output just the options description which can then be combined with the prologue and epilogue manually. We can also provide two options, --prologue and --epilogue, which would allow the caller to specify the documentation prologue and epilogue files that will be automatically copied to the output.

I am going to think about this feature for a few more days and hopefully implement it over the next weekend. As always, if you have any thoughts, feel free to add them in the comments.

Comments are closed.