CLI in C++: CLI to C++ Mapping

This is the eighth installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous posts were:

In the last post we designed our CLI language which is a domain-specific language (DSL) for defining a program’s command line interface. Today we are going to see how to map the CLI language constructs to C++ constructs.

At the end of the previous post we had a list of high-level CLI language features which I am going to repeat here:

  • comments
  • namespaces
  • option class
  • option declaration
  • option inheritance
  • C++ inclusion
  • CLI inclusion
  • using declarations/directives and typedef’s
  • option documentation

We also agreed that only a subset of them will end up being supported in the initial release. But for the same reasons as with the CLI language itself, we are going to discuss the mapping of all of these constructs to C++ even though initially we are only implementing a small subset of them.

On the file level, the CLI compiler will map each CLI file to a set of C++ files, such as C++ header file, C++ inline file, and C++ source file. In other words, if we compile options.cli, we will end up with options.hxx, options.ixx, options.cxx (or the same files but with alternative extensions).

Comments are ignored. In some cases it could make sense to copy comments over into the generated code. However, there is no way to distinguish between such “documentation” comments and comments that are for the CLI definition itself. For example:

class options
{
  // Show help.
  //
  bool --help|-h;
 
  // Note: version has two aliases.
  //
  bool --version|-v|-V;
};

In this example it could be useful to copy the first comment to the generated code but not the second. The first comment should actually be made a documentation string which can then be reproduced as a comment in the generated C++ code.

CLI namespaces are mapped to C++ namespaces. This one is simple.

Similarly, option classes are mapped to C++ classes. We will need to provide the copy constructor and assignment operator. We will also need to provide a constructor to instantiate this class from the argc, argv pair.

Since the options may be followed by a number of arguments, this last constructor will need a way to tell the caller where the options end and the arguments begin. There are two ways this can be done. The first is to pass by reference an index argument which is set by the constructor to the position of the first argument. The second approach is to modify the argc and argv data by removing the entries that were consumed by the options class. The second approach is more convenient but is not usable if we need to re-examine the argv elements corresponding to the options. Finally, both versions will have one additional argument which will allow us to specify where in the argv array we should start. By default it will be the second element, after the program name. Here is how all this will look in C++:

class options
{
public:
  options (int& argc,
           char** argv,
           size_t start = 1);
 
  options (int argc,
           char** argv,
           size_t& end,
           size_t start = 1);
 
  options (const options&);
  options& operator= (const options&);
 
  ...
};

Another aspect that we will need to take care of is error handling. In particular, the argc/argv parsing constructors may fail for a number of reasons, including because of an unknown option, missing option value, or invalid option value. The user of our class will need a way to access the declarations of the corresponding exceptions. To keep everything relevant to the functionality of our options parser in one place, we can add them to the generated options class, for example:

class options
{
public:
  // Exceptions.
  //
  typedef ... unknown_option;
  typedef ... missing_value;
  typedef ... invalid_value;
 
  // Constructors.
  //
  options (int& argc,
           char** argv,
           size_t start = 1);
 
  options (int argc,
           char** argv,
           size_t& end,
           size_t start = 1);
 
  options (const options&);
  options& operator= (const options&);
 
  ...
};

Now let’s consider the central construct of our language, the option declaration. For each option we will generate a set of accessors and, optionally, modifiers to access/modify this option’s value. Most applications won’t need to modify option values after parsing so we will only generate modifiers if explicitly requested by the user with a compiler flag (e.g., –generate-modifiers). We will also need to generate a member variable which will store the option’s value. The names of the accessors, modifiers, and member variable will be derived from the option name. Finally, if the option has the default value or if it is a flag, we will need to add initializers to the constructors. For example:

class options
{
public:
  ...
 
  options (int& argc, char** argv, size_t start = 1)
    : help_ (false), compression_ (5)
  {
  }
 
  ...
 
  bool help () const;
  void help (bool); // optional
 
  short compression () const;
  void compression (short); // optional
 
protected:
  bool help_;
  short compression_;
};

Option inheritance is naturally mapped to C++ public inheritance. For example, these CLI definitions:

class common_options
{
  bool --help|-h;
  bool --version|-v;
};
 
class options: common_options
{
  short --compression = 5;
};

will be mapped to the following C++ definitions:

class common_options
{
  ...
};
 
class options: public common_options
{
  ...
};

The C++ inclusion is mapped pretty much verbatim to the C++ preprocessor #include. The only thing that we may need to do is to strip the ‘cxx:’ prefix from the path.

The CLI inclusion is a bit more complex. The purpose of CLI inclusion is to make CLI class declarations in one file visible in another file. This is necessary to support option inheritance. Since option inheritance is mapped to C++ class inheritance, the derived C++ class declaration in one file will also need to “see” the base class declaration in another file. As a result, we will need to map CLI inclusions to C++ header inclusions. Consider the following two CLI files:

// file: common.cli
//
class common_options
{
  bool --help|-h;
  bool --version|-v;
};
 
// file: options.cli
//
include "cli:common.cli"
 
class options: common_options
{
  short --compression = 5;
};

When we compile these files, the generated C++ header files would look like this:

// file: common.hxx
//
class common_options
{
  ...
};
 
// file: options.hxx
//
#include "common.hxx"
 
class options: public common_options
{
  ...
};

Here, the CLI include is mapped to the C++ preprocessor #include with the CLI file name being transformed to the corresponding C++ header file name.

Using declarations and directives as well as typedef’s are copied verbatim to the generated C++ code.

Option documentation can be used to produce several kinds of output. Outside of C++ it can be used to generate man pages and HTML-formatted documentation (or fragments thereof). In C++, the user of the options class may want to print the usage information. To support this we can add the static usage() function to our class which prints the usage information to std::ostream, for example:

class options
{
public:
  ...
 
  static void usage (std::ostream&);
};

Some applications may also need to access individual option documentation in which case we can generate a set of static functions that will allow one to access this information. Finally, the short option documentation strings can be added as comments for the corresponding accessor and modifier functions.

And that covers the basics of the mapping between CLI and C++. Next time we will consider the pros and cons of self-sufficient generated code vs generated code that depends on a runtime library. Then we will need to decide which approach to use. In the meantime I am going to start working on the CLI language parser. Hopefully by next time I will have some code to show. As always, if you have any thoughts, feel free to add them in the comments.

Comments are closed.