CLI in C++: The Ideal Solution
This is the third installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous posts were:
Today I would like to explore the solution space and get an idea about what the ideal solution might look like.
Using the terminology introduced in the previous post, an application may need to access the following three objects that result from the command line parsing: commands, options, and arguments. Both commands (or, usually, just one command) and arguments are homogeneous arrays of strings. It is normally sufficient to present them to the application as such, either directly in the argv
array by identifying their start/end positions or as separate string sequences. Options, on the other hand, are a more interesting problem.
If we start thinking about the form in which we could make the parsed options information available to our applications, several alternatives come to mind. In a very simple application we might have a variable (global or declared in main()
) for each option. The CLI parser then sets these variables to the values provided in the command line. Something along these lines:
bool help = false; bool version = false; unsigned short compression = 5; int main (int argc, char* argv[]) { cli::parser p; p.option ("--help", help); p.option ("--version", version); p.option ("--compression", compression); p.parse (argc, argv); if (help) { ... } }
The major problem with this approach is that it does not scale to a more modularized design. In such applications each module may have a specific set of options. For example, in the XSD and XSD/e compilers the compiler driver, frontend, and each code generator has a unique set of options. Placing the corresponding variables all in the global namespace is cumbersome. They are more naturally represented as member variables in the corresponding module classes.
Of course, nothing prevents us from parsing directly into member variables using the above solution. However, it requires that all the classes that hold option values be instantiated before command line parsing can begin. This creates a chicken and egg problem since these classes often need the option values in their constructors. The only way to resolve this problem with the above approach is to first parse the options into temporary variables which are then used to initialize the modules. Here is an example:
struct compressor { compressor (unsigned short level); }; int main (int argc, char* argv[]) { bool help = false; bool version = false; unsigned short compression = 5; cli::parser p; p.option ("--help", help); p.option ("--version", version); p.option ("--compression", compression); p.parse (argc, argv); compressor c (compression); }
Another drawback of this approach is the need to repeat each option name twice: first as the variable name (e.g., help
) and then as the option name (e.g., "--help"
). Furthermore, in the case of global variables, there are two distinct places in the source code where each option must be recorded: first as the variable name and then as the call to option()
. In non-trivial allocations the global option variables would most likely also be declared as extern
in a header file so that they can be accessed from other modules. This brings the number of places where each option is recorded to three.
The alternative approach to storing the option values in individual variables is to have a dedicated object which holds them all. The application can then query this object for individual values. Logically, such an object is a heterogeneous map of option names to their values and we can use the map interface to access individual option values. Here is how this might look:
int main (int argc, char* argv[]) { cli::parser p; p.option<bool> ("--help"); p.option<bool> ("--version"); p.option<unsigned short> ("--compression", 5); cli::options o (p.parse (argc, argv)); if (o.value<bool> ("--help")) { ... } }
There are a number of drawbacks with this interface. The first is the use of strings to identify options. If we misspell one, the error will only be detected at runtime. The second drawback is the need to specify the value type every time we access the option value. Then we have the verbosity problem as in the previous approach. Option names and option types are repeated in several places in the source code which makes it hard to maintain.
The alternative interface design would be to have an individual accessor for each option. Something along these lines:
struct options: cli:options { options () : help_ (false), version_ (false), compression_ (5) { // The option() function is provided by cli::options. // option ("--help", help_); option ("--version", version_); option ("--compression", compression_); } bool help () const; bool version () const; unsigned short compression () const; private: bool help_; bool version_; bool compression_; }; int main (int argc, char* argv[]) { cli::parser<options> p; options o (p.parse (argc, argv)); if (o.help ()) { ... } }
While we have solved all the problems with accessing the option values, the declaration of the options
class is very verbose. For each option we repeat its name five times plus we have to manually implement each accessor, initialize each option variable with the default value, as well as register each option with cli:options
. We could automate some of these step by using functor objects to store the option values as well as implement the accessors, for example:
struct options: cli:options { options () : help (false), version (false), compression (5) { option ("--help", help); option ("--version", version); option ("--compression", compression); } cli::option<bool> help; cli::option<bool> version; cli::option<unsigned short> compression; };
We could also get rid of the explicit calls to the option()
function by making the cli::option
object automatically register with the containing object (we would need to use a global variable or a thread-local storage (TLS) slot to store the current containing object). Here is how the resulting options
class could look:
struct options: cli:options { options () : help (false, "--help"), version (false, "--version"), compression (5, "--compression") { } cli::option<bool> help; cli::option<bool> version; cli::option<unsigned short> compression; };
With this approach we have reduced the number of option name repetitions from five to three.
How does the above approach address the issue of modularized applications that we brought up earlier? One alternative would be to have the corresponding member variables added manually to module classes and then initialized with values from the options
object. For example:
struct compressor { compressor (unsigned short level) : level_ (level) { } private: unsigned short level_; }; int main (int argc, char* argv[]) { cli::parser<options> p; options o (p.parse (argc, argv)); compressor c (o.compression ()); }
Alternatively, we could use the options
object directly by inheriting the module class from it. For that, however, we would also need to split the options
object into several module-specific parts, for example:
struct compression_options: virtual cli:options { compression_options () : compression (5) { option ("--compression", compression); } cli::option<unsigned short> compression; }; struct compressor: private compression_options { compressor (const compression_options& o) : compression_options (o) { } }; struct options: compression_options { options () : help (false, "--help"), version (false, "--version") { } cli::option<bool> help; cli::option<bool> version; }; int main (int argc, char* argv[]) { cli::parser<options> p; options o (p.parse (argc, argv)); compressor c (o.compression ()); }
At this point it appears that we have analyzed the drawbacks of all the practical approaches and can now list the properties of an ideal solution:
- Aggregation: options are stored in an object
- Static naming: option accessors have names derived from option names
- Static typing: option accessors have return types fixed to option types
- No repetition: the option name and option type are specified only once for each option
With these properties figured out, next time we will examine the drawback of the existing solutions, namely the Program Options library from Boost as well as my previous attempt at the CLI library which is part of libcult. As usual, if you have any thoughts, feel free to add them as comments.
July 14th, 2009 at 8:11 pm
I’m sorry that I don’t have time to come up with a concrete code suggestion, but I’m wondering if Boost Fusion’s associative tuples might be able to come to the rescue.
Very interesting blog, a pleasure to read! Also, I’ve been using XSD in a project and it’s very nice! Thanks.
July 17th, 2009 at 8:07 pm
Jesse,
Thanks, I am glad you are enjoying both (the blog and XSD).
Regarding Fusion’s associative tuples: This is the same approach as the one used in the CLI parser from libcult which is discussed in the next post. In particular, there is the limit on how many tuples you can have in the map (10 by default, which is way too low for a CLI parser). They also use partial template specialization to implement all this apparatus.
As mentioned in the next post, this approach does not scale to a large number of options (say, 100). In this case the compilation time and the object code size (due to the enormous symbol length) become unacceptable. There is also the verbosity problem since we have to repeat each option name at least three times.
The implementation can be improved using the C++0x variadic templates feature. There is nothing that we can do about the verbosity problem, however (see the next two posts for details).
But nice idea, nevertheless. This is probably as good as we can get if we have to define the command line interface in the C++ source code.