Archive for the ‘Design’ Category

CLI in C++: DSL-based Designs

Sunday, July 19th, 2009

This is the sixth installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous posts were:

In the last post we analyzed design approaches which have the command line interface defined in the C++ source code. Today we will start exploring designs that rely on domain-specific languages (DSL).

A DSL is a special-purpose language tailored for a specific domain or problem. We have two broad choices when it comes to the DSL-based designs. We can try to reuse or retrofit an existing language to describe the command line interface. Or we can design our own command line interface definition language. The main advantage of the first approach is the ability to base our implementation on an existing compiler implementation. The main disadvantage lies in the difficulty of reusing an existing language for a different purpose. If a language is fairly generic, then the resulting CLI definition will most likely end up overly verbose. On the other hand, if a language is tailored to address a more specific problem, we may be unable to use it to capture some of the aspects of the command line interface. A good example of this problem would be a hypothetical language that describes objects containing typed name-value pairs. We could use the pair’s name to capture the option name. However, options may have aliases (e.g., --help and -h) and it would be impossible to capture them in such a language. If we decide to design our own language for CLI definition, then we can make it a perfect fit for our requirements. However, we will have to implement the compiler from scratch.

One existing DSL language that was suggested by Malisha Mogilny is YANG. YANG is a data modeling language used to describe configuration and state data. Here is how we could model the CLI definition using YANG:

module example
{
  container options
  {
    leaf help
    {
      type boolean;
    }
 
    leaf version
    {
      type boolean;
    }
 
    leaf version
    {
      type uint16;
      default 5;
    }
  }
}

This definition would be mapped to C++ code along these lines:

namespace example
{
  class options
  {
  public:
    options ()
      : help_ (false),
        version_ (false),
        compression_ (5)
    {
    }
 
    bool help () const;
    bool version () const;
    unsigned short compression () const;
 
  private:
    bool help_;
    bool version_;
    unsigned short compression_;
  };
}

There is a number of problems with reusing YANG for command line interface definition. The language is very big and 90% of it does not apply to CLI. There is no easy way to define name aliases for options (we could use the extension mechanism, but it gets quite verbose). The YANG type system uses names for built-in types that differ from those in C++. As a result, we will need to provide a mapping between YANG types and C++ types. Finally, the definition presented above is verbose, it has too much syntax. Compare it to the following definition which we can achieve with our own language:

namespace example
{
  class options
  {
    bool --help|-h;
    bool --version;
    unsigned short --compression = 5;
  };
}

Which brings us to the custom DSL design alternative. The above example is the most elegant and concise CLI definition that we have seen so far. We can also support user-defined C++ type which won’t be possible if we are reusing an existing language. For example:

#include <string>
#include <vector>
#include <boost/regex.hpp>
 
namespace example
{
  class options
  {
    std::vector<std::string> --names;
    boost::regex --expr (".*", boost::regex::perl);
  };
}

Until now we have identified and analyzed three broad design alternatives: the native design, reusing an existing DSL, and creating our own language for CLI definition. The first approach is the simplest but, as we have discussed in the previous posts, it has a number of problems, including verbosity and implementation issues. Reusing an existing DSL will most likely also result in a sub-optimal solution as we have seen today. Designing our own language involves the largest amount of work but gives us complete control and theoretically allows us to design a truly ideal solution. Since we are after an ideal solution, having our own DSL appears to be the only viable way to achieve this. So next time we will start designing our own CLI definition language. As always, you are welcome to add your thoughts on this in the comments.

CLI in C++: Native Designs

Sunday, July 12th, 2009

This is the fifth installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous posts were:

Today we will start exploring the possible design alternatives for a CLI parser. But first, let’s divide all the possible designs into two categories. In the first category there are designs that define the command line interface in the C++ source code itself. We will call them native. In the second category there are designs that define the command line interface outside of C++, in the so-called domain-specific language (DSL). Such a definition is then translated to C++ using a DSL compiler. We will call these types of design DSL-based. The first approach is preferable since it is more flexible, easier to maintain, and, overall, keeps things simple. If we cannot achieve the ideal solution using this design, then we will need to decide whether the drawbacks of the best solutions from the first category outweigh the trouble of going the DSL route. Today we will concentrate on the native designs.

Let’s also reiterate the properties of the ideal solution that we have established so far:

  1. Aggregation: options are stored in an object
  2. Static naming: option accessors have names derived from option names
  3. Static typing: option accessors have return types fixed to option types
  4. No repetition: the option name and option type are specified only once for each option

The two native solutions that we have seen so far and that have come closest to the ideal are the functor-based design and the template-based design. Here is the recap of the functor-based CLI definition:

struct options: cli:options
{
  options ()
    : help (false, "--help"),
      version (false, "--version"),
      compression (5, "--compression")
  {
  }
 
  cli::option<bool> help;
  cli::option<bool> version;
  cli::option<unsigned short> compression;
};

And here is the template-based version:

extern const char help[] = "help";
extern const char version[] = "version";
extern const char compression[] = "compression";
 
typedef
cli::options<help, bool,
             version, bool,
             compression, unsigned short>
options;
 
typedef cli::options_spec<options> options_spec;
 
int main ()
{
  options_spec spec;
  spec.option<compression> ().default_value (5);
  ...
}

Both solutions satisfy the first three properties but fail the “No repetition” one. In both cases we have to repeat the option name at least three times.

To see whether we can improve on the functor-based design, we can try to analyze it on a more elementary level. To satisfy the second rule (static naming), we will have to have a C++ identifier (i.e., a function or a functor name) corresponding to the option name. We will also need to have a string representation of the option name so that we can compare it to command line array elements during parsing. Since there is no easy way to get one from the other (the easiest method would probably be to use the debug information), we will have to repeat the option name at least twice. Thus the best definition that we can hope to achieve would be something along these lines (pseudo C++):

struct options: cli:options
{
  cli::option<bool, "--help"> help;
  cli::option<bool, "--version"> version;
  cli::option<unsigned short, 
              "--compression",
              5> compression;
};

Unfortunately, string literals cannot be template arguments, neither in the current C++98 nor in the upcoming C++x0. As a result, the function/functor declaration and the place where it is “connected” to the string representation of the option name have to be separated. As a result, the number of required option name repetitions becomes three.

With the template-based design, even if we could use string literals directly as template arguments, it would violate the second property (static naming). The use of variable names in accessing the option values guarantees that if we misspell any of them, it will be detected by the compiler.

Each approach also has a number of implementation-related problems. In the functor-based design the use of functors instead of normal member functions makes the resulting options class harder to understand. Functors cannot be easily overridden should we decide to make some of the accessors virtual. This design also needs a global (or thread-local) variable to implement automatic option registration. There is nothing we can do about either of these drawbacks without greatly increasing the verbosity of the CLI definition.

As we have discussed in the previous post, the template-based approach does not scale to a large number of options. But can its implementation be improved using C++x0? At the first glance the variadic templates look promising . However, this feature only supports a single unbounded template argument. In other words there is no way to have a “parallel” pair of unbounded template arguments (option type and option name in our case). One way to resolve this is to wrap each option declaration into a separate type, for example:

typedef
cli::options<cli::option<help, bool>,
             cli::option<version, bool>,
             cli::option<compression, unsigned short>>
options;

So with the help of C++x0 we can make the template-based implementation scale but this comes at the cost of increased verbosity.

In the next post we will explore possible DSL-based design alternatives. Once this is done we will have to weigh the pros and cons of using native vs DSL-based designs and decide which way to go. If you have any thoughts or maybe another promising native design that I have missed, feel free to add them as comments.

CLI in C++: Existing Solutions

Sunday, July 5th, 2009

This is the fourth installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous posts were:

In the last post we analyzed various ways to represent the options information in the application as well as established a set of properties that the ideal solution would have. Today we examine two existing implementations, the Program Options library from Boost as well as the CLI library from libcult, and determine how close these solutions approach the ideal.

The Boost Program Options library provides three ways to represent the options information: as a set of variables defined by the user (except for options without values which we call flags), as a heterogeneous map of option names to values, and by calling and passing the value to a user-provided callback function. All three methods can be used simultaneously for different options. In the previous post we have analyzed the first two approaches to storing the options information. The third approach seems rather cumbersome since it makes the user go an extra mile to get the option value. I can’t think of any scenario where it would be much more convenient than the other two. Here is an example of using the first approach with Program Options:

unsigned short compression;
 
int main (int argc, char* argv[])
{
  po::options_description desc;
  desc.add_options ()
    ("compression",
     po::value<unsigned short>(&compression)->
       default_value (5),
     "compression level");
 
  po::variables_map vm;
  po::store (
    po::parse_command_line (argc, argv, desc), vm);
  po::notify (vm);
}

As we have noted before, this approach does not scale well to the modular design of more complex applications and suffers from the verbosity problem (the option name is repeated three times, the option type — twice).

And here is an example that instead uses the heterogeneous map to store option values:

int main (int argc, char* argv[])
{
  po::options_description desc;
  desc.add_options ()
    ("help", "show usage information")
    ("version", "show version")
    ("compression",
     po::value<unsigned short>()->default_value (5),
     "compression level");
 
  po::variables_map vm;
  po::store (
    po::parse_command_line (argc, argv, desc), vm);
  po::notify (vm);
 
  if (vm.count ("help"))
  {
    ...
  }
 
  compressor c (
    vm["compression"].as<unsigned short> ());
}

Again, as we have discussed before, this approach has a number of drawbacks including the use of strings to identify options and the need to specify the option type every time we retrieve its value. And we also have the verbosity problem as in the previous approach.

Overall, the use of operator() in Program Options to collect option descriptions makes the code feel foreign to the conventional ways of doing things in C++. Every time I look at it I need to make an effort to understand what’s going on there. The need to make three function calls just to parse the simplest command line feels arbitrary.

The next implementation to consider is the CLI library from libcult which was my previous attempt at designing a statically-named and typed options representation. Here is how we would handle the above example using this library:

extern const char help[] = "help";
extern const char version[] = "version";
extern const char compression[] = "compression";
 
typedef
cli::options<help, bool,
             version, bool,
             compression, unsigned short>
options;
 
typedef cli::options_spec<options> options_spec;
 
int main (int argc, char* argv[])
{
  options_spec spec;
  spec.option<compression> ().default_value (5);
  options o (cli::parse (spec, argc, argv));
 
  if (o.value<help> ())
  {
    ...
  }
 
  compressor c (o.value<compression> ());
}

The option accessors are statically-typed. The option names used as the template arguments are C++ variables so any misspelling is detected by the compiler. There is still the verbosity problem with three repetitions of the name for each option.

The more important problem with this approach, however, is in the implementation details. Templates and template specializations are used heavily to make this interface possible. With a handful of options this is not a problem. However, for applications with hundreds of options this becomes taxing in terms of the compilation time and object code size. The code size issue stems from the long symbol names caused by template instantiations with hundreds of template arguments.

To summarize, the Program Option library falls far short from the ideal we have established. The CLI library from libcult has most of the properties of the ideal solution but does not scale to the large number of options.

Next time we will start exploring the possible ways of implementing our ideal solution. If you have any thoughts, feel free to add them as comments.