CLI in C++: DSL-based Designs
This is the sixth installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous posts were:
In the last post we analyzed design approaches which have the command line interface defined in the C++ source code. Today we will start exploring designs that rely on domain-specific languages (DSL).
A DSL is a special-purpose language tailored for a specific domain or problem. We have two broad choices when it comes to the DSL-based designs. We can try to reuse or retrofit an existing language to describe the command line interface. Or we can design our own command line interface definition language. The main advantage of the first approach is the ability to base our implementation on an existing compiler implementation. The main disadvantage lies in the difficulty of reusing an existing language for a different purpose. If a language is fairly generic, then the resulting CLI definition will most likely end up overly verbose. On the other hand, if a language is tailored to address a more specific problem, we may be unable to use it to capture some of the aspects of the command line interface. A good example of this problem would be a hypothetical language that describes objects containing typed name-value pairs. We could use the pair’s name to capture the option name. However, options may have aliases (e.g., --help and -h) and it would be impossible to capture them in such a language. If we decide to design our own language for CLI definition, then we can make it a perfect fit for our requirements. However, we will have to implement the compiler from scratch.
One existing DSL language that was suggested by Malisha Mogilny is YANG. YANG is a data modeling language used to describe configuration and state data. Here is how we could model the CLI definition using YANG:
module example
{
  container options
  {
    leaf help
    {
      type boolean;
    }
 
    leaf version
    {
      type boolean;
    }
 
    leaf version
    {
      type uint16;
      default 5;
    }
  }
}
This definition would be mapped to C++ code along these lines:
namespace example
{
  class options
  {
  public:
    options ()
      : help_ (false),
        version_ (false),
        compression_ (5)
    {
    }
 
    bool help () const;
    bool version () const;
    unsigned short compression () const;
 
  private:
    bool help_;
    bool version_;
    unsigned short compression_;
  };
}
There is a number of problems with reusing YANG for command line interface definition. The language is very big and 90% of it does not apply to CLI. There is no easy way to define name aliases for options (we could use the extension mechanism, but it gets quite verbose). The YANG type system uses names for built-in types that differ from those in C++. As a result, we will need to provide a mapping between YANG types and C++ types. Finally, the definition presented above is verbose, it has too much syntax. Compare it to the following definition which we can achieve with our own language:
namespace example
{
  class options
  {
    bool --help|-h;
    bool --version;
    unsigned short --compression = 5;
  };
}
Which brings us to the custom DSL design alternative. The above example is the most elegant and concise CLI definition that we have seen so far. We can also support user-defined C++ type which won’t be possible if we are reusing an existing language. For example:
#include <string>
#include <vector>
#include <boost/regex.hpp>
 
namespace example
{
  class options
  {
    std::vector<std::string> --names;
    boost::regex --expr (".*", boost::regex::perl);
  };
}
Until now we have identified and analyzed three broad design alternatives: the native design, reusing an existing DSL, and creating our own language for CLI definition. The first approach is the simplest but, as we have discussed in the previous posts, it has a number of problems, including verbosity and implementation issues. Reusing an existing DSL will most likely also result in a sub-optimal solution as we have seen today. Designing our own language involves the largest amount of work but gives us complete control and theoretically allows us to design a truly ideal solution. Since we are after an ideal solution, having our own DSL appears to be the only viable way to achieve this. So next time we will start designing our own CLI definition language. As always, you are welcome to add your thoughts on this in the comments.
October 21st, 2009 at 8:58 pm
Hi Boris,
Very interesting blog. Why not using an xml schema to represent command/argument/option structure and implement a parser that creates a CLI reader?
October 22nd, 2009 at 8:17 am
Hi Jack,
There would be several issues with XML Schema, similar to YANG. First, it would be very verbose. Second, there is no language mechanism for capturing option aliases. While we could use default values in attributes to model default option values, they will only work for built-in XML Schema types. Then there is the issue of using custom C++ types for options which won’t be easy with XML Schema.
I am planning to release CLI 1.0.0 on Sunday. You can try it and then see if you can achieve something that simple and elegant with XML Schema ;-).
October 23rd, 2009 at 12:40 pm
I will definitely give it a go because I need that and I think you did a fantastic job with xsd/xsde.
Also, do you think your project could be adapted for other kind of interfaces? I have xlw in mind (Excel interfacing).
Keep up the good work, it makes developing a real pleasure! =D
October 23rd, 2009 at 1:22 pm
Hi Jack,
I am not sure what you are want to achieve with XLW. Do you want to be able to access the command line data from Excel?
The CLI language is specific to the concepts of the command line interface and, to a lesser extent, C++. The upside of this is that it is very clean and simple. The downside is that it is hard to reuse for something else.