CLI in C++: Problem and Terminology
This is the second installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous post was CLI in C++: Project Introduction. Today I would like to talk about the problem that we are trying to solve as well as establish the terminology.
Command line interface is the most universal way of passing information from a caller to a program. The concept of a command line is part of most operating systems and programming languages including C++. However, the model and terminology for command line processing vary significantly among different platforms.
The Single UNIX Specification includes Utility Argument Syntax Conventions and Guidelines which document basic terminology for command line processing. The Single UNIX Specification model is the least common denominator for different UNIX implementations. It is minimal and targets system utilities rather than a wider range of applications and their possible requirements. Another de-facto command line processing model is the GNU Standard for Command Line Interfaces which generally encourages conformance to the Single UNIX Specification but adds a few extensions and uses different terminology. Our CLI parser will need to handle a wider range of use cases than those covered by these two standards. We would therefore need to establish a more complete model and associated terminology for command line processing.
Command line is an array of character strings and not just a string with spaces between words as is sometimes incorrectly assumed. Each string in a command line array is referred to as argument. The first argument usually contains the name of the executable.
The interpretation of arguments is completely up to the program logic, however, conventions exist that vary among different systems. Usually some arguments are translated into higher-level objects such as commands and options. These objects form a model for command line processing and are defined below.
Command is usually a word or a single letter that represents a command to the program logic. Neither the Single UNIX Specification nor the GNU Standard for Command Line Interfaces has the notion of a command. Other terms for command include action and function. Command is usually (but not necessarily) the first argument after the executable name. Here are a few examples:
tar x
Here we have a one letter command x
(extract). In GNU tar manual it is called functional letter.
tar xvf
Here we have three commands encoded as a single letter each. Semantically, only x
is a command while v
(verbose) and f
(read from a file) are options.
openssl req
Here we have a word command req
(operations with certificate requests).
cvs checkout foo
Here we have a word command checkout
and command argument foo
.
tar --help
Even though --help
is usually considered an option, semantically it is a command.
Option consists of option name and, optionally, one or more option values. Options are normally optional. Non-optional options are better represented by commands or arguments.
An option name takes up one argument. Option names usually start with a prefix, for example, --compile-only
, -c
or /c
. This helps distinguish them from commands and arguments. Option names may have aliases, for example, for option name --output-dir
there could be the -o
alias.
An option without a value is always optional and represents an option with an implied binary value ({0, 1} or {false, true}). Such an option is sometimes called flag.
An option can be associated with a program or a command. Thus the concept of an option can be further refined to program option and command option. Program options alter behavior of the program as a whole while command options are only affecting and may only be valid for a particular command. For example:
g++ -o hello.o hello.cpp
Here we have an option with name -o
which has value hello.o
. hello.cpp
is an argument.
ls -l
Here we have a flag with name -l
.
cvs -z 6 checkout -P foo
Here we have a program option with name -z
and value 6
(set compression level to 6). checkout
is a command. -P
is a command flag (prune empty directories). foo
is a command argument.
Argument usually represents an input value or a parameter and can be mandatory or optional. The interpretation of arguments is application-specific. The same as with the options, the concept of an argument can be further refined to program argument and command argument.
Note that above we are using the term argument to mean both an element in the command line string array as well as the input value to the program that is distinct from commands and options. From the operating system point of view every item that is passed to a program via command line is an argument. It is up to the program to interpret them as commands, options, or arguments proper. The special --
argument is often used to indicate that all the following arguments must be treated as arguments proper.
It may seem premature to establish such a complete model for the initial version of the CLI parser that we are designing, especially because most applications will only use the basic subset of this model (options and arguments). I, however, prefer to think things through on the conceptual level even if there are no immediate plans to support them in the code. This way when designing the first version I can make sure that I at least understand how the complete model will fit or can be supported in the future versions without a complete redesign.
In its simplest form the task of parsing a command line boils down to determining if one or more options are specified in the command line string array and presenting this information to the rest of the application in a convenient way. This process is complicated by the fact that options can normally appear in arbitrary order. Some options may also have values, in which case they need to be extracted and converted into suitable data types (for example, the compression level probably needs to be converted to an integer). While most option values will use simple types such as integers and strings, it is plausible that conversions to application-specific types may be required. The parsing code also needs to perform reasonable error handling, such as detecting unknown options, missing option values, and value conversion failures.
The application may also need to set the default values for some options. These values are then used by the program logic in case the corresponding options were not specified.
Handling of commands and arguments is usually quite a bit simpler. Once the options have been parsed, the starting positions of a command and arguments in the command line array become known and they can be accessed by the application directly.
There is also the related problem of producing command line documentation, such as program usage information and man pages. Some applications, especially with a large number of options, may also want to allow their users to specify command line arguments in one or more files in addition to the command line proper.
And that’s it for today. If you have any thoughts, feel free to add them as comments. Next time we will try to understand what an ideal solution to the CLI parsing problem might look like. We will also analyze the shortcomings of some of the existing implementations. For that I would like to consider the Program Options library from Boost as well as my previous attempt at the CLI library which is part of libcult. We will also briefly examine whether any new features planned for C++0x could be used to address these shortcomings.
June 11th, 2009 at 6:55 am
Have you considered adhering to the YANG specification from IETF?
June 11th, 2009 at 1:09 pm
Malisha,
Interesting idea, thanks for bringing it up (for those like myself who have never heard of YANG, here is the website: http://www.yang-central.org). I am definitely going to to think about this some more and bring it up in one of the coming posts. Also, if you have any thoughts on how we might describe/handle CLI with YANG, feel free to add them here. Perhaps an example command line interface description written in YANG?
Boris