CLI in C++: Separate vs Embedded Runtime
Monday, September 21st, 2009This is the ninth installment in the series of posts about designing a Command Line Interface (CLI) parser for C++. The previous posts were:
- Project Introduction
- Problem and Terminology
- The Ideal Solution
- Existing Solutions
- Native Designs
- DSL-based Designs
- CLI Definition Language
- CLI to C++ Mapping
In the last post we discussed the mapping of our CLI language to C++ and established that there will be some common support code, such as exception definitions, that we will need to place somewhere. There are two places where we can keep this code: We can either create a separate runtime library that will contain the support code and on which the generated code will depend. Or we can embed this code directly into the generated C++ files. Today we are going to consider the pros and cons of each approach.
The embedded runtime has the following advantages compared to the separate runtime library:
- No external dependencies
- Simple cases will not require extra generated files
- Can have source code (compared to a header-only library)
- Can easily support various naming conventions
- Can minimize the code by only generating what’s needed
- No runtime/generated code version mismatches
- Makes the use of the generated code in CLI much easier
Let’s consider each of these points in order. The embedded runtime will not require inclusion of any external headers or linking to any external libraries other than the C++ standard library. This will make the adoption of CLI very easy, in fact, easier than adopting a header-only library. All that needs to be done is to generate the C++ files from the CLI definition and add them to the application source code. This is especially important for a relatively inconsequential functionality such as command line parsing. The requirement to add an extra dependency, even a header-only, may override all the benefits that the CLI compiler will bring.
Most applications that use the CLI language will only have one options file. When we have only one file we can generate the runtime code into the same set of C++ files as the one containing the option class(es). Things are a bit more complicated when we have multiple options files. In this case we cannot generate the runtime code directly into the resulting C++ files because this will lead to re-definitions (if two generated header files are included into the same translation unit) or duplicate symbols. In this case we will need to generate the runtime code into a separate set of C++ files and then include the resulting header into other generated files. For example, we could have the --generate-runtime file
option which instructs the compiler to generate the runtime code in a separate set of C++ files and the --runtime file
option which tells the compiler that the runtime is in these C++ files.
The embedded runtime can have C++ source code unlike a header-only external runtime library. We would want to restrict the external runtime to be a header-only library in order to simplify adoption, since a header-only library does not require building. However, this restriction may force us to declare certain functions inline
even if they shouldn’t normally be inlined because of the potential code bloat.
One common complaint about generated code in general is that it fits poorly with the hand-written code. The major reason for this is that the generated code often doesn’t follow the same identifier naming convention as the one used in the project. For example, the project may be using “upper camel case” for type names (e.g., SimpleName
) while the generated code uses the standard C++ lower case and underscores (e.g., simple_name
). There is no technical reason (except for, maybe, complexity) why a code generator can’t support configurable naming conventions. In fact, that’s what we did in the C++/Tree mapping in XSD and it made a lot of people very happy. The only problem is that it is virtually impossible to support configurable naming conventions in a hand-written runtime library. But it should be quite easy to do with the embedded runtime since it is also generated by the compiler.
Because the code for the embedded runtime is generated for each application, we can minimize the output by omitting unused optional components. We can also decide whether to generate certain functions inline based on the application developer preferences.
Since with the embedded runtime there are no external dependencies, there are also no version mismatches that can occur when one of the components (generated code or runtime library) was upgraded and the other was not.
Finally, the embedded runtime approach makes it much easier to use the generated code in the CLI implementation itself. With the separate runtime library we will either have to keep an old copy around or risk breaking the generated code with backwards-incompatible changes that occur during development.
The embedded runtime approach also has a number of disadvantages:
- Hard to develop and maintain
- Bug fixes to the runtime require compiler rebuild
- Impractical for large runtimes
The embedded runtime is harder to develop and maintain than a separate runtime library. This is because the code has to be emitted by the compiler instead of simply sitting in a file. In particular, because the runtime code is embedded into the compiler source code as a collection of strings, it is a lot harder to read and write.
Fixing any bug that is found in the embedded runtime code will require a compiler rebuild. In case of a header-only runtime library the same can be accomplished by patching a few files and recompiling the application.
Finally, the embedded runtime approach quickly becomes impractical as the size of the runtime code grows. The difficulty of development and maintenance is one reason. The other reason is the lack of separate compilation. All of the embedded runtime code is contained in a single generated C++ source file. As the amount of code in the runtime grows, this file takes longer and longer to compile.
Now, which approach should we use in our case? The CLI runtime is going to be pretty small, or, at least, I expect it to be. Initially it will contain a few exception definitions and maybe a few helper classes. So the size shouldn’t be an issue. On the other hand, as we discussed above, it is very important to make the generated code as easy to adopt as possible. Making it self-sufficient and dependency-free sounds very attractive. So it looks like in our situation the advantages of the embedded runtime significantly outweigh its disadvantages.
At this point we have covered enough ground to make the first usable release of the CLI compiler. From the last status update I have started working on the backend infrastructure and at this stage the compiler is able to generate the output C++ files with all the #include
directives and the proper namespace structure. It is all in the source repository if you would like to take a look. From now on I will be working on generating the C++ mapping. Once this is done, we should be in good shape to release cli-1.0.0
. As always, if you have any thoughts, feel free to add them in the comments.