CLI in C++: Status Update
The semantic graph for the CLI language and the traversal mechanism are ready. I have also updated the parser to build the graph as the parsing progresses. If you would like to check it out, you can download the source distribution via these links:
See the first status update for more information on what to do with these files. This release adds a dependency on libcutl
which is a small C++ utility library that I developed. If you are not using the cli+dep
package, you will also need to get this library:
You can follow the development, including accessing a more detailed log of changes, via the git repository:
The semantic graph and the traversal mechanisms are quite interesting pieces of software. I have developed the ideas behind them over several years, starting in CCF (CORBA Compiler Frontend), and then continuing in XSD and XSD/e. The semantic graph is an in-memory data structure that captures the semantics of a particular language in a way convenient for analysis and code generation. The traversal mechanism is similar to the visitor pattern except it is inheritance-aware and more flexible (more about this below). We can use it to implement passes over the semantic graph which perform various analysis, graph transformations, or code generation.
The semantic graph is a heterogeneous directed graph. Its nodes are concepts of a particular language. For our CLI language we have: namespace
, class
, option
, C++ type
, and expression
. Edges represent relationships between nodes as dictated by the language semantics. For example, in our case, a namespace names one or more namespaces or classes. A class names one or more options. An option belongs to a type and can be initialized by an expression. So we have the following edges: names
, belongs
, and initialized
. Notice that all nodes are nouns and all edges are verbs.
Some concepts in the language have a common behavior or trait. For example, both namespace and class are a kind of scope. Namespace, class, and option all have a name (or, more accurately, are named by a scope). To capture this, we define a few base nodes, such as scope
, and nameable
. The following listing shows the inheritance relationship between nodes:
type nameable scope: nameable class: scope namespace: scope option: nameable
Each node and edge in the graph provides a way to navigate to other nodes/edges to which it is connected. For example, having a class
node, we can get all its outgoing names
edges and from each such edge get to the option
node. While such manual navigation can be used to, say, generate code, it is quite tedious. It is much more convenient to use the traversal mechanism.
For each node and edge type in the semantic graph there is a traverser class that implements common navigation patterns for this node or edge. For example, a namespace traverser by default iterates over and traverses all its names
edges. The traversal mechanism also does automatic, inheritance-aware type matching. An example will help illustrate what this means. Suppose we are traversing a namespace
node. It can name two kinds of nodes: classes and namespaces. If we don’t care which kind of node it is, we can pass the generic scope
-based traverser. Because both namespace and class are a kind of scope, our scope
traverser will be called for both kinds of objects.
But suppose we need to do something special for classes. In this case, we can pass a class
-based traverser in addition to the scope
-based one. In this case, classes will be handled by the class
traverser (because it is a better match than the scope
traverser) and the rest will be handled by the scope
traverser. If you would like to see how this type matching works in a very simple example, take a look at the tests/compiler/traverser
test in libcutl
.
The implementation of the semantic graph and the traversal mechanism completes the so-called frontend part of our compiler. Now we are ready to move to the backend part which is where we generate the code. Next time we will have the long promised discussion of the pros and cons of self-sufficient generated code vs generated code that depends on a runtime library. I will also start working on the backend infrastructure, such as opening the output C++ files, writing include guards, generating #include
directives, etc.