Archive for the ‘Development’ Category

CLI in C++: Status Update

Wednesday, August 12th, 2009

Over the weekend I went ahead and implemented the lexical analyzer for the CLI language. If you would like to check it out, you can download the source distribution via these links:

The +dep version includes the project’s dependencies. See the INSTALL file inside for details. Alternatively, you can follow the development in the git repository:

There are a couple of interesting things to note about the lexer design: First is the handling of the include paths. To mimic the C++ preprocessor as closely as possible, we decided to allow both "foo" and <foo> styles of paths. However, in our case, the include statement is part of the language and, therefore, is tokenized by the lexer. The dilemma then is how to handle < and >, which can be used in both paths and, later, in option types that contain template-ids (for example, std::vector<int>) or expressions (for example, 2 < 3). If we always treat them as separate tokens, then handling of the include paths becomes very tricky. For example, the <foo/bar.hxx> path would be split into several tokens. On the parser level it will be indistinguishable from, say, <foo / bar.hxx>, which may not be the same or even a valid path.

To overcome this problem, the lexer treats < after the include keyword as a start of a path literal instead of as a separate token (path literals are handled in the same way as string literals except for having < > instead of " "). That’s one area where we had to bring a little bit of language semantics knowledge into the lexical analyzer.

Another interesting thing to know is handling of option names. To be convenient to use, option names should allow various additional symbols such as -, /, etc., that are not allowed in C++ identifiers. Consider this option definition as an example:

class options
{
  bool --help-me|-h|/h;
};

The question is how do we treat these additional symbols: as part of the option identifier or as separate tokens? Handling them as separate tokens presents the same problem as in the include path handling. Namely, the option name can be written in many different ways which will all result in the same token sequence. Making these additional symbols a part of an option identifier can be done in two ways. We can either recognize and make option names as a special kind of identifier or we can “relax” all the identifiers in the language to include these symbols. The first approach would be cleaner but is hard to implement. The lexer would need to recognize the places where option names can appear and scan them accordingly. Since there is no keyword to easily identify such places, the lexer would need to implement pretty much the same language parsing logic as the parser itself. This is a bit more semantics knowledge than I am willing to bring into the lexical analyzer.

This leaves us with the relaxed identifier option. One major drawback of this approach is the difficulty of handling expressions that involve -, /, etc. Consider this definition:

class options
{
  int -a-b = -a-b;
};

Semantically, the first -a-b is the option name while the second is an expression. In the initial version of the CLI compiler I decided not to support expressions other than literals (negative integer literals, such as -5 are supported by recognizing them as a special case). The more complex expressions can be defined as constants in C++ headers and included into the CLI file, for example:

// types.hxx
//
const int a = 1;
const int b = 2;
const int a_b = -a-b;
// options.cli
//
include "types.hxx";
 
class options
{
  int -a-b = a_b;
};

In the future we can support full expressions by recognizing places in the language where they can appear. That is, after = as well as between ( ) and < >. For example:

class options
{
  int -a = 2-1;
  int -b (2-1);
  foo<2-1> -c;
};

On the parser level, we will also need to tighten these relaxed identifiers back to the C++ subset for namespaces, classes, and option types.

With the initial version of the lexer complete, the next thing to implement is the parser. I will let you know as soon as I have something working.

Wabi-sabi in software design

Monday, May 11th, 2009

When I first heard about wabi-sabi, I thought the concept sounded interesting, particularly in the context of good software design. The other day I picked up a thin volume titled “Wabi-Sabi for Artists, Designers, Poets & Philosophers”. From its back cover:

“Wabi-sabi is the quintessential Japanese aesthetic. It is a beauty of things imperfect, impermanent, and incomplete. It is a beauty of things modest and humble. It is a beauty of things unconventional.”

The book makes a number of interesting points that I believe apply quite well to software design but this excerpt about the material simplicity of all things wabi-sabi is the one to frame and put on the wall:

“The simplicity of wabi-sabi is best described as the state of grace arrived at by a sober, modest, heartfelt intelligence. The main strategy of this intelligence is economy of means. Pare down to the essence but don’t remove the poetry. Keep things clean and unencumbered, but don’t sterilize. (Things wabi-sabi are emotionally warm, never cold.) This implies a limited palette of materials. It also means keeping conspicuous features to a minimum. But it doesn’t mean removing the invisible connective tissue that somehow binds the elements into a meaningful whole. It also doesn’t mean in any way diminishing something’s ‘interestingness’, the quality that compels us to look at that something over, and over, and over again.”

A profiling war story

Monday, May 4th, 2009

The other day I ran across an XML schema that took the XSD compiler unusually long to translate to C++. While the schema itself was quite large, almost 2MB of definitions spread over 80 files, the compilation time on my machine was excessive, about 160 seconds. We also had another report from several months ago where XSD would seem to run indefinitely while trying to compile a really large schema (15K element declarations across about 200 files). Unfortunately, the reporter couldn’t provide us with that schema for debugging due to distribution restrictions. All we could do was give them a new version of the compiler and they had to tell us whether it worked or not.

I set out to figure out what was going on using the schema I had suspecting that it was the same problem as the one affecting the other schema. The first step was to understand which part of the compilation process takes so much time. A good profiler can be very helpful in accomplishing this. I personally prefer OProfile mainly because: 1) it is system-wide so you can see the complete picture (your application, libraries, even the kernel), 2) it is transparent in the sense that you don’t need a special build of your application and all the libraries that it depends on, and 3) it has a low overhead which means that your application doesn’t take 10 times as long to execute when profiling.

I ran OProfile while the XSD compiler was compiling the schema. My first step in analyzing the program execution was to get a flat symbol list that shows how much time is spent in each function. Here is the top of the listing for XSD:

samples  %        image name  symbol name
 
405527   14.4254  libc.so     _int_malloc
334761   11.9081  libc.so     _int_free
287232   10.2174  libc.so     free
285081   10.1409  xsd         Cult::MM::allocate
199565    7.0989  libc.so     malloc
124309    4.4219  xsd         Schema::find
119779    4.2608  xsd         operator delete
104893    3.7312  xsd         std::map<...>::find
94964     3.3780  libc.so     wmemcmp
85689     3.0481  xsd         Cult::MM::Key::size
80308     2.8567  xsd         operator new
62390     2.2193  libc.so     strlen
37687     1.3406  xsd         std::string::_S_create
35205     1.2523  libc.so     memcpy

When you read articles about profiling you are normally presented with a listing similar to the one above where one or two functions at the top account for 90% percent of the execution time. This, however, doesn’t happen very often for complex programs that were built by competent engineers. Complex programs perform several operations in a pipe-line manner instead of just one algorithmic task (image conversion is good example of an algorithmic application where you may see a single function at the top taking 90% of the time). Competent engineering means that there are no silly programming or architectural mistakes that result in single-function bottlenecks. Instead, you normally see a listing like the one above: execution time is spread over many functions without any time standing out as unreasonable. Such information is completely useless in understanding what’s going on in the program.

What we need instead are the aggregate times for each function. That is, a time spent in the function itself and in all the functions that it calls. We also need to know which other functions each function calls with their respective times. With this information we can find which operation in our program takes most of the execution time as well as figure out which parts of this operation are responsible for the slowdown.

The best way to get this information from OProfile is to collect the samples with the call graph information (see the --callgraph option), then convert them to the gprof format with the opgprof command, and finally generate the call graph with the gprof -q command. One limitation of the call graph information is that it is very inaccurate if your application is optimized. To get a useful result you will need to recompile your application with optimization turned off.

The call graph output is quite a bit more complex and can be a lot more verbose compared to the flat symbol list presented above. The gprof call graph format is explained in detail in the gprof documentation.

One way to get a summary of the gprof call graph that just tells us which functions have the largest aggregate times is to run the following sed script on the call graph output:

sed -e '/^\[/p;d' callgraph >summary

Here is the top of the summary listing for XSD:

% time self children called   name
 
99.8      0  645554           main
94.0  16021  591541  588759+  Schema::find
                     3889477
82.6   1024  533243  512793+  <cycle 4 as a whole>
                     36104907
82.4      0  532686  512443   Parser::parse
55.5      8  358965  465329   Node<Element>::trampoline
49.5  27896  291869  265608+  <cycle 1 as a whole>
                     36968
47.9      0  309389  299838   Resolver::resolve_element
42.6  18427  256696  215619   operator new
38.6      1  249727  242001   Resolver::traverse

From this listing it is immediately apparent that the problem is most likely in the Schema::find operation. It takes up 94% of the program execution time and is called half a million times from other functions and almost four million times recursively.

The Schema::find() function looks up namespaces with the specified XML namespace name in the schema graph. To understand how it works one needs a basic understanding of how the schema graph is organized. Each schema file is represented by a Schema object which contains a list of references to included and imported schema files (also represented as Schema objects) along with the Namespace object which contains the actual type, element, etc., declarations. Schema::find() returns a list of Namespace object corresponding to a namespace name. To collect this list, it traverses the include/import graph to make sure it visits every Schema object visible, directly or indirectly, from the initial schema.

There could be two general reasons why Schema::find() takes a disproportional amount of time. It can be because it is called too many times by other functions. Or the problem could be in its implementation. Or it can be both. Seeing that Schema::find() is called almost 8 times more often recursively than from outside, the suspicion is that it is the implementation. Here is the fragment of the call graph listing corresponding to this function:

self  children    called       name
 
16021  591541  588759+3889477  Schema::find
 8275  217657  208177/208268    Context::set
 3528  174629  158008/158008    std::list<...>::insert
 3190   45372  65030/65055      Context::remove
  346   47712  43879/43880      std::list<...>::clear
 3500   28218  32082/37821      Scope::find
 3560    9619  20461/20586      Context::count

As you can see the time is dominated by two sets of operations. The first are Context operations which are used to detect recursive import/inclusion (that is, a flag is set in the Schema’s context to indicate that it is being traversed). The second are the list insertion and removal operations. This tells us two things: a lot of Schema objects are being traversed and a lot of Namespace objects are being inserted into the resulting list.

The size of the schema, and in particular the number of files which corresponds to the number of Schema objects, could not by itself explain the long compilation time since there are comparable or even larger schemas that compile much faster. There had to be something else special about this schema. As it turned out, and this is something I just had to figure out on my own since no profiling tool is available for this last step, the schema had a very expansive include/import tree. Most files in this schema included or imported a handful of other files which in turn did the same. The end result was that each schema file was included or imported, directly and indirectly, hundreds of times. That meant that the Schema::find() implementation had to traverse the same files over and over again (thus all the Context operations) and insert the same Namespace objects into the resulting list over and over again (thus all the list operations).

The solution was fairly straightforward. I added a set which keeps track of all the Schema objects that has already been seen in this execution of Schema::find(). The compilation time dropped from 160 seconds to 1.5 seconds which is more than 100 times speedup. And, as it turned out, this was the same problem that affected the larger schema which we could not get.