Fun and dull of making development tools

January 4th, 2009

Writing software development tools, such as compilers, IDE’s, etc., is often considered the top of the software development food chain. You get to create products for like-minded people who are able to appreciate elegant design and quality implementation. As a result, the development tools business is highly competitive and attracts the best minds in the field.

From the outside it may seem that creating this type of software is pure fun. On the inside, however, there are a lot of dull moments that lead to a quality release. Without things like testing, examples, and documentation, the product will be buggy and nobody except the authors will know how to use it. A couple of weeks ago I was working on a feature of medium complexity for CodeSynthesis XSD and decided to capture a high-level list of all the things I had to take care of. This should give anyone a practical idea of the amount of detail involved in creating software development tools.

  1. First I gathered the use-cases and requirements from each interested customer. In this case I had two.
  2. Then I generalized the requirements and came up with an overall design. Most of the time the solutions proposed by customers only cover their particular use-case. I had to think of other similar things that people may want to do and come up with a general enough design to cover all reasonable scenarios. This is also why I prefer to wait until at least two people ask for any non-trivial feature. This way you get two “projections” which makes it easier to “see” the general feature behind them.
  3. After that I ran the overall design by each interested customer to make sure they are satisfied with the solution. This took the form of sample code fragments and explanations that showed how the proposed feature cover their use-cases.
  4. Once all the parties were satisfied with the overall design I had to think it through in more detail. In particular, I had to consider how this feature will interact with other features already in the product. In this case I needed to consider support for polymorphism and customizable naming conventions. While it turned out support for polymorphism didn’t require anything extra, I needed to make changes to the naming convention code.
  5. Once the design had been thought through in detail, it was time to write the code. This involved adding new command line options, updating the name processor, and adding support for the new feature in the code generators.
  6. Once the feature was implemented, I had to add a test to the test suite to make sure everything worked as expected.
  7. Next I added an example which shows how to use the most important aspects of the new feature. Besides implementing the sample code itself, creating an example involves the following main tasks (we have a “new example” checklist):
    1. Write a README file describing the example.
    2. Write a Makefile for the UNIX build system.
    3. Create VC++ 7.1, 8.0, and 9.0 project/solution files.
  8. Since new command line options were added, I needed to update the usage information and man pages.
  9. I then added a section to the User Manual documenting the new feature as well as a note to the Getting Started Guide with a reference to the new section in the Manual.
  10. I then added the information about the new feature to the NEWS file with references to the man pages, the new section in the Manual, and the new example.
  11. Finally, I built pre-release binaries of the product for the two customers so that they could test and/or start using the new feature.

Rvalue-references: how they affect your code

December 1st, 2008

In the previous post I covered the motivation and basics of rvalue-references, one of the language additions in the next C++ standard. Today I will examine the changes that rvalue-references will bring to the way we write C++ code.

We will start with the following example:

typedef string Word;
typedef vector<Word> Words;
typedef map<Word, Words> SynonymMap;
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  Words& syn = map[*i];
  // Load synonyms into syn.
}

Seeing that the code that loads all the synonyms for a particular word is quite long, we would like to move it into a separate function. The straightforward change would look like this:

Words load (const Word& w)
{
  Words r;
  // Load synonyms into r.
  return r;
}
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  map[*i] = load (*i);
}

With the current state of C++ this code has a potential efficiency problem. As was discussed in the previous post, depending on the complexity of the load() function and C++ compiler used, the addition of the function call can result in the synonym list being copied twice. The first time in the load() function from r into a temporary and the second time from the temporary into the object stored in the map.

There are two common ways in which this inefficiency is addressed right now. The first involves passing a reference to the result object instead of returning it by value:

void load (const Word& w, Words& r)
{
  // Load synonyms into r.
}
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  load (*i, map[*i]);
}

This approach has a number of drawbacks. For one, the code is unnatural. We wouldn’t have written it like this if we weren’t optimizing. More importantly, this approach is limited to cases where the type has the default constructor or we have enough information to create an empty instance before passing it to the load() function. Consider, for example, the following simple change to the above code:

typedef string Word;
typedef vector<Word> Words;
 
class Synonyms: Words
{
public:
  Synonyms (const string& source);
  ...
};
 
typedef map<Word, Synonyms> SynonymMap;

Now only the load() function knows how to construct the Synonyms object.

Both the return by value and pass by reference approaches have another drawback that may not be apparent at first. As we insert new entries, the synonym map may need to grow the underlying storage from time to time. When this happens, all the existing Words object are copied from the old storage to the new one.

The second way to deal with the inefficiency of returning the Words object by value is to allocate it on the heap and return a pointer instead:

typedef string Word;
typedef vector<Word> Words;
typedef map<Word, shared_ptr<Words> > SynonymMap;
 
shared_ptr<Words> load (const Word& w)
{
  shared_ptr<Words> r = new Words;
  // Load synonyms into r.
  return r;
}
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  map[*i] = load (*i);
}

This approach forced us to change the way we store the synonym list in the map. It also requires an extra heap allocation for each Words object.

As we can see, with the current state of C++ there is no perfect solution to the problem of returning complex objects. If we return the object by value, we pay with two copies per call plus copying during the map growth. The pass by reference approach cannot always be used and also suffers from the map growth overhead. Finally, the return by pointer method gets rid of the function return and map growth overheads but has the penalty of allocating each Words object on the heap. Normally, the return by pointer approach is chosen as the least inefficient.

With the introduction of rvalue-references this situation changes dramatically. Provided the types we use are rvalue-optimized by including a constructor and an assignment operator for rvalue-references (std::vector and std::map are), the return by value approach becomes the most efficient. The deep copies during the function return are eliminated and when the map grows, the rvalue constructor is used to “move” the existing objects to the new storage.

Another advantage of the return by value approach is that it allows the user to choose where to place the object, on the stack or on the heap. The stack rvalue can be cheaply moved to the heap using the rvalue constructor:

Words load (const Word& w)
{
  Words r;
  // Load synonyms into r.
  return r;
}
 
shared_ptr<Words> syn (load ("value"));

One, however, needs to understand how rvalue-references work to be able to write efficient code using return by value and rvalue-optimized types. Otherwise it is fairly easy to shoot oneself in the foot, as shown in the following example:

Words load (const Word& w)
{
  Words r;
  // Load synonyms into r.
  return r;
}
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  Words syn (load (*i));
  map[*i] = syn;
}

Here we introduced a variable that temporarily holds the result of the load() function. With it we also introduced an unnecessary copy since now the non-rvalue assignment operator has to be used. It does not mean, however, that we cannot do something with the list of synonyms before we insert it into the map. We just need to use std::move() to “move” the object instead of copying it.

for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  Words syn (load (*i));
 
  if (syn.size () > 0)
    map[*i] = std::move (syn);
}

I suspect that as with many other parts of C++, people who make the effort to understand rvalue-references will write even more elegant and efficient code while those who don’t will continue complaining that C++ is too difficult.

While it may seem that the only two places where rvalue-references will be used are constructors and assignment operators, on a more conceptual level, rvalue-references provide a mechanism to distinguish between objects whose state needs to be copied and objects whose state can be reused (with rvalue objects automatically treated as the latter). As such, rvalue-references can be used in other places where such a distinction can be helpful. For example, std::vector defines an overloaded version of the push_back() function that takes an rvalue-reference:

template <typename T, ...>
class vector
{
public:
  void push_back (const T&);  // copy
  void push_back (T&&);       // reuse
  ...
};

To summarize, the introduction of rvalue-references in C++ solves the problem of efficiently returning complex objects by value. They will allow writing simpler, faster, and more natural code. I also believe that, at least in well-designed programs, the use of heap-allocated objects will decrease which should make these programs even more efficient.

Rvalue-references: the basics

November 24th, 2008

One of the language changes in the next C++ standard is the addition of rvalue-references. While some people believe this mechanism will be used by a small number of library writers, I think that it will have a wider impact on the way we write software in C++. In either case, however, it will be beneficial to understand the nuances of rvalue-references even if you don’t use them in your code directly. Today I will cover the motivation and basics of rvalue-references, including some counter-intuitive aspects towards the end. The next installment will deal with the changes that rvalue-references will bring to C++ programs.

While the next C++ standard still has some way to go before being published, the GCC team has implemented support for some parts of the standard in g++ 4.3 and 4.4 (see C++0x Language Support in GCC for details). All the sample code presented below can be tried with g++ 4.3.2 in the C++0x mode (the -std=c++0x option).

Every expression in C++ evaluates to either lvalue or rvalue. The standard has a page-long list of rules that define these two terms. It is more intuitive (but less precise) to think of an lvalue as referring to an object that has a name somewhere in the program. This name can be a variable name or a pointer. An rvalue refers to an object that does not have a name. Here are some examples:

struct S {};
S s;
 
s    // lvalue
S () // rvalue
 
S  f ();
S& g ();
S* h ();
 
f ()  // rvalue
g ()  // lvalue
*h () // lvalue

The last two expressions are lvalues because one needs a named object to bind a reference to it or take its address.

Because an rvalue object is always allocated on the stack and is unnamed, its life is short (unless bound to a const reference variable). As a result, modifying an rvalue does not make much sense and usually means a programming error. To prevent such errors C++ does not allow binding of non-const references to rvalues:

void f (S&);
void g (const S&);
 
f (S ()); // error
g (S ()); // ok

An issue with the above rules arises when we want to make a copy of an rvalue and this operation is expensive. Because the rvalue object will be gone immediately after we make the copy, “transplanting” the internals of an rvalue to our copy seems like a natural optimization. Consider the following example:

struct IntVector
{
  IntVector (const IntVector& x)
  {
    v_ = new int[x.size_];
    size_ = x.size_;
    memcpy (v_, x.v_, size_ * sizeof (int));
  }
  ...
private:
  int* v_;
  size_t size_;
};
 
IntVector load ()
{
  IntVector r;
  // Load the data into r.
  return r;
}
 
IntVector v (load ());

While some compilers may optimize the call to the copy constructor away (RVO/NRVO optimization), this depends on the implementation of load() and is not guaranteed.

What we would like here is to transfer the contents of the returned rvalue into v, something along these lines:

IntVector (IntVector& x)
{
  v_ = x.v_;
  size_ = x.size_;
  x.v_ = 0;
  x.size_ = 0;
}

This approach does not work for two reasons: First, because we modify the source object, we have to pass it as a non-const reference. But an rvalue can only be bound to a const reference. Second, our new version of the copy constructor will just as happily transplant the internals of lvalues:

IntVector v1;
IntVector v2 (v1); // v1 is no longer usable

What we need is a special kind of reference that will bind to rvalues and allow their modification. And that’s exactly what the next version of the C++ standard adds: rvalue-references. Unfortunately, the interaction of rvalue-references with lvalue-references (that’s what the plain old references are now called) and the rest of the language is quite complex. It is also fairly easy to shoot yourself in the foot, even as just the user of rvalue-optimized types. What follows is the description of some of the rules governing rvalue-references.

An rvalue-reference of type T is declared as T&&. The overload resolution between rvalue- and lvalue-references is similar to the way it works for const and non-const lvalue-references. For example:

void f (int&);
void f (int&&);
void g (int&&);
 
int h ();
int i;
 
f (i);     // f (int&) is called
f (h ());  // f (int&&) is called
g (i);     // ok

That is, when passing an lvalue and the lvalue overload is available, it is preferred. Otherwise, the rvalue version is called since an lvalue can be used anywhere an rvalue can. This is a fairly important point which means that in the majority of cases when you provide a constructor or a function taking an rvalue-reference, you will also want to provide the overloaded version taking const an lvalue-reference. Otherwise, lvalues will be treated as rvalues (std::move(), which is discussed shortly, is one example where this behavior is actually desired). Here is how we could overload our IntVector constructors:

IntVector (const IntVector& x)
{
  v_ = new int[x.size_];
  size_ = x.size_;
  memcpy (v_, x.v_, size_ * sizeof (int));
}
 
IntVector (IntVector&& x)
{
  v_ = x.v_;
  size_ = x.size_;
  x.v_ = 0;
  x.size_ = 0;
}

Let’s consider the call to load() once again:

IntVector v (load ());

If the compiler didn’t optimize the copying of the return value, the process works as follows. The r object inside the function is first copied into a temporary and then destroyed. After load() returns, the v object is copy-constructed from the temporary and the temporary is destroyed.

The new constructor now takes care of efficiently copying the data from the temporary returned by load() to the v object. But what about the other half of the process. Namely copying of the r object inside the load() function to the temporary? The r object is an lvalue and, as a result, the expensive copy constructor will be used. One way to solve this would be to explicitly convert r to an rvalue-reference with the help of std::move():

IntVector load ()
{
  IntVector r;
  // Load the data into r.
  return std::move (r);
}

std::move() simply converts any rvalue or lvalue to an rvalue reference:

template <typename T>
typename remove_reference<T>::type&&
move (T&& x)
{
  return x;
}

While this works, it sure is verbose and easy to forget. Plus, in template code we don’t know whether the argument type has a constructor capable of taking an rvalue-reference. The C++ standard fixes this in quite an interesting way. The standard provides criteria when a C++ compiler is allowed to optimize away temporaries and copying that is used to return a value from a function (called elision of copy operations). While compilers can perform this optimization, they are not required to do so. But then the standard adds that when the criteria for elision of copy operations are met but the compiler is not performing such elision and the object to copy is an lvalue, then the overload resolution to select the copy constructor is first performed as if the object were an rvalue. One of the criteria for elision of copy operations is returning an automatic (stack-allocated) object. If you think about it, it makes perfect sense: the object being copied is essentially an rvalue since it will be destroyed once the copying is complete and nobody else can access it between now and then.

All this means that in our implementation of load() when we return the result, the r object is first considered as if it were an rvalue and the fast constructor will be called automatically without us explicitly calling std::move().

Now to the less intuitive side of rvalue-references. Named rvalue-references are treated as lvalues while unnamed rvalue-references are treated as rvalues. In other words, once an rvalue-reference is given a name, it starts acting as an lvalue. The following code snippet helps illustrate this point:

void f (int&);
int&& g ();
 
int&& i = 2;
f (i);       // ok
f (g ());    // error

Without keeping this in mind, it is natural to expect one thing but get something completely different:

void f (const int&);
void f (int&&);
 
void g (int&& x)
{
  f (x); // calls f (const int&)
}

If we wanted to call f(int&&), we would need to first convert x to an unnamed rvalue-reference, for example, with the help of std::move():

void g (int&& x)
{
  f (std::move (x)); // calls f (int&&)
}

Another interesting aspect of rvalue-references is what happens when we try to create an rvalue-reference to lvalue-reference or vice versa. While this cannot be done directly (T&&& is illegal), it is easy to achieve, most likely unintentionally, with typedef or template type arguments. In this case, the resulting type will always be an lvalue-reference:

typedef int&  ilr;
typedef int&& irr;
 
void f (int&)  {}
void f (ilr&&) {} // error: same as void f (int&)
void f (irr&)  {} // error: same as void f (int&)
 
template <typename T>
void g (T&);
 
template <typename T>
void h (T&&);
 
g<int&&> (..); // void g (int&)
h<int&> (..);  // void h (int&)

Next time: how rvalue-references will affect the way we write software in C++.