Rvalue-references: how they affect your code

December 1st, 2008

In the previous post I covered the motivation and basics of rvalue-references, one of the language additions in the next C++ standard. Today I will examine the changes that rvalue-references will bring to the way we write C++ code.

We will start with the following example:

typedef string Word;
typedef vector<Word> Words;
typedef map<Word, Words> SynonymMap;
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  Words& syn = map[*i];
  // Load synonyms into syn.
}

Seeing that the code that loads all the synonyms for a particular word is quite long, we would like to move it into a separate function. The straightforward change would look like this:

Words load (const Word& w)
{
  Words r;
  // Load synonyms into r.
  return r;
}
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  map[*i] = load (*i);
}

With the current state of C++ this code has a potential efficiency problem. As was discussed in the previous post, depending on the complexity of the load() function and C++ compiler used, the addition of the function call can result in the synonym list being copied twice. The first time in the load() function from r into a temporary and the second time from the temporary into the object stored in the map.

There are two common ways in which this inefficiency is addressed right now. The first involves passing a reference to the result object instead of returning it by value:

void load (const Word& w, Words& r)
{
  // Load synonyms into r.
}
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  load (*i, map[*i]);
}

This approach has a number of drawbacks. For one, the code is unnatural. We wouldn’t have written it like this if we weren’t optimizing. More importantly, this approach is limited to cases where the type has the default constructor or we have enough information to create an empty instance before passing it to the load() function. Consider, for example, the following simple change to the above code:

typedef string Word;
typedef vector<Word> Words;
 
class Synonyms: Words
{
public:
  Synonyms (const string& source);
  ...
};
 
typedef map<Word, Synonyms> SynonymMap;

Now only the load() function knows how to construct the Synonyms object.

Both the return by value and pass by reference approaches have another drawback that may not be apparent at first. As we insert new entries, the synonym map may need to grow the underlying storage from time to time. When this happens, all the existing Words object are copied from the old storage to the new one.

The second way to deal with the inefficiency of returning the Words object by value is to allocate it on the heap and return a pointer instead:

typedef string Word;
typedef vector<Word> Words;
typedef map<Word, shared_ptr<Words> > SynonymMap;
 
shared_ptr<Words> load (const Word& w)
{
  shared_ptr<Words> r = new Words;
  // Load synonyms into r.
  return r;
}
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  map[*i] = load (*i);
}

This approach forced us to change the way we store the synonym list in the map. It also requires an extra heap allocation for each Words object.

As we can see, with the current state of C++ there is no perfect solution to the problem of returning complex objects. If we return the object by value, we pay with two copies per call plus copying during the map growth. The pass by reference approach cannot always be used and also suffers from the map growth overhead. Finally, the return by pointer method gets rid of the function return and map growth overheads but has the penalty of allocating each Words object on the heap. Normally, the return by pointer approach is chosen as the least inefficient.

With the introduction of rvalue-references this situation changes dramatically. Provided the types we use are rvalue-optimized by including a constructor and an assignment operator for rvalue-references (std::vector and std::map are), the return by value approach becomes the most efficient. The deep copies during the function return are eliminated and when the map grows, the rvalue constructor is used to “move” the existing objects to the new storage.

Another advantage of the return by value approach is that it allows the user to choose where to place the object, on the stack or on the heap. The stack rvalue can be cheaply moved to the heap using the rvalue constructor:

Words load (const Word& w)
{
  Words r;
  // Load synonyms into r.
  return r;
}
 
shared_ptr<Words> syn (load ("value"));

One, however, needs to understand how rvalue-references work to be able to write efficient code using return by value and rvalue-optimized types. Otherwise it is fairly easy to shoot oneself in the foot, as shown in the following example:

Words load (const Word& w)
{
  Words r;
  // Load synonyms into r.
  return r;
}
 
Words words = ...
SynonymMap map;
 
for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  Words syn (load (*i));
  map[*i] = syn;
}

Here we introduced a variable that temporarily holds the result of the load() function. With it we also introduced an unnecessary copy since now the non-rvalue assignment operator has to be used. It does not mean, however, that we cannot do something with the list of synonyms before we insert it into the map. We just need to use std::move() to “move” the object instead of copying it.

for (Words::const_iterator i = words.begin ();
     i != words.end ();
     ++i)
{
  Words syn (load (*i));
 
  if (syn.size () > 0)
    map[*i] = std::move (syn);
}

I suspect that as with many other parts of C++, people who make the effort to understand rvalue-references will write even more elegant and efficient code while those who don’t will continue complaining that C++ is too difficult.

While it may seem that the only two places where rvalue-references will be used are constructors and assignment operators, on a more conceptual level, rvalue-references provide a mechanism to distinguish between objects whose state needs to be copied and objects whose state can be reused (with rvalue objects automatically treated as the latter). As such, rvalue-references can be used in other places where such a distinction can be helpful. For example, std::vector defines an overloaded version of the push_back() function that takes an rvalue-reference:

template <typename T, ...>
class vector
{
public:
  void push_back (const T&);  // copy
  void push_back (T&&);       // reuse
  ...
};

To summarize, the introduction of rvalue-references in C++ solves the problem of efficiently returning complex objects by value. They will allow writing simpler, faster, and more natural code. I also believe that, at least in well-designed programs, the use of heap-allocated objects will decrease which should make these programs even more efficient.

Rvalue-references: the basics

November 24th, 2008

One of the language changes in the next C++ standard is the addition of rvalue-references. While some people believe this mechanism will be used by a small number of library writers, I think that it will have a wider impact on the way we write software in C++. In either case, however, it will be beneficial to understand the nuances of rvalue-references even if you don’t use them in your code directly. Today I will cover the motivation and basics of rvalue-references, including some counter-intuitive aspects towards the end. The next installment will deal with the changes that rvalue-references will bring to C++ programs.

While the next C++ standard still has some way to go before being published, the GCC team has implemented support for some parts of the standard in g++ 4.3 and 4.4 (see C++0x Language Support in GCC for details). All the sample code presented below can be tried with g++ 4.3.2 in the C++0x mode (the -std=c++0x option).

Every expression in C++ evaluates to either lvalue or rvalue. The standard has a page-long list of rules that define these two terms. It is more intuitive (but less precise) to think of an lvalue as referring to an object that has a name somewhere in the program. This name can be a variable name or a pointer. An rvalue refers to an object that does not have a name. Here are some examples:

struct S {};
S s;
 
s    // lvalue
S () // rvalue
 
S  f ();
S& g ();
S* h ();
 
f ()  // rvalue
g ()  // lvalue
*h () // lvalue

The last two expressions are lvalues because one needs a named object to bind a reference to it or take its address.

Because an rvalue object is always allocated on the stack and is unnamed, its life is short (unless bound to a const reference variable). As a result, modifying an rvalue does not make much sense and usually means a programming error. To prevent such errors C++ does not allow binding of non-const references to rvalues:

void f (S&);
void g (const S&);
 
f (S ()); // error
g (S ()); // ok

An issue with the above rules arises when we want to make a copy of an rvalue and this operation is expensive. Because the rvalue object will be gone immediately after we make the copy, “transplanting” the internals of an rvalue to our copy seems like a natural optimization. Consider the following example:

struct IntVector
{
  IntVector (const IntVector& x)
  {
    v_ = new int[x.size_];
    size_ = x.size_;
    memcpy (v_, x.v_, size_ * sizeof (int));
  }
  ...
private:
  int* v_;
  size_t size_;
};
 
IntVector load ()
{
  IntVector r;
  // Load the data into r.
  return r;
}
 
IntVector v (load ());

While some compilers may optimize the call to the copy constructor away (RVO/NRVO optimization), this depends on the implementation of load() and is not guaranteed.

What we would like here is to transfer the contents of the returned rvalue into v, something along these lines:

IntVector (IntVector& x)
{
  v_ = x.v_;
  size_ = x.size_;
  x.v_ = 0;
  x.size_ = 0;
}

This approach does not work for two reasons: First, because we modify the source object, we have to pass it as a non-const reference. But an rvalue can only be bound to a const reference. Second, our new version of the copy constructor will just as happily transplant the internals of lvalues:

IntVector v1;
IntVector v2 (v1); // v1 is no longer usable

What we need is a special kind of reference that will bind to rvalues and allow their modification. And that’s exactly what the next version of the C++ standard adds: rvalue-references. Unfortunately, the interaction of rvalue-references with lvalue-references (that’s what the plain old references are now called) and the rest of the language is quite complex. It is also fairly easy to shoot yourself in the foot, even as just the user of rvalue-optimized types. What follows is the description of some of the rules governing rvalue-references.

An rvalue-reference of type T is declared as T&&. The overload resolution between rvalue- and lvalue-references is similar to the way it works for const and non-const lvalue-references. For example:

void f (int&);
void f (int&&);
void g (int&&);
 
int h ();
int i;
 
f (i);     // f (int&) is called
f (h ());  // f (int&&) is called
g (i);     // ok

That is, when passing an lvalue and the lvalue overload is available, it is preferred. Otherwise, the rvalue version is called since an lvalue can be used anywhere an rvalue can. This is a fairly important point which means that in the majority of cases when you provide a constructor or a function taking an rvalue-reference, you will also want to provide the overloaded version taking const an lvalue-reference. Otherwise, lvalues will be treated as rvalues (std::move(), which is discussed shortly, is one example where this behavior is actually desired). Here is how we could overload our IntVector constructors:

IntVector (const IntVector& x)
{
  v_ = new int[x.size_];
  size_ = x.size_;
  memcpy (v_, x.v_, size_ * sizeof (int));
}
 
IntVector (IntVector&& x)
{
  v_ = x.v_;
  size_ = x.size_;
  x.v_ = 0;
  x.size_ = 0;
}

Let’s consider the call to load() once again:

IntVector v (load ());

If the compiler didn’t optimize the copying of the return value, the process works as follows. The r object inside the function is first copied into a temporary and then destroyed. After load() returns, the v object is copy-constructed from the temporary and the temporary is destroyed.

The new constructor now takes care of efficiently copying the data from the temporary returned by load() to the v object. But what about the other half of the process. Namely copying of the r object inside the load() function to the temporary? The r object is an lvalue and, as a result, the expensive copy constructor will be used. One way to solve this would be to explicitly convert r to an rvalue-reference with the help of std::move():

IntVector load ()
{
  IntVector r;
  // Load the data into r.
  return std::move (r);
}

std::move() simply converts any rvalue or lvalue to an rvalue reference:

template <typename T>
typename remove_reference<T>::type&&
move (T&& x)
{
  return x;
}

While this works, it sure is verbose and easy to forget. Plus, in template code we don’t know whether the argument type has a constructor capable of taking an rvalue-reference. The C++ standard fixes this in quite an interesting way. The standard provides criteria when a C++ compiler is allowed to optimize away temporaries and copying that is used to return a value from a function (called elision of copy operations). While compilers can perform this optimization, they are not required to do so. But then the standard adds that when the criteria for elision of copy operations are met but the compiler is not performing such elision and the object to copy is an lvalue, then the overload resolution to select the copy constructor is first performed as if the object were an rvalue. One of the criteria for elision of copy operations is returning an automatic (stack-allocated) object. If you think about it, it makes perfect sense: the object being copied is essentially an rvalue since it will be destroyed once the copying is complete and nobody else can access it between now and then.

All this means that in our implementation of load() when we return the result, the r object is first considered as if it were an rvalue and the fast constructor will be called automatically without us explicitly calling std::move().

Now to the less intuitive side of rvalue-references. Named rvalue-references are treated as lvalues while unnamed rvalue-references are treated as rvalues. In other words, once an rvalue-reference is given a name, it starts acting as an lvalue. The following code snippet helps illustrate this point:

void f (int&);
int&& g ();
 
int&& i = 2;
f (i);       // ok
f (g ());    // error

Without keeping this in mind, it is natural to expect one thing but get something completely different:

void f (const int&);
void f (int&&);
 
void g (int&& x)
{
  f (x); // calls f (const int&)
}

If we wanted to call f(int&&), we would need to first convert x to an unnamed rvalue-reference, for example, with the help of std::move():

void g (int&& x)
{
  f (std::move (x)); // calls f (int&&)
}

Another interesting aspect of rvalue-references is what happens when we try to create an rvalue-reference to lvalue-reference or vice versa. While this cannot be done directly (T&&& is illegal), it is easy to achieve, most likely unintentionally, with typedef or template type arguments. In this case, the resulting type will always be an lvalue-reference:

typedef int&  ilr;
typedef int&& irr;
 
void f (int&)  {}
void f (ilr&&) {} // error: same as void f (int&)
void f (irr&)  {} // error: same as void f (int&)
 
template <typename T>
void g (T&);
 
template <typename T>
void h (T&&);
 
g<int&&> (..); // void g (int&)
h<int&> (..);  // void h (int&)

Next time: how rvalue-references will affect the way we write software in C++.

delete, operator delete() and NULL pointers

November 17th, 2008

In C++ both delete or, more precisely, the delete expression and operator delete() can be called with a NULL pointer. The standard requires that such a call shall have no effect. There is, however, a subtle difference in how these two cases are handled under the hood which can be useful to know, especially when writing performance-critical applications.

When we deallocate a memory block using the delete expression, the compiler inserts a check for NULL before calling the destructor and operator delete(). Consider the following code fragment as an example:

buffer::~buffer ()
{
  delete[] buf_;
}

During compilation this code snippet is transformed to something along these lines (there is no destructor call because, presumably, buf_ is declared as char*):

buffer::~buffer ()
{
  if (buf_)
    operator delete[] (buf_);
}

Consider now an alternative implementation of the buffer destructor (here we assume that the constructor was also changed to use operator new(n) instead of new char[n]):

buffer::~buffer ()
{
  operator delete (buf_);
}

In this case nothing extra is inserted by the compiler and the check for NULL is performed by the implementation of operator delete(). Here is the default implementation of this operator provided by g++:

void
operator delete (void* ptr) throw ()
{
  if (ptr)
    std::free (ptr);
}

As you can see, both alternatives ignore NULL pointers. However, the delete expression checks the pointer before calling the deallocation operator while with operator delete(), the function call is made unconditionally and the pointer is checked inside the implementation. If such a call is on a critical path and the pointer is frequently NULL (e.g., a buffer is left empty), then checking for NULL before calling operator delete() can make a difference.