Archive for the ‘C++’ Category

Efficient argument passing in C++11, Part 1

Tuesday, June 19th, 2012

I don’t think anyone would argue that the most important new core language feature in C++11 is the addition of rvalue references. The sheer number of articles and blog posts that were written to explore various aspects of this new functionality further confirms this assertion. However, the more I read these articles and blogs and think about applying their ideas in real C++11 code, the more I realize that many of the suggested “best practices” are fairly theoretical, not confirmed by real-world usage. In this sense the addition or rvalue references is like the addition of a new dimension; it is hard to imagine the implications until you actually start “living” there. As a result, in many cases, the interactions between rvalue references and other C++ features are not yet very well understood, especially when it comes to how the design of our applications changes with C++11.

After reading that last statement you may be asking, hasn’t STL itself been updated to be rvalue reference-aware and isn’t it the perfect ground for discovering new best practices? While the lessons learned from STL are definitely valuable, one thing to keep in mind is that STL is very “generic”, meaning that it mostly consists of templates. While the techniques used in STL can be used in other templates, they may not be applicable or, more often, practical in normal, non-template code. We will see an example of this later in this post.

One fundamental design decision that we have to re-evaluate when moving from C++98 to C++11 is how we pass arguments to our functions. This is because types in addition to being copyable can now also be movable and that moving is expected to be significantly cheaper than copying. Not surprisingly, there were quite a few articles discussing this topic, especially lately as more and more people started using C++11 and discover holes in what was considered best practices. I am also lucky to be able to use new language features by adding C++11 support to ODB, an ORM for C++. So I started thinking about updating ODB interfaces to take advantage of C++11 if C++11 is available (ODB supports both C++98 and C++11 from the same codebase). However, I pretty quickly realized that the current “state of the art” of argument passing in C++11 still doesn’t feel right. In this series of posts I am going to outline the problem and see if we can come up with a solution.

There are quite a few blog posts and articles that give background on this problem. However, none of them cover all the issues completely. So let’s quickly recap the situation.

Say we are writing a C++98 function that takes a string and does something with it other than modifying it. That is, we have an “in” parameter. Our function could just access the string or it could make a copy. We don’t know. I am sure you’ve seen many functions like this:

 
void f (const std::string&);
 

That’s a pretty idiomatic signature for C++98. Here is how we could use it:

 
f ("foo");
 
std::string s ("bar");
f (s);
 

This function will work just fine in C++11. However, passing its argument as const reference may not be the most efficient way, depending on what this function does with the string. If the function simply accesses the string without making any copies, then using const reference is as efficient as it gets. However, if the function makes a copy of the passed string, then its performance can be improved, potentially significantly. To see why, consider the first call to f() in the above example. Here is what will happen: a temporary (rvalue) string will be initialized with "foo" and then passed to f(). f(), in turn, will create a copy of the passed temporary and, once it returns, the temporary will be destroyed.

Do you see the potential improvement here? If what is passed to f() is a temporary (rvalue), then we could move it instead of copying! If your compiler provides a copy-on-write implementation of std::string, then the improvement might not seem that significant. However, if we replace std::string with, say, std::vector and pass vectors that contain a large number of elements, then the performance improvement could be dramatic.

At first this may also seem as not that big of a deal if all that we are optimizing are cases where we pass temporaries. For example, if we were passing std::vector, then you could reason that nobody will write an initializer list with thousands of elements and therefore we can deem such cases insignificant:

 
void f (const std::vector<std::string>&);
 
f ({"foo", "bar", "baz"});
 

Note, however, that there is a much more important use case for this optimization other than inline initialization. And that is passing an object that was returned by value from another function. Consider this example:

 
void f (const std::vector<std::string>&);
std::vector<std::string> g ();
 
f (g ());
 

As you are probably aware, returning a movable type by value from a function is the most efficient way to return in C++11. As a result, making sure that the receiving function in a call chain won’t make a copy is crucial for the whole to work as efficiently as possible.

Ok, so how do we make sure our function copies its argument if we pass an lvalue and moves it if we pass an rvalue? The most straightforward way is to overload the function with a version that takes the rvalue reference:

 
void f (const std::string& s)
{
  std::string s1 (s); // copy
  ...
}
 
void f (std::string&& s)
{
  std::string s1 (std::move (s)); // move
  ...
}
 

There are a couple of problems with this approach, however. The most obvious one is that we now have two functions instead of one. As a result, we either have to duplicate the logic or factor it out into yet another “common implementation” function. Not very elegant.

But there is a bigger problem. A typical example of a function that makes copies of its arguments is a constructor that initializes a bunch of data members with the passed value. Consider the email class, for instance:

 
struct email
{
  email (const std::string& first,
         const std::string& last,
         const std::string& addr)
    : first_ (first), last_ (last), addr_ (addr) {}
 
  ...
 
private:
  std::string first_;
  std::string last_;
  std::string addr_;
};
 

To make this constructor rvalue-aware, we will have to provide an overload for every rvalue/lvalue combination of its three arguments. That’s 6 overloads in this case (2^N in the general case, where N is the number of by-value arguments). Remember that “new dimension” analogy I used above?

STL uses this approach. For example, push_back() in std::vector is overloaded for the const lvalue reference and rvalue reference. In case of push_back() this approach works well: there is only one argument and std::vector is not implemented very often. Generally, my feeling is that this approach suits generic, reusable code best and quickly becomes impractical for application-level code that needs to be written quickly and re-written often.

Ok, so what can we do to solve this combinatorial explosion? Someone clever (not sure who got this idea first) came up with this trick: instead of passing the argument as a reference (rvalue or lvalue), let’s pass it by value:

 
void f (std::string s)
{
  std::string s1 (std::move (s)); // move
  ...
}
 

The idea here is to let the compiler decide at the function call site whether to move or copy the value. If we pass an lvalue, then it will be copied to the argument value and inside f() we move this argument value to our own instance (total cost: one move and one copy). If we pass an rvalue, then it will be moved to the argument value and inside f() we again move it to our own instance (total cost: two moves). If you count the cost of the first approach, then you will get one copy if we pass an lvalue and one move if we pass an rvalue. So the overhead of pass-by-value is one extra move in each case. Since moving is expected to be quite cheap, this is pretty good when the alternative is to write, say, 16 overloads for a 4-argument constructor.

This is the approach that is considered the “best practice” at the moment. However, as with most clever tricks, cracks become apparent if we look close enough. Let’s first observe that this approach only provides optimal performance if you know for sure that the function will need a copy of the passed argument. If the function doesn’t need a copy, then the overhead imposed by this approach compared to the previous one will be a whole extra copy. Seeing that we are counting extra moves, this is a potentially huge overhead. I’ve seen quite a bit of confusion regarding this where some people suggest that we should always pass things by value in C++11. So it is worth repeating this: pass-by-value only works if you are definitely making a copy of the argument.

In what circumstances do we not know whether a function needs a copy? Well, one is if the function needs to make a copy only in certain cases. The other is when the function can have various implementations that may or may not need a copy. Consider our email class as an example. Whether we need to make copies depends on how we store the email address. In the constructor implementation that we have seen above, the pass-by-value approach works well. But here is an alternative implementation where we have to pass arguments differently (”asymmetrically”) to achieve the optimal performance:

 
struct email
{
  email (std::string first,
         const std::string& last,
         const std::string& addr)
    : email_ (std::move (first))
  {
    email_ += ' ';
    email_ + last
    email_ += " <";
    email_ + addr;
    email_ += '>';
  }
 
  ...
 
private:
  std::string email_;
};
 

Think about it: with the pass-by-value approach we embed assumptions about the function’s implementation into its interface! Not very clean, to say the least.

Even if the interface and implementation are tied closely together, this approach may not quite work. One such example are binary operators. Let’s say we want to implement operator+ for a matrix class. An efficient implementation of this operator would copy one of its arguments and only access the other. Here is a typical implementation in C++98:

 
matrix operator+ (const matrix& x, const matrix& y)
{
  matrix r (x);
  r += y;
  return r;
}
 

The pass-by-value approach prescribes that we must pass the argument that is copied by value so that in case it is an rvalue, the copy can be replaced with the move. So here we go:

 
matrix operator+ (matrix x, const matrix& y)
{
  x += y;
  return x;
}
 

This works well if the temporary is on the left-hand side. But what if it is on the right-hand side? Here is a perfectly plausible example:

 
matrix r = a + 2 * b;
 

Now the first argument is an lvalue and we have to make a copy. But the second argument is an rvalue which we could have used instead to avoid copying! In this example we only need to copy one of the arguments. The problem is that with the pass-by-value approach we have to hardcode which one it is in the function signature.

Note also that the pass-by-value approach will only work if the class is movable (by “work” here I mean “won’t result in disastrous performance”). Particularly, that’s the reason why this approach should not be used in generic code, such as std::vector::push_back(). Again, there seems to be some confusion here with some people suggesting that std::vector could have used the pass-by-value approach but doesn’t for some historic or compatibility reasons.

There are also other, more obscure, problems with pass-by-value stemming from the fact that we factored out the move/copy constructor call from the function body to the function call site. One is the potential code bloat and code locality issues resulting from all these copy constructor calls. The other is the strange resulting exception safety guarantee. Oftentimes, the function itself will be noexcept because it only uses the move constructor which is often noexcept. However, the call to such a function can still throw because of all the copy constructor calls, which can normally throw, at the function call site. For more information on these issues refer to Dave Abrahams’ blog post, including the comments.

Ok, so the first approach is often impractical while the pass-by-value method smells a little. Are there any other alternatives that overcome all of the above issues? I don’t believe there is a solution that is based just on the built-in derived types (i.e., references, etc).

Let me elaborate a little bit on that last statement. It appears what we want is some sort of a derived type (here I use the term derived type to refer to pointers, references, etc., rather than class inheritance) that would (a) bind to both rvalues and lvalues and (b) allow us to distinguish between the two. We can achieve (a) with const lvalue references but not (b).

In fact, if we think about it, C++11 already has a similar mechanism: perfect forwarding. Essentially, with perfect forwarding we have a parameter that can become an rvalue or lvalue reference depending on the argument type. And we can distinguish between the two in the function body. The only catch is that it works at compile time. So what seems to be missing is a similar mechanism that works at runtime.

How does all of the above make you feel? To be honest, I feel disappointed. If I had to summarize it in one word, I would call it inelegant. The fact that C++11 has multiple approaches to something as fundamental as efficiently passing arguments to functions is really unfortunate. Hell, the fact that it takes multiple pages to explain all this is already alarming. I think a lot of people, including myself, hoped that rvalue references will help rid C++ of some of its ugly aspects (e.g., std::auto_ptr). While this certainly materialized for some parts of the language, such as efficient return-by-value, it seems we also managed to further complicate other things. In fact, it feels as if there is an asymmetry between return-by-value and pass-by-value. Return-by-value got a lot of attention and the result feels really elegant (i.e., in the majority of cases we don’t have to explicitly call std::move() to “move” a value out of the function). At the same time, it feels that pass-by-value didn’t get the same amount of attention, even though, as we have seen above, for efficient function chaining, it is just as important.

Ok, that’s pretty depressing. You must be also thinking that surely I didn’t write all this without coming up with at least some sort of a solution. And you would be right. As I mentioned above, I don’t believe there is a built-in mechanism in C++11 to achieve what we want. However, that doesn’t mean there isn’t a more involved approach. We will see what we can do next week, in Part 2 of this post.

Perfect forwarding and overload resolution

Wednesday, May 30th, 2012

One of the things that are made possible in C++11 thanks to rvalue references is perfect forwarding. I am sure you’ve already read about this feature and seen some examples. However, as with many new features in C++11, when we move beyond toy code and start using them in real applications, we often discover interactions with other language features that weren’t apparent initially. Interestingly, I’ve already run into a somewhat surprising side effect of perfect forwarding, which I described in the “Rvalue reference pitfalls” post. Today I would like to explore another such interaction, this time between perfect forwarding and overload resolution.

If we have just have one function (or constructor, etc.) that implements perfect forwarding, then the compiler’s job in resolving a call to such a function is straightforward. However, if a forwarding function is part of an overloaded set, then things can get interesting. Suppose we have the following existing set of overloaded functions:

void f (int);
void f (const std::string&);

And here are some sample calls of these functions:

int i = 1;
std::string s = "aaa";
 
f (i);     // f(int)
f (2);     // f(int)
f (s);     // f(string)
f ("bbb"); // f(string)

Suppose we now want to add a “fallback” function that forwards the call to some other function g(). The idea here is that our fallback function should only be used if none of the existing functions can be called. Sounds easy, right?

template <typename T>
void f (T&& x)
{
  g (x);
}

With this addition, let’s see which functions get called for the above example:

f (i);     // f(int)
f (2);     // f(int)
f (s);     // f(T&&) [T = std::string&]
f ("bbb"); // f(T&&) [T = const char (&)[4]]

Whoa, what just happen? This is definitely not something that we want. We expect the C++ compiler to select the most specialized functions which in this case means the compiler should prefer non-templates over our forwarding function. As it turns out, this is still the rule but things get a bit more complex because of the perfect forwarding. Remember that the whole idea of perfect forwarding is that we get the perfect parameter type for whatever argument we pass. The C++ compiler will still choose the most specialized function (non-template in our case) over the forwarding function if it has the parameter type that matches the argument perfectly. Otherwise the forwarding function is a better match.

With this understanding it is easy to explain the above example. In the first call the argument type is int&. Both f(int) and f(int&) (signature of the forwarding function after reference collapsing) match perfectly and the former is selected as the most specialized. The same logic applies to the second call except that the signature of the forwarding function becomes f(int&&).

The last two calls are more interesting. The argument type of the second last call is std::string&. Possible matches are f(const std::string&) and f(std::string&) (again, signature of the forwarding function after reference collapsing). In this case, the second signature is a better match.

In the last call, the argument type is const char (&)[4] (reference to an array of four constant characters). While the forwarding function is happy to oblige again, f(const std::string&) can only be applied with implicit conversion. So it is not even considered! Note something else interesting here: it is perfectly plausible that the call to g() inside forwarding f() will also require an implicit conversion. But that fact is not taken into account during overload resolution of the call to f(). In other words, perfect forwarding can hide implicit conversions.

Ok, now we know why our code doesn’t work. The question is how do we fix it? From experience (see the link above), it seems that the best way to resolve such issues is to disable the forwarding function for certain argument types using std::enable_if. Ideally, the test that we would like to perform in plain English would sound like this: “can any of the non-forwarding functions be called with this argument type”? If the answer is yes, then we disable the forwarding function. Here is the outline:

template <typename T,
          typename std::enable_if<!test<T>::value, int>::type = 0>
void f (T&& x)
{
  ...
}

Unfortunately, there doesn’t seem to be a way to create such a test (i.e., there is no way to exclude the forwarding function from the overload resolution; though if you know how to achieve something like this, do tell). The next best thing is to test whether the argument type is the same as or can be implicitly converted to the parameter type of a non-forwarding function. Here is how we can fix our code using this method:

template <typename T,
          typename std::enable_if<
            !(std::is_same<T, std::string&>::value ||
              std::is_convertible<T, std::string>::value), int>::type = 0>
void f (T&& x)
{
  ...
}

Besides looking rather hairy, the obvious disadvantage of this approach is that we have to do this test for every non-forwarding function. To make the whole thing tidier we can create a helper that allows us to test multiple types at once:

template <typename T,
          typename disable_forward<T, std::string, std::wstring>::type = 0>
void f (T&& x)
{
  ...
}

Here is the implementation of disable_forward using C++11 variadic templates:

template <typename T, typename T1, typename ... R>
struct same_or_convertible
{
  static const bool value =
    std::is_same<T, T1>::value ||
    std::is_convertible<T, T1>::value ||
    same_or_convertible<T, R...>::value;
};
 
template <typename T, typename T1>
struct same_or_convertible<T, T1>
{
  static const bool value =
    std::is_same<T, T1>::value ||
    std::is_convertible<T, T1>::value;
};
 
template <typename T, typename ... R>
struct disable_forward
{
  static const bool value = same_or_convertible<
    typename std::remove_reference<
      typename std::remove_cv<T>::type>::type,
    R...>::value;
 
  typedef typename std::enable_if<value, int> type;
};

And the overall moral of the story is this: when using perfect forwarding watch out for unexpected interactions with other language features.

C++11 range-based for loop

Wednesday, May 16th, 2012

On the surface, the new range-based for loop may seem like a simple feature, perhaps the simplest of all the core language changes in C++11. However, like with most higher-level abstractions, there are quite a few nuances once we start digging a little bit deeper. So in this post I am going to dig a little bit deeper with the intent to get a better understanding of this feature as well as the contexts in which it can and cannot be used.

The range-based for loop has the following form:

for ( declaration : expression ) statement

According to the standard, this is equivalent to the the following plain for loop:

1   {
2     auto&& __range = expression;
3     for (auto __begin = begin-expression,
4               __end = end-expression;
5          __begin != __end;
6          ++__begin)
7     {
8       declaration = *__begin;
9       statement
10    }
11  }

Note that when the standard says equivalent, it means that the resulting logic is equivalent and not that this is the actual translation; in particular, the variable names (e.g., __range, __begin, etc.) are for exposition only and cannot be referred to by the application.

Ok, the equivalent plain for loop version looks quite a bit more complicated compared to the range-based one. Let’s start our examination with the __range initialization (line 2). We use automatic type deduction to determine the type of the range variable based on the initializing expression. Note also that the resulting variable is made an r-value reference. This is done to allow us to iterate over temporaries without making any copies and without imposing additional const restrictions. To see where the use of the r-value reference becomes important, consider this example:

std::vector<int> f ();
 
for (int& x: f ())
  x = 0;

What can we have for the expression? Well, it can be a standard container, an array, a brace initializer list (in which case __range will be std::initializer_list), or anything that supports the concept of iteration by providing suitable begin() and end() functions. Here are a few examples:

int primes[] = {1, 2, 3, 5, 7, 11};
 
for (int x: primes)
  ...;
 
for (int x: {1, 2, 3, 5, 7, 11})
  ...;
 
template <typename T>
struct istream_range
{
  typedef std::istream_iterator<T> iterator_type;
 
  istream_range (std::istream& is): is_ (is) {}
 
  iterator_type begin () const
  {
    return iterator_type (is_);
  }
 
  iterator_type end () const
  {
    return iterator_type ();
  }
 
private:
  std::istream& is_;
};
 
for (int x: istream_range<int> (cin))
  ...;

The begin-expression and end-expression (lines 3 and 4) are determined as follows:

  • If expression is an array, then begin-expression and end-expression are __range and __range + __bound, respectively, where __bound is the array bound.
  • If expression is of a class type that declares begin() and end() member functions, then begin-expression and end-expression are __range.begin() and __range.end(), respectively.
  • Otherwise, begin-expression and end-expression are begin(__range) and end(__range), respectively, where the begin() and end() functions are looked up using the argument-dependent lookup (ADL) which also includes the std namespace.

With arrays taken care of by the first rule, the second rule makes sure that all the standard containers as well as all the user-defined ones that follow the standard sequence interface will work with range-based for out of the box. For example, in ODB (an ORM for C++), we have the container-like result class template which allows iteration over the query result. Because it has the standard sequence interface with a forward iterator, we didn’t have to do anything extra to make it work with range-based for.

The last rule (the fallback to the free-standing begin()and end() functions) allows us to non-invasively adapt an existing container to the range-based for loop interface.

You may be wondering why did the standard explicitly add the std namespace to ADL in the last rule? That’s a good question since the implementations provided in std simply call the corresponding member functions (which, if existed, would have satisfied the second rule). My guess is that it allows for a single place where a custom container can be adapted to the standard interface by specializing std::begin() and std::end().

The last interesting bit is the declaration (line 8). If we specified the type explicitly, then things are pretty straightforward. However, we can also let the compiler deduce the type for us, for example:

std::vector<int> v = {1, 2, 3, 5, 7, 11};
for (auto x: v)
  ...;

When automatic type deduction is used, generally, the resulting type will be the type of the *(__range.begin()) or *(begin(__range)) expression. When standard containers are used, however, the type will be const element type when __range is const and we are forming a reference and just the element type otherwise. For example:

std::vector<int> v = {1, 2, 3, 5, 7, 11};
const std::vector<int> cv = {1, 2, 3, 5, 7, 11};
 
for (auto x: v) // x is int
  ...;
 
for (auto x: cv) // x is int
  ...;
 
for (auto& x: v) // x is int&
  ...;
 
for (auto& x: cv) // x is const int&
  ...;

Another thing to note is the caching of the end iterator which makes the range-based for as efficient as what we could have written ourselves. There is, however, no provision for handling cases where the container is modified during iteration, unless iterator stability is guaranteed.

While the range-based for loop only supports straight iteration, it is easy to add support for reverse iteration with a simple adapter. In fact, it is strange that something like this is not part of the standard library:

template <typename T>
struct reverse_range
{
private:
  T& x_;
 
public:
  reverse_range (T& x): x_ (x) {}
 
  auto begin () const -> decltype (this->x_.rbegin ())
  {
    return x_.rbegin ();
  }
 
  auto end () const -> decltype (this->x_.rend ())
  {
    return x_.rend ();
  }
};
 
template <typename T>
reverse_range<T> reverse_iterate (T& x)
{
  return reverse_range<T> (x);
}
 
std::vector<int> v = {1, 2, 3, 5, 7, 11};
 
for (auto x: reverse_iterate (v))
  ...;