Archive for the ‘Design’ Category

Efficient argument passing in C++11, Part 3

Tuesday, July 3rd, 2012

Last week, in Part 2 of this post, we saw yet another method of efficient argument passing in C++11, this time using a custom wrapper type. Some people called it a smart pointer, though it looks more like a smart reference with smartness coming from its ability to distinguish between lvalues and rvalues. You can download an improved (thanks to your feedback) version of the in class template along with a test: in.tar.gz.

So, now we have a total of four alternatives: pass by const reference, pass by value, overload on lvalue/rvalue references, and, finally, the smart reference approach. I would have liked to tell you that there is a single method that works best in all cases. Unfortunately, it is not the case, at least not in C++11. Every one of these methods works best in some situations and has serious drawbacks when applied in others.

In fact, one can argue that C++11 actually complicated things compared to C++98. While you may not be able to achieve the same efficiency in C++98 when it comes to argument passing, at least the choice was simple: pass by const reference and move on to more important things. In this area C++11 became even more of a craftsman’s language where every case needs to be carefully analyzed and an intricate mechanism used to achieve the best result.

If we can’t have a single, fit-all solution, let’s at least try to come up with a set of guidelines that would allow us to select an appropriate method without spending too much time thinking about it.

Let’s start with the smart reference approach since it comes closest to the fit-all solution. As you may remember from last week’s post, its main issue is the need for a custom wrapper type and the resulting non-idiomatic interface. This is a problem both at the interface level (people looking at the function signature that uses the in class template may not know what’s going on) as well as at the implementation level (we have to “unwrap” the argument to access its member functions). As a result, I wouldn’t recommend using this approach in code that is meant to be used by a wider audience (e.g., libraries, frameworks, etc). However, for application code that is only meant to be seen and understood by the people developing it, smart references can free your team from agonizing about which method to use in each specific case in order to achieve the best performance.

If we decide not to use the smart reference approach, then we have the other three alternatives to choose from. Let’s first say that we want to select only one method and always use that. This may not be a bad idea since what you get in return is the freedom not to think about this stuff anymore. You simply apply the rule and concentrate on more important things. One can also argue that all this discussion is one misguided exercise in premature optimization because in the majority of cases and in the grand scheme of things, it won’t matter which approach we use. And the few cases that do matter which, as experience tells us, we can only recognize with the help of a profiler, we can always change to use a more optimal method.

Ok, so if we had to choose just one method, which one would it be? The overload on lvalue/rvalue references is out since it epitomizes premature optimization that we pay for with complexity and code bloat. So that leaves us with pass by const reference and pass by value. If we use pass by reference and our function makes a copy of the argument, we will miss out on the move optimization in case the argument is an rvalue. If we use pass by value and our function doesn’t make a copy of the argument, we will incur a copy overhead in case the argument is an lvalue. Predictably, the loss aversion principle kicks in (people’s tendency to strongly prefer avoiding losses to acquiring gains) and I personally prefer to miss out on the optimization than to incur the overhead. More rationally, though, I tend to think that in the general case more functions will simply use the argument rather than making a copy.

So it is the pass by const reference method if we had to choose only one. It has a couple of other nice properties. First of all, it is the same as what we would use in C++98. So if our code has to compile in both C++98 and C++11 modes or if we are migrating from C++98 to C++11, then it makes our life a little bit easier. The other nice property of this approach is that we can convert it to the overload on lvalue/rvalue method by simply adding another function.

What if we relax our requirements a little and allow ourselves to select between two methods? Can we come up with a set of simple rules that would allow us to make a correct choice in most cases and without spending too much time thinking about it? The choice here is between pass by reference and pass by value, with overload on lvalue/rvalue references reserved for fine-tuning a select few cases. As we know, whether the first two methods will result in optimal performance depends solely on whether the function makes a copy of its argument. And, as we have discussed in Part 1 of this post, in quite a few real-world situations this can be really hard and often impossible to determine. It also makes the signature of the function (i.e., the interface) depend on its implementation, which can have all sorts of negative consequences.

One approximation that we can use to resolve this problem is to think of argument copying conceptually rather than actually. That is, when we decide how to pass an argument, we ask ourselves whether this function conceptually needs to make a copy of this argument. For example, for the email class constructor that we’ve seen in Part 1 and 2, the answer is clearly yes, since the resulting email instance is expected to contain copies of the passed data.

Similarly, if we ask ourselves whether the matrix operator+ conceptually makes copies of its arguments, then the answer is no, even though the implementation is most likely to make a copy of one of its arguments and use operator+= on that (as we have seen, passing one or both arguments by value in operator+ doesn’t really produce the desired optimization in all the cases).

As another example, consider operator+= itself. For matrix it clearly doesn’t make a copy of its argument, conceptually and actually. For std::string, on the other hand, it does make a copy of its argument, conceptually but, most likely, not actually. For std::list, it does make a copy of its argument, conceptually and, chances are good, actually.

While this approximation won’t produce the optimal result every time, I believe it will have a pretty good average while significantly simplifying the decision making. So these are the rules I am going to start using in my C++11 code, summarized in the list list:

  1. Does the function conceptually make a copy of its argument?
  2. If the answer is NO, then pass by const reference.
  3. If the answer is YES, then pass by value.
  4. Based on the profiler result or other evidence, optimize a select few cases by providing lvalue/rvalue overloads.

I think the only kind of code where going straight to lvalue/rvalue overloads is justified are things like generic containers, matrices, etc. I would also like to know what you think. You can do it in the comments below or in the /r/cpp discussion of this post.

Efficient argument passing in C++11, Part 1

Tuesday, June 19th, 2012

I don’t think anyone would argue that the most important new core language feature in C++11 is the addition of rvalue references. The sheer number of articles and blog posts that were written to explore various aspects of this new functionality further confirms this assertion. However, the more I read these articles and blogs and think about applying their ideas in real C++11 code, the more I realize that many of the suggested “best practices” are fairly theoretical, not confirmed by real-world usage. In this sense the addition or rvalue references is like the addition of a new dimension; it is hard to imagine the implications until you actually start “living” there. As a result, in many cases, the interactions between rvalue references and other C++ features are not yet very well understood, especially when it comes to how the design of our applications changes with C++11.

After reading that last statement you may be asking, hasn’t STL itself been updated to be rvalue reference-aware and isn’t it the perfect ground for discovering new best practices? While the lessons learned from STL are definitely valuable, one thing to keep in mind is that STL is very “generic”, meaning that it mostly consists of templates. While the techniques used in STL can be used in other templates, they may not be applicable or, more often, practical in normal, non-template code. We will see an example of this later in this post.

One fundamental design decision that we have to re-evaluate when moving from C++98 to C++11 is how we pass arguments to our functions. This is because types in addition to being copyable can now also be movable and that moving is expected to be significantly cheaper than copying. Not surprisingly, there were quite a few articles discussing this topic, especially lately as more and more people started using C++11 and discover holes in what was considered best practices. I am also lucky to be able to use new language features by adding C++11 support to ODB, an ORM for C++. So I started thinking about updating ODB interfaces to take advantage of C++11 if C++11 is available (ODB supports both C++98 and C++11 from the same codebase). However, I pretty quickly realized that the current “state of the art” of argument passing in C++11 still doesn’t feel right. In this series of posts I am going to outline the problem and see if we can come up with a solution.

There are quite a few blog posts and articles that give background on this problem. However, none of them cover all the issues completely. So let’s quickly recap the situation.

Say we are writing a C++98 function that takes a string and does something with it other than modifying it. That is, we have an “in” parameter. Our function could just access the string or it could make a copy. We don’t know. I am sure you’ve seen many functions like this:

 
void f (const std::string&);
 

That’s a pretty idiomatic signature for C++98. Here is how we could use it:

 
f ("foo");
 
std::string s ("bar");
f (s);
 

This function will work just fine in C++11. However, passing its argument as const reference may not be the most efficient way, depending on what this function does with the string. If the function simply accesses the string without making any copies, then using const reference is as efficient as it gets. However, if the function makes a copy of the passed string, then its performance can be improved, potentially significantly. To see why, consider the first call to f() in the above example. Here is what will happen: a temporary (rvalue) string will be initialized with "foo" and then passed to f(). f(), in turn, will create a copy of the passed temporary and, once it returns, the temporary will be destroyed.

Do you see the potential improvement here? If what is passed to f() is a temporary (rvalue), then we could move it instead of copying! If your compiler provides a copy-on-write implementation of std::string, then the improvement might not seem that significant. However, if we replace std::string with, say, std::vector and pass vectors that contain a large number of elements, then the performance improvement could be dramatic.

At first this may also seem as not that big of a deal if all that we are optimizing are cases where we pass temporaries. For example, if we were passing std::vector, then you could reason that nobody will write an initializer list with thousands of elements and therefore we can deem such cases insignificant:

 
void f (const std::vector<std::string>&);
 
f ({"foo", "bar", "baz"});
 

Note, however, that there is a much more important use case for this optimization other than inline initialization. And that is passing an object that was returned by value from another function. Consider this example:

 
void f (const std::vector<std::string>&);
std::vector<std::string> g ();
 
f (g ());
 

As you are probably aware, returning a movable type by value from a function is the most efficient way to return in C++11. As a result, making sure that the receiving function in a call chain won’t make a copy is crucial for the whole to work as efficiently as possible.

Ok, so how do we make sure our function copies its argument if we pass an lvalue and moves it if we pass an rvalue? The most straightforward way is to overload the function with a version that takes the rvalue reference:

 
void f (const std::string& s)
{
  std::string s1 (s); // copy
  ...
}
 
void f (std::string&& s)
{
  std::string s1 (std::move (s)); // move
  ...
}
 

There are a couple of problems with this approach, however. The most obvious one is that we now have two functions instead of one. As a result, we either have to duplicate the logic or factor it out into yet another “common implementation” function. Not very elegant.

But there is a bigger problem. A typical example of a function that makes copies of its arguments is a constructor that initializes a bunch of data members with the passed value. Consider the email class, for instance:

 
struct email
{
  email (const std::string& first,
         const std::string& last,
         const std::string& addr)
    : first_ (first), last_ (last), addr_ (addr) {}
 
  ...
 
private:
  std::string first_;
  std::string last_;
  std::string addr_;
};
 

To make this constructor rvalue-aware, we will have to provide an overload for every rvalue/lvalue combination of its three arguments. That’s 6 overloads in this case (2^N in the general case, where N is the number of by-value arguments). Remember that “new dimension” analogy I used above?

STL uses this approach. For example, push_back() in std::vector is overloaded for the const lvalue reference and rvalue reference. In case of push_back() this approach works well: there is only one argument and std::vector is not implemented very often. Generally, my feeling is that this approach suits generic, reusable code best and quickly becomes impractical for application-level code that needs to be written quickly and re-written often.

Ok, so what can we do to solve this combinatorial explosion? Someone clever (not sure who got this idea first) came up with this trick: instead of passing the argument as a reference (rvalue or lvalue), let’s pass it by value:

 
void f (std::string s)
{
  std::string s1 (std::move (s)); // move
  ...
}
 

The idea here is to let the compiler decide at the function call site whether to move or copy the value. If we pass an lvalue, then it will be copied to the argument value and inside f() we move this argument value to our own instance (total cost: one move and one copy). If we pass an rvalue, then it will be moved to the argument value and inside f() we again move it to our own instance (total cost: two moves). If you count the cost of the first approach, then you will get one copy if we pass an lvalue and one move if we pass an rvalue. So the overhead of pass-by-value is one extra move in each case. Since moving is expected to be quite cheap, this is pretty good when the alternative is to write, say, 16 overloads for a 4-argument constructor.

This is the approach that is considered the “best practice” at the moment. However, as with most clever tricks, cracks become apparent if we look close enough. Let’s first observe that this approach only provides optimal performance if you know for sure that the function will need a copy of the passed argument. If the function doesn’t need a copy, then the overhead imposed by this approach compared to the previous one will be a whole extra copy. Seeing that we are counting extra moves, this is a potentially huge overhead. I’ve seen quite a bit of confusion regarding this where some people suggest that we should always pass things by value in C++11. So it is worth repeating this: pass-by-value only works if you are definitely making a copy of the argument.

In what circumstances do we not know whether a function needs a copy? Well, one is if the function needs to make a copy only in certain cases. The other is when the function can have various implementations that may or may not need a copy. Consider our email class as an example. Whether we need to make copies depends on how we store the email address. In the constructor implementation that we have seen above, the pass-by-value approach works well. But here is an alternative implementation where we have to pass arguments differently (”asymmetrically”) to achieve the optimal performance:

 
struct email
{
  email (std::string first,
         const std::string& last,
         const std::string& addr)
    : email_ (std::move (first))
  {
    email_ += ' ';
    email_ + last
    email_ += " <";
    email_ + addr;
    email_ += '>';
  }
 
  ...
 
private:
  std::string email_;
};
 

Think about it: with the pass-by-value approach we embed assumptions about the function’s implementation into its interface! Not very clean, to say the least.

Even if the interface and implementation are tied closely together, this approach may not quite work. One such example are binary operators. Let’s say we want to implement operator+ for a matrix class. An efficient implementation of this operator would copy one of its arguments and only access the other. Here is a typical implementation in C++98:

 
matrix operator+ (const matrix& x, const matrix& y)
{
  matrix r (x);
  r += y;
  return r;
}
 

The pass-by-value approach prescribes that we must pass the argument that is copied by value so that in case it is an rvalue, the copy can be replaced with the move. So here we go:

 
matrix operator+ (matrix x, const matrix& y)
{
  x += y;
  return x;
}
 

This works well if the temporary is on the left-hand side. But what if it is on the right-hand side? Here is a perfectly plausible example:

 
matrix r = a + 2 * b;
 

Now the first argument is an lvalue and we have to make a copy. But the second argument is an rvalue which we could have used instead to avoid copying! In this example we only need to copy one of the arguments. The problem is that with the pass-by-value approach we have to hardcode which one it is in the function signature.

Note also that the pass-by-value approach will only work if the class is movable (by “work” here I mean “won’t result in disastrous performance”). Particularly, that’s the reason why this approach should not be used in generic code, such as std::vector::push_back(). Again, there seems to be some confusion here with some people suggesting that std::vector could have used the pass-by-value approach but doesn’t for some historic or compatibility reasons.

There are also other, more obscure, problems with pass-by-value stemming from the fact that we factored out the move/copy constructor call from the function body to the function call site. One is the potential code bloat and code locality issues resulting from all these copy constructor calls. The other is the strange resulting exception safety guarantee. Oftentimes, the function itself will be noexcept because it only uses the move constructor which is often noexcept. However, the call to such a function can still throw because of all the copy constructor calls, which can normally throw, at the function call site. For more information on these issues refer to Dave Abrahams’ blog post, including the comments.

Ok, so the first approach is often impractical while the pass-by-value method smells a little. Are there any other alternatives that overcome all of the above issues? I don’t believe there is a solution that is based just on the built-in derived types (i.e., references, etc).

Let me elaborate a little bit on that last statement. It appears what we want is some sort of a derived type (here I use the term derived type to refer to pointers, references, etc., rather than class inheritance) that would (a) bind to both rvalues and lvalues and (b) allow us to distinguish between the two. We can achieve (a) with const lvalue references but not (b).

In fact, if we think about it, C++11 already has a similar mechanism: perfect forwarding. Essentially, with perfect forwarding we have a parameter that can become an rvalue or lvalue reference depending on the argument type. And we can distinguish between the two in the function body. The only catch is that it works at compile time. So what seems to be missing is a similar mechanism that works at runtime.

How does all of the above make you feel? To be honest, I feel disappointed. If I had to summarize it in one word, I would call it inelegant. The fact that C++11 has multiple approaches to something as fundamental as efficiently passing arguments to functions is really unfortunate. Hell, the fact that it takes multiple pages to explain all this is already alarming. I think a lot of people, including myself, hoped that rvalue references will help rid C++ of some of its ugly aspects (e.g., std::auto_ptr). While this certainly materialized for some parts of the language, such as efficient return-by-value, it seems we also managed to further complicate other things. In fact, it feels as if there is an asymmetry between return-by-value and pass-by-value. Return-by-value got a lot of attention and the result feels really elegant (i.e., in the majority of cases we don’t have to explicitly call std::move() to “move” a value out of the function). At the same time, it feels that pass-by-value didn’t get the same amount of attention, even though, as we have seen above, for efficient function chaining, it is just as important.

Ok, that’s pretty depressing. You must be also thinking that surely I didn’t write all this without coming up with at least some sort of a solution. And you would be right. As I mentioned above, I don’t believe there is a built-in mechanism in C++11 to achieve what we want. However, that doesn’t mean there isn’t a more involved approach. We will see what we can do next week, in Part 2 of this post.

Rvalue reference pitfalls

Tuesday, March 6th, 2012

I just finished adding initial C++11 support to ODB (will write more about that in a separate post) and, besides other things, this involved using rvalue references, primarily to implement move constructors and assignment operators. While I talked about some of the tricky aspects of rvalue references in the “Rvalue references: the basics” post back in 2008 (it is also a good general introduction to the subject), the experience of writing real-world code that uses this feature brought a whole new realization of the potential pitfalls.

Probably the most important thing to always keep in mind when working with rvalue references is this: once an rvalue reference is given a name, it becomes an lvalue reference. Consider this function:

void f (int&& x)
{
}

What is the type of the argument in this function? It is an rvalue reference to int (i.e., int&&). And what is the type of the x variable in f()’s body? It is a normal (lvalue) reference to int (i.e., int&). This takes some getting used to.

Let’s see what happens when we forget about this rule. Here is a naive implementation of a move constructor in a simple class:

struct base
{
  base (const base&);
  base (base&&);
};
 
struct object: base
{
  object (object&& obj)
      : base (obj),
        nums (obj.nums)
  {
  }
 
private:
  std::vector<int> nums;
};

Here, instead of calling move constructors for base and nums we call their copy constructors! Why? Because obj is an lvalue reference, not an rvalue one. The really bad part to this story is this: if you make such a mistake, there will be no compilation or runtime error. It will only manifest itself as sub-optimal performance which can easily go unnoticed for a long time.

So how do we fix this? The fix is to always remember to convert the lvalue reference back to rvalue with the help of std::move():

  object (object&& obj)
      : base (std::move (obj)),
        nums (std::move (obj.nums))
  {
  }

What if one of the members doesn’t provide a move constructor? In this case the copy constructor will silently be called instead. This can also be sub-optimal or even plain wrong, for example, in case of a raw pointer. If the member’s type provides swap(), then this can be a good backup plan:

  object (object&& obj)
      : base (std::move (obj))
  {
    nums.swap (obj.nums);
  }

Ok, that was a warmup. Ready for some heavy lifting? Let’s start with this simple code fragment:

typedef int& rint;
typedef rint& rrint;

What is rrint? Right, it is still a reference to int. The same logic holds for rvalue references:

typedef int&& rint;
typedef rint&& rrint;

Here rrint is still an rvalue reference to int. Things become more interesting when we try to mix rvalue and lvalue references:

typedef int&& rint;
typedef rint& rrint;

What is rrint? Is it an rvalue, lvalue, or some other kind of reference (lrvalue reference, anyone)? The correct answer is it’s an lvalue reference to int. The general rule is that as soon as we have an lvalue reference anywhere in the mix, the resulting type will always be an lvalue reference.

You may be wondering why on earth would anyone create an lvalue reference to rvalue reference or an rvalue reference to lvalue reference. While you probably won’t do it directly, this can happen more often than one would think in template code. And I don’t think the resulting interactions with other C++ mechanisms, such as automatic template argument deductions, are well understood yet.

Here is a concrete example from my work on C++11 support in ODB. But first a bit of context. For standard smart pointers, such as std::shared_ptr, ODB provides lazy versions, such as odb::lazy_shared_ptr. In a nutshell, when an object that contains lazy pointers to other objects is loaded from the database, these other objects are not loaded right away (which would be the case for normal, eager pointers). Instead, just the object ids are loaded and the objects themselves can be loaded later, when and if required.

A lazy pointer can be initialized with an actual pointer to a persistent object, in which case the pointer is said to be loaded. Or we can initialize it with an object id, in which case the pointer is unloaded. Here are the signatures of the two constructors in question:

template <class T>
class lazy_shared_ptr
{
  ...
 
  template <class ID>
  lazy_shared_ptr (database&, const ID&);
 
  template <class T1>
  lazy_shared_ptr (database&, const std::shared_ptr<T1>&);
};

Seeing that we now have rvalue references, I’ve decided to go ahead and add move versions for these two constructors. Here is my first attempt:

  template <class ID>
  lazy_shared_ptr (database&, const ID&);
 
  template <class ID>
  lazy_shared_ptr (database&, ID&&);
 
  template <class T1>
  lazy_shared_ptr (database&, const std::shared_ptr<T1>&);
 
  template <class T1>
  lazy_shared_ptr (database&, std::shared_ptr<T1>&&);

Let’s now see what happens when we try to create a loaded lazy pointer:

shared_ptr<object> p (db.load<object> (...));
lazy_shared_ptr<object> lp (db, p);

One would expect that the third constructor will be used in this fragment but that’s not what happens. Let’s see how the overload resolution and template argument deduction work here. The type of the second argument in the lazy_shared_ptr constructor call is shared_ptr<object>& and here are the signatures that we get:

lazy_shared_ptr (database&, const shared_ptr<object>&);
lazy_shared_ptr (database&, (shared_ptr<object>&)&&);
lazy_shared_ptr (database&, const shared_ptr<object>&);
lazy_shared_ptr (database&, shared_ptr<object>&&);

Take a closer look at the second signature. Here the template argument is an lvalue reference. On top of that we add an rvalue reference. But, as we now know, this is still just an lvalue reference. So in effect our candidate list is as follows and, unlike our expectations, the second constructor is selected:

lazy_shared_ptr (database&, const shared_ptr<object>&);
lazy_shared_ptr (database&, shared_ptr<object>&);
lazy_shared_ptr (database&, const shared_ptr<object>&);
lazy_shared_ptr (database&, shared_ptr<object>&&);

In other words, the second constructor, which was supposed to take an rvalue reference was transformed to a constructor that takes an lvalue reference. This is some profound stuff. Just think about it: given its likely implementation, this constructor can now silently “gut” an lvalue without us ever indicating this desire with an explicit std::move() call.

So how can we fix this? My next attempt was to strip the lvalue reference from the template argument, just like std::move() does for its return value:

  template <class ID>
  lazy_shared_ptr (
    database&,
    typename std::remove_reference<ID>::type&&);

But this inhibits template argument deduction and there is no way (nor desire, in my case) to specify template arguments explicitly for constructor templates. So, in effect, the above constructor becomes uncallable.

So what did I do in the end? Nothing. There doesn’t seem to be a way to provide such a move constructor. The more general conclusion seems to be this: function templates with arguments of rvalue references that are formed directly from template arguments can be transformed, with unpredictable results, to functions that have lvalue references instead. Preventing this using std::remove_reference will inhibit template argument deduction.

Update: I have written a follow up to this post that discusses some of the suggestions left in the comments as well as presents a solution for the above problem.