Efficient argument passing in C++11, Part 3

Last week, in Part 2 of this post, we saw yet another method of efficient argument passing in C++11, this time using a custom wrapper type. Some people called it a smart pointer, though it looks more like a smart reference with smartness coming from its ability to distinguish between lvalues and rvalues. You can download an improved (thanks to your feedback) version of the in class template along with a test: in.tar.gz.

So, now we have a total of four alternatives: pass by const reference, pass by value, overload on lvalue/rvalue references, and, finally, the smart reference approach. I would have liked to tell you that there is a single method that works best in all cases. Unfortunately, it is not the case, at least not in C++11. Every one of these methods works best in some situations and has serious drawbacks when applied in others.

In fact, one can argue that C++11 actually complicated things compared to C++98. While you may not be able to achieve the same efficiency in C++98 when it comes to argument passing, at least the choice was simple: pass by const reference and move on to more important things. In this area C++11 became even more of a craftsman’s language where every case needs to be carefully analyzed and an intricate mechanism used to achieve the best result.

If we can’t have a single, fit-all solution, let’s at least try to come up with a set of guidelines that would allow us to select an appropriate method without spending too much time thinking about it.

Let’s start with the smart reference approach since it comes closest to the fit-all solution. As you may remember from last week’s post, its main issue is the need for a custom wrapper type and the resulting non-idiomatic interface. This is a problem both at the interface level (people looking at the function signature that uses the in class template may not know what’s going on) as well as at the implementation level (we have to “unwrap” the argument to access its member functions). As a result, I wouldn’t recommend using this approach in code that is meant to be used by a wider audience (e.g., libraries, frameworks, etc). However, for application code that is only meant to be seen and understood by the people developing it, smart references can free your team from agonizing about which method to use in each specific case in order to achieve the best performance.

If we decide not to use the smart reference approach, then we have the other three alternatives to choose from. Let’s first say that we want to select only one method and always use that. This may not be a bad idea since what you get in return is the freedom not to think about this stuff anymore. You simply apply the rule and concentrate on more important things. One can also argue that all this discussion is one misguided exercise in premature optimization because in the majority of cases and in the grand scheme of things, it won’t matter which approach we use. And the few cases that do matter which, as experience tells us, we can only recognize with the help of a profiler, we can always change to use a more optimal method.

Ok, so if we had to choose just one method, which one would it be? The overload on lvalue/rvalue references is out since it epitomizes premature optimization that we pay for with complexity and code bloat. So that leaves us with pass by const reference and pass by value. If we use pass by reference and our function makes a copy of the argument, we will miss out on the move optimization in case the argument is an rvalue. If we use pass by value and our function doesn’t make a copy of the argument, we will incur a copy overhead in case the argument is an lvalue. Predictably, the loss aversion principle kicks in (people’s tendency to strongly prefer avoiding losses to acquiring gains) and I personally prefer to miss out on the optimization than to incur the overhead. More rationally, though, I tend to think that in the general case more functions will simply use the argument rather than making a copy.

So it is the pass by const reference method if we had to choose only one. It has a couple of other nice properties. First of all, it is the same as what we would use in C++98. So if our code has to compile in both C++98 and C++11 modes or if we are migrating from C++98 to C++11, then it makes our life a little bit easier. The other nice property of this approach is that we can convert it to the overload on lvalue/rvalue method by simply adding another function.

What if we relax our requirements a little and allow ourselves to select between two methods? Can we come up with a set of simple rules that would allow us to make a correct choice in most cases and without spending too much time thinking about it? The choice here is between pass by reference and pass by value, with overload on lvalue/rvalue references reserved for fine-tuning a select few cases. As we know, whether the first two methods will result in optimal performance depends solely on whether the function makes a copy of its argument. And, as we have discussed in Part 1 of this post, in quite a few real-world situations this can be really hard and often impossible to determine. It also makes the signature of the function (i.e., the interface) depend on its implementation, which can have all sorts of negative consequences.

One approximation that we can use to resolve this problem is to think of argument copying conceptually rather than actually. That is, when we decide how to pass an argument, we ask ourselves whether this function conceptually needs to make a copy of this argument. For example, for the email class constructor that we’ve seen in Part 1 and 2, the answer is clearly yes, since the resulting email instance is expected to contain copies of the passed data.

Similarly, if we ask ourselves whether the matrix operator+ conceptually makes copies of its arguments, then the answer is no, even though the implementation is most likely to make a copy of one of its arguments and use operator+= on that (as we have seen, passing one or both arguments by value in operator+ doesn’t really produce the desired optimization in all the cases).

As another example, consider operator+= itself. For matrix it clearly doesn’t make a copy of its argument, conceptually and actually. For std::string, on the other hand, it does make a copy of its argument, conceptually but, most likely, not actually. For std::list, it does make a copy of its argument, conceptually and, chances are good, actually.

While this approximation won’t produce the optimal result every time, I believe it will have a pretty good average while significantly simplifying the decision making. So these are the rules I am going to start using in my C++11 code, summarized in the list list:

Does the function conceptually make a copy of its argument?
If the answer is NO, then pass by const reference.
If the answer is YES, then pass by value.
Based on the profiler result or other evidence, optimize a select few cases by providing lvalue/rvalue overloads.

I think the only kind of code where going straight to lvalue/rvalue overloads is justified are things like generic containers, matrices, etc. I would also like to know what you think. You can do it in the comments below or in the /r/cpp discussion of this post.

This entry was posted on Tuesday, July 3rd, 2012 at 7:55 am and is filed under Design, C++. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

4 Responses to “Efficient argument passing in C++11, Part 3”

alexander turner Says:
July 4th, 2012 at 2:02 am
Hi, this is a really thoughtful post. It seems tp me that there are differemt types of programming going on here. There is application programming in which one should stick to the c98 style as it is fast to implement and works. Then there is systems programming where crafting l r value overloads makes sense. Then, if there are issues in a prpfile, late optimizing application code makes sense. Early optimising application code might well fall fowel of optimisers in different compilers. My general rule of thumb is that the more complex your code is the less able an optimiser is to deal with it.
segfault Says:
July 6th, 2012 at 1:31 am
I tried this out (on gcc 4.7.1) with a in parameter and got a segfault. The problem seems to be with struct storage. The data field is defined as
[code]
std::aligned_storage data;
[/code]
when it should be
[code]
typename std::aligned_storage::type data;
[/code]
And the placement new should be *new(&s.data) instead of *new(&s).

Other than that, it is an interesting idea.
segfault Says:
July 6th, 2012 at 1:36 am
Here it is again with escaping the less-than:
typename std::aligned_storage<sizeof (T), alignof (T)>::type data;
Boris Kolpackov Says:
July 6th, 2012 at 3:24 am
You are, of course, absolutely correct. I’ve applied the fixes and updated the archive. Thanks for reporting this!