Rvalue reference pitfalls, an update

March 14th, 2012

My original post about rvalue reference pitfalls from last week was followed by quite a few comments, including some interesting suggestions that are worth discussing.

While most of the discussion centered around the second problem, Jonathan Rogers pointed out the following interesting observation. Consider again the lazy_shared_ptr constructor from the original article that takes shared_ptr:

template <class T>
class lazy_shared_ptr
{
  ...
 
  template <class T1>
  lazy_shared_ptr (database&, const std::shared_ptr<T1>& p)
    : p_ (p)
  {
  }
};

If we want to support efficient initialization (shall we call it move initialization?), then it seems natural to add an rvalue reference overload:

template <class T>
class lazy_shared_ptr
{
  ...
 
  template <class T1>
  lazy_shared_ptr (database&, const std::shared_ptr<T1>& p)
    : p_ (p)
  {
  }
 
  template <class T1>
  lazy_shared_ptr (database&, std::shared_ptr<T1>&& p)
    : p_ (std::move (p))
  {
  }
};

While this works, as Jonathan pointed out, another alternative to provide the same functionality is to just have a single constructor that takes its argument by value:

template <class T>
class lazy_shared_ptr
{
  ...
 
  template <class T1>
  lazy_shared_ptr (database&, std::shared_ptr<T1> p)
    : p_ (std::move (p))
  {
  }
};

Let’s consider what happens when we use both versions to initialize lazy_shared_ptr with lvalues and rvalues. When we use the original implementation with an lvalue, the first constructor (the one taking const lvalue reference) is selected. The value is then copied to p_ using shared_ptr’s copy constructor. Using the second implementation with an lvalue causes that copy constructor to be called right away to create the temporary. This temporary is then passed to the lazy_shared_ptr constructor where it is moved to p_. So in this case the second implementation requires an extra move constructor call.

Let’s now pass an rvalue. In the first implementation the second constructor is selected and the value is passed as an rvalue reference. It is then moved to p_. When the second implementation is used, a temporary is again created but this time using a move instead of a copy constructor. The temporary is then moved to p_. In this case, again, the second implementation requires an extra move constructor call.

Considering that move constructors are normally very cheap, this makes for a good way to keep your code short and concise. But the real advantage of this approach becomes apparent when we have multiple arguments that we want to pass efficiently (this was also a topic of Sumant’s post from a few days ago). If we use the rvalue reference approach, then for n arguments we will need 2^n constructor versions.

Note, however, that the pass by value approach is only a good idea if you know for sure that the argument type provides a move constructor. If that’s not the case, then this approach will perform significantly worse compared to the rvalue reference version. This is the reason, for example, why it is not a good idea to use this technique in std::vector’s push_back().

Ok, let’s now turn to the problem that triggered a lot of comments and suggestions. Here is a quick recap. We have two constructors like these:

template <class T>
class lazy_shared_ptr
{
  ...
 
  template <class ID>
  lazy_shared_ptr (database&, const ID&);
 
  template <class T1>
  lazy_shared_ptr (database&, const std::shared_ptr<T1>&);
};

One initializes a lazy pointer using an object id, creating an unloaded pointer. The other initializes it with the pointer to the actual object, creating a loaded pointer.

Now we want to add move initialization overloads for these two constructors. As it turns out, the straightforward approach doesn’t quite work:

template <class T>
class lazy_shared_ptr
{
  ...
 
  template <class ID>
  lazy_shared_ptr (database&, const ID&);
 
  template <class ID>
  lazy_shared_ptr (database&, ID&&);
 
  template <class T1>
  lazy_shared_ptr (database&, const std::shared_ptr<T1>&);
 
  template <class T1>
  lazy_shared_ptr (database&, std::shared_ptr<T1>&&);
};

As you may recall, we have two problems here. The first manifests itself when we try to initialize lazy_shared_ptr with an lvalue of the shared_ptr type:

shared_ptr<object> p = ...;
lazy_shared_ptr<object> lp (db, p);

Instead of selecting the third constructor, the overload resolution rules select the second because the rvalue reference in its second argument becomes an lvalue reference (see the original post for details on why this happens).

The second problem occurs when we try to initialize a lazy pointer with an object id that is again an lvalue. For example:

string id = ...
lazy_shared_ptr<object> lp (db, id);

In this case, instead of selecting the first constructor, the overload resolution again selects the second constructor which is again transformed to a version that has an lvalue instead of an rvalue reference for its second argument. If you are wondering why this is a problem (after all, the first two constructors accomplish essentially the same), consider that while the first constructor’s hypothetical implementation will use a copy constructor to initialize the id, the second constructor will most likely use the move constructor to accomplish the same. Which means that the state of our lvalue will be transferred without us explicitly asking for it, as we normally do with the std::move() call.

As Thomas noted in the comments and as I should have mentioned explicitly in the original post, the C++ mechanisms that are causing problems here are exactly the same ones that allow for perfect argument forwarding. In fact, rvalue references are primarily used to implement two related but also quite distinct things: the move semantics and perfect forwarding. What happens here is that we are trying to implement the move semantics but are getting perfect forwarding instead. To paraphrase the conclusion of my original post, any time you write a function like this:

template <typename T>
void f (T&&);

You always get perfect forwarding and never move semantics.

If we look closely at the two problematic cases above, we will notice that they both happen when the template argument is an lvalue reference which results in our rvalue reference becoming lvalue. When we pass an rvalue, everything works great. In fact, we never want our move initialization constructor to be called for lvalues since we have other overloads (const lvalue reference) taking care of these cases. So what we want is to disable the move initialization constructor when the template argument is an lvalue reference. As it turns out, this is not that difficult in C++11:

  template <class ID,
            typename std::enable_if<
              !std::is_lvalue_reference<ID>::value,
              int>::type = 0>
  lazy_shared_ptr (database&, ID&&)

To put this in more general terms, it is possible to reduce perfect forwarding back to just move semantics by disabling a function for template arguments that are lvalue references.

Rvalue reference pitfalls

March 6th, 2012

I just finished adding initial C++11 support to ODB (will write more about that in a separate post) and, besides other things, this involved using rvalue references, primarily to implement move constructors and assignment operators. While I talked about some of the tricky aspects of rvalue references in the “Rvalue references: the basics” post back in 2008 (it is also a good general introduction to the subject), the experience of writing real-world code that uses this feature brought a whole new realization of the potential pitfalls.

Probably the most important thing to always keep in mind when working with rvalue references is this: once an rvalue reference is given a name, it becomes an lvalue reference. Consider this function:

void f (int&& x)
{
}

What is the type of the argument in this function? It is an rvalue reference to int (i.e., int&&). And what is the type of the x variable in f()’s body? It is a normal (lvalue) reference to int (i.e., int&). This takes some getting used to.

Let’s see what happens when we forget about this rule. Here is a naive implementation of a move constructor in a simple class:

struct base
{
  base (const base&);
  base (base&&);
};
 
struct object: base
{
  object (object&& obj)
      : base (obj),
        nums (obj.nums)
  {
  }
 
private:
  std::vector<int> nums;
};

Here, instead of calling move constructors for base and nums we call their copy constructors! Why? Because obj is an lvalue reference, not an rvalue one. The really bad part to this story is this: if you make such a mistake, there will be no compilation or runtime error. It will only manifest itself as sub-optimal performance which can easily go unnoticed for a long time.

So how do we fix this? The fix is to always remember to convert the lvalue reference back to rvalue with the help of std::move():

  object (object&& obj)
      : base (std::move (obj)),
        nums (std::move (obj.nums))
  {
  }

What if one of the members doesn’t provide a move constructor? In this case the copy constructor will silently be called instead. This can also be sub-optimal or even plain wrong, for example, in case of a raw pointer. If the member’s type provides swap(), then this can be a good backup plan:

  object (object&& obj)
      : base (std::move (obj))
  {
    nums.swap (obj.nums);
  }

Ok, that was a warmup. Ready for some heavy lifting? Let’s start with this simple code fragment:

typedef int& rint;
typedef rint& rrint;

What is rrint? Right, it is still a reference to int. The same logic holds for rvalue references:

typedef int&& rint;
typedef rint&& rrint;

Here rrint is still an rvalue reference to int. Things become more interesting when we try to mix rvalue and lvalue references:

typedef int&& rint;
typedef rint& rrint;

What is rrint? Is it an rvalue, lvalue, or some other kind of reference (lrvalue reference, anyone)? The correct answer is it’s an lvalue reference to int. The general rule is that as soon as we have an lvalue reference anywhere in the mix, the resulting type will always be an lvalue reference.

You may be wondering why on earth would anyone create an lvalue reference to rvalue reference or an rvalue reference to lvalue reference. While you probably won’t do it directly, this can happen more often than one would think in template code. And I don’t think the resulting interactions with other C++ mechanisms, such as automatic template argument deductions, are well understood yet.

Here is a concrete example from my work on C++11 support in ODB. But first a bit of context. For standard smart pointers, such as std::shared_ptr, ODB provides lazy versions, such as odb::lazy_shared_ptr. In a nutshell, when an object that contains lazy pointers to other objects is loaded from the database, these other objects are not loaded right away (which would be the case for normal, eager pointers). Instead, just the object ids are loaded and the objects themselves can be loaded later, when and if required.

A lazy pointer can be initialized with an actual pointer to a persistent object, in which case the pointer is said to be loaded. Or we can initialize it with an object id, in which case the pointer is unloaded. Here are the signatures of the two constructors in question:

template <class T>
class lazy_shared_ptr
{
  ...
 
  template <class ID>
  lazy_shared_ptr (database&, const ID&);
 
  template <class T1>
  lazy_shared_ptr (database&, const std::shared_ptr<T1>&);
};

Seeing that we now have rvalue references, I’ve decided to go ahead and add move versions for these two constructors. Here is my first attempt:

  template <class ID>
  lazy_shared_ptr (database&, const ID&);
 
  template <class ID>
  lazy_shared_ptr (database&, ID&&);
 
  template <class T1>
  lazy_shared_ptr (database&, const std::shared_ptr<T1>&);
 
  template <class T1>
  lazy_shared_ptr (database&, std::shared_ptr<T1>&&);

Let’s now see what happens when we try to create a loaded lazy pointer:

shared_ptr<object> p (db.load<object> (...));
lazy_shared_ptr<object> lp (db, p);

One would expect that the third constructor will be used in this fragment but that’s not what happens. Let’s see how the overload resolution and template argument deduction work here. The type of the second argument in the lazy_shared_ptr constructor call is shared_ptr<object>& and here are the signatures that we get:

lazy_shared_ptr (database&, const shared_ptr<object>&);
lazy_shared_ptr (database&, (shared_ptr<object>&)&&);
lazy_shared_ptr (database&, const shared_ptr<object>&);
lazy_shared_ptr (database&, shared_ptr<object>&&);

Take a closer look at the second signature. Here the template argument is an lvalue reference. On top of that we add an rvalue reference. But, as we now know, this is still just an lvalue reference. So in effect our candidate list is as follows and, unlike our expectations, the second constructor is selected:

lazy_shared_ptr (database&, const shared_ptr<object>&);
lazy_shared_ptr (database&, shared_ptr<object>&);
lazy_shared_ptr (database&, const shared_ptr<object>&);
lazy_shared_ptr (database&, shared_ptr<object>&&);

In other words, the second constructor, which was supposed to take an rvalue reference was transformed to a constructor that takes an lvalue reference. This is some profound stuff. Just think about it: given its likely implementation, this constructor can now silently “gut” an lvalue without us ever indicating this desire with an explicit std::move() call.

So how can we fix this? My next attempt was to strip the lvalue reference from the template argument, just like std::move() does for its return value:

  template <class ID>
  lazy_shared_ptr (
    database&,
    typename std::remove_reference<ID>::type&&);

But this inhibits template argument deduction and there is no way (nor desire, in my case) to specify template arguments explicitly for constructor templates. So, in effect, the above constructor becomes uncallable.

So what did I do in the end? Nothing. There doesn’t seem to be a way to provide such a move constructor. The more general conclusion seems to be this: function templates with arguments of rvalue references that are formed directly from template arguments can be transformed, with unpredictable results, to functions that have lvalue references instead. Preventing this using std::remove_reference will inhibit template argument deduction.

Update: I have written a follow up to this post that discusses some of the suggestions left in the comments as well as presents a solution for the above problem.

Who calls this function?

February 29th, 2012

Let’s say we have a large project and we want to find out from which places in our code a particular function is called. You may be wondering why would you want to know? The most common reason is to eliminate dead code; if the function is not called, then it may not be needed anymore. Or maybe you just want to refresh your memory about this area of your code base. The case that triggered this post involved changing one function call to another. I was adding support for composite object ids in ODB and was gradually changing the code to use a more generalized version of a certain function while still maintaining the old version for the code still to be ported. While I knew about most of the areas that needed changing, in the end I needed to verify that nobody was calling the old function and then remove it.

So how do we find out who calls a particular function? The method that I am sure most of you have used before is to comment the function out, recompile, and use the C++ compiler error messages to pin-point the calls. There are a few problems with this approach, however. First of all, depending on your build system, the compilation may stop before showing you all the call sites (make -k is helpful here but is still not a bulletproof solution). So to make sure that you have seen all the places, you may also have to keep commenting the calls and recompiling until you get no more errors. This is annoying.

This approach will also not work if a call can be resolved to one of the overloaded versions. This was exactly the situation I encountered. I had two functions that looked like this:

class traverser
{
  void traverse (type&);   // New version.
  void traverse (class_&); // Old version.
};

Where class_ derives from type so if I commented the old version out, the calls were happily resolved to the new version without giving any errors.

Another similar situation is when you have a function in the outer namespace that will be used if you comment a function in the inner namespace:

void f ();
 
namespace n
{
  void f ();
 
  void g ()
  {
    // Will resolve to outer f() if inner f() is
    // commented out.
    //
    f ();
  }
}

What’s worse is that in complex cases involving implicit conversions of arguments, some calls may be successfully resolved to an overloaded or outer version while some will trigger an error. As a result, you may not even realize that you didn’t see all the call sites.

Ok, so that approach didn’t work in my case. What else can we try? Another option is to just comment the definition of the function out and see if we get any unresolved symbol errors during linking. There are many problems with this method as well. First of all, if the function in question is virtual, then this method won’t work because the virtual function table will always contain a reference to the function. Plus, all the calls to this function will go through the vtable.

If the function is not virtual, then, at best, a linker will tell you that there is an undefined reference in a specific function in a specific translation unit. For example, here is an output from the GNU Binutils ld:

/tmp/ccXez0jI.o: In function `main':
test.cxx:(.text+0×10): undefined reference to `f()'
test.cxx:(.text+0×15): undefined reference to `f()'

In particular, there will be no line information so if a function calls the function of interest multiple times, we will have no way of knowing which call triggered the undefined symbol.

This approach also won’t work if we are building a shared library (unless we are using the -no-undefined or equivalent option) because the undefined reference won’t be reported until we link the library to an executable or try to load it (e.g., with dlopen()). And when that happens all we will get is just a note that there is an undefined reference in a library:

libtest.so: undefined reference to `f()'

In my case, since ODB is implemented as a shared library, all this method did was to confirm that I still had a call to the old version of the function. I, however, had no idea even which file(s) contained these calls.

As it happens, just the day before I was testing ODB with GCC in the C++11 mode. While everything worked fine, I got a few warnings about std::auto_ptr being deprecated. As I saw them scrolling by, I made an idle note to myself that when compiled in the C++11 mode libstdc++ probably marks auto_ptr using the GCC deprecated attribute. A day later this background note went off like a light bulb in my head: I can mark the old version of the function as deprecated and GCC will pin-point with a warning every single place where this function is called:

class traverser
{
  void traverse (type&);
 
  void traverse (class_&) __attribute__ ((deprecated));
};

And the diagnostics is:

model.cxx: In function ‘void object_columns::traverse(data_member&)’:
model.cxx:22:9: warning: ‘void traverser::traverse(class_&)’ is
deprecated

This method is also very handy to find out which overloaded version was selected by the compiler without resolving to the runtime test:

void f (bool) __attribute__ ((deprecated));
void f (int) __attribute__ ((deprecated));
void f (double) __attribute__ ((deprecated));
 
void g ()
{
  f (true);
  f (123);
  f (123.1);
}

And the output is:

test.cxx:7:10: warning: ‘void f(bool)’ is deprecated
test.cxx:8:9: warning: ‘void f(int)’ is deprecated
test.cxx:9:11: warning: ‘void f(double)’ is deprecated

The obvious drawback of this method is that it relies on a GCC-specific extension, though some other compilers (Clang and probably Intel C++ for Linux) also support it. If you know of a similar functionality in other compilers and/or IDE’s, please mention it in the comments.