Archive for July, 2012

What are const rvalue references good for?

Tuesday, July 24th, 2012

You probably haven’t seen any practical code that uses const rvalue references (const T&&). And that’s not really surprising. The main purpose of rvalue references is to allow us to move objects instead of copying them. And moving the state of an object implies modification. As a result, the canonical signatures for the move constructor and the move assignment operator both take its argument as a non-const rvalue reference.

But const rvalue references are in the language and have well defined binding rules. Specifically, a const rvalue will prefer to bind to the const rvalue reference rather than the const lvalue reference. The following code fragment illustrates the binding preferences:

 
struct s {};
 
void f (      s&);  // #1
void f (const s&);  // #2
void f (      s&&); // #3
void f (const s&&); // #4
 
const s g ();
s x;
const s cx;
 
f (s ()); // rvalue        #3, #4, #2
f (g ()); // const rvalue  #4, #2
f (x);    // lvalue        #1, #2
f (cx);   // const lvalue  #2
 

Note the asymmetry: while a const lvalue reference can bind to an rvalue, a const rvalue reference cannot bind to an lvalue. In particular, this makes a const lvalue reference able to do everything a const rvalue reference can and more (i.e., bind to lvalues).

This makes const rvalue references pretty useless. Think about it: if all we need is a non-modifiable reference to an object (rvalue or lvalue), then a const lvalue reference does the job perfectly. The only situation where we want to single out rvalues is if we want to move them. And in that case we need a modifiable reference.

Plus, const rvalues don’t make much sense. Yes, we can have a function that returns a const value, like g() above, or call std::move() on a const object, but all this is rather pointless. Why place any restrictions on what a caller of our function can do with the their own copy of the returned value? And why try to move an immutable object?

So it seems the only use for const rvalue references is if you need to disable rvalue references altogether and you need to do it in a bullet-proof way that will deal with the above pointless but nevertheless legal cases.

Case in point is the std::reference_wrapper class template and its ref() and cref() helper functions. Because reference_wrapper is only meant to store references to lvalues, the standard disables ref() and cref() for rvalues. To make sure they cannot be called even for const rvalues, the standard uses the const rvalue reference:

 
template <class T> void ref (const T&&) = delete;
template <class T> void cref (const T&&) = delete;
 

Besides supporting the move semantics, the rvalue reference syntax is also used in perfect forwarding. Note, however, that adding const to T&& disables the special argument deduction rules used to support this feature. For example:

 
template <typename T> void f (T&&);
template <typename T> void g (const T&&);
 
s x;
 
f (s ()); // Ok, T = s
f (x);    // Ok, T = s&
 
g (s ()); // Ok, T = s
g (x);    // Error, binding lvalue to rvalue, T = s
 

Extended Database to C++ Type Mapping in ODB

Wednesday, July 18th, 2012

When it comes to development tools, I view features that they provide as being of two kinds. The majority are of the first kind which simply do something useful for the user of the tool. But the ones I really like are features that help people help themselves in ways that I might not have foreseen. The upcoming ODB 2.1.0 release has just such a feature.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code. ODB natively supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server.

To understand this new feature let’s first get some background on the problem. As you probably know, these days all relational databases support pretty much the same set of “core” SQL data types. Things like integers, floating point types, strings, binary, date-time, etc. Each database, of course, has its own names for these types, but they provide more or less the same functionality across all the vendors. For each database ODB provides native support for all the core SQL types. Here by native I mean that the data is exchanged with the database in the most efficient, binary format. ODB also allows you to map any core SQL type to any C++ type so we can map TEXT to std::string, QString, or my_string (the former two mappings are provided by default).

This all sounds nice and simple and that would have been the end of the story if all that modern databases supported were core SQL types. However, most modern databases also support a slew of extended SQL types. Things like spatial types, user-defined types, arrays, XML, the kitchen sink, etc, etc (Ok, I don’t think any database supports that last one, yet). Here is a by no means complete list that should give you an idea about the vast and varying set of extended types available in each database supported by ODB:

MySQL
  • Spatial types (GEOMETRY, GEOGRAPHY)
SQLite
  • NUMERIC
  • Spatial types (GEOMETRY, GEOGRAPHY) [spatialite extension]
PostgreSQL
  • NUMERIC
  • XML
  • JSON
  • HSTORE (key-value store) [hstore extension]
  • Geometric types
  • Network address types
  • Enumerated types
  • Arrays
  • Range types
  • Composite types
  • Spatial types (GEOMETRY, GEOGRAPHY) [PostGIS extension]
Oracle
  • ANY
  • XML
  • MEDIA
  • Arrays (VARRAY, table type)
  • User-defined types
  • Spatial types (GEOMETRY, GEOGRAPHY)
SQL Server
  • XML
  • Alias types
  • CLR types
  • Spatial types (GEOMETRY, GEOGRAPHY)

When people just started using ODB, core SQL types were sufficient. But now, as projects become more ambitious, we started getting questions about using extended SQL types in ODB. For example, ODB will handle std::vector<int> for us, but it will do it in a portable manner, which means it will create a separate, JOIN‘ed table to store the vector elements. On the other hand, if we are using PostgreSQL, it would be much cleaner to map it to a single column of the array of integers type (INTEGER[]). Clearly we needed a way to support extended SQL types in ODB.

The straightforward way to add this support would have been to handle extended types the same way we handle the core ones. That is, for each type implement a mapping that uses native database format. However, as types become more complex (e.g., arrays, user-defined types) so do the methods used to access them in the database-native format. In fact, for some databases, this format is not even documented and the only way to understand how things are represented is to study the database source code!

So the straightforward way appears to be very laborious and not very robust. What other options do we have? The idea that is implemented in ODB came from the way the OpenGIS specification handles reading and writing of spatial values (GEOMETRY, GEOGRAPHY). OpenGIS specifies the Well-Known Text (WKT) and Well-Known Binary (WKB) formats for representing spatial values. For example, point(10, 20) in WKT is represented as the "POINT(10 20)" string. Essentially, what OpenGIS did is define an interface for the spatial SQL types in terms of one of the core SQL types (text or binary). OpenGIS also defines a pair of functions for converting between, say, WKT and GEOMETRY values (GeomFromText/AsText).

As it turns out, this idea of interfacing with an extended SQL type using one of the core ones can be used to handle pretty much any extended type mentioned in the list above. In the vast majority of cases all we need to do is cast one value to another.

So in order to support extended SQL types, ODB allows us to map them to one of the built-in types, normally a string or a binary. Given the text or binary representation of the data we can then extract it into our chosen C++ data type and thus establish a mapping between an extended database type and its C++ equivalent.

The mapping between an extended type and a core SQL type is established with the map pragma:

#pragma db map type(regex) as(subst) to(subst) from(subst)

The type clause specifies the name of the database type that we are mapping, which we will call mapped type from now on. The as clause specifies the name of the database type that we are mapping the mapped type to. We will call it interface type from now on. The optional to and from clauses specify the database conversion expressions between the mapped type and the interface type. They must contain the special (?) placeholder which will be replaced with the actual value to be converted.

The name of the mapped type is actually a regular expression pattern so we can match a class of types, instead of just a single name. We will see how this can be useful in a moment. Similarly, the name of the interface type as well as the to/from conversion expressions are actually regex pattern substitutions.

Let’s now look at a concrete example that shows how all this fits together. Earlier I mentioned std::vector<int> and how it would be nice to map it to PostgreSQL INTEGER[] instead of creating a separate table. Let’s see what it takes to arrange such a mapping.

In PostgreSQL the array literal has the {n1,n2,...} form. As it turns out, if we cast an array to TEXT, then we will get a string in exactly this format. Similarly, Postgres is happy to convert a string in this form back to an array with a simple cast. With this knowledge, we can take a stab at the mapping pragma:

#pragma db map type("INTEGER\\[\\]") \
               as("TEXT")            \
               to("(?)::INTEGER[]")  \
               from("(?)::TEXT")

In plain English this pragma essentially says this: map INTEGER[] to TEXT. To convert from TEXT to INTEGER[], cast the value to INTEGER[]. To convert the other way, cast the value to TEXT. exp::TEXT is a shorter, Postgres-specific notation for CAST(exp AS TEXT).

The above pragma will do the trick if we always spell the type as INTEGER[]. However, INTEGER [] or INTEGER[123] are also valid. If we want to handle all the one-dimension arrays of integers, then that regex support I mentioned above comes in very handy:

#pragma db map type("INTEGER *\\[(\\d*)\\]") \
               as("TEXT")                    \
               to("(?)::INTEGER[$1]")        \
               from("(?)::TEXT")

With the above pragma we can now have a persistent class that contains std::vector<int> mapped to INTEGER[]:

// test.hxx
//
#ifndef TEST_HXX
#define TEST_HXX
 
#include <vector>
 
#pragma db map type("INTEGER *\\[(\\d*)\\]") \
               as("TEXT")                    \
               to("(?)::INTEGER[$1]")        \
               from("(?)::TEXT")
 
#pragma db object
class object
{
public:
  #pragma db id auto
  unsigned long id;
 
  #pragma db type("INTEGER[]")
  std::vector<int> array;
};
#endif

Ok, that’s one half of the puzzle. The other half is to implement conversion between std::vector<int> and the "{n1,n2,...}" text representation. For that we need to provide a value_traits specialization for std::vector<int> C++ type and TEXT PostgreSQL type. value_traits is the ODB customization mechanism I mentioned earlier that allows us to map any C++ type to any core SQL type. Here is a sample implementation which should be pretty easy to follow. I’ve instrumented it with a few print statements so that we can see what’s going on at runtime.

// traits.hxx
//
#ifndef TRAITS_HXX
#define TRAITS_HXX
 
#include <vector>
#include <sstream>
#include <iostream>
#include <cstring> // std::memcpy
 
#include <odb/pgsql/traits.hxx>
 
namespace odb
{
  namespace pgsql
  {
    template <>
    class value_traits<std::vector<int>, id_string>
    {
    public:
      typedef std::vector<int> value_type;
      typedef value_type query_type;
      typedef details::buffer image_type;
 
      static void
      set_value (value_type& v,
                 const details::buffer& b,
                 std::size_t n,
                 bool is_null)
      {
        v.clear ();
 
        if (!is_null)
        {
          char c;
          std::string s (b.data (), n);
          std::cerr << "in: " << s << std::endl;
          std::istringstream is (s);
 
          is >> c; // '{'
 
          for (c = static_cast<char> (is.peek ());
               c != '}';
               is >> c)
          {
            v.push_back (int ());
            is >> v.back ();
          }
        }
      }
 
      static void
      set_image (details::buffer& b,
                 std::size_t& n,
                 bool& is_null,
                 const value_type& v)
      {
        is_null = false;
        std::ostringstream os;
 
        os << '{';
 
        for (value_type::const_iterator i (v.begin ()),
               e (v.end ());
             i != e;)
        {
          os << *i;
 
          if (++i != e)
            os << ',';
        }
 
        os << '}';
 
        const std::string& s (os.str ());
        std::cerr << "out: " << s << std::endl;
        n = s.size ();
 
        if (n > b.capacity ())
          b.capacity (n);
 
        std::memcpy (b.data (), s.c_str (), n);
      }
    };
  }
}
#endif

Ok, now that we have both pieces of the puzzle, let’s put everything together. The first step is to compile test.hxx (the file that defines the persistent class) with the ODB compiler. At this stage we need to include traits.hxx (the file that defines the value_traits specialization) into the generated header file. We use the --hxx-epilogue option for that. Here is a sample ODB command line:

odb -d pgsql -s --hxx-epilogue '#include "traits.hxx"' test.hxx

Let’s also create a test driver that stores the object in the database and then loads it back. Here we want to see two things: the SQL statements that are being executed and the data that is being sent to and from the database:

// driver.cxx
//
#include <odb/transaction.hxx>
#include <odb/pgsql/database.hxx>
 
#include "test.hxx"
#include "test-odb.hxx"
 
using namespace std;
using namespace odb::core;
 
int main ()
{
  odb::pgsql::database db ("odb_test", "", "odb_test");
 
  object o;
  o.array.push_back (1);
  o.array.push_back (2);
  o.array.push_back (3);
 
  transaction t (db.begin ());
  t.tracer (stderr_tracer);
 
  unsigned long id (db.persist (o));
  db.load (id, o);
 
  t.commit ();
}

Now we can build and run our test driver:

g++ -o driver driver.cxx test-odb.cxx -lodb-pgsql -lodb
psql -U odb_test -d odb_test ./test.sql
./driver

The output of the test driver is shown below. Notice how the conversion expressions that we specified in the mapping pragma ended up in the SQL statements that ODB executed in order to persist and load the object.

out: {1,2,3}
INSERT INTO object(id,array) VALUES(DEFAULT,$2::INTEGER[]) RETURNING id
SELECT object.id,object.array::TEXT FROM object WHERE object.id=$1
in: {1,2,3}

For more information on custom database type mapping support in ODB refer to Section 12.6, “Database Type Mapping Pragmas” in the ODB manual. Additionally, the odb-tests package contains a set of tests in the <database>/custom directories that, for each database, shows how to provide custom mapping for a variety of SQL types.

While the 2.1.0 release is still several weeks out, if you would like to give the new type mapping support a try, you can use the 2.1.0.a1 pre-release.

Efficient argument passing in C++11, Part 3

Tuesday, July 3rd, 2012

Last week, in Part 2 of this post, we saw yet another method of efficient argument passing in C++11, this time using a custom wrapper type. Some people called it a smart pointer, though it looks more like a smart reference with smartness coming from its ability to distinguish between lvalues and rvalues. You can download an improved (thanks to your feedback) version of the in class template along with a test: in.tar.gz.

So, now we have a total of four alternatives: pass by const reference, pass by value, overload on lvalue/rvalue references, and, finally, the smart reference approach. I would have liked to tell you that there is a single method that works best in all cases. Unfortunately, it is not the case, at least not in C++11. Every one of these methods works best in some situations and has serious drawbacks when applied in others.

In fact, one can argue that C++11 actually complicated things compared to C++98. While you may not be able to achieve the same efficiency in C++98 when it comes to argument passing, at least the choice was simple: pass by const reference and move on to more important things. In this area C++11 became even more of a craftsman’s language where every case needs to be carefully analyzed and an intricate mechanism used to achieve the best result.

If we can’t have a single, fit-all solution, let’s at least try to come up with a set of guidelines that would allow us to select an appropriate method without spending too much time thinking about it.

Let’s start with the smart reference approach since it comes closest to the fit-all solution. As you may remember from last week’s post, its main issue is the need for a custom wrapper type and the resulting non-idiomatic interface. This is a problem both at the interface level (people looking at the function signature that uses the in class template may not know what’s going on) as well as at the implementation level (we have to “unwrap” the argument to access its member functions). As a result, I wouldn’t recommend using this approach in code that is meant to be used by a wider audience (e.g., libraries, frameworks, etc). However, for application code that is only meant to be seen and understood by the people developing it, smart references can free your team from agonizing about which method to use in each specific case in order to achieve the best performance.

If we decide not to use the smart reference approach, then we have the other three alternatives to choose from. Let’s first say that we want to select only one method and always use that. This may not be a bad idea since what you get in return is the freedom not to think about this stuff anymore. You simply apply the rule and concentrate on more important things. One can also argue that all this discussion is one misguided exercise in premature optimization because in the majority of cases and in the grand scheme of things, it won’t matter which approach we use. And the few cases that do matter which, as experience tells us, we can only recognize with the help of a profiler, we can always change to use a more optimal method.

Ok, so if we had to choose just one method, which one would it be? The overload on lvalue/rvalue references is out since it epitomizes premature optimization that we pay for with complexity and code bloat. So that leaves us with pass by const reference and pass by value. If we use pass by reference and our function makes a copy of the argument, we will miss out on the move optimization in case the argument is an rvalue. If we use pass by value and our function doesn’t make a copy of the argument, we will incur a copy overhead in case the argument is an lvalue. Predictably, the loss aversion principle kicks in (people’s tendency to strongly prefer avoiding losses to acquiring gains) and I personally prefer to miss out on the optimization than to incur the overhead. More rationally, though, I tend to think that in the general case more functions will simply use the argument rather than making a copy.

So it is the pass by const reference method if we had to choose only one. It has a couple of other nice properties. First of all, it is the same as what we would use in C++98. So if our code has to compile in both C++98 and C++11 modes or if we are migrating from C++98 to C++11, then it makes our life a little bit easier. The other nice property of this approach is that we can convert it to the overload on lvalue/rvalue method by simply adding another function.

What if we relax our requirements a little and allow ourselves to select between two methods? Can we come up with a set of simple rules that would allow us to make a correct choice in most cases and without spending too much time thinking about it? The choice here is between pass by reference and pass by value, with overload on lvalue/rvalue references reserved for fine-tuning a select few cases. As we know, whether the first two methods will result in optimal performance depends solely on whether the function makes a copy of its argument. And, as we have discussed in Part 1 of this post, in quite a few real-world situations this can be really hard and often impossible to determine. It also makes the signature of the function (i.e., the interface) depend on its implementation, which can have all sorts of negative consequences.

One approximation that we can use to resolve this problem is to think of argument copying conceptually rather than actually. That is, when we decide how to pass an argument, we ask ourselves whether this function conceptually needs to make a copy of this argument. For example, for the email class constructor that we’ve seen in Part 1 and 2, the answer is clearly yes, since the resulting email instance is expected to contain copies of the passed data.

Similarly, if we ask ourselves whether the matrix operator+ conceptually makes copies of its arguments, then the answer is no, even though the implementation is most likely to make a copy of one of its arguments and use operator+= on that (as we have seen, passing one or both arguments by value in operator+ doesn’t really produce the desired optimization in all the cases).

As another example, consider operator+= itself. For matrix it clearly doesn’t make a copy of its argument, conceptually and actually. For std::string, on the other hand, it does make a copy of its argument, conceptually but, most likely, not actually. For std::list, it does make a copy of its argument, conceptually and, chances are good, actually.

While this approximation won’t produce the optimal result every time, I believe it will have a pretty good average while significantly simplifying the decision making. So these are the rules I am going to start using in my C++11 code, summarized in the list list:

  1. Does the function conceptually make a copy of its argument?
  2. If the answer is NO, then pass by const reference.
  3. If the answer is YES, then pass by value.
  4. Based on the profiler result or other evidence, optimize a select few cases by providing lvalue/rvalue overloads.

I think the only kind of code where going straight to lvalue/rvalue overloads is justified are things like generic containers, matrices, etc. I would also like to know what you think. You can do it in the comments below or in the /r/cpp discussion of this post.