build2 – C++ Build Toolchain
Wednesday, February 3rd, 2016Wondering what I’ve been up to lately? It’s a C++ build system… it’s a C++ package manager… it’s a C++ package repository! Check it out: build2
.
Wondering what I’ve been up to lately? It’s a C++ build system… it’s a C++ package manager… it’s a C++ package repository! Check it out: build2
.
In my work on ODB I periodically run into a need to define a set container that uses only a subset of data members from a class. Occasionally I also need to have several such sets for a single class, each containing the same elements but using different subsets of the data members. Here is a motivating example (I am omitting accessors and instead making all the data members public for brevity):
struct person { person (const std::string& email, const std::string& name, unsigned short age); std::string email; std::string name; unsigned short age; };
Let’s say we want to have std::set
of person
elements that only use the email
data member for ordering. We can create such a set fairly easily by defining a custom comparator:
struct email_comparator { bool operator() (const person& x, const person& y) const { return x.email < y.email; } }; typedef std::set<person, email_comparator> person_set;
The limitations of this approach, however, become obvious as soon as we try to use find()
. We cannot just pass the email to this function. Instead, we have to pass a person
instance with the desired email:
person_set s; ... auto i (s.find (person ("john@doe.com", "", 0)));
This is inelegant. And potentially inefficient, if creating a dummy person
instance is expensive. While we cannot do much about inefficiency, we can hide the ugliness by providing a custom version of the find()
function:
struct person_set: std::set<person, email_comparator> { typedef std::set<person, email_comparator> base; iterator find (const std::string& email) const { return base::find (person (email, "", 0)); } };
Things get worse if we need to be able to find elements using different ordering criteria. In our example, besides email, we may also want to lookup people based on their name. Standard set
doesn’t provide this functionality and what people normally resort to is maintaining multiple sets each containing all the elements, perhaps using shared_ptr
in order to avoid duplication.
If you are familiar with Boost multi_index_container
then you are probably jumping out of your seat screaming that this is exactly the problem this container is here to solve. And you would be absolutely right. With Boost multi_index_container
our person_set
can be defined like this:
namespace mi = boost::multi_index; typedef mi::multi_index_container< person, mi::indexed_by< mi::ordered_unique< mi::member<person, std::string, &person::email> > > > person_set;
While the declaration is quite a bit more involved, in return we get a perfect find()
signature that takes email as its argument:
person_set s; ... auto i (s.find ("john@doe.com"));
It is also easy to extend the above person_set
container to include another ordering criteria (called index):
struct by_email {}; struct by_name {}; typedef mi::multi_index_container< person, mi::indexed_by< mi::ordered_unique< mi::tag<by_email>, mi::member<person, std::string, &person::email> >, mi::ordered_unique< mi::tag<by_name>, mi::member<person, std::string, &person::name> > > > person_set; person_set s; ... auto i (s.get<by_email> ().find ("john@doe.com")); auto j (s.get<by_name> ().find ("Jane Doe"));
Now, if your project is already using Boost or if you need functionality beyond simple insert()
/find()
, then by all means use multi_index
. On the other hand, if you are not using Boost, then adding this dependency for a simple use-case like this definitely sounds like an overkill. Furthermore, some projects, for various reasons, may not be allowed to add this dependency. For example, while you can use ODB with Boost (support for Boost and Qt is provided as add-on profile libraries), we choose to maintain the ODB compiler itself as dependency-free as possible.
So if we cannot use multi_index
, are we forever relegated to using ugly and inefficient hacks discussed above? As it turns out, we can emulate functionality offered by multi_index
using just the standard containers. And it will be pretty efficient and convenient to use, too.
Let’s start with the first case where we want a set that indexes on a specific data member (email
in our case). Our goal is to get rid of the requirement to create a dummy person
instance during lookup. Which standard container would allow us to associate a single value (email
) with the whole object (person
)? std::map
seem to fit the bill. So, here is our first draft of a solution:
typedef std::map<std::string, person> person_set;
While this solves our immediate problem (we can now call find()
with just the email), it added a couple of new ones. Let’s start with the fact that we now have two copies of the email stored for each entry: one as the map key and the other in the person
object. That is wasteful. What can we do about it? Since both copies contain exactly the same value and their lifetime is exactly the same, can’t we make one just point to the other? Making person::email
point to the map key is probably a bad idea. After all, person
instances can exist outside our container. But the other way around sounds reasonable. That is, we make the map key point to the string stored in the value:
typedef std::map<const std::string*, person> person_set;
There is still a little problem with this approach. When we insert an element into a map (as a key and value pair), we don’t know the address of the email
member in the value that is to be created (remember std::map
will make a copy of the passed pair). Or, in other words, when we call insert()
, we have to pass the address to something that will only be created inside this insert()
. Bummer! Maybe we should add that Boost dependency after all…
For those who are still with me, the way we can work around this limitation is to pass a temporary pointer to insert()
and change it to point to the inserted value after insert()
has completed. To be able to do this we first have to circumvent the const
-ness of the key in the map. And for that we use a little helper class template called key
:
template <typename T> struct key { mutable const T* p; key (const T* v = 0): p (v) {} bool operator< (const key& x) const {return *p < *x.p;} };
The key
’s only purpose is to give us a mutable
pointer that we can update. While at it, it also conveniently provides operator<
. Ok, let’s look at our next draft (we will attend to that ???
in a second):
struct person_set { typedef std::map<key<std::string>, person> email_map; std::pair<???, bool> insert (const person& v) { auto r (email_map_.insert (email_map::value_type (&v.email, v))); if (r.second) r.first->first.p = &r.first->second.email; return std::make_pair (r.first, r.second); } private: email_map email_map_; };
Ok, that wasn’t as bad as it sounded. We basically insert a key/value pair into the map using the passed value’s email
member as a temporary pointer. If the insertion succeeded, we update that pointer with the email
member from the inserted value. From the map’s perspective nothing changed since the pointed-to string is the same. That was the trickiest part of the whole thing. The rest is just bringing it home.
The other problem with using std::map
as a base of our set implementation is that the iterator no longer points to person
. Instead, we have std::pair
containing the key and the value. This is not terribly convenient and fairly easy to fix with another little helper called map_iterator_adapter
. Essentially it takes an std::map
iterator and turns it into an iterator that iterates over map’s values while ignoring the keys:
template <typename I> struct map_iterator_adapter: I { typedef const typename I::value_type::second_type value_type; typedef value_type* pointer; typedef value_type& reference; map_iterator_adapter () {} map_iterator_adapter (I i): I (i) {} reference operator* () const {return I::operator* ().second;} pointer operator-> () const {return &I::operator-> ()->second;} };
Ok, let’s put everything together:
struct person_set { typedef std::map<key<std::string>, person> email_map; typedef map_iterator_adapter<email_map::const_iterator> iterator; std::pair<iterator, bool> insert (const person& v) { auto r (email_map_.insert (email_map::value_type (&v.email, v))); iterator i (r.first); if (r.second) r.first->first.p = &i->email; return std::make_pair (i, r.second); } iterator find (const std::string& email) const { return email_map_.find (&email); } iterator begin () const {return email_map_.begin ();} iterator end () const {return email_map_.end ();} private: email_map email_map_; };
For our purposes, this version of person_set
can be used just like the one based on Boost multi_index
:
person_set s; s.insert (person ("john@doe.com", "John Doe", 29)); s.insert (person ("jane@doe.com", "Jane Doe", 27)); auto i (s.find ("john@doe.com"));
What if we want to add another index, say for name
? The idea is to add another map with the iterator from the first map as its value. The highlighted with <--
fragments correspond to the changes necessary to add support for another index:
struct person_set { typedef std::map<key<std::string>, person> email_map; typedef map_iterator_adapter<email_map::const_iterator> iterator; typedef std::map<key<std::string>, iterator> name_map; // <-- std::pair<iterator, bool> insert (const person& v) { // First check that we don't have any collisions in // the secondary indexes. // { auto i (name_map_.find (&v.name)); // <-- if (i != name_map_.end ()) // <-- return std::make_pair (i->second, false); // <-- } auto r (email_map_.insert (email_map::value_type (&v.email, v))); iterator i (r.first); if (r.second) { r.first->first.p = &i->email; name_map_.insert (name_map::value_type (&i->name, i)); // <-- } return std::make_pair (i, r.second); } iterator // <-- find_email (const std::string& email) const // <-- { // <-- return email_map_.find (&email); // <-- } // <-- iterator find_name (const std::string& name) const { auto i (name_map_.find (&name)); return i != name_map_.end () ? i->second : end (); } iterator begin () const {return email_map_.begin ();} iterator end () const {return email_map_.end ();} private: email_map email_map_; name_map name_map_; // <-- };
What about multi-member (composite) indexes? Say we change our person
class to store the first and last names separately but would still like to do lookup based on these two members:
struct person { person (const std::string& email, const std::string& first, const std::string& last, unsigned short age); std::string email; std::string first; std::string last; unsigned short age; };
Boost multi_index
can handle this quite easily. We can also support composite indexes in our emulation if we extend our key
helper to handle multi-member keys:
template <typename T, typename... R> struct key: key<R...> { typedef key<R...> base; mutable const T* p; key (): p (0) {} key (const T* v, const R*... r): base (r...), p (v) {} void assign (const T* v, const R*... r) const { p = v; base::assign (r...); } bool operator< (const key& x) const { return *p < *x.p || (!(*x.p < *p) && base::operator< (x)); } }; template <typename T> struct key<T> { mutable const T* p; key (const T* v = 0): p (v) {} void assign (const T* v) const {p = v;} bool operator< (const key& x) const {return *p < *x.p;} };
Based on this improvement we can now implement a set with first
and last
members as a key:
struct person_set { typedef key<std::string, std::string> name_key; typedef std::map<name_key, person> name_map; typedef map_iterator_adapter<name_map::const_iterator> iterator; std::pair<iterator, bool> insert (const person& v) { auto r (name_map_.insert ( std::make_pair (name_key (&v.first, &v.last), v))); iterator i (r.first); if (r.second) r.first->first.assign (&i->first, &i->last); return std::make_pair (i, r.second); } iterator find (const std::string& first, const std::string& last) const { return name_map_.find (name_key (&first, &last)); } iterator begin () const {return name_map_.begin ();} iterator end () const {return name_map_.end ();} private: name_map name_map_; };
Given a choice one should probably prefer Boost.MultiIndex to the approach shown above. However, there is one potential advantage of our custom solution compared to multi_index
which has serious limitations on the mutability of elements. On the other hand, because we store elements in a map, it is fairly straightforward to allow mutation provided we are careful not to modify any members that are involved in indexes.
Just a quick note on a post called “Evil, or Just Misunderstood?” by Jon Kalb and Dave Abrahams over at C++Next. While the article itself is quite thought-provoking, what I found really insightful is the discussion in the comments that followed. Some really good stuff if you are interested in software design. Check it out.