Emulating Boost.MultiIndex with Standard Containers

In my work on ODB I periodically run into a need to define a set container that uses only a subset of data members from a class. Occasionally I also need to have several such sets for a single class, each containing the same elements but using different subsets of the data members. Here is a motivating example (I am omitting accessors and instead making all the data members public for brevity):

 
struct person
{
  person (const std::string& email,
          const std::string& name,
          unsigned short age);
 
  std::string email;
  std::string name;
  unsigned short age;
};

Let’s say we want to have std::set of person elements that only use the email data member for ordering. We can create such a set fairly easily by defining a custom comparator:

 
struct email_comparator
{
  bool operator() (const person& x, const person& y) const
  {
    return x.email < y.email;
  }
};
 
typedef std::set<person, email_comparator> person_set;

The limitations of this approach, however, become obvious as soon as we try to use find(). We cannot just pass the email to this function. Instead, we have to pass a person instance with the desired email:

 
person_set s;
...
auto i (s.find (person ("john@doe.com", "", 0)));

This is inelegant. And potentially inefficient, if creating a dummy person instance is expensive. While we cannot do much about inefficiency, we can hide the ugliness by providing a custom version of the find() function:

 
struct person_set: std::set<person, email_comparator>
{
  typedef std::set<person, email_comparator> base;
 
  iterator find (const std::string& email) const
  {
    return base::find (person (email, "", 0));
  }
};

Things get worse if we need to be able to find elements using different ordering criteria. In our example, besides email, we may also want to lookup people based on their name. Standard set doesn’t provide this functionality and what people normally resort to is maintaining multiple sets each containing all the elements, perhaps using shared_ptr in order to avoid duplication.

If you are familiar with Boost multi_index_container then you are probably jumping out of your seat screaming that this is exactly the problem this container is here to solve. And you would be absolutely right. With Boost multi_index_container our person_set can be defined like this:

 
namespace mi = boost::multi_index;
 
typedef mi::multi_index_container<
  person,
  mi::indexed_by<
    mi::ordered_unique<
      mi::member<person, std::string, &person::email>
    >
  >
> person_set;

While the declaration is quite a bit more involved, in return we get a perfect find() signature that takes email as its argument:

 
person_set s;
...
auto i (s.find ("john@doe.com"));

It is also easy to extend the above person_set container to include another ordering criteria (called index):

 
struct by_email {};
struct by_name {};
 
typedef mi::multi_index_container<
  person,
  mi::indexed_by<
    mi::ordered_unique<
      mi::tag<by_email>,
      mi::member<person, std::string, &person::email>
    >,
    mi::ordered_unique<
      mi::tag<by_name>,
      mi::member<person, std::string, &person::name>
    >
  >
> person_set;
 
person_set s;
...
auto i (s.get<by_email> ().find ("john@doe.com"));
auto j (s.get<by_name> ().find ("Jane Doe"));

Now, if your project is already using Boost or if you need functionality beyond simple insert()/find(), then by all means use multi_index. On the other hand, if you are not using Boost, then adding this dependency for a simple use-case like this definitely sounds like an overkill. Furthermore, some projects, for various reasons, may not be allowed to add this dependency. For example, while you can use ODB with Boost (support for Boost and Qt is provided as add-on profile libraries), we choose to maintain the ODB compiler itself as dependency-free as possible.

So if we cannot use multi_index, are we forever relegated to using ugly and inefficient hacks discussed above? As it turns out, we can emulate functionality offered by multi_index using just the standard containers. And it will be pretty efficient and convenient to use, too.

Let’s start with the first case where we want a set that indexes on a specific data member (email in our case). Our goal is to get rid of the requirement to create a dummy person instance during lookup. Which standard container would allow us to associate a single value (email) with the whole object (person)? std::map seem to fit the bill. So, here is our first draft of a solution:

 
typedef std::map<std::string, person> person_set;

While this solves our immediate problem (we can now call find() with just the email), it added a couple of new ones. Let’s start with the fact that we now have two copies of the email stored for each entry: one as the map key and the other in the person object. That is wasteful. What can we do about it? Since both copies contain exactly the same value and their lifetime is exactly the same, can’t we make one just point to the other? Making person::email point to the map key is probably a bad idea. After all, person instances can exist outside our container. But the other way around sounds reasonable. That is, we make the map key point to the string stored in the value:

 
typedef std::map<const std::string*, person> person_set;

There is still a little problem with this approach. When we insert an element into a map (as a key and value pair), we don’t know the address of the email member in the value that is to be created (remember std::map will make a copy of the passed pair). Or, in other words, when we call insert(), we have to pass the address to something that will only be created inside this insert(). Bummer! Maybe we should add that Boost dependency after all…

For those who are still with me, the way we can work around this limitation is to pass a temporary pointer to insert() and change it to point to the inserted value after insert() has completed. To be able to do this we first have to circumvent the const-ness of the key in the map. And for that we use a little helper class template called key:

 
template <typename T>
struct key
{
  mutable const T* p;
 
  key (const T* v = 0): p (v) {}
  bool operator< (const key& x) const {return *p < *x.p;}
};

The key’s only purpose is to give us a mutable pointer that we can update. While at it, it also conveniently provides operator<. Ok, let’s look at our next draft (we will attend to that ??? in a second):

 
struct person_set
{
  typedef std::map<key<std::string>, person> email_map;
 
  std::pair<???, bool>
  insert (const person& v)
  {
    auto r (email_map_.insert (email_map::value_type (&v.email, v)));
 
    if (r.second)
      r.first->first.p = &r.first->second.email;
 
    return std::make_pair (r.first, r.second);
  }
 
private:
  email_map email_map_;
};

Ok, that wasn’t as bad as it sounded. We basically insert a key/value pair into the map using the passed value’s email member as a temporary pointer. If the insertion succeeded, we update that pointer with the email member from the inserted value. From the map’s perspective nothing changed since the pointed-to string is the same. That was the trickiest part of the whole thing. The rest is just bringing it home.

The other problem with using std::map as a base of our set implementation is that the iterator no longer points to person. Instead, we have std::pair containing the key and the value. This is not terribly convenient and fairly easy to fix with another little helper called map_iterator_adapter. Essentially it takes an std::map iterator and turns it into an iterator that iterates over map’s values while ignoring the keys:

 
template <typename I>
struct map_iterator_adapter: I
{
  typedef const typename I::value_type::second_type value_type;
  typedef value_type* pointer;
  typedef value_type& reference;
 
  map_iterator_adapter () {}
  map_iterator_adapter (I i): I (i) {}
 
  reference operator* () const {return I::operator* ().second;}
  pointer operator-> () const {return &I::operator-> ()->second;}
};

Ok, let’s put everything together:

 
struct person_set
{
  typedef std::map<key<std::string>, person> email_map;
  typedef map_iterator_adapter<email_map::const_iterator> iterator;
 
  std::pair<iterator, bool>
  insert (const person& v)
  {
    auto r (email_map_.insert (email_map::value_type (&v.email, v)));
    iterator i (r.first);
 
    if (r.second)
      r.first->first.p = &i->email;
 
    return std::make_pair (i, r.second);
  }
 
  iterator
  find (const std::string& email) const
  {
    return email_map_.find (&email);
  }
 
  iterator begin () const {return email_map_.begin ();}
  iterator end () const {return email_map_.end ();}
 
private:
  email_map email_map_;
};

For our purposes, this version of person_set can be used just like the one based on Boost multi_index:

 
person_set s;
s.insert (person ("john@doe.com", "John Doe", 29));
s.insert (person ("jane@doe.com", "Jane Doe", 27));
auto i (s.find ("john@doe.com"));

What if we want to add another index, say for name? The idea is to add another map with the iterator from the first map as its value. The highlighted with <-- fragments correspond to the changes necessary to add support for another index:

 
struct person_set
{
  typedef std::map<key<std::string>, person> email_map;
  typedef map_iterator_adapter<email_map::const_iterator> iterator;
 
  typedef std::map<key<std::string>, iterator> name_map;           // <--
 
  std::pair<iterator, bool>
  insert (const person& v)
  {
    // First check that we don't have any collisions in
    // the secondary indexes.
    //
    {
      auto i (name_map_.find (&v.name));                           // <--
      if (i != name_map_.end ())                                   // <--
        return std::make_pair (i->second, false);                  // <--
    }
 
    auto r (email_map_.insert (email_map::value_type (&v.email, v)));
    iterator i (r.first);
 
    if (r.second)
    {
      r.first->first.p = &i->email;
      name_map_.insert (name_map::value_type (&i->name, i));       // <--
    }
 
    return std::make_pair (i, r.second);
  }
 
  iterator                                                         // <--
  find_email (const std::string& email) const                      // <--
  {                                                                // <--
    return email_map_.find (&email);                               // <--
  }                                                                // <--
 
  iterator
  find_name (const std::string& name) const
  {
    auto i (name_map_.find (&name));
    return i != name_map_.end () ? i->second : end ();
  }
 
  iterator begin () const {return email_map_.begin ();}
  iterator end () const {return email_map_.end ();}
 
private:
  email_map email_map_;
  name_map name_map_;                                              // <--
};

What about multi-member (composite) indexes? Say we change our person class to store the first and last names separately but would still like to do lookup based on these two members:

 
struct person
{
  person (const std::string& email,
          const std::string& first,
          const std::string& last,
          unsigned short age);
 
  std::string email;
  std::string first;
  std::string last;
  unsigned short age;
};

Boost multi_index can handle this quite easily. We can also support composite indexes in our emulation if we extend our key helper to handle multi-member keys:

 
template <typename T, typename... R>
struct key: key<R...>
{
  typedef key<R...> base;
 
  mutable const T* p;
 
  key (): p (0) {}
  key (const T* v, const R*... r): base (r...), p (v) {}
 
  void assign (const T* v, const R*... r) const 
  {
    p = v; base::assign (r...);
  }
 
  bool operator< (const key& x) const
  {
    return *p < *x.p || (!(*x.p < *p) && base::operator< (x));
  }
};
 
template <typename T>
struct key<T>
{
  mutable const T* p;
 
  key (const T* v = 0): p (v) {}
  void assign (const T* v) const {p = v;}
  bool operator< (const key& x) const {return *p < *x.p;}
};

Based on this improvement we can now implement a set with first and last members as a key:

 
struct person_set
{
  typedef key<std::string, std::string> name_key;
  typedef std::map<name_key, person> name_map;
  typedef map_iterator_adapter<name_map::const_iterator> iterator;
 
  std::pair<iterator, bool>
  insert (const person& v)
  {
    auto r (name_map_.insert (
              std::make_pair (name_key (&v.first, &v.last), v)));
    iterator i (r.first);
 
    if (r.second)
      r.first->first.assign (&i->first, &i->last);
 
    return std::make_pair (i, r.second);
  }
 
  iterator
  find (const std::string& first, const std::string& last) const
  {
    return name_map_.find (name_key (&first, &last));
  }
 
  iterator begin () const {return name_map_.begin ();}
  iterator end () const {return name_map_.end ();}
 
private:
  name_map name_map_;
};

Given a choice one should probably prefer Boost.MultiIndex to the approach shown above. However, there is one potential advantage of our custom solution compared to multi_index which has serious limitations on the mutability of elements. On the other hand, because we store elements in a map, it is fairly straightforward to allow mutation provided we are careful not to modify any members that are involved in indexes.

This entry was posted on Tuesday, September 11th, 2012 at 7:38 am and is filed under Design, C++. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

One Response to “Emulating Boost.MultiIndex with Standard Containers”

Maxim Yegorushkin Says:
September 13th, 2012 at 6:40 am
> … multi_index which has serious limitations on the mutability of elements

Not sure if const === serious.

Boost multi_index tries to protect one from accidentally changing the key by returning references to const elements from find(). There is nothing wrong with casting away that const qualifier, it is like saying: “yeah, I know the key must not change, I promise not to do that and I know what I am doing”.